Danny To Eun Kim
@teknology.bsky.social
1K followers 420 following 20 posts
PhD student @CMU LTI NLP | IR | Evaluation | RAG https://kimdanny.github.io
Posts Media Videos Starter Packs
teknology.bsky.social
Excited to present at #CLEF2025 #Touché Lab (Session 2) shared task "Advertisement in RAG"🇪🇸!
@webis.de
🗓️Sept 9 (Tue)
⏲️5:20PM (CEST) / 11:20AM (EST)
📍Florentino Sanz Room
🧠https://arxiv.org/abs/2507.00509
Join us for insights on #RAG + advertising!
advertisement generation and detection in RAG
Reposted by Danny To Eun Kim
bmitra.bsky.social
Some exciting news! 🤗 After 3 amazing years at TREC, the Tip-of-the-Tongue (ToT) shared task will be a core task at NTCIR-19 in 2026. The new track will focus on tip-of-the-tongue information needs in English and East Asian languages.

More details coming soon. See you all in Tokyo next year!
an aerial view of tokyo at night with lots of lights
ALT: an aerial view of tokyo at night with lots of lights
media.tenor.com
Reposted by Danny To Eun Kim
bmitra.bsky.social
Gentle reminder 📢
All run submissions for the Tip-of-the-Tongue (ToT) Track are due next week Wednesday (Aug 27).

More info: trec-tot.github.io/guidelines
#TREC2025 #TRECToT #TREC2025ToT
teknology.bsky.social
This year's TREC Tip of the Tongue (ToT) track will be amazing! Based on our rigorous experiments on synthetic ToT query generation presented at #SIGIR2025, we extended the track to open domain ToT queries.
We provide codes for baseline systems, and submissions are due by August 27th!
bmitra.bsky.social
Important announcement: All run submissions for TREC'25 Tip-of-the-Tongue (TREC-ToT) Track are due by **August 27th**. The run submission form is now open. Please submit your runs before the deadline.

More information: trec-tot.github.io/guidelines
#TREC2025 #TRECToT #TREC2025ToT

Spread the word!
Reposted by Danny To Eun Kim
maik-froebe.bsky.social
To Eun Kim just presented the work on "Tip of the Tongue Query Elicitation for Simulated Evaluation" at #SIGIR2025. The approach will be used in the #TREC2025 Tip-of-the-Tongue track, and we had some sweets at the poster :)

The paper is available online: dl.acm.org/doi/10.1145/...
Reposted by Danny To Eun Kim
bmitra.bsky.social
Hello TREC-ToTers!

We have released the test queries for the TREC 2025 Tip-of-the-Tongue (TREC-ToT) Track. Please see the guidelines for more information: trec-tot.github.io/guidelines. Run submission deadline will tentatively be in August. #TREC2025 #TRECToT #TREC2025ToT

Please spread the word!
teknology.bsky.social
❓How do LLMs respond to fair ranking in RAG?
🤩 See how fair ranking boosts downstream utility while promoting fairer attribution of cited sources.
Catch our oral presentation at #ICTIR2025!
#SIGIR2025 @841io.bsky.social
teknology.bsky.social
Heading to #NeurIPS2024 to present our ‘Fair RAG’ paper at the #AFME2024 workshop! Let's talk about RAG, Information Retrieval, and Fairness. Honored that our paper was selected as one of the Top 5 Spotlight Papers! 🎉 Let’s connect and chat!
Paper: arxiv.org/abs/2409.11598
Towards Fair RAG: On the Impact of Fair Ranking in Retrieval-Augmented Generation
Many language models now enhance their responses with retrieval capabilities, leading to the widespread adoption of retrieval-augmented generation (RAG) systems. However, despite retrieval being a cor...
arxiv.org
Reposted by Danny To Eun Kim
maik-froebe.bsky.social
Do not forget to participate in the #TREC2025 Tip-of-the-Tongue (ToT) Track :)

The corpus and baselines (with run files) are now available and easily accessible via the ir_datasets API and the HuggingFace Datasets API.

More details are available at: trec-tot.github.io/guidelines
Dory from finding nemo with the quote: "I remember it like it was yesterday. Of course, I dont remember yesterday."
Reposted by Danny To Eun Kim
shaily99.bsky.social
🖋️ Curious how writing differs across (research) cultures?
🚩 Tired of “cultural” evals that don't consult people?

We engaged with interdisciplinary researchers to identify & measure ✨cultural norms✨in scientific writing, and show that❗LLMs flatten them❗

📜 arxiv.org/abs/2506.00784

[1/11]
An overview of the work “Research Borderlands: Analysing Writing Across Research Cultures” by Shaily Bhatt, Tal August, and Maria Antoniak. The overview describes that We  survey and interview interdisciplinary researchers (§3) to develop a framework of writing norms that vary across research cultures (§4) and operationalise them using computational metrics (§5). We then use this evaluation suite for two large-scale quantitative analyses: (a) surfacing variations in writing across 11 communities (§6); (b) evaluating the cultural competence of LLMs when adapting writing from one community to another (§7).
Reposted by Danny To Eun Kim
bmitra.bsky.social
Hello TREC-ToTers! 👋🏽

Excited to announce the release of TREC 2025 Tip-of-the-Tongue (TREC-ToT) Track guidelines: trec-tot.github.io/guidelines. We will release test queries in July and run submission deadline will be in August. #TREC2025 #TRECToT #TREC2025ToT

Please register to participate:
TREC 2025 Tip-of-the-Tongue (ToT) Track
Tip of the tongue: The phenomenon of failing to retrieve something from memory, combined with partial recall and the feeling that retrieval is imminent.
trec-tot.github.io
Reposted by Danny To Eun Kim
athiya.bsky.social
Ever trusted a metric that works great on average, only for it to fail in your specific use case?

In our #NAACL2025 paper (w/ @841io.bsky.social), we show why global evaluations are not enough and why context matters more than you think.

📄 aclanthology.org/2025.finding...
#NLP #Evaluation

(🧵1/9)
Reposted by Danny To Eun Kim
Reposted by Danny To Eun Kim
841io.bsky.social
If you're working on a recall-oriented task or with ranking systems evaluated across varied users, content, or intents, check it out. 5/5

dl.acm.org/doi/10.1145/...
Reposted by Danny To Eun Kim
841io.bsky.social
📢 New Paper: "Recall, Robustness, and Lexicographic Evaluation" (ACM TORS)
F Diaz, M Ekstrand (@md.ekstrandom.net), B Mitra (@bmitra.bsky.social)

For IR, NLP, and ML researchers working on ranking systems evaluated for recall and robustness. 🧵 1/5 dl.acm.org/doi/10.1145/...
A ven diagram showing that the recall and robustness, each of which has many different conceptions, interest when thinking about recall as "totality" and robustness as "worst-case performance".  It's in this intersection that lexicographic recall (lexirecall) lives.
teknology.bsky.social
Here's an overview of TREC 2024 TOT track runs with the test queries:
trec.nist.gov/pubs/trec33/...
trec.nist.gov
teknology.bsky.social
Yes! Thats exactly the case of TOT retrieval for academics :)
teknology.bsky.social
⚡️Multi-Domain Coverage
Combining both methods allows TOT query evaluation in multiple domains. We tested simulated evaluation in Movie, Landmark, and Person domains. Moreover, we build a broader, more inclusive TOT test collection.
teknology.bsky.social
Solution2️⃣: Human-Elicitation
We designed an interface with visual prompts to induce a TOT state in human participants. Their queries closely match authentic TOT queries and captures genuine TOT experiences in a controlled setting.
Human TOT query elicitation interface
teknology.bsky.social
Solution1️⃣: LLM-Elicitation
We built a TOT user simulator to produce synthetic queries. Results show high system rank correlation and linguistic similarity compared to real queries. This scalable simulated evaluation method overcomes data scarcity by simulating new queries on demand.
System rank correlation as a validation method for synthetic TOT queries.
teknology.bsky.social
🤔Why the Problem?
TOT query data collection relies heavily on community question answering websites (e.g., Reddit). This causes data availability issues and domain bias (most TOT queries end up being about movies or books).
teknology.bsky.social
👅Tip-of-the-Tongue (TOT) search is a complex form of known-item search, shaped by the expression of partial recall, personal context, and uncertain memories. However, TOT research has long been hindered by the scarcity of high-quality TOT queries.
teknology.bsky.social
🚨New Breakthrough in Tip-of-the-Tongue (TOT) Retrieval Research!

We address data limitations and offer a fresh evaluation method for these complex queries.

Curious how TREC TOT track test queries are created? Check out this thread 🧵 and our paper 📄: arxiv.org/abs/2502.17776
Tip of the Tongue Query Elicitation for Simulated Evaluation
Tip-of-the-tongue (TOT) search occurs when a user struggles to recall a specific identifier, such as a document title. While common, existing search systems often fail to effectively support TOT scena...
arxiv.org