Andrew Drozdov
@mrdrozdov.com
5.2K followers 610 following 180 posts
Research Scientist @ Mosaic x Databricks. Adaptive Methods for Retrieval, Generation, NLP, AI, LLMs https://mrdrozdov.github.io/
Posts Media Videos Starter Packs
Pinned
mrdrozdov.com
Using 100+ tokens to answer 2 + 3 =
Reposted by Andrew Drozdov
markriedl.bsky.social
The transformer was invented in Google. RLHF was not invented in industry labs, but came to prominence in OpenAI and DeepMind. I took 5 of the most influential papers (black dots) and visualized their references. Blue dots are papers that acknowledge federal funding (DARPA, NSF).
Reposted by Andrew Drozdov
flrp.bsky.social
LongEval is turning three this year!

This is a Call for Participation to our CLEF 2025 Lab - try out how your IR system does in the long term.

Check the details on our page:
clef-longeval.github.io
LongEval 2025
Conference Template
clef-longeval.github.io
mrdrozdov.com
The PhD is pretraining. Interview prep is alignment. Take this to heart. :)
Reposted by Andrew Drozdov
markar.bsky.social
We have updated #nocha, a leaderboard for reasoning over long-context narratives 📖, with some new models including #Gemini 2.5 Pro which shows massive improvements over the previous version! Congrats to #Gemini team 🪄 🧙 Check 🔗 novelchallenge.github.io for details :)
Leaderboard showing performance of language models on claim verification task over book-length input. o1-preview is the best model with 67.36% accuracy followed by Gemini 2.5 Pro with 64.17% accuracy.
mrdrozdov.com
I think ARR used to do this? Seems like it’s missing in the recent cycle(s).

stats.aclrollingreview.org/iterations/2...
ARR Dashboard
stats.aclrollingreview.org
mrdrozdov.com
A corollary here is that a relevant context might not improve the probability of the right answer.
mrdrozdov.com
Perhaps the most misunderstood aspect of retrieval: For a context to be relevant, it is not enough for it to improve the probability of the right answer.
Reposted by Andrew Drozdov
danliden.com
MLflow is on BlueSky! Follow @mlflow.org to keep up to date on new releases, blogs and tutorials, events, and more.
bsky.app
mrdrozdov.com
---Born To Add, Sesame Street
---(sung to the tune of Bruce Springsteen’s Born to Run)
mrdrozdov.com
One, and two, and three police persons spring out of the shadows
Down the corner comes one more
And we scream into that city night: “three plus one makes four!”
Well, they seem to think we’re disturbing the peace
But we won’t let them make us sad
’Cause kids like you and me baby, we were born to add
mrdrozdov.com
"How Claude Code is using a 50-Year-Old trick to revolutionize programming"
mrdrozdov.com
Somehow my most controversial take of 2025 is that agents relying on grep are a form of RAG.
mrdrozdov.com
Somehow my most controversial take of 2025 is that agents relying on grep are a form of RAG.
mrdrozdov.com
Search is the key to building trustworthy AI and will only be more important as we build more ambitious applications. With that in mind, there's not nearly enough energy spent improving the quality of search systems.

Follow the link for the full episode:
www.linkedin.com/posts/data-b...
Data Brew by Databricks on LinkedIn: Join us on the latest Data Brew episode for a deep dive on Retrieval…
Join us on the latest Data Brew episode for a deep dive on Retrieval, rerankers, and RAG tips and tricks with our very own Andrew Drozdov, Research Scientist…
www.linkedin.com
mrdrozdov.com
It was a real pleasure talking about effective IR approaches with Brooke and Denny on the Data Brew podcast.

Among other things, I'm excited about embedding finetuning and reranking as modular ways to improve RAG pipelines. Everyone should use these more!
mrdrozdov.com
We're probably a little too obsessed with zero-shot retrieval. If you have documents (you do), then you can generate synthetic data, and finetune your embedding. Blog post lead by @jacobianneuro.bsky.social shows how well this works in practice.

www.databricks.com/blog/improvi...
Improving Retrieval and RAG with Embedding Model Finetuning
Fine-tune embedding models on Databricks to enhance retrieval and RAG accuracy with synthetic data—no manual labeling required.
www.databricks.com
mrdrozdov.com
I do want to see aggregate stats about the model’s generation and total reasoning tokens is perhaps the least informative one.
mrdrozdov.com
"All you need to build a strong reasoning model is the right data mix."

The pipeline that creates the data mix:
mrdrozdov.com
After frequent road runs during a Finland visit I tend to feel the same
mrdrozdov.com
Using 100+ tokens to answer 2 + 3 =
mrdrozdov.com
It’s pretty obvious we’re in a local minima for pretraining. Would expect more breakthroughs in the 5-10 year range. Granted, it’s still incredibly hard and expensive to do good research in this space, despite the number of labs working on it.
kyunghyuncho.bsky.social
"the gap between OAI/Anthropic/Meta/etc. and a large group of companies all over the world you've never cared to know of, in terms of LM pre-training? tiny" - 💡 me (Nov 2, 2024)
Reposted by Andrew Drozdov
susiedent.com
Word of the day (of course) is ‘scurryfunging’, from US dialect: the frantic attempt to tidy the house just before guests arrive.
Reposted by Andrew Drozdov
kyunghyuncho.bsky.social
... didn't know this would be one of the hottest takes i've had ...

for more on my thoughts, see drive.google.com/file/d/1sk_t...