Vaishnavh Nagarajan
@vaishnavh.bsky.social
3.2K followers 380 following 170 posts
Foundations of AI. I like simple and minimal examples and creative ideas. I also like thinking about the next token 🧮🧸 Google Research | PhD, CMU | https://arxiv.org/abs/2504.15266 | https://arxiv.org/abs/2403.06963 vaishnavh.github.io
Posts Media Videos Starter Packs
Reposted by Vaishnavh Nagarajan
csdatcmu.bsky.social
Congratulations to CSD faculty Aditi Raghunathan and her research collaborators on receiving an ICML Outstanding Paper award for Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction (icml.cc/virtual/2025...).

Paper: arxiv.org/abs/2504.15266
ICML 2025 AwardsICML 2025
icml.cc
Reposted by Vaishnavh Nagarajan
eugenevinitsky.bsky.social
Reading the dedications of a PhD thesis is often a cure for a bad day. There’s so much affection in them
Reposted by Vaishnavh Nagarajan
abeirami.bsky.social
As NeurIPS review deadline is around the corner, please remember that you cannot use any non-local LLM like chatgpt/gemini for understanding the paper and drafting/revising your review as that breaks the confidentiality agreement.

NeurIPS 2025 Official LLM Policy:
neurips.cc/Conferences/...
LLM Policy
neurips.cc
Reposted by Vaishnavh Nagarajan
gtof.bsky.social
I really enjoyed "When We Cease to Understand the World", although it's more fiction than history of science
Reposted by Vaishnavh Nagarajan
jburroni.bsky.social
“Science in history” by Bernal is my first recommendation. The work of Ian Hacking is a good recommendation for
Probability
Reposted by Vaishnavh Nagarajan
eugenevinitsky.bsky.social
When we are doing science, we are unknowingly executing our mythology, taken from movies and friends and textbooks, of what science is. History of science helps us ground that myth in reality
vaishnavh.bsky.social
I finally wrote a full-fledged blog about this: reading the history of science is an **amazing** yet under-recognized way to develop (emotional) maturity as a researcher.

If you have thoughts/recommendations, please share!
vaishnavh.github.io/2025/04/29/h...
vaishnavh.bsky.social
I finally wrote a full-fledged blog about this: reading the history of science is an **amazing** yet under-recognized way to develop (emotional) maturity as a researcher.

If you have thoughts/recommendations, please share!
vaishnavh.github.io/2025/04/29/h...
vaishnavh.bsky.social
haha that's a new idiom for me. it's perfect! and the flip side is, "target and (potentially) regret", which causes quite a lot of stress. (what if your work gets rejected by the community or worse, overlooked)
vaishnavh.bsky.social
but these pressures are real and have always persisted.



I think @abeirami.bsky.social may be interested in this rant.
vaishnavh.bsky.social
but now I've the maturity to seek validation from things like "a specific person complimenting my work" or even better, "a meaningful citation where someone substantially builds on my work." (ofc, i also seek internal validation/satisfaction but I gotta be realistic, lol).
vaishnavh.bsky.social
i had intense first-hand struggle with a lot of these effects in my phd since i had <= 1 paper/year for the most part. i started managing it only after getting visibly recognized by experts for one of my papers at one point. i still struggle with it at some level.
vaishnavh.bsky.social
then there are many other insidious feedback cycles like the fact that publishing more => more visibility => more opportunities/networks/interfaces with the community/more citations => more opportunities/internships etc., => more papers
vaishnavh.bsky.social
for example, with the advent of twitter, there's a pressure to stay constantly visible and to have many different things to say every now and then (bec everyone else is doing that), rather than pitch your one paper again and again which starts feeling awkward :-(
vaishnavh.bsky.social
someday I hope to write a blog about "all the other forces that discourage me" from publishing less. people always say "publish less!" but without acknowledging these varied and nuanced forces
vaishnavh.bsky.social
all other incentivation strategies I had thought of are much more negative/mean. like
- "evaluating someone based on some bottom k papers" or
- "judging negatively for publishing >N papers"
vaishnavh.bsky.social
haha thank you! honored you feel that way!

btw, i just noticed this, this sort of a compliment is actually a great way to incentivize people to be more selective in publishing papers (and to counter all the other forces that discourage me from my rate of ~1 paper a year)
Reposted by Vaishnavh Nagarajan
tpimentel.bsky.social
A string may get 17 times less probability if tokenised as two symbols (e.g., ⟨he, llo⟩) than as one (e.g., ⟨hello⟩)—by an LM trained from scratch in each situation! Our new ACL paper proposes an observational method to estimate this causal effect! Longer thread soon!
Title of paper "Causal Estimation of Tokenisation Bias" and schematic of how we define tokenisation bias, which is the causal effect we are interested in.
Reposted by Vaishnavh Nagarajan
saxelab.bsky.social
How does in-context learning emerge in attention models during gradient descent training?

Sharing our new Spotlight paper @icmlconf.bsky.social: Training Dynamics of In-Context Learning in Linear Attention
arxiv.org/abs/2501.16265

Led by Yedi Zhang with @aaditya6284.bsky.social and Peter Latham
Reposted by Vaishnavh Nagarajan
eugenevinitsky.bsky.social
This paper is quite nice. It mixes some useful toy models of creativity with insights about how to induce more creativity in LLMs that are better than greedy sampling
vaishnavh.bsky.social
📢 New #paper on creativity & multi-token prediction! We design minimal open-ended tasks to argue:

→ LLMs are limited in creativity as they learn to predict the next token

→ creativity can be improved via multi-token learning & injecting noise ("seed-conditioning" 🌱) 1/ #MLSky #AI #arxiv 🧵👇🏽
vaishnavh.bsky.social
But, there's a lot of scope for exciting work:
→ generalizing these insights to real cows,
→ studying RL/CoT for creativity,
→ understanding surprising behaviors of seed-conditioning 10/👇🏽
vaishnavh.bsky.social
Of course, this is all a study of spherical cows. 🐮
Given the noisy, subjective studies of real cows, we believe an objective study brings
→much-needed clarity of thought (like disentangling the two modes of creativity),
→more ideas,
→better-defined experiments. 9/👇🏽
vaishnavh.bsky.social
Our vision is that seed-conditioning can help models sample a latent thought and articulate that one thought into words,

but temp sampling has to articulate multiple latent thoughts in parallel to produce a marginal next-word distribution -- this is more burdensome! 8/👇🏽
vaishnavh.bsky.social
Next, we revisit how to produce randomness: the go-to temp sampling 🌡️ vs. injecting a random prefix (seed-conditioning). 🌱

Remarkably, seed-conditioning produces meaningful diversity even w *greedy* decoding 🤑; it is competitive with temp & in some conditions, superior. 7/👇🏽
Figure showing algorithmic creativity with and without seed-conditioning.