David Bau
@davidbau.bsky.social
2.1K followers 240 following 140 posts
Interpretable Deep Networks. http://baulab.info/ @davidbau
Posts Media Videos Starter Packs
Looking forward to #COLM2025 tomorrow. DM me if you'll also be there and want to meet to chat.
Who is going to be at #COLM2025?

I want to draw your attention to a COLM paper by my student @sfeucht.bsky.social that has totally changed the way I think and teach about LLM representations. The work is worth knowing.

And you can meet Sheridan at COLM, Oct 7!
bsky.app/profile/sfe...
There are a lot of interesting details that surface when you use SAEs to understand and control diffusion image synthesis models. Learn more in @wendlerc.bsky.social's talk.
On the Good Fight podcast w substack.com/@yaschamounk I give a quick but careful primer on how modern AI works.

I also chat about our responsibility as machine learning scientists, and what we need to fix to get AI right.

Take a listen and reshare -

www.persuasion.community/p/david-bau
David Bau on How Artificial Intelligence Works
Yascha Mounk and David Bau delve into the “black box” of AI.
www.persuasion.community
I love the 'opinionated' approach taken by Aaron + team in this survey. It captures the ongoing work around the central casual puzzles we face in mechanistic interpretability.
amuuueller.bsky.social
What's the right unit of analysis for understanding LLM internals? We explore in our mech interp survey (a major update from our 2024 ms).

We’ve added more recent work and more immediately actionable directions for future work. Now published in Computational Linguistics!
kmahowald.bsky.social
At UT we just got to hear about this in a zoom talk from @sfeucht.bsky.social. I echo the endorsement:
cool ideas about representations in llms with linguistic relevance!
Who is going to be at #COLM2025?

I want to draw your attention to a COLM paper by my student @sfeucht.bsky.social that has totally changed the way I think and teach about LLM representations. The work is worth knowing.

And you can meet Sheridan at COLM, Oct 7!
bsky.app/profile/sfe...
The takeaway for me: LLMs separate their token processing from their conceptual processing. Akin to humans' dual route processing of speech.

We need to be aware when an LM is thinking about tokens or concepts.

They do both, and it makes a difference which way it's thinking.
If token-processing and concept-processing are largely separate, does killing one kill the other? Chris Olah's team in Olsson 2022 hypothesized that ICL emerges from token induction.

@keremsahin22.bsky.social + Sheridan are finding cool ways to look into Olah's induction hypothesis too!
The representation space within the concept induction heads also has a more "meaningful" geometry than the transformer as a whole.

Sheridan discovered (Neurips mechint 2025) that semantic vector arithmetic works better in this space. (Token semantics work in tokenspace.)

arithmetic.baulab.info/
If you disable token induction heads and ask the model to copy text with only the concept induction heads, it will NOT copy exactly. It will paraphrase the text.

That happens even for computer code. They copy the BEHAVIOR of the code, but write it in a totally different way!
An amazing thing about the "concepts" in this 2nd route: they are *not* literal words. They are totally language-independent.

If the target context is in Chinese, they will copy the concept into Chinese. Or patch them between runs to get Italian. They mediate translation.
This second set of text-copying attention heads also shows up in every LLM we tested, and these heads work in a totally different way from token induction heads.

Instead of copying tokens, they copy *concepts*.
So Sheridan scrutinized copying mechanisms in LLMs and found a SECOND route.

Yes, the token induction of Elhage and Olsson is there.

But there is *another* route where the copying is done in a different way. It shows up it in attention heads that do 2-ahead copying.
bsky.app/profile/sfe...
Sherdian's erasure is Bad News for induction heads.

Induction heads are how transformers copy text: they find earlier tokens in identical contexts. (Elhage 2021, Olsson 2022 arxiv.org/abs/2209.11895)

But when that context "what token came before" is erased, how could induction possibly work?
Why is that weird? In a transformer each token knows its context, "which tokens came before" and in probes Sheridan found that info is always there when sequences are meaningLESS.

But in meaningFUL phrases, the LM often ERASES the context!!

Exactly opposite of what we expected.
The work starts with a mystery!

In footprints.baulab.info (EMNLP) while dissecting the problem of how LMs read badly tokenized words like " n.ort.he.astern", Sheridan found a huge surprise: they do it by _erasing_ contextual information.
Who is going to be at #COLM2025?

I want to draw your attention to a COLM paper by my student @sfeucht.bsky.social that has totally changed the way I think and teach about LLM representations. The work is worth knowing.

And you can meet Sheridan at COLM, Oct 7!
bsky.app/profile/sfe...
As you know ndif.us/ is a "white-box" inference service.

It lets you crack open the model and trace and modify its internals. We run the models for you on NSF servers.

Up to now, NDIF has supported small set of a dozen models.
NSF National Deep Inference Fabric
NDIF is a research computing project that enables researchers and students to crack open the mysteries inside large-scale AI systems.
ndif.us
Especially if you're curious about questions like
• What is GPT-OSS 120B thinking inside?
• What does OLMO-32b learn between all its hundreds of checkpoints?
• Why do Qwen3 layers have such different roles from LLama's?
• How does Foundation-Sec reason about cybersecurity?
Announcing a broad expansion of the National Deep Inference Fabric.

This could be relevant to your research...
The NDIF youtube talk series continues... Don't miss the fascinating talks on by Xu Pan and Josh Engels, on the NDIF youtube channel.

www.youtube.com/channel/UCaQ...
In the wake of the Jimmy Kimmel firing: Do not underestimate the power of the truth.

The truth is our superpower.

davidbau.com/archives/202...
davidbau.com The Truth is Our Superpower
davidbau.com