Lightnews — Scholar-powered news

David Bau @davidbau.bsky.social · 2d

Looking forward to #COLM2025 tomorrow. DM me if you'll also be there and want to meet to chat.

David Bau @davidbau.bsky.social · 10d

Who is going to be at #COLM2025?

I want to draw your attention to a COLM paper by my student @sfeucht.bsky.social that has totally changed the way I think and teach about LLM representations. The work is worth knowing.

And you can meet Sheridan at COLM, Oct 7!
bsky.app/profile/sfe...

5

David Bau @davidbau.bsky.social · 4d

And kudos to @ndif-team.bsky.social for keeping up with weekly youtube video posts on AI interpretability!

www.youtube.com/@NDIFTeam

NDIF Team

We're a research computing project cracking open the mysteries inside large-scale AI systems. The NSF National Deep Inference Fabric consists of a unique combination of hardware and software that pr...

www.youtube.com

2

David Bau @davidbau.bsky.social · 4d

There are a lot of interesting details that surface when you use SAEs to understand and control diffusion image synthesis models. Learn more in @wendlerc.bsky.social's talk.

NDIF Team @ndif-team.bsky.social · 4d

New YouTube video posted! @wendlerc.bsky.social presents his work using SAEs for diffusion text-to-image models. The authors find interpretable SAE features and demonstrate how these features can alter generated images.

Watch here: youtu.be/43NnaqGjArA

Interpreting SDXL Turbo Using Sparse Autoencoders with Chris Wendler

In this talk, Chris Wendler presents his recent work on using sparse autoencoders for diffusion models. In this work, they train SAEs on SDXL Turbo, finding ...

www.youtube.com

1 5

David Bau @davidbau.bsky.social · 5d

On the Good Fight podcast w substack.com/@yaschamounk I give a quick but careful primer on how modern AI works.

I also chat about our responsibility as machine learning scientists, and what we need to fix to get AI right.

Take a listen and reshare -

www.persuasion.community/p/david-bau

David Bau on How Artificial Intelligence Works

Yascha Mounk and David Bau delve into the “black box” of AI.

www.persuasion.community

2 6

David Bau @davidbau.bsky.social · 6d

I love the 'opinionated' approach taken by Aaron + team in this survey. It captures the ongoing work around the central casual puzzles we face in mechanistic interpretability.

Aaron Mueller @amuuueller.bsky.social · 6d

What's the right unit of analysis for understanding LLM internals? We explore in our mech interp survey (a major update from our 2024 ms).

We’ve added more recent work and more immediately actionable directions for future work. Now published in Computational Linguistics!

3

David Bau @davidbau.bsky.social · 10d

Thanks @kmahowald.bsky.social!

bsky.app/profile/kmah...

Kyle Mahowald (COLM 2025) @kmahowald.bsky.social · 10d

At UT we just got to hear about this in a zoom talk from @sfeucht.bsky.social. I echo the endorsement:
cool ideas about representations in llms with linguistic relevance!

David Bau @davidbau.bsky.social · 10d

Who is going to be at #COLM2025?

I want to draw your attention to a COLM paper by my student @sfeucht.bsky.social that has totally changed the way I think and teach about LLM representations. The work is worth knowing.

And you can meet Sheridan at COLM, Oct 7!
bsky.app/profile/sfe...

2

David Bau @davidbau.bsky.social · 10d

Read more at arxiv.org/abs/2504.03022 <- at COLM

footprints.baulab.info <- token context erasure
arithmetic.baulab.info <- concept parallelograms
dualroute.baulab.info <- the second induction route,
w a neat colab notebook.

@ericwtodd.bsky.social @byron.bsky.social @diatkinson.bsky.social

The Dual-Route Model of Induction

Do LLMs copy meaningful text by rote or by understanding meaning? Webpage for The Dual-Route Model of Induction (Feucht et al., 2025).

dualroute.baulab.info

1 7

David Bau @davidbau.bsky.social · 10d

The takeaway for me: LLMs separate their token processing from their conceptual processing. Akin to humans' dual route processing of speech.

We need to be aware when an LM is thinking about tokens or concepts.

They do both, and it makes a difference which way it's thinking.

1 2

David Bau @davidbau.bsky.social · 10d

If token-processing and concept-processing are largely separate, does killing one kill the other? Chris Olah's team in Olsson 2022 hypothesized that ICL emerges from token induction.

@keremsahin22.bsky.social + Sheridan are finding cool ways to look into Olah's induction hypothesis too!

1 2

David Bau @davidbau.bsky.social · 10d

The representation space within the concept induction heads also has a more "meaningful" geometry than the transformer as a whole.

Sheridan discovered (Neurips mechint 2025) that semantic vector arithmetic works better in this space. (Token semantics work in tokenspace.)

arithmetic.baulab.info/

1 2

David Bau @davidbau.bsky.social · 10d

If you disable token induction heads and ask the model to copy text with only the concept induction heads, it will NOT copy exactly. It will paraphrase the text.

That happens even for computer code. They copy the BEHAVIOR of the code, but write it in a totally different way!

1 2

David Bau @davidbau.bsky.social · 10d

An amazing thing about the "concepts" in this 2nd route: they are *not* literal words. They are totally language-independent.

If the target context is in Chinese, they will copy the concept into Chinese. Or patch them between runs to get Italian. They mediate translation.

1 2

David Bau @davidbau.bsky.social · 10d

This second set of text-copying attention heads also shows up in every LLM we tested, and these heads work in a totally different way from token induction heads.

Instead of copying tokens, they copy *concepts*.

1 2

David Bau @davidbau.bsky.social · 10d

So Sheridan scrutinized copying mechanisms in LLMs and found a SECOND route.

Yes, the token induction of Elhage and Olsson is there.

But there is *another* route where the copying is done in a different way. It shows up it in attention heads that do 2-ahead copying.
bsky.app/profile/sfe...

1 2

David Bau @davidbau.bsky.social · 10d

Sherdian's erasure is Bad News for induction heads.

Induction heads are how transformers copy text: they find earlier tokens in identical contexts. (Elhage 2021, Olsson 2022 arxiv.org/abs/2209.11895)

But when that context "what token came before" is erased, how could induction possibly work?

1 2

David Bau @davidbau.bsky.social · 10d

Why is that weird? In a transformer each token knows its context, "which tokens came before" and in probes Sheridan found that info is always there when sequences are meaningLESS.

But in meaningFUL phrases, the LM often ERASES the context!!

Exactly opposite of what we expected.

1 2

David Bau @davidbau.bsky.social · 10d

The work starts with a mystery!

In footprints.baulab.info (EMNLP) while dissecting the problem of how LMs read badly tokenized words like " n.ort.he.astern", Sheridan found a huge surprise: they do it by _erasing_ contextual information.

1 2

David Bau @davidbau.bsky.social · 10d

Who is going to be at #COLM2025?

I want to draw your attention to a COLM paper by my student @sfeucht.bsky.social that has totally changed the way I think and teach about LLM representations. The work is worth knowing.

And you can meet Sheridan at COLM, Oct 7!
bsky.app/profile/sfe...

1 8 38

David Bau @davidbau.bsky.social · 11d

You'll get early access to the new system.

We will work with you to make sure that the models you need work with our new system: dedicated support!

And by being an early adopter you will help us make NDIF more useful to research community. Thank you!

t.co/Eo3HTC3bQn

NDIF Hot-swapping Beta Testing

Do you have a research project where you plan to study many different models? NDIF will soon be deploying model hot-swapping, which will enable users to access any HuggingFace model remotely via NDIF. We are soliciting applications for a pilot program to beta test our hot-swapping functionality on real research. By participating, you will: Be in the first cohort of users to access models beyond our whitelist (including checkpoints) Directly control which models are hosted on the NDIF backend Receive 1:1 research and technical support from NDIF team Give feedback to NDIF, guiding future user experience Application Information: Apply by October 1st, 2025 Acceptance by October 15th, 2025 Applications will be reviewed based on impact, feasibility, and fit with the NDIF/NNsight platform We are particularly interested in supporting work on model checkpoints and training dynamics Please email [email protected] with any questions.

docs.google.com

2

David Bau @davidbau.bsky.social · 11d

What's new?

We are EXPANDING NDIF to support MANY more models!

But we need your help. If you are doing research on e.g., OLMO checkpoints or any other model beyond this list nnsight.net/status/

Then sign up for the NDIF Hot-swapping pilot here:

t.co/Eo3HTC3bQn

NDIF Hot-swapping Beta Testing

Do you have a research project where you plan to study many different models? NDIF will soon be deploying model hot-swapping, which will enable users to access any HuggingFace model remotely via NDIF. We are soliciting applications for a pilot program to beta test our hot-swapping functionality on real research. By participating, you will: Be in the first cohort of users to access models beyond our whitelist (including checkpoints) Directly control which models are hosted on the NDIF backend Receive 1:1 research and technical support from NDIF team Give feedback to NDIF, guiding future user experience Application Information: Apply by October 1st, 2025 Acceptance by October 15th, 2025 Applications will be reviewed based on impact, feasibility, and fit with the NDIF/NNsight platform We are particularly interested in supporting work on model checkpoints and training dynamics Please email [email protected] with any questions.

docs.google.com

1 2

David Bau @davidbau.bsky.social · 11d

As you know ndif.us/ is a "white-box" inference service.

It lets you crack open the model and trace and modify its internals. We run the models for you on NSF servers.

Up to now, NDIF has supported small set of a dozen models.

NSF National Deep Inference Fabric

NDIF is a research computing project that enables researchers and students to crack open the mysteries inside large-scale AI systems.

ndif.us

1 2

David Bau @davidbau.bsky.social · 11d

Especially if you're curious about questions like
• What is GPT-OSS 120B thinking inside?
• What does OLMO-32b learn between all its hundreds of checkpoints?
• Why do Qwen3 layers have such different roles from LLama's?
• How does Foundation-Sec reason about cybersecurity?

1 2

David Bau @davidbau.bsky.social · 11d

Announcing a broad expansion of the National Deep Inference Fabric.

This could be relevant to your research...

1 3 11

David Bau @davidbau.bsky.social · 17d

The NDIF youtube talk series continues... Don't miss the fascinating talks on by Xu Pan and Josh Engels, on the NDIF youtube channel.

www.youtube.com/channel/UCaQ...

1 4

David Bau @davidbau.bsky.social · 17d

In the wake of the Jimmy Kimmel firing: Do not underestimate the power of the truth.

The truth is our superpower.

davidbau.com/archives/202...

davidbau.com The Truth is Our Superpower

davidbau.com

4