Lightnews — Scholar-powered news

David Bau

@davidbau.bsky.social

If Sam Altman can't listen to his moral convictions, he will listen to his employees.

It is important for tech employees to make it clear that we will not accept making AI for authoritarianism.

davidbau.github.io/poetsandnurses

On GitHub. PRs welcome.
github.com/davidbau/poetsandnurses

Murders of American Citizens on the streets of Minneapolis.

January 28, 2026 at 12:58 PM

David Bau

@davidbau.bsky.social

I left Google in 2015 to pursue the insight that "Volo Ergo Sum."

That the central challenge in AI is how to amplify human agency. This is not easy.

Do you think AI will ever be superhuman at taking responsibility for what should be?

Read more:
davidbau.com/archives/20...

January 27, 2026 at 3:32 PM

David Bau

@davidbau.bsky.social

On the other hand we have @dhadfieldmenell.bsky.social who draws a line at normative judgments.

x.com/dhadfieldme...

January 27, 2026 at 3:32 PM

David Bau

@davidbau.bsky.social

That question is in the air. Today @polynoamial.bsky.social pokes fun at the series of "AI can't do what a human brain can do" predictions.

Of course AI can do anything a human brain can do, Noam argues. Including making wise decisions.
x.com/polynoamial...

January 27, 2026 at 3:32 PM

David Bau

@davidbau.bsky.social

The Art of Wanting.

About the question I see as central in AI ethics, interpretability, and safety. Can an AI take responsibility? I do not think so, but *not* because it's not smart enough.

davidbau.com/archives/20...

January 27, 2026 at 3:32 PM

David Bau

@davidbau.bsky.social

What should academics be doing right now?

I have been writing up some thoughts on what the research says about effective action, and what universities specifically can do.

davidbau.github.io/poetsandnurs...

It's on GitHub. Suggestions and pull requests welcome.
github.com/davidbau/poe...

Federal agents with weapons drawn, moments before murdering American citizens on the streets of Minneapolis at the dawn of 2026.

January 26, 2026 at 3:27 AM

David Bau

@davidbau.bsky.social

From induction to FVs, every ICL mechanism we've pinned down is fuzzy copying.

Is copying all there is?

@ericwtodd.bsky.social trained on groups where tokens have no fixed meaning and found a basket of mechanisms beyond copying.

Watch them emerge, a grokking cascade! ↓

bsky.app/profile/eri...

January 25, 2026 at 4:37 PM

David Bau

@davidbau.bsky.social

I can't read Chinese, but my family has old genealogy documents I've always wanted to understand. Claude and Gemini helped me build an interactive reader to explore the calligraphy character by character.

I can finally read my great-grandfather's epitaph. Try it:
davidbau.com/archives/202...

Screenshot of Chinese calligraphy reader web application

January 12, 2026 at 3:12 AM

David Bau

@davidbau.bsky.social

My vibe-coded Mandelbrot viewer is 40x faster now! New GPU synchronization tricks go outside the design intent of WebGPU specs. But the real story: Claude tells me what happens in the AGI break room.

What superhuman AGIs say when the boss is not around:
davidbau.com/archives/202...

January 6, 2026 at 1:00 AM

David Bau

@davidbau.bsky.social

I have been teaching myself to vibe code.

Watch Claude Code grow my 780 lines to 13,600 - mandelbrot.page/coverage/ca...

Two fundamental rules for staying in control:
davidbau.com/archives/20...

December 18, 2025 at 8:01 PM

David Bau

@davidbau.bsky.social

At the #Neurips2025 mechanistic interpretability workshop I gave a brief talk about Venetian glassmaking, since I think we face a similar moment in AI research today.

Here is a blog post summarizing the talk:

davidbau.com/archives/202...

The Doge of Venice visits a Murano glassworks in the 17th century. I will talk about why glassmaking in this era has some similarities to AI research today.

December 11, 2025 at 3:03 PM

David Bau

@davidbau.bsky.social

When you read the paper, be sure to check out the appendix where @arnab_api discusses how pointer and value data are entangled in filters.

And possible applications of the filter mechanism, like as a zero-shot "lie detector" that can flag incorrect statements in ordinary text.

November 6, 2025 at 2:00 PM

David Bau

@davidbau.bsky.social

The neural representations for LLM filter heads are language independent!

If we pick up the representation for a question in French, it will accurately match items expressed in the Thai language.

November 6, 2025 at 2:00 PM

David Bau

@davidbau.bsky.social

How embarrassing for me and confusing to the LLM!

OK, here it is fixed. Nice thing about workbench is that it just takes a second to edit the prompt, and you can see how the LLM responds, now deciding very early it should be ':'

October 11, 2025 at 2:21 PM

David Bau

@davidbau.bsky.social

The lens reveals: the model does NOT go directly from amore to "amor" or "amour" by just dropping or adding letters!

Instead it first "thinks" about the (English) word "love".

In other words: LLMs translate using *concepts*, not tokens.

October 11, 2025 at 12:02 PM

David Bau

@davidbau.bsky.social

Enter a translation prompt: "Italiano: amore, Español: amor, François:".

The workbench doesn't just show you the model's output. It shows the grid of internal states that lead to the output. Researchers call this visualization the "logit lens".

October 11, 2025 at 12:02 PM

David Bau

@davidbau.bsky.social

What does an LLM do when it translates from Italian "amore" to Spanish "amor" or French "amour"?

That's easy! (you might think) Because surely it knows: amore, amor, amour are all based on the same Latin word. It can just drop the "e", or add a "u".

October 11, 2025 at 12:02 PM

David Bau

@davidbau.bsky.social

The takeaway for me: LLMs separate their token processing from their conceptual processing. Akin to humans' dual route processing of speech.

We need to be aware when an LM is thinking about tokens or concepts.

They do both, and it makes a difference which way it's thinking.

September 27, 2025 at 8:54 PM

David Bau

@davidbau.bsky.social

If token-processing and concept-processing are largely separate, does killing one kill the other? Chris Olah's team in Olsson 2022 hypothesized that ICL emerges from token induction.

@keremsahin22.bsky.social + Sheridan are finding cool ways to look into Olah's induction hypothesis too!

September 27, 2025 at 8:54 PM

David Bau

@davidbau.bsky.social

The representation space within the concept induction heads also has a more "meaningful" geometry than the transformer as a whole.

Sheridan discovered (Neurips mechint 2025) that semantic vector arithmetic works better in this space. (Token semantics work in tokenspace.)

arithmetic.baulab.info/

September 27, 2025 at 8:54 PM

David Bau

@davidbau.bsky.social

If you disable token induction heads and ask the model to copy text with only the concept induction heads, it will NOT copy exactly. It will paraphrase the text.

That happens even for computer code. They copy the BEHAVIOR of the code, but write it in a totally different way!

September 27, 2025 at 8:54 PM

David Bau

@davidbau.bsky.social

An amazing thing about the "concepts" in this 2nd route: they are *not* literal words. They are totally language-independent.

If the target context is in Chinese, they will copy the concept into Chinese. Or patch them between runs to get Italian. They mediate translation.

September 27, 2025 at 8:54 PM

David Bau

@davidbau.bsky.social

This second set of text-copying attention heads also shows up in every LLM we tested, and these heads work in a totally different way from token induction heads.

Instead of copying tokens, they copy *concepts*.

September 27, 2025 at 8:54 PM

David Bau

@davidbau.bsky.social

So Sheridan scrutinized copying mechanisms in LLMs and found a SECOND route.

Yes, the token induction of Elhage and Olsson is there.

But there is *another* route where the copying is done in a different way. It shows up it in attention heads that do 2-ahead copying.
bsky.app/profile/sfe...

September 27, 2025 at 8:54 PM

David Bau

@davidbau.bsky.social

Sherdian's erasure is Bad News for induction heads.

Induction heads are how transformers copy text: they find earlier tokens in identical contexts. (Elhage 2021, Olsson 2022 arxiv.org/abs/2209.11895)

But when that context "what token came before" is erased, how could induction possibly work?

September 27, 2025 at 8:54 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news