Lightnews — Scholar-powered news

David Bau

@davidbau.bsky.social

The Art of Wanting.

About the question I see as central in AI ethics, interpretability, and safety. Can an AI take responsibility? I do not think so, but *not* because it's not smart enough.

davidbau.com/archives/20...

January 27, 2026 at 3:32 PM

Reposted by David Bau

Juan Diego Rodriguez

@juand-r.bsky.social

I think everyone (not just academics) should read this.

David Bau @davidbau.bsky.social · 4d

What should academics be doing right now?

I have been writing up some thoughts on what the research says about effective action, and what universities specifically can do.

davidbau.github.io/poetsandnurs...

It's on GitHub. Suggestions and pull requests welcome.
github.com/davidbau/poe...

Federal agents with weapons drawn, moments before murdering American citizens on the streets of Minneapolis at the dawn of 2026.

January 26, 2026 at 4:14 AM

David Bau

@davidbau.bsky.social

What should academics be doing right now?

I have been writing up some thoughts on what the research says about effective action, and what universities specifically can do.

davidbau.github.io/poetsandnurs...

It's on GitHub. Suggestions and pull requests welcome.
github.com/davidbau/poe...

January 26, 2026 at 3:27 AM

David Bau

@davidbau.bsky.social

From induction to FVs, every ICL mechanism we've pinned down is fuzzy copying.

Is copying all there is?

@ericwtodd.bsky.social trained on groups where tokens have no fixed meaning and found a basket of mechanisms beyond copying.

Watch them emerge, a grokking cascade! ↓

bsky.app/profile/eri...

January 25, 2026 at 4:37 PM

David Bau

@davidbau.bsky.social

I can't read Chinese, but my family has old genealogy documents I've always wanted to understand. Claude and Gemini helped me build an interactive reader to explore the calligraphy character by character.

I can finally read my great-grandfather's epitaph. Try it:
davidbau.com/archives/202...

Screenshot of Chinese calligraphy reader web application

January 12, 2026 at 3:12 AM

David Bau

@davidbau.bsky.social

My vibe-coded Mandelbrot viewer is 40x faster now! New GPU synchronization tricks go outside the design intent of WebGPU specs. But the real story: Claude tells me what happens in the AGI break room.

What superhuman AGIs say when the boss is not around:
davidbau.com/archives/202...

January 6, 2026 at 1:00 AM

David Bau

@davidbau.bsky.social

I have been teaching myself to vibe code.

Watch Claude Code grow my 780 lines to 13,600 - mandelbrot.page/coverage/ca...

Two fundamental rules for staying in control:
davidbau.com/archives/20...

December 18, 2025 at 8:01 PM

David Bau

@davidbau.bsky.social

At the #Neurips2025 mechanistic interpretability workshop I gave a brief talk about Venetian glassmaking, since I think we face a similar moment in AI research today.

Here is a blog post summarizing the talk:

davidbau.com/archives/202...

The Doge of Venice visits a Murano glassworks in the 17th century. I will talk about why glassmaking in this era has some similarities to AI research today.

December 11, 2025 at 3:03 PM

David Bau

@davidbau.bsky.social

The secret life of an LM is defined by its internal data types. Inner layers transport abstractions that are more robust than words, like concepts, functions, or pointers.

In new work yesterday, @arnabsensharma.bsky.social et al identify a data type for *predicates*.

bsky.app/profile/arn...

Arnab Sen Sharma (@arnabsensharma.bsky.social)

How can a language model find the veggies in a menu? New pre-print where we investigate the internal mechanisms of LLMs when filtering on a list of options. Spoiler: turns out LLMs use strategies surprisingly similar to functional programming (think "filter" from python)! 🧵

bsky.app

November 6, 2025 at 2:00 PM

David Bau

@davidbau.bsky.social

What does an LLM do when it translates from Italian "amore" to Spanish "amor" or French "amour"?

That's easy! (you might think) Because surely it knows: amore, amor, amour are all based on the same Latin word. It can just drop the "e", or add a "u".

October 11, 2025 at 12:02 PM

David Bau

@davidbau.bsky.social

Looking forward to #COLM2025 tomorrow. DM me if you'll also be there and want to meet to chat.

David Bau @davidbau.bsky.social · Sep 27

Who is going to be at #COLM2025?

I want to draw your attention to a COLM paper by my student @sfeucht.bsky.social that has totally changed the way I think and teach about LLM representations. The work is worth knowing.

And you can meet Sheridan at COLM, Oct 7!
bsky.app/profile/sfe...

October 6, 2025 at 12:10 PM

David Bau

@davidbau.bsky.social

There are a lot of interesting details that surface when you use SAEs to understand and control diffusion image synthesis models. Learn more in @wendlerc.bsky.social's talk.

NDIF Team @ndif-team.bsky.social · Oct 3

New YouTube video posted! @wendlerc.bsky.social presents his work using SAEs for diffusion text-to-image models. The authors find interpretable SAE features and demonstrate how these features can alter generated images.

Watch here: youtu.be/43NnaqGjArA

Interpreting SDXL Turbo Using Sparse Autoencoders with Chris Wendler

In this talk, Chris Wendler presents his recent work on using sparse autoencoders for diffusion models. In this work, they train SAEs on SDXL Turbo, finding ...

www.youtube.com

October 3, 2025 at 6:52 PM

David Bau

@davidbau.bsky.social

On the Good Fight podcast w substack.com/@yaschamounk I give a quick but careful primer on how modern AI works.

I also chat about our responsibility as machine learning scientists, and what we need to fix to get AI right.

Take a listen and reshare -

www.persuasion.community/p/david-bau

David Bau on How Artificial Intelligence Works

Yascha Mounk and David Bau delve into the “black box” of AI.

www.persuasion.community

October 3, 2025 at 8:58 AM

David Bau

@davidbau.bsky.social

I love the 'opinionated' approach taken by Aaron + team in this survey. It captures the ongoing work around the central casual puzzles we face in mechanistic interpretability.

Aaron Mueller @amuuueller.bsky.social · Oct 1

What's the right unit of analysis for understanding LLM internals? We explore in our mech interp survey (a major update from our 2024 ms).

We’ve added more recent work and more immediately actionable directions for future work. Now published in Computational Linguistics!

October 1, 2025 at 2:25 PM

David Bau

@davidbau.bsky.social

Who is going to be at #COLM2025?

I want to draw your attention to a COLM paper by my student @sfeucht.bsky.social that has totally changed the way I think and teach about LLM representations. The work is worth knowing.

And you can meet Sheridan at COLM, Oct 7!
bsky.app/profile/sfe...

September 27, 2025 at 8:54 PM

David Bau

@davidbau.bsky.social

Announcing a broad expansion of the National Deep Inference Fabric.

This could be relevant to your research...

September 26, 2025 at 6:47 PM

David Bau

@davidbau.bsky.social

The NDIF youtube talk series continues... Don't miss the fascinating talks on by Xu Pan and Josh Engels, on the NDIF youtube channel.

www.youtube.com/channel/UCaQ...

September 20, 2025 at 7:20 PM

David Bau

@davidbau.bsky.social

In the wake of the Jimmy Kimmel firing: Do not underestimate the power of the truth.

The truth is our superpower.

davidbau.com/archives/202...

davidbau.com The Truth is Our Superpower

davidbau.com

September 20, 2025 at 7:17 PM

Reposted by David Bau

David Bau

@davidbau.bsky.social

Monday: Trump tries to fire Fed Governor Lisa Cook (first time in 111 years).
Thursday: CDC chief dismissed, four top scientists resign.

Discredit, dismiss, blame.

History shows exactly where this three-step pattern leads.

August 29, 2025 at 2:04 AM

David Bau

@davidbau.bsky.social

Monday: Trump tries to fire Fed Governor Lisa Cook (first time in 111 years).
Thursday: CDC chief dismissed, four top scientists resign.

Discredit, dismiss, blame.

History shows exactly where this three-step pattern leads.

August 29, 2025 at 2:04 AM

David Bau

@davidbau.bsky.social

This Friday NEMI 2025 is at Northeastern in Boston, 8 talks, 24 roundtables, 90 posters; 200+ attendees. Thanks to
goodfire.ai/ for sponsoring! nemiconf.github.io/summer25/

If you can't make it in person, the livestream will be here:
www.youtube.com/live/4BJBis...

New England Mechanistic Interpretability Workshop

About:The New England Mechanistic Interpretability (NEMI) workshop aims to bring together academic and industry researchers from the New England and surround...

www.youtube.com

August 18, 2025 at 6:06 PM

David Bau

@davidbau.bsky.social

Announcing a deep net interpretability talk series!

Every week you will find new talks on recent research in the science of neural networks. The first few are posted: jackmerullo.bsky.social, Roy Rinberg, and me.

At the @ndif-team.bsky.social Youtube Channel: www.youtube.com/@NDIFTeam

NDIF Team

We're a research computing project cracking open the mysteries inside large-scale AI systems. The NSF National Deep Inference Fabric consists of a unique combination of hardware and software that provides a remotely-accessible computing resource for scientists and students to perform detailed and reproducible experiments on large pretrained AI models, such as open large language models. We aim to make AI interpretability research more accessible through this channel by publishing lectures and educational content covering real interpretability research.

www.youtube.com

August 18, 2025 at 6:02 PM

David Bau

@davidbau.bsky.social

The New England Mechanistic Interpretability Workshop, NEMI 2025 is August 22 in Boston.

Talks, posters, meals, discussion... Most of all, an excellent chance to chat about new ideas with other great researchers in the field!

Help spread the word - register and repost -

bsky.app/profile/koy...

Koyena Pal (@koyena.bsky.social)

🚨 Registration is live! 🚨 The New England Mechanistic Interpretability (NEMI) Workshop is happening Aug 22nd 2025 at Northeastern University! A chance for the mech interp community to nerd out on how models really work 🧠🤖 🌐 Info: nemiconf.github.io/summer25/ 📝 Register: https://forms.gle/v4kJCweE3UUHUE81A

bsky.app

July 1, 2025 at 3:00 PM

David Bau

@davidbau.bsky.social

The new "Lookback" paper from @nikhil07prakash.bsky.social‬ contains a surprising insight...

70b/405b LLMs use double pointers, akin to C programmers' double (**) pointers. They show up when the LLM is "knowing what Sally knows Ann knows", i.e., Theory of Mind.

bsky.app/profile/nik...

@nikhil07prakash.bsky.social

How do language models track mental states of each character in a story, often referred to as Theory of Mind? We reverse-engineered how LLaMA-3-70B-Instruct handles a belief-tracking task and found something surprising: it uses mechanisms strikingly similar to pointer variables in C programming!

bsky.app

June 25, 2025 at 3:00 PM

David Bau

@davidbau.bsky.social

FRIENDS: American science is being decimated by Congress NOW.

Your help is needed to fix this. The current DC plan PERMANENTLY slashes NSF, NIH, all science training. Money isn't redirected—it's gone.

Please read+share what's happening

thevisible.net/posts/004-s...

June 3, 2025 at 4:15 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news