Lightnews — Scholar-powered news

David Bau

@davidbau.bsky.social

If Sam Altman can't listen to his moral convictions, he will listen to his employees.

It is important for tech employees to make it clear that we will not accept making AI for authoritarianism.

davidbau.github.io/poetsandnurses

On GitHub. PRs welcome.
github.com/davidbau/poetsandnurses

Murders of American Citizens on the streets of Minneapolis.

January 28, 2026 at 12:58 PM

David Bau

@davidbau.bsky.social

Also posted here mag.re-alignment.com/p/the-art-o...

The Art of Wanting

Wanting the world to be a certain way is our privilege and our unique responsibility. Understanding what you really want is nontrivial, utterly difficult, essentially human.

mag.re-alignment.com

January 27, 2026 at 3:32 PM

David Bau

@davidbau.bsky.social

I left Google in 2015 to pursue the insight that "Volo Ergo Sum."

That the central challenge in AI is how to amplify human agency. This is not easy.

Do you think AI will ever be superhuman at taking responsibility for what should be?

Read more:
davidbau.com/archives/20...

January 27, 2026 at 3:32 PM

David Bau

@davidbau.bsky.social

On the other hand we have @dhadfieldmenell.bsky.social who draws a line at normative judgments.

x.com/dhadfieldme...

January 27, 2026 at 3:32 PM

David Bau

@davidbau.bsky.social

Noam's comment is a response to NY Times Opinion by Blair Effron www.nytimes.com/2026/01/25/... which is worth reading.

Opinion | Why A.I. Can’t Make Thoughtful Decisions

Computers still don’t do well with vagueness and uncertainty.

www.nytimes.com

January 27, 2026 at 3:32 PM

David Bau

@davidbau.bsky.social

That question is in the air. Today @polynoamial.bsky.social pokes fun at the series of "AI can't do what a human brain can do" predictions.

Of course AI can do anything a human brain can do, Noam argues. Including making wise decisions.
x.com/polynoamial...

January 27, 2026 at 3:32 PM

David Bau

@davidbau.bsky.social

It is interesting how the formality of code is such a good opportunity for dealing with AI complexity.

December 19, 2025 at 1:30 AM

David Bau

@davidbau.bsky.social

Paper, code, website; Please help reshare Arnab's bsky thread:

bsky.app/profile/arn...

Arnab Sen Sharma (@arnabsensharma.bsky.social)

Thanks to my collaborators Giordano Rogers, @natalieshapira.bsky.social, and @davidbau.bsky.social . Checkout our paper for more details: 📜 arxiv.org/pdf/2510.26784 💻 https://github.com/arnab-api/filter 🌐 filter.baulab.info https://arxiv.org/pdf/2510.26784

bsky.app

November 6, 2025 at 2:00 PM

David Bau

@davidbau.bsky.social

When you read the paper, be sure to check out the appendix where @arnab_api discusses how pointer and value data are entangled in filters.

And possible applications of the filter mechanism, like as a zero-shot "lie detector" that can flag incorrect statements in ordinary text.

November 6, 2025 at 2:00 PM

David Bau

@davidbau.bsky.social

Curiously, when the question precedes the list of candidates, there is an abstract predicate for "this is the answer I am looking for," that that tags items in a list as soon as they are seen.
bsky.app/profile/arn...

Arnab Sen Sharma (@arnabsensharma.bsky.social)

When the question is presented *after* the options, filter heads can achieve high causality scores across language and format changes! This suggests that the encoded predicate is robust against such perturbations.

bsky.app

November 6, 2025 at 2:00 PM

David Bau

@davidbau.bsky.social

The neural representations for LLM filter heads are language independent!

If we pick up the representation for a question in French, it will accurately match items expressed in the Thai language.

November 6, 2025 at 2:00 PM

David Bau

@davidbau.bsky.social

Arnab calls predicate attention heads "filter heads" because the same heads filter many properties across objects, people, and landmarks.

The generic structure resembles functional programming's "filter" function, with a common mechanism handling a wide range of predicates.
bsky.app/profile/arn...

Arnab Sen Sharma (@arnabsensharma.bsky.social)

🔍 In Llama-70B and Gemma-27B, we found special attention heads that consistently focus their attention on the filtered items. This behavior seems consistent across a range of different formats and semantic types.

bsky.app

November 6, 2025 at 2:00 PM

David Bau

@davidbau.bsky.social

How embarrassing for me and confusing to the LLM!

OK, here it is fixed. Nice thing about workbench is that it just takes a second to edit the prompt, and you can see how the LLM responds, now deciding very early it should be ':'

October 11, 2025 at 2:21 PM

David Bau

@davidbau.bsky.social

... @wendlerc.bsky.social and @sfeucht.bsky.social ....

October 11, 2025 at 12:25 PM

David Bau

@davidbau.bsky.social

Help me thank the NDIF team for rolling out workbench.ndif.us/ by using it to make your own discoveries inside LLM internals. We should all be looking inside our LLMs.

Share the tool! Share what you find!

And send the team feedback -
bsky.app/profile/ndi...

NDIF Team (@ndif-team.bsky.social)

This is a public beta, so we expect bugs and actively want your feedback: https://forms.gle/WsxmZikeLNw34LBV9

bsky.app

October 11, 2025 at 12:02 PM

David Bau

@davidbau.bsky.social

That process was noticed by @wendlerch in arxiv.org/abs/2402.10588 and studied by @sheridan_feucht in dualroute.baulab.info

Try it out yourself on workbench.ndif.us/.

Does it work with other words? Can you find interesting exceptions? How about prompts beyond translation?

October 11, 2025 at 12:02 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news