gsarti.com
Find them here: gsarti.com/langlearn
Find them here: gsarti.com/langlearn
About the question I see as central in AI ethics, interpretability, and safety. Can an AI take responsibility? I do not think so, but *not* because it's not smart enough.
davidbau.com/archives/20...
About the question I see as central in AI ethics, interpretability, and safety. Can an AI take responsibility? I do not think so, but *not* because it's not smart enough.
davidbau.com/archives/20...
When Model A explains its Chain-of-Thought (CoT) , do Models B, C, and D interpret it the same way?
Our new preprint with @davidbau.bsky.social and @csinva.bsky.social explores CoT generalizability 🧵👇
(1/7)
When Model A explains its Chain-of-Thought (CoT) , do Models B, C, and D interpret it the same way?
Our new preprint with @davidbau.bsky.social and @csinva.bsky.social explores CoT generalizability 🧵👇
(1/7)
cb=c, ac=b, ab=?
A small transformer can learn to solve problems like this!
And since the letters don't have inherent meaning, this lets us study how context alone imparts meaning. Here's what we found:🧵⬇️
cb=c, ac=b, ab=?
A small transformer can learn to solve problems like this!
And since the letters don't have inherent meaning, this lets us study how context alone imparts meaning. Here's what we found:🧵⬇️
🪄interpreto is an interpretability toolbox for HF language models🤗. In both generation and classification!
Why do you need it, and for what?
1/8 (links at the end)
Watch the video: youtu.be/4eqvABPX5rA
Watch the video: youtu.be/4eqvABPX5rA
Learn more: ndif-team.github.io/nnterp/
What superhuman AGIs say when the boss is not around:
davidbau.com/archives/202...
In 2026, we'll grow the NDIF ecosystem and democratize access to interpretability methods for academics and domain experts! 🚀
In 2026, we'll grow the NDIF ecosystem and democratize access to interpretability methods for academics and domain experts! 🚀
We steer LLM generations to mimic human translator styles on literary novels in 7 languages. 📚
SAE steering can beat few-shot prompting, leading to better personalization while maintaining quality.
🧵1/
simonwillison.net/2025/Dec/31/...
This year it's divided into 26 sections! This is the table of contents:
simonwillison.net/2025/Dec/31/...
This year it's divided into 26 sections! This is the table of contents:
Watch Claude Code grow my 780 lines to 13,600 - mandelbrot.page/coverage/ca...
Two fundamental rules for staying in control:
davidbau.com/archives/20...
Watch Claude Code grow my 780 lines to 13,600 - mandelbrot.page/coverage/ca...
Two fundamental rules for staying in control:
davidbau.com/archives/20...
I'm grateful to my advisors @arianna-bis.bsky.social @malvinanissim.bsky.social and to everyone who played a role in this journey! 🎉 #PhDone
I'm grateful to my advisors @arianna-bis.bsky.social @malvinanissim.bsky.social and to everyone who played a role in this journey! 🎉 #PhDone
arxiv.org/abs/2512.04759
We warmly thank all the individuals involved for their extraordinary work, dedication, and collaborative spirit that made this project possible!
arxiv.org/abs/2512.04759
We warmly thank all the individuals involved for their extraordinary work, dedication, and collaborative spirit that made this project possible!
by matching their DBLP entries. Have a look!
by matching their DBLP entries. Have a look!
Our Temporal Feature Analyzer discovers contextual features in LLMs, that detect event boundaries, parse complex grammar, and represent ICL patterns.
Our Temporal Feature Analyzer discovers contextual features in LLMs, that detect event boundaries, parse complex grammar, and represent ICL patterns.
"Evaluating Interpretability Methods: Challenges and Future Directions" just started! 🎉 Come to learn more about the MIB benchmark and hear the takes of @michaelwhanna.bsky.social, Michal Golovanevsky, Nicolò Brunello and Mingyang Wang!
Paper: arxiv.org/abs/2503.03044
Slides/video/poster: underline.io/lecture/1315...
Paper: arxiv.org/abs/2503.03044
Slides/video/poster: underline.io/lecture/1315...