samuelstevens.bsky.social
@samuelstevens.bsky.social
thanks to my labmates for the great discussion and forcing me to introspect and think about why I'm unbothered about multi-digit addition. 10/10 recommend the OSU NLP group for stuff like this!
April 12, 2025 at 4:08 PM
why this might be wrong:
1. gradient descent is a sort of evolution, where each step of "learning" must lead to improvement.
2. the arc agi benchmark: a non-trivial task that models fail at and cannot solve with existing tools.
April 12, 2025 at 4:08 PM
under this model, it doesn't matter that LLMs can't do multi-digit multiplication or inductive reasoning in their weight space, because they have other ways to do so (coding, tool use, etc). furthermore, the models can still achieve superhuman intelligence.
April 12, 2025 at 4:08 PM
another plausible argument (to me) is that llms can learn higher-level skills without lower-level skills because evolution did not kill off llms with missing foundational skills.
April 12, 2025 at 4:08 PM
if this is your model of intelligence, then missing the ability to copy strings, or do multi-digit addition implies that LLMs cannot reach superhuman intelligence because they lack the foundation to do so
April 12, 2025 at 4:08 PM
why doesn't it bother me? I think a popular mental model of intelligence is building skills on top of each other like a pyramid. through evolution or schooling, humans learned lots of new skills, but each skill depended on previous building blocks.
April 12, 2025 at 4:08 PM
Thanks to my wonderful mentors @weilunchao and Tanya Berger-Wolf and my advisor @ysu_nlp for their support on this project!

website: osu-nlp-group.github.io/SAE-V/
SAE checkpoints: huggingface.co/collections...
code: github.com/osu-nlp-gro...
arxiv: arxiv.org/abs/2502.06755
Sparse Autoencoders for Scientifically Rigorous Interpretation of...
To truly understand vision models, we must not only interpret their learned features but also validate these interpretations through controlled experiments. Current approaches either provide...
arxiv.org
February 26, 2025 at 1:12 PM
By unifying interpretation with controlled experiments, SAEs enable rigorous scientific investigation of vision models. Check out our full paper for more examples and analysis.
February 26, 2025 at 1:12 PM
🏖️ For semantic segmentation, we can suppress specific concepts like "sand" across the entire image. The model then predicts the next most likely class (like "earth" or "water") while leaving other parts of the scene unchanged:
February 26, 2025 at 1:12 PM
🐦 In this bird classification example, when we suppress the "spotted" feature (technically mottling) on this bird's breast and neck, the ViT switches from predicting "Canada Warbler" to "Wilson Warbler", a similar bird species but without necklace pattern:
February 26, 2025 at 1:12 PM
So we built interactive demos where you can suppress specific features and watch model predictions change.

osu-nlp-group.github.io/SAE-V/#demos

See below for examples of what you can do.
February 26, 2025 at 1:12 PM
By decomposing dense activations into a larger but sparse space, we find a diverse and precise visual vocabulary.

But discovering features isn't enough. We need to prove they actually matter for model behavior.
February 26, 2025 at 1:12 PM
The difference between CLIP and visual-only models like DINOv2 is striking. CLIP forms country-specific visual representations, while DINOv2 doesn't see these cultural connections. Here are examples from a USA feature and a Brazil feature.
February 26, 2025 at 1:12 PM
It feels like cheating to build tools with such high-quality UX, *using* tools with such high-quality UX (aider running commands and fixing exceptions on its own, uv installing half a dozen packages instantly, etc).
December 24, 2024 at 8:32 PM
I wrote a small script to track progress towards 10K pullups in 2025 (samuelstevens.me/writing/10k) using these tools and was done in an ~hour without getting a notebook out.
Behold: 10K
A small Python utility to track personal habits.
samuelstevens.me
December 24, 2024 at 8:32 PM
Awesome work. Will the preprint be added to arxiv or another open-access site? I'm very excited to read about the transformer-based models.
December 22, 2024 at 3:13 PM
It took me years of programming to fully realize this. The earlier you internalize this idea, the better.

grugbrain.dev
December 12, 2024 at 5:29 PM
They send the same message: complexity is bad.

Grug: "apex predator of grug is complexity...given choice between complexity or one on one against t-rex, grug take t-rex"

John: "The greatest limitation in writing software is our ability to understand the systems we are creating"
December 12, 2024 at 5:29 PM