1. gradient descent is a sort of evolution, where each step of "learning" must lead to improvement.
2. the arc agi benchmark: a non-trivial task that models fail at and cannot solve with existing tools.
1. gradient descent is a sort of evolution, where each step of "learning" must lead to improvement.
2. the arc agi benchmark: a non-trivial task that models fail at and cannot solve with existing tools.
website: osu-nlp-group.github.io/SAE-V/
SAE checkpoints: huggingface.co/collections...
code: github.com/osu-nlp-gro...
arxiv: arxiv.org/abs/2502.06755
website: osu-nlp-group.github.io/SAE-V/
SAE checkpoints: huggingface.co/collections...
code: github.com/osu-nlp-gro...
arxiv: arxiv.org/abs/2502.06755
osu-nlp-group.github.io/SAE-V/#demos
See below for examples of what you can do.
osu-nlp-group.github.io/SAE-V/#demos
See below for examples of what you can do.
But discovering features isn't enough. We need to prove they actually matter for model behavior.
But discovering features isn't enough. We need to prove they actually matter for model behavior.
grugbrain.dev
grugbrain.dev
Grug: "apex predator of grug is complexity...given choice between complexity or one on one against t-rex, grug take t-rex"
John: "The greatest limitation in writing software is our ability to understand the systems we are creating"
Grug: "apex predator of grug is complexity...given choice between complexity or one on one against t-rex, grug take t-rex"
John: "The greatest limitation in writing software is our ability to understand the systems we are creating"