Kai Sandbrink
@ackaisa.bsky.social
880 followers 490 following 14 posts
Computational cognitive neuroscience PhD Student, Oxford & EPFL
Posts Media Videos Starter Packs
Reposted by Kai Sandbrink
brianchristian.bsky.social
Reward models (RMs) are the moral compass of LLMs – but no one has x-rayed them at scale. We just ran the first exhaustive analysis of 10 leading RMs, and the results were...eye-opening. Wild disagreement, base-model imprint, identity-term bias, mere-exposure quirks & more: 🧵
ackaisa.bsky.social
In summary, we show that task abstractions can be learned in simple models, and how they result from learning dynamics in multi-task settings. These abstractions allow for cognitive flexibility in neural nets. This was a really fun collaborative project - I look forward to seeing where we go next!
ackaisa.bsky.social
As a proof of concept, we show that our linear model can be used in conjunction with nonlinear networks trained on MNIST. We also show that our flexible model qualitatively matches human behavior in a task-switching experiment (Steyvers et al., 2019), while a forgetful model does not.
ackaisa.bsky.social
We show that our minimal components are sufficient to induce the flexible regime in a fully-connected network, where first layer weights specialize to teacher components and second layer weights produce distinct task-specific gating in single units of each row.
ackaisa.bsky.social
Using a SVD reduction, we study the network’s learning dynamics in the 2D task space. We reveal a virtuous cycle that facilitates the transition to the flexible regime: teacher-aligned weights accelerate gating, and fast gating protects teacher alignment, preventing interference.
ackaisa.bsky.social
We identify 3 components that facilitate flexible learning: bounded (regularized), nonnegative activity of gates, temporally correlated signals (task block length), and faster gate-to-weight timescales.
ackaisa.bsky.social
These learned abstractions are not only useful for switching between computations, but can also be used to combine different computations flexibly for generalization to compositional tasks comprised of the same core learned components.
ackaisa.bsky.social
We describe 1. a *flexible* learning regime where weights specialize to task computations and gates represent tasks (as abstractions), preserving learned information, and 2. a *forgetful* regime where knowledge is overwritten in each successive task.
ackaisa.bsky.social
We study a linear network with multiple paths modulated by gates with bounded activity and a faster timescale. We adopt a teacher-student setup and train the network on alternating task (teacher) blocks, jointly optimizing gates and weights using gradient descent.
ackaisa.bsky.social
Animals learn tasks by segmenting them into computations that are controlled by internal abstractions. Selecting and recombining these task abstractions then allows for flexible adaptation.

How do such abstractions emerge in neural networks in a multi-task environment?
ackaisa.bsky.social
Thrilled to share our NeurIPS Spotlight paper with Jan Bauer*, @aproca.bsky.social*, @saxelab.bsky.social, @summerfieldlab.bsky.social, Ali Hummos*! openreview.net/pdf?id=AbTpJ...

We study how task abstractions emerge in gated linear networks and how they support cognitive flexibility.
ackaisa.bsky.social
Remarkably, we find that this individual variation in behavior correlates well with PCs extracted from anxiety & depression and compulsivity transdiagnostic factor scores. We hope these findings can pave the way for using ANNs to study healthy and pathological meta-control! (4/4)
ackaisa.bsky.social
We perturb the hidden representations of the meta-RL networks along the axis used for APE prediction. When perturbed systematically, the models replicate human individual differences in performance across levels of controllability (3/4)
ackaisa.bsky.social
We ask humans and neural networks to complete observe or bet task variants that require adapting to changes in controllability. Meta-RL trained neural networks only match human performance when explicitly trained to predict APEs, mirroring error likelihood prediction in ACC (2/4)
ackaisa.bsky.social
Excited that the preprint for the work from my first two years of PhD at @summerfieldlab.bsky.social is out! In this work, we examine the role of action prediction errors (APEs) in cognitive control: osf.io/5ezxs (1/4)
OSF
osf.io