Lightnews — Scholar-powered news

Kai Sandbrink

@ackaisa.bsky.social

880 followers 490 following 14 posts

Computational cognitive neuroscience PhD Student, Oxford & EPFL

Posts Media Videos Starter Packs

Reposted by Kai Sandbrink

Brian Christian @brianchristian.bsky.social · Jun 23

Reward models (RMs) are the moral compass of LLMs – but no one has x-rayed them at scale. We just ran the first exhaustive analysis of 10 leading RMs, and the results were...eye-opening. Wild disagreement, base-model imprint, identity-term bias, mere-exposure quirks & more: 🧵

Kai Sandbrink @ackaisa.bsky.social · Dec 3

In summary, we show that task abstractions can be learned in simple models, and how they result from learning dynamics in multi-task settings. These abstractions allow for cognitive flexibility in neural nets. This was a really fun collaborative project - I look forward to seeing where we go next!

Kai Sandbrink @ackaisa.bsky.social · Dec 3

As a proof of concept, we show that our linear model can be used in conjunction with nonlinear networks trained on MNIST. We also show that our flexible model qualitatively matches human behavior in a task-switching experiment (Steyvers et al., 2019), while a forgetful model does not.

Kai Sandbrink @ackaisa.bsky.social · Dec 3

We show that our minimal components are sufficient to induce the flexible regime in a fully-connected network, where first layer weights specialize to teacher components and second layer weights produce distinct task-specific gating in single units of each row.

Kai Sandbrink @ackaisa.bsky.social · Dec 3

Using a SVD reduction, we study the network’s learning dynamics in the 2D task space. We reveal a virtuous cycle that facilitates the transition to the flexible regime: teacher-aligned weights accelerate gating, and fast gating protects teacher alignment, preventing interference.

Kai Sandbrink @ackaisa.bsky.social · Dec 3

We identify 3 components that facilitate flexible learning: bounded (regularized), nonnegative activity of gates, temporally correlated signals (task block length), and faster gate-to-weight timescales.

Kai Sandbrink @ackaisa.bsky.social · Dec 3

These learned abstractions are not only useful for switching between computations, but can also be used to combine different computations flexibly for generalization to compositional tasks comprised of the same core learned components.

Kai Sandbrink @ackaisa.bsky.social · Dec 3

We describe 1. a *flexible* learning regime where weights specialize to task computations and gates represent tasks (as abstractions), preserving learned information, and 2. a *forgetful* regime where knowledge is overwritten in each successive task.

Kai Sandbrink @ackaisa.bsky.social · Dec 3

We study a linear network with multiple paths modulated by gates with bounded activity and a faster timescale. We adopt a teacher-student setup and train the network on alternating task (teacher) blocks, jointly optimizing gates and weights using gradient descent.

Kai Sandbrink @ackaisa.bsky.social · Dec 3

Animals learn tasks by segmenting them into computations that are controlled by internal abstractions. Selecting and recombining these task abstractions then allows for flexible adaptation.

How do such abstractions emerge in neural networks in a multi-task environment?

Kai Sandbrink @ackaisa.bsky.social · Dec 3

Thrilled to share our NeurIPS Spotlight paper with Jan Bauer*, @aproca.bsky.social*, @saxelab.bsky.social, @summerfieldlab.bsky.social, Ali Hummos*! openreview.net/pdf?id=AbTpJ...

We study how task abstractions emerge in gated linear networks and how they support cognitive flexibility.

Kai Sandbrink @ackaisa.bsky.social · Sep 20

Remarkably, we find that this individual variation in behavior correlates well with PCs extracted from anxiety & depression and compulsivity transdiagnostic factor scores. We hope these findings can pave the way for using ANNs to study healthy and pathological meta-control! (4/4)

Kai Sandbrink @ackaisa.bsky.social · Sep 20

We perturb the hidden representations of the meta-RL networks along the axis used for APE prediction. When perturbed systematically, the models replicate human individual differences in performance across levels of controllability (3/4)

Kai Sandbrink @ackaisa.bsky.social · Sep 20

We ask humans and neural networks to complete observe or bet task variants that require adapting to changes in controllability. Meta-RL trained neural networks only match human performance when explicitly trained to predict APEs, mirroring error likelihood prediction in ACC (2/4)

Kai Sandbrink @ackaisa.bsky.social · Sep 20

Excited that the preprint for the work from my first two years of PhD at @summerfieldlab.bsky.social is out! In this work, we examine the role of action prediction errors (APEs) in cognitive control: osf.io/5ezxs (1/4)