Stephanie Chan
@scychan.bsky.social
940 followers 280 following 15 posts
Staff Research Scientist at Google DeepMind. Artificial and biological brains 🤖 🧠
Posts Media Videos Starter Packs
Reposted by Stephanie Chan
lampinen.bsky.social
In neuroscience, we often try to understand systems by analyzing their representations — using tools like regression or RSA. But are these analyses biased towards discovering a subset of what a system represents? If you're interested in this question, check out our new commentary! Thread:
What do representations tell us about a system? Image of a mouse with a scope showing a vector of activity patterns, and a neural network with a vector of unit activity patterns
Common analyses of neural representations: Encoding models (relating activity to task features) drawing of an arrow from a trace saying [on_____on____] to a neuron and spike train. Comparing models via neural predictivity: comparing two neural networks by their R^2 to mouse brain activity. RSA: assessing brain-brain or model-brain correspondence using representational dissimilarity matrices
scychan.bsky.social
Great new paper by @jessegeerts.bsky.social, looking at a certain type of generalization in transformers -- transitive inference -- and what conditions induce this type of generalization
jessegeerts.bsky.social
🧠 How do transformers learn relational reasoning? We trained small transformers on transitive inference (if A>B and B>C, then A>C) and discovered striking differences between learning paradigms. Our latest work reveals when and why AI systems generalize beyond training data 🤖
scychan.bsky.social
New paper: Generalization from context often outperforms generalization from finetuning.

And you might get the best of both worlds by spending extra compute and train time to augment finetuning.
lampinen.bsky.social
How do language models generalize from information they learn in-context vs. via finetuning? In arxiv.org/abs/2505.00661 we show that in-context learning can generalize more flexibly, illustrating key differences in the inductive biases of these modes of learning — and ways to improve finetuning. 1/
arxiv.org
scychan.bsky.social
It was such a pleasure to co-supervise this research, but
@aaditya6284.bsky.social should really take the bulk of the credit :)

And thank you so much to all our wonderful collaborators, who made fundamental contributions as well!
Ted Moskovitz, Sara Dragutinovic, Felix Hill, @saxelab.bsky.social
scychan.bsky.social
This paper is dedicated to our collaborator
Felix Hill, who passed away recently. This is our last ever paper with him.

It was bittersweet to finish this research, which contains so much of the scientific spark that he shared with us. Rest in peace Felix, and thank you so much for everything.
scychan.bsky.social
Some general takeaways for interp:
scychan.bsky.social
4. We provide intuition for these dynamics through a simple mathematical model.
scychan.bsky.social
3. A lot of previous work (including our own), has emphasized *competition* between in-context and in-weights learning.

But we find that cIWL and ICL actually compete AND cooperate, via shared subcircuits. In fact, ICL cannot emerge if cIWL is blocked from emerging, even though ICL emerges first!
scychan.bsky.social
2. At the end of training, ICL doesn't give way to in-weights learning (IWL), as we previously thought. Instead, the model prefers a surprising strategy that is a *combination* of the two!

We call this combo "cIWL" (context-constrained in-weights learning).
scychan.bsky.social
1. We aimed to better understand the transience of in-context-learning (ICL) -- where ICL can emerge but then disappear after long training times.
scychan.bsky.social
Dropping a few high-level takeaways in this thread.

For more details please see Aaditya's thread,
or the paper itself.
bsky.app/profile/aadi...
arxiv.org/abs/2503.05631
scychan.bsky.social
New work led by
@aaditya6284.bsky.social

"Strategy coopetition explains the emergence and transience of in-context learning in transformers."

We find some surprising things!! E.g. that circuits can simultaneously compete AND cooperate ("coopetition") 😯 🧵👇
Reposted by Stephanie Chan
lampinen.bsky.social
What counts as in-context learning (ICL)? Typically, you might think of it as learning a task from a few examples. However, we’ve just written a perspective (arxiv.org/abs/2412.03782) suggesting interpreting a much broader spectrum of behaviors as ICL! Quick summary thread: 1/7
The broader spectrum of in-context learning
The ability of language models to learn a task from a few examples in context has generated substantial interest. Here, we provide a perspective that situates this type of supervised few-shot learning...
arxiv.org
Reposted by Stephanie Chan
noemielteto.bsky.social
Introducing the :milkfoamo: emoji
scychan.bsky.social
Hahaha. We need a cappuccino emoji?!
scychan.bsky.social
I'll be not at Neurips this week. Let's grab coffee if you want to fomo-commiserate with me
scychan.bsky.social
Hello hello. Testing testing 123