Jesse Geerts
@jessegeerts.bsky.social
86 followers 55 following 14 posts
Cognitive neuroscientist and AI researcher
Posts Media Videos Starter Packs
Pinned
jessegeerts.bsky.social
🧠 How do transformers learn relational reasoning? We trained small transformers on transitive inference (if A>B and B>C, then A>C) and discovered striking differences between learning paradigms. Our latest work reveals when and why AI systems generalize beyond training data 🤖
Reposted by Jesse Geerts
kristorpjensen.bsky.social
I’m super excited to finally put my recent work with @behrenstimb.bsky.social on bioRxiv, where we develop a new mechanistic theory of how PFC structures adaptive behaviour using attractor dynamics in space and time!

www.biorxiv.org/content/10.1...
Reposted by Jesse Geerts
Reposted by Jesse Geerts
neuroversepod.bsky.social
Check out our latest episode on habit formation with DrFrancesca Greenstreet ✅📝 We talk about how habits are made how they may not require reward-based learning … 🎧 open.spotify.com/episode/1gZI...
Reposted by Jesse Geerts
danielwurgaft.bsky.social
🚨New paper! We know models learn distinct in-context learning strategies, but *why*? Why generalize instead of memorize to lower loss? And why is generalization transient?

Our work explains this & *predicts Transformer behavior throughout training* without its weights! 🧵

1/
Reposted by Jesse Geerts
jessegeerts.bsky.social
Thank you! Yes, I think that’s a fair summary. Another way of looking at it is that pre training on a match and copy task gives it a hint in the “wrong” direction. Our takeaway is that what the transformer learns to implement in-context depends on the pretraining task
Reposted by Jesse Geerts
neurokim.bsky.social
New work on relational reasoning in transformers!

TLDR: Inductive biases of In-Weight and In-Context Learning in transformers are really different for relational reasoning, and pretraining can make a big difference for in-context.

Check out @jessegeerts.bsky.social's thread for more!
jessegeerts.bsky.social
This is a nice paper which applies and refines some of the ideas we put forward in our Psych Review paper. Our model combined multiple Successor Representations which it switches between based on uncertainty. Jess's model adds reward outcomes to this process and captures splitter cells and more!
macaskillaf.bsky.social
Congrats to the fantastic Jess P for her new paper! She compared how feature Vs outcome focussed agents learn to solve contextual inference problems. She found that you need a balance of both to learn these tasks - and that this mix recapitulates pfc and hippocampal activity in rodent tasks!
biorxiv-neursci.bsky.social
Contextual inference through flexible integration of environmental features and behavioural outcomes https://www.biorxiv.org/content/10.1101/2025.05.28.656607v1
Reposted by Jesse Geerts
macaskillaf.bsky.social
Congrats to the fantastic Jess P for her new paper! She compared how feature Vs outcome focussed agents learn to solve contextual inference problems. She found that you need a balance of both to learn these tasks - and that this mix recapitulates pfc and hippocampal activity in rodent tasks!
biorxiv-neursci.bsky.social
Contextual inference through flexible integration of environmental features and behavioural outcomes https://www.biorxiv.org/content/10.1101/2025.05.28.656607v1
jessegeerts.bsky.social
The key insight: computational strategies underlying ICL aren't fixed but depend on both learning paradigm and pre-training structures. This helps explain when AI systems will generalize beyond their training data.
jessegeerts.bsky.social
5. We could see these differences in their internal representations. Successful models organized items along continuous dimensions in representation space, while unsuccessful models showed no such structure.
jessegeerts.bsky.social
4. Pre-training ICL models on linear regression tasks changed this outcome. These models then succeeded at transitive inference and didn't rely on induction circuits.
jessegeerts.bsky.social
3. Mechanistic analysis revealed why: ICL models developed induction circuits - specialized attention patterns that implement match-and-copy operations rather than encoding hierarchical relationships.
jessegeerts.bsky.social
2. In-context learning models failed to generalize transitively. Despite perfect performance on training pairs, they couldn't infer relationships between non-adjacent items.
jessegeerts.bsky.social
1. In-weights learning model developed transitive inference despite only seeing adjacent pairs during training. They also showed behavioral patterns consistent with human and animal performance on these tasks.
jessegeerts.bsky.social
We compared two learning approaches: storing relationships in model weights vs. using relationships provided in the input context. The results show different computational strategies.
jessegeerts.bsky.social
🧠 How do transformers learn relational reasoning? We trained small transformers on transitive inference (if A>B and B>C, then A>C) and discovered striking differences between learning paradigms. Our latest work reveals when and why AI systems generalize beyond training data 🤖