Simon Schug
@smonsays.bsky.social
910 followers 220 following 15 posts
compositional generalization in neural networks, et al. @Princeton https://smn.one
Posts Media Videos Starter Packs
Pinned
smonsays.bsky.social
Neural networks used to struggle with compositionality but transformers got really good at it. How come?

And why does attention work so much better with multiple heads?

There might be a common answer to both of these questions.
Reposted by Simon Schug
brendenlake.bsky.social
I'm joining Princeton University as an Associate Professor of Computer Science and Psychology this fall! Princeton is ambitiously investing in AI and Natural & Artificial Minds, and I'm excited for my lab to contribute. Recruiting postdocs and Ph.D. students in CS and Psychology — join us!
Nassau Hall. Photo credit to Debbie and John O'Boyle
smonsays.bsky.social
Are transformers smarter than you? Hypernetworks might explain why.

Come checkout our Oral at #ICLR tomorrow (Apr 26th, poster at 10:00, Oral session 6C in the afternoon).

openreview.net/forum?id=V4K...
Reposted by Simon Schug
taylorwwebb.bsky.social
LLMs have shown impressive performance in some reasoning tasks, but what internal mechanisms do they use to solve these tasks? In a new preprint, we find evidence that abstract reasoning in LLMs depends on an emergent form of symbol processing arxiv.org/abs/2502.20332 (1/N)
Emergent Symbolic Mechanisms Support Abstract Reasoning in Large Language Models
Many recent studies have found evidence for emergent reasoning capabilities in large language models, but debate persists concerning the robustness of these capabilities, and the extent to which they ...
arxiv.org
Reposted by Simon Schug
bellecguill.bsky.social
Pre-print 🧠🧪
Is mechanism modeling dead in the AI era?

ML models trained to predict neural activity fail to generalize to unseen opto perturbations. But mechanism modeling can solve that.

We say "perturbation testing" is the right way to evaluate mechanisms in data-constrained models

1/8
Reposted by Simon Schug
markdhumphries.bsky.social
Cutting it a bit fine, but here’s my review of the year in neuroscience for 2024

The eighth of these, would you believe? We’ve got dark neurons, tiny monkeys, the most complete brain wiring diagram ever constructed, and much more…
Published on The Spike

Enjoy!

medium.com/the-spike/20...
2024: A Review of the Year in Neuroscience
Feeling a bit wired
medium.com
Reposted by Simon Schug
kristorpjensen.bsky.social
I wrote an introduction to RL for neuroscience last year that was just published in NBDT: tinyurl.com/5f58zdy3

This review aims to provide some intuition for and derivations of RL methods commonly used in systems neuroscience, ranging from TD learning through the SR to deep and distributional RL!
An introduction to reinforcement learning for neuroscience | Published in Neurons, Behavior, Data analysis, and Theory
By Kristopher T. Jensen. Reinforcement learning for neuroscientists
tinyurl.com
Reposted by Simon Schug
beenwrekt.bsky.social
Stitching component models into system models has proven difficult in biology. But how much easier has it been in engineering? www.argmin.net/p/monster-mo...
Monster Models
Systems-level biology is hard because systems-level engineering is hard.
www.argmin.net
Reposted by Simon Schug
bkhmsi.bsky.social
🚨 New Paper!

Can neuroscience localizers uncover brain-like functional specializations in LLMs? 🧠🤖

Yes! We analyzed 18 LLMs and found units mirroring the brain's language, theory of mind, and multiple demand networks!

w/ @gretatuckute.bsky.social, @abosselut.bsky.social, @mschrimpf.bsky.social
🧵👇
Reposted by Simon Schug
tyrellturing.bsky.social
1/ Okay, one thing that has been revealed to me from the replies to this is that many people don't know (or refuse to recognize) the following fact:

The unts in ANN are actually not a terrible approximation of how real neurons work!

A tiny 🧵.

🧠📈 #NeuroAI #MLSky
tyrellturing.bsky.social
Why does anyone have any issue with this?

I've seen people suggesting it's problematic, that neuroscientists won't like it, and so on.

But, I literally don't see why this is problematic...
pessoabrain.bsky.social
This would be funny if it weren't sad...
Coming from the "giants" of AI.
Or maybe this was posted out of context? Please clarify.
I can't process this...
Reposted by Simon Schug
razvan-pascanu.bsky.social
For my first post on Bluesky .. I'll start by announcing our 2025 edition of EEML which will be in Sarajevo :) ! I'm really excited about it and hope to see many of you there. Please follow the website (and Bluesky account) for more details which are coming soon ..
eemlcommunity.bsky.social
Hello Bluesky! 🦋

This will be the official account of the Eastern European Machine Learning (EEML) community.

Follow us for news regarding our summer schools, workshops, education/community initiatives, and more!
Reposted by Simon Schug
mameister4.bsky.social
Have you had private doubts whether we'll ever understand the brain? Whether we'll be able explain psychological phenomena in an exhaustive way that ranges from molecules to membranes to synapses to cells to cell types to circuits to computation to perception and behavior?
Reposted by Simon Schug
lampinen.bsky.social
What counts as in-context learning (ICL)? Typically, you might think of it as learning a task from a few examples. However, we’ve just written a perspective (arxiv.org/abs/2412.03782) suggesting interpreting a much broader spectrum of behaviors as ICL! Quick summary thread: 1/7
The broader spectrum of in-context learning
The ability of language models to learn a task from a few examples in context has generated substantial interest. Here, we provide a perspective that situates this type of supervised few-shot learning...
arxiv.org
Reposted by Simon Schug
ackaisa.bsky.social
Thrilled to share our NeurIPS Spotlight paper with Jan Bauer*, @aproca.bsky.social*, @saxelab.bsky.social, @summerfieldlab.bsky.social, Ali Hummos*! openreview.net/pdf?id=AbTpJ...

We study how task abstractions emerge in gated linear networks and how they support cognitive flexibility.
smonsays.bsky.social
Would love to be added as well :)
Reposted by Simon Schug
tyrellturing.bsky.social
Great thread from @michaelhendricks.bsky.social!

Reminds me of something Larry Abbott once said to me at a summer school:

Many physicists come into neuroscience assuming that the failure to find laws of the brain was just because biologists aren't clever enough. In fact, there are no laws.

🧠📈 🧪
michaelhendricks.bsky.social
I came across a quote in an article, which I will paraphrase: the ultimate goal of neuroscience is to model the brain and derive laws that define the brain’s computational abilities. Statements like this are common and presented as self-evident, but I think they are wrong.
Reposted by Simon Schug
cocoscilab.bsky.social
(1/5) Very excited to announce the publication of Bayesian Models of Cognition: Reverse Engineering the Mind. More than a decade in the making, it's a big (600+ pages) beautiful book covering both the basics and recent work: mitpress.mit.edu/978026204941...
smonsays.bsky.social
To help find people at the intersection of neuroscience and AI. Of course let me know if I missed someone or you’d like to be added 🧪 🧠

#neuroskyence

go.bsky.app/CAfmKQs
smonsays.bsky.social
I think you are already part of it - just double checked :)
smonsays.bsky.social
With language being highly compositional itself, could the hypernetwork mechanism play a part in explaining the success of multi-head attention?

Maybe! Have a look at the paper in case you are curious!

arxiv.org/abs/2406.05816
Attention as a Hypernetwork
Transformers can under some circumstances generalize to novel problem instances whose constituent parts might have been encountered during training but whose compositions have not. What mechanisms und...
arxiv.org
smonsays.bsky.social
Indeed in line with the hypothesis that the hypernetwork mechanism supports compositionality, this modification (hyla) improves performance on unseen tasks.
smonsays.bsky.social
So what happens if we strengthen the hypernetwork mechanism?
Could we maybe further improve compositionality?

We can for instance make the value network nonlinear - without introducing additional parameters.