Shahab Bakhtiari
@shahabbakht.bsky.social
6.1K followers 1K following 1.2K posts
|| assistant prof at University of Montreal || leading the systems neuroscience and AI lab (SNAIL: https://www.snailab.ca/) 🐌 || associate academic member of Mila (Quebec AI Institute) || #NeuroAI || vision and learning in brains and machines
Posts Media Videos Starter Packs
Pinned
shahabbakht.bsky.social
So excited to see this preprint released from the lab into the wild.

Charlotte has developed a theory for how learning curriculum influences learning generalization.
Our theory makes straightforward neural predictions that can be tested in future experiments. (1/4)

🧠🤖 🧠📈 #MLSky
charlottevolk.bsky.social
🚨 New preprint alert!

🧠🤖
We propose a theory of how learning curriculum affects generalization through neural population dimensionality. Learning curriculum is a determining factor of neural dimensionality - where you start from determines where you end up.
🧠📈

A 🧵:

tinyurl.com/yr8tawj3
The curriculum effect in visual learning: the role of readout dimensionality
Generalization of visual perceptual learning (VPL) to unseen conditions varies across tasks. Previous work suggests that training curriculum may be integral to generalization, yet a theoretical explan...
tinyurl.com
shahabbakht.bsky.social
I don't see a direct causal path, but pessimistically speaking, when bubbles burst they often leave subconscious biases against the bubbled topic, e.g., in evaluation committees. In other words, the current abundance of AI funding (relative to other fields) might not last.
shahabbakht.bsky.social
What if the bubble collapse also takes down our funding so we can't even afford H100s at half price?! :)
Reposted by Shahab Bakhtiari
drlaschowski.bsky.social
Imagine a brain decoding algorithm that could generalize across different subjects and tasks. Today, we’re one step closer to achieving that vision.

Introducing the flagship paper of our brain decoding program: www.biorxiv.org/content/10.1...
#neuroAI #compneuro @utoronto.ca @uhn.ca
Reposted by Shahab Bakhtiari
sushrutthorat.bsky.social
and the low-D part has been on the horizon since a bit now - proceedings.neurips.cc/paper/2019/h... - given complex numbers you can go loooowwww haha (O(1)). Also this is linked to top-down attention: arxiv.org/abs/1907.12309 , arxiv.org/abs/2502.15634 - which is a low-D modulation (O(N) vs O(N^2)).
Superposition of many models into one
proceedings.neurips.cc
shahabbakht.bsky.social
Yeah, it all makes sense in hindsight. I think the low-d structure of weights was actually the rationale behind LoRA when it was proposed.
Reposted by Shahab Bakhtiari
sgray.bsky.social
This seems to imply that, with a large enough context and the right prompt, you could “prototype” a LoRA in-context before creating it?

As someone removed from the low level implementation details, this speaks to something i’ve long wondered:

Could you “freeze” a context to create a LoRA from it?
shahabbakht.bsky.social
Interesting paper suggesting a mechanism for why in-context learning happens in LLMs.

They show that LLMs implicitly apply an internal low-rank weight update adjusted by the context. It’s cheap (due to the low-rank) but effective for adapting the model’s behavior.

#MLSky

arxiv.org/abs/2507.16003
Learning without training: The implicit dynamics of in-context learning
One of the most striking features of Large Language Models (LLM) is their ability to learn in context. Namely at inference time an LLM is able to learn new patterns without any additional weight updat...
arxiv.org
shahabbakht.bsky.social
Interesting connection to the recent Thinking Machines blog post on LoRA: thinkingmachines.ai/blog/lora/

Both seem to suggest that low-rank weight adjustments are sufficient for model adaptation, whether explicitly (LoRA fine-tuning) or implicitly (in-context learning).
LoRA Without Regret
How LoRA matches full training performance more broadly than expected.
thinkingmachines.ai
shahabbakht.bsky.social
Interesting paper suggesting a mechanism for why in-context learning happens in LLMs.

They show that LLMs implicitly apply an internal low-rank weight update adjusted by the context. It’s cheap (due to the low-rank) but effective for adapting the model’s behavior.

#MLSky

arxiv.org/abs/2507.16003
Learning without training: The implicit dynamics of in-context learning
One of the most striking features of Large Language Models (LLM) is their ability to learn in context. Namely at inference time an LLM is able to learn new patterns without any additional weight updat...
arxiv.org
shahabbakht.bsky.social
To be fair, most people don’t understand biology that well when it comes to the computational role of evolution, but then again, most people don’t make such strong claims either.
shahabbakht.bsky.social
The bitter lesson of the bitter lesson :)
shahabbakht.bsky.social
The charitable take would be that he’s arguing against the typical audience of Patel’s podcast, who might need to move away from LLM extremism a bit.
shahabbakht.bsky.social
It does feel a lot like that in this interview actually. He seems to be pushing a strong position for purely experience-dependent intelligence.

Though I just remembered this bit from their ‘reward is enough’ paper, which makes the notion of reward so wide it becomes almost meaningless.
shahabbakht.bsky.social
Though it’s surprising how much he downplays the role of evolution in bootstrapping animal learning and intelligence.
shahabbakht.bsky.social
The way Sutton himself interprets the “bitter lesson” in this interview definitely caught a lot of bitter lesson enthusiasts off guard.
LLMs not actually being an example of the bitter lesson was quite a nuance no one saw coming.

youtu.be/21EYKqUsPfg?...
Richard Sutton – Father of RL thinks LLMs are a dead end
YouTube video by Dwarkesh Patel
youtu.be
shahabbakht.bsky.social
Lone scientists are fictional creatures.
marspidermonkey.bsky.social
When we hear about a lone scientist who made groundbreaking discoveries on their own, it’s usually erasing the truth that science is a team sport, and field research builds on the local knowledge and expertise of the people that live there (8/10)
Reposted by Shahab Bakhtiari
marspidermonkey.bsky.social
As a primatologist, Jane Goodall was a huge inspiration to me. I admired the way she describes chimpanzee behavior with such detail and empathy, and she’s inspired so many people and advocated for chimpanzee conservation and welfare.

However, I'm dismayed at what her narrative leaves out (1/10)
Photo of Jane Goodall in the center, signing a book, with three women standing slightly hunched behind her. A very young Michelle is to the right, smiling.
Reposted by Shahab Bakhtiari
charlottevolk.bsky.social
🚨 New preprint alert!

🧠🤖
We propose a theory of how learning curriculum affects generalization through neural population dimensionality. Learning curriculum is a determining factor of neural dimensionality - where you start from determines where you end up.
🧠📈

A 🧵:

tinyurl.com/yr8tawj3
The curriculum effect in visual learning: the role of readout dimensionality
Generalization of visual perceptual learning (VPL) to unseen conditions varies across tasks. Previous work suggests that training curriculum may be integral to generalization, yet a theoretical explan...
tinyurl.com
shahabbakht.bsky.social
This is also a very special paper to me: it's the first project I initiated in my lab, motivated by a long-standing curiosity to compare continual visual learning in humans and ANNs.

Big thanks to @charlottevolk.bsky.social for getting it to the finish line. (4/4)
shahabbakht.bsky.social
I'd value proven-wrong predictions even more, as they open up a path to 'what should be changed in the ANN' – a new direction of change in architecture, objectives, learning rules that can evolve *independently* of advances in AI, but who knows… maybe with implications for AI. (3/4)
shahabbakht.bsky.social
On a broader note: Our ANN-based models should generate bold predictions with direct prescriptions for how to test them. This is how I believe #NeuroAI can avoid becoming an isolated, self-referential domain of neuroscience. This is the only way to flywheel NeuroAI into a theory of the brain (2/4)
shahabbakht.bsky.social
So excited to see this preprint released from the lab into the wild.

Charlotte has developed a theory for how learning curriculum influences learning generalization.
Our theory makes straightforward neural predictions that can be tested in future experiments. (1/4)

🧠🤖 🧠📈 #MLSky
charlottevolk.bsky.social
🚨 New preprint alert!

🧠🤖
We propose a theory of how learning curriculum affects generalization through neural population dimensionality. Learning curriculum is a determining factor of neural dimensionality - where you start from determines where you end up.
🧠📈

A 🧵:

tinyurl.com/yr8tawj3
The curriculum effect in visual learning: the role of readout dimensionality
Generalization of visual perceptual learning (VPL) to unseen conditions varies across tasks. Previous work suggests that training curriculum may be integral to generalization, yet a theoretical explan...
tinyurl.com
Reposted by Shahab Bakhtiari
cpaxton.bsky.social
The length of tasks that can be completed at 50% success rate by AI continues to increase
Reposted by Shahab Bakhtiari
neurograce.bsky.social
Fyi if you're an educator looking for videos about the brain and/or how neuroscience is done, check out
@brainfacts.org's YouTube page: youtube.com/@brainfactsorg
It includes submissions to their Brain Awareness Week video contest, which can be quite fun! #neuroskyence
BrainFacts.org
BrainFacts.org is an authoritative source of information about the brain and nervous system for the public. The site is a public information initiative of The Kavli Foundation, the Gatsby Charitable F...
youtube.com