Shahab Bakhtiari
@shahabbakht.bsky.social
6.1K followers 1.1K following 1.2K posts
|| assistant prof at University of Montreal || leading the systems neuroscience and AI lab (SNAIL: https://www.snailab.ca/) 🐌 || associate academic member of Mila (Quebec AI Institute) || #NeuroAI || vision and learning in brains and machines
Posts Media Videos Starter Packs
Pinned
shahabbakht.bsky.social
So excited to see this preprint released from the lab into the wild.

Charlotte has developed a theory for how learning curriculum influences learning generalization.
Our theory makes straightforward neural predictions that can be tested in future experiments. (1/4)

🧠🤖 🧠📈 #MLSky
charlottevolk.bsky.social
🚨 New preprint alert!

🧠🤖
We propose a theory of how learning curriculum affects generalization through neural population dimensionality. Learning curriculum is a determining factor of neural dimensionality - where you start from determines where you end up.
🧠📈

A 🧵:

tinyurl.com/yr8tawj3
The curriculum effect in visual learning: the role of readout dimensionality
Generalization of visual perceptual learning (VPL) to unseen conditions varies across tasks. Previous work suggests that training curriculum may be integral to generalization, yet a theoretical explan...
tinyurl.com
shahabbakht.bsky.social
It's definitely not 50/50 for me. More like 10/90 ;)
shahabbakht.bsky.social
This feels a lot like systems neuro, honestly. You could hear similar advice there, especially from the more experimentally-oriented minds.
shahabbakht.bsky.social
I guess the whole predictive circuit finding approach can be seen as a convergent evolution, which probably doesn’t scale and generalize outside of the experiment setting?
shahabbakht.bsky.social
Having full observation and control over the studied system is definitely the main advantage of MI. But the unintuitive mess of high-d computation is their shared problem, which seems to need more theories than experiments.
shahabbakht.bsky.social
A systems neuroscientist turned mech interp researcher should write a paper on what the field should absolutely avoid, then observe how thoroughly they’ll be ignored :)

Though what I find intriguing in this domain (watching from afar): its much slower rate of progress compared to the rest of AI.
shahabbakht.bsky.social
Regardless of what explainability/mech interp in AI is actually after, and whether or not they know what they’re searching for, we can confidently say they’re pursuing what systems neuroscience has pursued for decades, with very similar puzzles and confusions.
bayesianboy.bsky.social
What problem is explainability/interpretability research trying to solve in ML, and do you have a favorite paper articulating what that problem is?
Reposted by Shahab Bakhtiari
bayesianboy.bsky.social
What problem is explainability/interpretability research trying to solve in ML, and do you have a favorite paper articulating what that problem is?
shahabbakht.bsky.social
I don't see a direct causal path, but pessimistically speaking, when bubbles burst they often leave subconscious biases against the bubbled topic, e.g., in evaluation committees. In other words, the current abundance of AI funding (relative to other fields) might not last.
shahabbakht.bsky.social
What if the bubble collapse also takes down our funding so we can't even afford H100s at half price?! :)
Reposted by Shahab Bakhtiari
drlaschowski.bsky.social
Imagine a brain decoding algorithm that could generalize across different subjects and tasks. Today, we’re one step closer to achieving that vision.

Introducing the flagship paper of our brain decoding program: www.biorxiv.org/content/10.1...
#neuroAI #compneuro @utoronto.ca @uhn.ca
Reposted by Shahab Bakhtiari
sushrutthorat.bsky.social
and the low-D part has been on the horizon since a bit now - proceedings.neurips.cc/paper/2019/h... - given complex numbers you can go loooowwww haha (O(1)). Also this is linked to top-down attention: arxiv.org/abs/1907.12309 , arxiv.org/abs/2502.15634 - which is a low-D modulation (O(N) vs O(N^2)).
Superposition of many models into one
proceedings.neurips.cc
shahabbakht.bsky.social
Yeah, it all makes sense in hindsight. I think the low-d structure of weights was actually the rationale behind LoRA when it was proposed.
Reposted by Shahab Bakhtiari
sgray.bsky.social
This seems to imply that, with a large enough context and the right prompt, you could “prototype” a LoRA in-context before creating it?

As someone removed from the low level implementation details, this speaks to something i’ve long wondered:

Could you “freeze” a context to create a LoRA from it?
shahabbakht.bsky.social
Interesting paper suggesting a mechanism for why in-context learning happens in LLMs.

They show that LLMs implicitly apply an internal low-rank weight update adjusted by the context. It’s cheap (due to the low-rank) but effective for adapting the model’s behavior.

#MLSky

arxiv.org/abs/2507.16003
Learning without training: The implicit dynamics of in-context learning
One of the most striking features of Large Language Models (LLM) is their ability to learn in context. Namely at inference time an LLM is able to learn new patterns without any additional weight updat...
arxiv.org
shahabbakht.bsky.social
Interesting connection to the recent Thinking Machines blog post on LoRA: thinkingmachines.ai/blog/lora/

Both seem to suggest that low-rank weight adjustments are sufficient for model adaptation, whether explicitly (LoRA fine-tuning) or implicitly (in-context learning).
LoRA Without Regret
How LoRA matches full training performance more broadly than expected.
thinkingmachines.ai
shahabbakht.bsky.social
Interesting paper suggesting a mechanism for why in-context learning happens in LLMs.

They show that LLMs implicitly apply an internal low-rank weight update adjusted by the context. It’s cheap (due to the low-rank) but effective for adapting the model’s behavior.

#MLSky

arxiv.org/abs/2507.16003
Learning without training: The implicit dynamics of in-context learning
One of the most striking features of Large Language Models (LLM) is their ability to learn in context. Namely at inference time an LLM is able to learn new patterns without any additional weight updat...
arxiv.org
shahabbakht.bsky.social
To be fair, most people don’t understand biology that well when it comes to the computational role of evolution, but then again, most people don’t make such strong claims either.
shahabbakht.bsky.social
The bitter lesson of the bitter lesson :)
shahabbakht.bsky.social
The charitable take would be that he’s arguing against the typical audience of Patel’s podcast, who might need to move away from LLM extremism a bit.
shahabbakht.bsky.social
It does feel a lot like that in this interview actually. He seems to be pushing a strong position for purely experience-dependent intelligence.

Though I just remembered this bit from their ‘reward is enough’ paper, which makes the notion of reward so wide it becomes almost meaningless.
shahabbakht.bsky.social
Though it’s surprising how much he downplays the role of evolution in bootstrapping animal learning and intelligence.
shahabbakht.bsky.social
The way Sutton himself interprets the “bitter lesson” in this interview definitely caught a lot of bitter lesson enthusiasts off guard.
LLMs not actually being an example of the bitter lesson was quite a nuance no one saw coming.

youtu.be/21EYKqUsPfg?...
Richard Sutton – Father of RL thinks LLMs are a dead end
YouTube video by Dwarkesh Patel
youtu.be