Jascha Achterberg
banner
achterbrain.bsky.social
Jascha Achterberg
@achterbrain.bsky.social
Neuroscience & AI at University of Oxford and University of Cambridge | Principles of efficient computations + learning in brains, AI, and silicon 🧠 NeuroAI | Gates Cambridge Scholar

www.jachterberg.com
Giacomo's commentary was in response to this great recent paper by Iqbal et al., also in PNAS:

"Biologically grounded neocortex computational primitives implemented on neuromorphic hardware improve vision transformer performance"
www.pnas.org/doi/10.1073/...
PNAS
Proceedings of the National Academy of Sciences (PNAS), a peer reviewed journal of the National Academy of Sciences (NAS) - an authoritative source of high-impact, original research that broadly spans...
www.pnas.org
December 18, 2025 at 3:22 PM
This new model opens a whole new world of analysing multi region interaction across trials and tasks! More analysis and findings can be found in our paper linked below. Work lead by Jack Cook, and with great help from @danakarca.bsky.social and @somnirons.bsky.social !

arxiv.org/abs/2506.02813
Brain-Like Processing Pathways Form in Models With Heterogeneous Experts
Examples of such pathways can be found in the interactions between cortical and subcortical networks during learning, or in sub-networks specializing for task characteristics such as difficulty or mod...
arxiv.org
November 21, 2025 at 12:01 PM
We also find that while complex regions are needed to learn complex tasks, these tasks are eventually moved toward simpler regions, similar to how you may struggle the first time when learning a new skill, but slowly get better with practice.
November 21, 2025 at 12:01 PM
Furthermore, we find that these pathways mirror our expected behavior of pathways in the brain! We find that difficult tasks need to be learned in more complex regions, similar to how you need to think “harder” when learning how to solve a difficult math problem.
November 21, 2025 at 12:01 PM
With these three features in place, we find that our third criterion of distinct pathways is also met. While baseline models exhibit largely random expert usage patterns, our models exhibit highly structured pathways between regions that reliably emerge during learning.
November 21, 2025 at 12:01 PM
Our third contribution is expert dropout. Without this feature, we find models suffer large performance deficits when experts outside of the active pathway are disabled. However, we would want models to be primarily dependent on the experts that are most being used.
November 21, 2025 at 12:01 PM
When put together, these two contributions resulted in remarkable pathway consistency in our model, which we measured by correlating the routing patterns across 10 different models trained on the same tasks.
November 21, 2025 at 12:01 PM
We then identify three inductive biases that yield pathways that meet each of these criteria.

The first of these is a routing loss that penalizes the use of more complex experts during training, and the second scales this loss by the model’s performance on the task being solved.
November 21, 2025 at 12:01 PM
We then set three criteria to determine whether pathways had formed:

(1) Consistency: Models trained on the same tasks should have similar pathways

(2) Self-sufficiency: Pathways should be primarily reliant on their own experts

(3) Distinctness: Many distinct pathways should be present
November 21, 2025 at 12:01 PM
We first needed to create a model in which we could study pathway formation. We chose a Heterogeneous Mixture-of-Experts architecture, in which information can be dynamically routed to computational experts, or regions, of varying sizes.

We train model on 82 tasks of varying complexity (ModCog)!
November 21, 2025 at 12:01 PM
All good Dan!
November 14, 2025 at 1:41 PM