Lightnews — Scholar-powered news

Badr AlKhamissi

@bkhmsi.bsky.social

210 followers 350 following 46 posts

PhD at EPFL 🧠💻 Ex @MetaAI, @SonyAI, @Microsoft Egyptian 🇪🇬

Posts Media Videos Starter Packs

Badr AlKhamissi @bkhmsi.bsky.social · 6d

Excited to be part of this cool work led by Melika Honarmand!

We show that by selectively targeting VLM units that mirror the brain’s visual word form area, models develop dyslexic-like reading impairments, while leaving other abilities intact!! 🧠🤖

Details in the 🧵👇

Badr AlKhamissi @bkhmsi.bsky.social · 13d

Huge thanks to my amazing collaborators: @gretatuckute.bsky.social, @davidtyt.bsky.social, @neurotaha.bsky.social & advisors @abosselut.bsky.social and @mschrimpf.bsky.social!

You can find more about our paper on the project's website: language-to-cognition.epfl.ch

Paper: arxiv.org/abs/2503.01830

From Language to Cognition: How LLMs Outgrow the Human Language Network

Large language models (LLMs) exhibit remarkable similarity to neural activity in the human language network. However, the key properties of language shaping brain-like representations, and their evolu...

Badr AlKhamissi @bkhmsi.bsky.social · 13d

Now that the ICLR deadline is behind us, happy to share that From Language to Cognition has been accepted as an Oral at #EMNLP2025! 🎉

Looking forward to seeing many of you in Suzhou 🇨🇳

Badr AlKhamissi @bkhmsi.bsky.social · Mar 5

🚨 New Preprint!!

LLMs trained on next-word prediction (NWP) show high alignment with brain recordings. But what drives this alignment—linguistic structure or world knowledge? And how does this alignment evolve during training? Our new paper explores these questions. 👇🧵

Reposted by Badr AlKhamissi

Deniz Bayazit @bayazitdeniz.bsky.social · 13d

1/🚨 New preprint

How do #LLMs’ inner features change as they train? Using #crosscoders + a new causal metric, we map when features appear, strengthen, or fade across checkpoints—opening a new lens on training dynamics beyond loss curves & benchmarks.

#interpretability

Reposted by Badr AlKhamissi

silingao.bsky.social @silingao.bsky.social · Jun 23

NEW PAPER ALERT: Recent studies have shown that LLMs often lack robustness to distribution shifts in their reasoning. Our paper proposes a new method, AbstRaL, to augment LLMs’ reasoning robustness, by promoting their abstract thinking with granular reinforcement learning.

Reposted by Badr AlKhamissi

Antoine Bosselut @abosselut.bsky.social · Jun 18

Check out @bkhmsi.bsky.social 's great work on mixture-of-expert models that are specialized to represent the behavior of known brain networks.

Badr AlKhamissi @bkhmsi.bsky.social · Jun 17

🚨 New Preprint!!

Thrilled to share with you our latest work: “Mixture of Cognitive Reasoners”, a modular transformer architecture inspired by the brain’s functional networks: language, logic, social reasoning, and world knowledge.

1/ 🧵👇

Badr AlKhamissi @bkhmsi.bsky.social · Jun 17

11/ 🌐 Links

Paper: arxiv.org/abs/2506.13331
Project Page: bkhmsi.github.io/mixture-of-c...
Code: github.com/bkhmsi/mixtu...
Models: huggingface.co/collections/...

In collaboration with: @cndesabbata.bsky.social, @eric-zemingchen.bsky.social, @mschrimpf.bsky.social, & @abosselut.bsky.social

Mixture of Cognitive Reasoners: Modular Reasoning with Brain-Like Specialization

Human intelligence emerges from the interaction of specialized brain networks, each dedicated to distinct cognitive functions such as language processing, logical reasoning, social understanding, and ...

Badr AlKhamissi @bkhmsi.bsky.social · Jun 17

10/ 🧾 Conclusion:
MiCRo weaves together modularity, interpretability & brain-inspired design to build controllable and high-performing models, moving toward truly cognitively grounded LMs.

Badr AlKhamissi @bkhmsi.bsky.social · Jun 17

9/ 💡 Key insights:
1. Minimal data (~3k samples) in Stage 1 can induce lasting specialization
2. Modular structure enables interpretability, control, and scalability (e.g., top‑2 routing can boost performance)
3. Approach generalizes across domains & base models

Badr AlKhamissi @bkhmsi.bsky.social · Jun 17

8/ 🧬 Brain alignment:
Neuroscience localizers (e.g., for language, multiple-demand) rediscover the corresponding experts in MiCRo, showing functional alignment with brain networks. However, ToM localizer fail to identify the social expert.

Figures for MiCRo-Llama & MiCRo-OLMo.

Badr AlKhamissi @bkhmsi.bsky.social · Jun 17

7/ 🧩 Steering & controllability:
Removing or emphasizing specific experts steers model behavior: Ablating logic expert hurts math accuracy; suppressing social reasoning improves math slightly—showcasing fine-grained control.

Badr AlKhamissi @bkhmsi.bsky.social · Jun 17

6/ 🔄 Interpretable routing:
Early layers route most tokens to the language expert; deeper layers route to domain-relevant experts (e.g., logic expert for math), matching task semantics.

Badr AlKhamissi @bkhmsi.bsky.social · Jun 17

5/ 📈 Performance gains:
We evaluate on 6 reasoning benchmarks (MATH, GSM8K, MMLU, BBH…), MiCRo outperforms both dense and “general‑expert” baselines: modular models with random specialist assignment in Stage 1.

Badr AlKhamissi @bkhmsi.bsky.social · Jun 17

4/ 📚 Training curriculum (3 stages):
• Stage 1: Expert training on small curated domain-specific datasets (~3k samples)
• Stage 2: Router training, experts frozen
• Stage 3: End-to-end finetuning on large instruction corpus (939k samples)
This seeds specialization effectively.

Badr AlKhamissi @bkhmsi.bsky.social · Jun 17

3/ ⚙️ Architecture:
We start with a pretrained model (e.g. Llama‑3.2‑1B). Clone each layer into four experts. Then, a light router assigns tokens dynamically to a single expert (top‑1 routing) per layer. Keeping a comparable number of active parameters to the base model.

Badr AlKhamissi @bkhmsi.bsky.social · Jun 17

2/ 🔍 Motivation:
Humans rely on specialized brain networks—e.g., language, multiple-demand, ToM, default mode—for different cognitive tasks. MiCRo mimics this by dividing transformer layers into four experts.

Figure from @evfedorenko.bsky.social's review paper: www.nature.com/articles/s41...

Badr AlKhamissi @bkhmsi.bsky.social · Jun 17

🚨 New Preprint!!

Thrilled to share with you our latest work: “Mixture of Cognitive Reasoners”, a modular transformer architecture inspired by the brain’s functional networks: language, logic, social reasoning, and world knowledge.

1/ 🧵👇

Badr AlKhamissi @bkhmsi.bsky.social · May 4

Excited to present tomorrow at the @c3nlp.bsky.social workshop at #NAACL2025 our position paper:

"Hire Your Anthropologist!" 🎓

Led by the amazing Mai Alkhamissi & @lrz-persona.bsky.social, under the supervision of @monadiab77.bsky.social. Don’t miss it! 😄

arXiv link coming soon!

Badr AlKhamissi @bkhmsi.bsky.social · Apr 30

Grateful to my amazing co-authors @gretatuckute.bsky.social, @abosselut.bsky.social, and @mschrimpf.bsky.social!

Badr AlKhamissi @bkhmsi.bsky.social · Apr 30

Excited to be at #NAACL2025 in Albuquerque! I’ll be presenting our paper “The LLM Language Network” as an Oral tomorrow at 2:00 PM in Ballroom C, hope to see you there!

Looking forward to all the discussions! 🎤 🧠

Reposted by Badr AlKhamissi

Hannes Mehrer @hannesmehrer.bsky.social · Apr 28

Before ICLR 2025 comes to an end today, a few #NeuroAI impressions from Singapore.
First, very happy to present our work on TopoLM as an oral, here with
@neilrathi.bsky.social
initial thread: bsky.app/profile/hann...
paper: doi.org/10.48550/arX...
code: github.com/epflneuroailab

Badr AlKhamissi @bkhmsi.bsky.social · Apr 23

Not at #ICLR2025 this year, but excited that @neilrathi.bsky.social and @hannesmehrer.bsky.social will be presenting our TopoLM paper during Friday’s Oral Session 4C. Don’t miss it!

Hannes Mehrer @hannesmehrer.bsky.social · Apr 23

Together with
@neil_rathi
I will present our #ICLR2025 Oral paper on TopoLM, a topographic language model!

Oral: Friday, 25 Apr 4:18 p.m. (session 4C)
Poster: Friday, 25 Apr 10 a.m. --> Hall 3 + Hall 2B Paper: arxiv.org/abs/2410.11516
Code and weights: github.com/epflneuroailab

Badr AlKhamissi @bkhmsi.bsky.social · Mar 31

With the Studio Ghibli AI trend taking over the internet, it's a good moment to reshare a blog post I wrote two years ago: The Curse of "Creative" AI.

Interested to hear your thoughts on this matter!

medium.com/@bkhmsi/the-...

The Curse of ‘Creative’ AI

Should we create art using AI?

Badr AlKhamissi @bkhmsi.bsky.social · Mar 5

13/13

This work was done in collaboration with:
@gretatuckute.bsky.social, Yingtian Tang, @neurotaha.bsky.social, @abosselut.bsky.social and @mschrimpf.bsky.social

Read the paper here:
arxiv.org/abs/2503.01830

From Language to Cognition: How LLMs Outgrow the Human Language Network

Large language models (LLMs) exhibit remarkable similarity to neural activity in the human language network. However, the key properties of language shaping brain-like representations, and their evolu...

Badr AlKhamissi @bkhmsi.bsky.social · Mar 5

12/
So, what’s happening? LLMs representations start like the brain as demonstrated with high brain alignment, but as they surpass human proficiency, they outgrow the human language network (as demonstrated with a plateau/downward trend), shifting toward learning other cognitive mechanisms.