Badr AlKhamissi
@bkhmsi.bsky.social
210 followers 350 following 46 posts
PhD at EPFL 🧠💻 Ex @MetaAI, @SonyAI, @Microsoft Egyptian 🇪🇬
Posts Media Videos Starter Packs
bkhmsi.bsky.social
Excited to be part of this cool work led by Melika Honarmand!

We show that by selectively targeting VLM units that mirror the brain’s visual word form area, models develop dyslexic-like reading impairments, while leaving other abilities intact!! 🧠🤖

Details in the 🧵👇
bkhmsi.bsky.social
Now that the ICLR deadline is behind us, happy to share that From Language to Cognition has been accepted as an Oral at #EMNLP2025! 🎉

Looking forward to seeing many of you in Suzhou 🇨🇳
bkhmsi.bsky.social
🚨 New Preprint!!

LLMs trained on next-word prediction (NWP) show high alignment with brain recordings. But what drives this alignment—linguistic structure or world knowledge? And how does this alignment evolve during training? Our new paper explores these questions. 👇🧵
Reposted by Badr AlKhamissi
bayazitdeniz.bsky.social
1/🚨 New preprint

How do #LLMs’ inner features change as they train? Using #crosscoders + a new causal metric, we map when features appear, strengthen, or fade across checkpoints—opening a new lens on training dynamics beyond loss curves & benchmarks.

#interpretability
Reposted by Badr AlKhamissi
silingao.bsky.social
NEW PAPER ALERT: Recent studies have shown that LLMs often lack robustness to distribution shifts in their reasoning. Our paper proposes a new method, AbstRaL, to augment LLMs’ reasoning robustness, by promoting their abstract thinking with granular reinforcement learning.
Reposted by Badr AlKhamissi
abosselut.bsky.social
Check out @bkhmsi.bsky.social 's great work on mixture-of-expert models that are specialized to represent the behavior of known brain networks.
bkhmsi.bsky.social
🚨 New Preprint!!

Thrilled to share with you our latest work: “Mixture of Cognitive Reasoners”, a modular transformer architecture inspired by the brain’s functional networks: language, logic, social reasoning, and world knowledge.

1/ 🧵👇
bkhmsi.bsky.social
10/ 🧾 Conclusion:
MiCRo weaves together modularity, interpretability & brain-inspired design to build controllable and high-performing models, moving toward truly cognitively grounded LMs.
bkhmsi.bsky.social
9/ 💡 Key insights:
1. Minimal data (~3k samples) in Stage 1 can induce lasting specialization
2. Modular structure enables interpretability, control, and scalability (e.g., top‑2 routing can boost performance)
3. Approach generalizes across domains & base models
bkhmsi.bsky.social
8/ 🧬 Brain alignment:
Neuroscience localizers (e.g., for language, multiple-demand) rediscover the corresponding experts in MiCRo, showing functional alignment with brain networks. However, ToM localizer fail to identify the social expert.

Figures for MiCRo-Llama & MiCRo-OLMo.
bkhmsi.bsky.social
7/ 🧩 Steering & controllability:
Removing or emphasizing specific experts steers model behavior: Ablating logic expert hurts math accuracy; suppressing social reasoning improves math slightly—showcasing fine-grained control.
bkhmsi.bsky.social
6/ 🔄 Interpretable routing:
Early layers route most tokens to the language expert; deeper layers route to domain-relevant experts (e.g., logic expert for math), matching task semantics.
bkhmsi.bsky.social
5/ 📈 Performance gains:
We evaluate on 6 reasoning benchmarks (MATH, GSM8K, MMLU, BBH…), MiCRo outperforms both dense and “general‑expert” baselines: modular models with random specialist assignment in Stage 1.
bkhmsi.bsky.social
4/ 📚 Training curriculum (3 stages):
• Stage 1: Expert training on small curated domain-specific datasets (~3k samples)
• Stage 2: Router training, experts frozen
• Stage 3: End-to-end finetuning on large instruction corpus (939k samples)
This seeds specialization effectively.
bkhmsi.bsky.social
3/ ⚙️ Architecture:
We start with a pretrained model (e.g. Llama‑3.2‑1B). Clone each layer into four experts. Then, a light router assigns tokens dynamically to a single expert (top‑1 routing) per layer. Keeping a comparable number of active parameters to the base model.
bkhmsi.bsky.social
2/ 🔍 Motivation:
Humans rely on specialized brain networks—e.g., language, multiple-demand, ToM, default mode—for different cognitive tasks. MiCRo mimics this by dividing transformer layers into four experts.

Figure from @evfedorenko.bsky.social's review paper: www.nature.com/articles/s41...
bkhmsi.bsky.social
🚨 New Preprint!!

Thrilled to share with you our latest work: “Mixture of Cognitive Reasoners”, a modular transformer architecture inspired by the brain’s functional networks: language, logic, social reasoning, and world knowledge.

1/ 🧵👇
bkhmsi.bsky.social
Excited to present tomorrow at the @c3nlp.bsky.social workshop at #NAACL2025 our position paper:

"Hire Your Anthropologist!" 🎓

Led by the amazing Mai Alkhamissi & @lrz-persona.bsky.social, under the supervision of @monadiab77.bsky.social. Don’t miss it! 😄

arXiv link coming soon!
bkhmsi.bsky.social
Excited to be at #NAACL2025 in Albuquerque! I’ll be presenting our paper “The LLM Language Network” as an Oral tomorrow at 2:00 PM in Ballroom C, hope to see you there!

Looking forward to all the discussions! 🎤 🧠
Reposted by Badr AlKhamissi
hannesmehrer.bsky.social
Before ICLR 2025 comes to an end today, a few #NeuroAI impressions from Singapore.
First, very happy to present our work on TopoLM as an oral, here with
@neilrathi.bsky.social
initial thread: bsky.app/profile/hann...
paper: doi.org/10.48550/arX...
code: github.com/epflneuroailab
bkhmsi.bsky.social
Not at #ICLR2025 this year, but excited that @neilrathi.bsky.social and @hannesmehrer.bsky.social will be presenting our TopoLM paper during Friday’s Oral Session 4C. Don’t miss it!
hannesmehrer.bsky.social
Together with
@neil_rathi
I will present our #ICLR2025 Oral paper on TopoLM, a topographic language model!

Oral: Friday, 25 Apr 4:18 p.m. (session 4C)
Poster: Friday, 25 Apr 10 a.m. --> Hall 3 + Hall 2B Paper: arxiv.org/abs/2410.11516
Code and weights: github.com/epflneuroailab
bkhmsi.bsky.social
With the Studio Ghibli AI trend taking over the internet, it's a good moment to reshare a blog post I wrote two years ago: The Curse of "Creative" AI.

Interested to hear your thoughts on this matter!

medium.com/@bkhmsi/the-...
The Curse of ‘Creative’ AI
Should we create art using AI?
medium.com
bkhmsi.bsky.social
12/
So, what’s happening? LLMs representations start like the brain as demonstrated with high brain alignment, but as they surpass human proficiency, they outgrow the human language network (as demonstrated with a plateau/downward trend), shifting toward learning other cognitive mechanisms.