Antoine Bosselut
@abosselut.bsky.social
480 followers 130 following 57 posts
Helping machines make sense of the world. Asst Prof @icepfl.bsky.social; Before: @stanfordnlp.bsky.social @uwnlp.bsky.social AI2 #NLProc #AI Website: https://atcbosselut.github.io/
Posts Media Videos Starter Packs
Reposted by Antoine Bosselut
tpimentel.bsky.social
Very happy this paper got accepted to NeurIPS 2025 as a Spotlight! 😁

Main takeaway: In mechanistic interpretability, we need assumptions about how DNNs encode concepts in their representations (eg, the linear representation hypothesis). Without them, we can claim any DNN implements any algorithm!
tpimentel.bsky.social
Mechanistic interpretability often relies on *interventions* to study how DNNs work. Are these interventions enough to guarantee the features we find are not spurious? No!⚠️ In our new paper, we show many mech int methods implicitly rely on the linear representation hypothesis🧵
Paper title "The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?" with the paper's graphical abstract showing how more powerful alignment maps between a DNN and an algorithm allow more complex features to be found and more "accurate" abstractions.
Reposted by Antoine Bosselut
amuuueller.bsky.social
What's the right unit of analysis for understanding LLM internals? We explore in our mech interp survey (a major update from our 2024 ms).

We’ve added more recent work and more immediately actionable directions for future work. Now published in Computational Linguistics!
abosselut.bsky.social
I don't see why the answer would be no but since you specifically say "October", what if we submitted to ARR in July and want to do early submission to ACL 2026 ?
Reposted by Antoine Bosselut
bayazitdeniz.bsky.social
1/🚨 New preprint

How do #LLMs’ inner features change as they train? Using #crosscoders + a new causal metric, we map when features appear, strengthen, or fade across checkpoints—opening a new lens on training dynamics beyond loss curves & benchmarks.

#interpretability
Reposted by Antoine Bosselut
mismayil.bsky.social
💡Can we optimize LLMs to be more creative?
Introducing Creative Preference Optimization (CrPO) and MuCE (Multi-task Creativity Evaluation Dataset).
Result: More novel, diverse, surprising text—without losing quality!
📝 Appearing at #EMNLP2025
abosselut.bsky.social
Special thanks to everyone that participated in this journey!
abosselut.bsky.social
(5) Transparency: We're fully open, pairing our weights with a full suite of reproduction artifacts.

Check out our artifacts and technical report here: huggingface.co/swiss-ai
swiss-ai (Swiss AI Initiative)
Org profile for Swiss AI Initiative on Hugging Face, the AI community building the future.
huggingface.co
abosselut.bsky.social
(4) Multilinguality: We pretrain the model on 15T tokens from 1811 languages, and post-train with 3.8 M examples from 149 languages
abosselut.bsky.social
(3) Memorization Prevention: Adopting the Goldfish objective, we suppress verbatim recall and reduce risks of memorization
abosselut.bsky.social
(2) Data Compliance: we pretrained exclusively on openly available data, retroactively respecting robots.txt exclusions and filtering for copyrighted, non-permissive, toxic, and personally identifiable content
abosselut.bsky.social
What makes Apertus special?
(1) Scale: Apertus-70B is the first fully open model to be trained at 70B parameter scale on 15T tokens, requiring us to scale out training to 4096 GPUs at
@cscsch.bsky.social
abosselut.bsky.social
The next generation of open LLMs should be inclusive, compliant, and multilingual by design. That’s why we @icepfl.bsky.social @ethz.ch @cscsch.bsky.social ) built Apertus.
icepfl.bsky.social
EPFL, ETH Zurich & CSCS just released Apertus, Switzerland’s first fully open-source large language model.
Trained on 15T tokens in 1,000+ languages, it’s built for transparency, responsibility & the public good.

Read more: actu.epfl.ch/news/apertus...
Reposted by Antoine Bosselut
icepfl.bsky.social
EPFL, ETH Zurich & CSCS just released Apertus, Switzerland’s first fully open-source large language model.
Trained on 15T tokens in 1,000+ languages, it’s built for transparency, responsibility & the public good.

Read more: actu.epfl.ch/news/apertus...
Reposted by Antoine Bosselut
dorialexander.bsky.social
Very happy to see that Pleias multilingual data processing pipelines have contributed to the largest open pretraining project in Europe.

From their tech report: huggingface.co/swiss-ai/Ape...
Reposted by Antoine Bosselut
rvgt.ch
Reto Vogt @rvgt.ch · Sep 2
Die Schweiz steigt ins Rennen der grossen Sprachmodelle ein. Unter dem Namen #Apertus veröffentlichen @ethz.ch, @icepfl.bsky.social und das @cscsch.bsky.social das erste vollständig offene, mehrsprachige #LLM des Landes.

Fürs MAZ habe ich Apertus kurz analysiert:

www.maz.ch/news/apertus...
Apertus: ein neues Sprachmodell für die Schweiz
www.maz.ch
abosselut.bsky.social
Thank you for your incredible work!
Reposted by Antoine Bosselut
kyunghyuncho.bsky.social
recently gave a talk on <Reality Checks> at two venues, and discussed (and rambled) about how leaderboard chasing is awesome (and we want it to continue) but that this isn't easy because everyone (me! me! me!) wants to write more papers.

the link to the slide deck in the reply.
Reposted by Antoine Bosselut
negarforoutan.bsky.social
🚨New Preprint!

In multilingual models, the same meaning can take far more tokens in some languages, penalizing users of underrepresented languages with worse performance and higher API costs. Our Parity-aware BPE algorithm is a step toward addressing this issue: 🧵
Reposted by Antoine Bosselut
lasha.bsky.social
📣 Life update: Thrilled to announce that I’ll be starting as faculty at the Max Planck Institute for Software Systems this Fall!

I’ll be recruiting PhD students in the upcoming cycle, as well as research interns throughout the year: lasharavichander.github.io/contact.html
Kaiserslautern, Germany
Reposted by Antoine Bosselut
epfl-ai-center.bsky.social
EPFL and ETH Zürich are building together a Swiss made LLM from scratch.
Fully open and multilingual, the model is trained on CSCS's supercomputer "Alps" and supports sovereign, transparent, and responsible AI in Switzerland and beyond.
Read more here: ai.epfl.ch/a-language-m...
#ResponsibleAI
A language model built for the public good     - EPFL AI Center
ETH Zurich and EPFL will release a large language model (LLM) developed on public infrastructure. Trained on the “Alps” supercomputer at the Swiss National Supercomputing Centre (CSCS), the new LLM ma...
ai.epfl.ch
abosselut.bsky.social
Check out Silin's paper done in collaboration with Apple on reinforcing abstract thinking in reasoning traces!
silingao.bsky.social
NEW PAPER ALERT: Recent studies have shown that LLMs often lack robustness to distribution shifts in their reasoning. Our paper proposes a new method, AbstRaL, to augment LLMs’ reasoning robustness, by promoting their abstract thinking with granular reinforcement learning.
abosselut.bsky.social
Check out @bkhmsi.bsky.social 's great work on mixture-of-expert models that are specialized to represent the behavior of known brain networks.
bkhmsi.bsky.social
🚨 New Preprint!!

Thrilled to share with you our latest work: “Mixture of Cognitive Reasoners”, a modular transformer architecture inspired by the brain’s functional networks: language, logic, social reasoning, and world knowledge.

1/ 🧵👇
Reposted by Antoine Bosselut
epfl-ai-center.bsky.social
Many AI models speak dozens of languages, but do they grasp cultural context? 🗣️🌍
The INCLUDE benchmark from EPFL's NLP Lab and @cohereforai.bsky.social reveal that there is still a gap...
👉 Find out how benchmarks like INCLUDE can help make AI truly inclusive: actu.epfl.ch/news/beyond-...
Beyond translation – making AI multicultural
A team of international researchers led by EPFL developed a multilingual benchmark to determine Large Language Models ability to grasp cultural context.
actu.epfl.ch