Lightnews — Scholar-powered news

Reposted by Antoine Bosselut

Tiago Pimentel @tpimentel.bsky.social · 7d

Very happy this paper got accepted to NeurIPS 2025 as a Spotlight! 😁

Main takeaway: In mechanistic interpretability, we need assumptions about how DNNs encode concepts in their representations (eg, the linear representation hypothesis). Without them, we can claim any DNN implements any algorithm!

Tiago Pimentel @tpimentel.bsky.social · Jul 14

Mechanistic interpretability often relies on *interventions* to study how DNNs work. Are these interventions enough to guarantee the features we find are not spurious? No!⚠️ In our new paper, we show many mech int methods implicitly rely on the linear representation hypothesis🧵

Paper title "The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?" with the paper's graphical abstract showing how more powerful alignment maps between a DNN and an algorithm allow more complex features to be found and more "accurate" abstractions.

4 24

Reposted by Antoine Bosselut

Aaron Mueller @amuuueller.bsky.social · 7d

What's the right unit of analysis for understanding LLM internals? We explore in our mech interp survey (a major update from our 2024 ms).

We’ve added more recent work and more immediately actionable directions for future work. Now published in Computational Linguistics!

2 14 37

Antoine Bosselut @abosselut.bsky.social · 8d

I don't see why the answer would be no but since you specifically say "October", what if we submitted to ARR in July and want to do early submission to ACL 2026 ?

1

Reposted by Antoine Bosselut

Deniz Bayazit @bayazitdeniz.bsky.social · 13d

1/🚨 New preprint

How do #LLMs’ inner features change as they train? Using #crosscoders + a new causal metric, we map when features appear, strengthen, or fade across checkpoints—opening a new lens on training dynamics beyond loss curves & benchmarks.

#interpretability

2 6 13

Reposted by Antoine Bosselut

Mete @mismayil.bsky.social · 16d

💡Can we optimize LLMs to be more creative?
Introducing Creative Preference Optimization (CrPO) and MuCE (Multi-task Creativity Evaluation Dataset).
Result: More novel, diverse, surprising text—without losing quality!
📝 Appearing at #EMNLP2025

1 4 6

Antoine Bosselut @abosselut.bsky.social · Sep 3

Special thanks to everyone that participated in this journey!

1

Antoine Bosselut @abosselut.bsky.social · Sep 3

(5) Transparency: We're fully open, pairing our weights with a full suite of reproduction artifacts.

Check out our artifacts and technical report here: huggingface.co/swiss-ai

swiss-ai (Swiss AI Initiative)

Org profile for Swiss AI Initiative on Hugging Face, the AI community building the future.

huggingface.co

1 2

Antoine Bosselut @abosselut.bsky.social · Sep 3

(4) Multilinguality: We pretrain the model on 15T tokens from 1811 languages, and post-train with 3.8 M examples from 149 languages

1 2

Antoine Bosselut @abosselut.bsky.social · Sep 3

(3) Memorization Prevention: Adopting the Goldfish objective, we suppress verbatim recall and reduce risks of memorization

1 1

Antoine Bosselut @abosselut.bsky.social · Sep 3

(2) Data Compliance: we pretrained exclusively on openly available data, retroactively respecting robots.txt exclusions and filtering for copyrighted, non-permissive, toxic, and personally identifiable content

1 2

Antoine Bosselut @abosselut.bsky.social · Sep 3

What makes Apertus special?
(1) Scale: Apertus-70B is the first fully open model to be trained at 70B parameter scale on 15T tokens, requiring us to scale out training to 4096 GPUs at
@cscsch.bsky.social

1 2

Antoine Bosselut @abosselut.bsky.social · Sep 3

The next generation of open LLMs should be inclusive, compliant, and multilingual by design. That’s why we @icepfl.bsky.social @ethz.ch @cscsch.bsky.social ) built Apertus.

EPFL School of Computer and Communication Sciences @icepfl.bsky.social · Sep 2

EPFL, ETH Zurich & CSCS just released Apertus, Switzerland’s first fully open-source large language model.
Trained on 15T tokens in 1,000+ languages, it’s built for transparency, responsibility & the public good.

Read more: actu.epfl.ch/news/apertus...

2 5 21

Reposted by Antoine Bosselut

EPFL AI Center @epfl-ai-center.bsky.social · Sep 2

EPFL, @ethz.ch and the @cscsch.bsky.social released Apertus today, Switzerland’s first large-scale, open, multilingual language model — a milestone in generative AI for transparency and diversity.

Find out more here: ai.epfl.ch/apertus-a-fu...

@abosselut.bsky.social @icepfl.bsky.social

Apertus: a fully open, transparent, multilingual language model - EPFL AI Center

EPFL, ETH Zurich and the Swiss National Supercomputing Centre (CSCS) released Apertus today, Switzerland’s first large-scale, open, multilingual language model — a milestone in generative AI for trans...

ai.epfl.ch

7 18

Reposted by Antoine Bosselut

EPFL School of Computer and Communication Sciences @icepfl.bsky.social · Sep 2

EPFL, ETH Zurich & CSCS just released Apertus, Switzerland’s first fully open-source large language model.
Trained on 15T tokens in 1,000+ languages, it’s built for transparency, responsibility & the public good.

Read more: actu.epfl.ch/news/apertus...

1 26 53

Reposted by Antoine Bosselut

Alexander Doria @dorialexander.bsky.social · Sep 2

Very happy to see that Pleias multilingual data processing pipelines have contributed to the largest open pretraining project in Europe.

From their tech report: huggingface.co/swiss-ai/Ape...

2 10 31

Reposted by Antoine Bosselut

Reto Vogt @rvgt.ch · Sep 2

Die Schweiz steigt ins Rennen der grossen Sprachmodelle ein. Unter dem Namen #Apertus veröffentlichen @ethz.ch, @icepfl.bsky.social und das @cscsch.bsky.social das erste vollständig offene, mehrsprachige #LLM des Landes.

Fürs MAZ habe ich Apertus kurz analysiert:

www.maz.ch/news/apertus...

Apertus: ein neues Sprachmodell für die Schweiz

www.maz.ch

3 7 25

Antoine Bosselut @abosselut.bsky.social · Sep 2

Thank you for your incredible work!

1

Reposted by Antoine Bosselut

kyunghyuncho.bsky.social @kyunghyuncho.bsky.social · Aug 12

recently gave a talk on <Reality Checks> at two venues, and discussed (and rambled) about how leaderboard chasing is awesome (and we want it to continue) but that this isn't easy because everyone (me! me! me!) wants to write more papers.

the link to the slide deck in the reply.

3 5 24

Reposted by Antoine Bosselut

Negar Foroutan @negarforoutan.bsky.social · Aug 11

🚨New Preprint!

In multilingual models, the same meaning can take far more tokens in some languages, penalizing users of underrepresented languages with worse performance and higher API costs. Our Parity-aware BPE algorithm is a step toward addressing this issue: 🧵

3 7 28

Antoine Bosselut @abosselut.bsky.social · Aug 4

The EPFL NLP lab is looking to hire a postdoctoral researcher on the topic of designing, training, and evaluating multilingual LLMs:

docs.google.com/document/d/1...

Come join our dynamic group in beautiful Lausanne!

EPFL NLP Postdoctoral Scholar Posting - Swiss AI LLMs

The EPFL Natural Language Processing (NLP) lab is looking to hire a postdoctoral researcher candidate in the area of multilingual LLM design, training, and evaluation. This postdoctoral position is as...

docs.google.com

12 21

Reposted by Antoine Bosselut

Abhilasha Ravichander @lasha.bsky.social · Jul 22

📣 Life update: Thrilled to announce that I’ll be starting as faculty at the Max Planck Institute for Software Systems this Fall!

I’ll be recruiting PhD students in the upcoming cycle, as well as research interns throughout the year: lasharavichander.github.io/contact.html

13 12 89

Reposted by Antoine Bosselut

EPFL AI Center @epfl-ai-center.bsky.social · Jul 9

EPFL and ETH Zürich are building together a Swiss made LLM from scratch.
Fully open and multilingual, the model is trained on CSCS's supercomputer "Alps" and supports sovereign, transparent, and responsible AI in Switzerland and beyond.
Read more here: ai.epfl.ch/a-language-m...
#ResponsibleAI

A language model built for the public good - EPFL AI Center

ETH Zurich and EPFL will release a large language model (LLM) developed on public infrastructure. Trained on the “Alps” supercomputer at the Swiss National Supercomputing Centre (CSCS), the new LLM ma...

ai.epfl.ch

3 10

Antoine Bosselut @abosselut.bsky.social · Jun 23

Check out Silin's paper done in collaboration with Apple on reinforcing abstract thinking in reasoning traces!

silingao.bsky.social @silingao.bsky.social · Jun 23

NEW PAPER ALERT: Recent studies have shown that LLMs often lack robustness to distribution shifts in their reasoning. Our paper proposes a new method, AbstRaL, to augment LLMs’ reasoning robustness, by promoting their abstract thinking with granular reinforcement learning.

3

Antoine Bosselut @abosselut.bsky.social · Jun 18

Check out @bkhmsi.bsky.social 's great work on mixture-of-expert models that are specialized to represent the behavior of known brain networks.

Badr AlKhamissi @bkhmsi.bsky.social · Jun 17

🚨 New Preprint!!

Thrilled to share with you our latest work: “Mixture of Cognitive Reasoners”, a modular transformer architecture inspired by the brain’s functional networks: language, logic, social reasoning, and world knowledge.

1/ 🧵👇

1 3

Reposted by Antoine Bosselut

EPFL AI Center @epfl-ai-center.bsky.social · Jun 2

Many AI models speak dozens of languages, but do they grasp cultural context? 🗣️🌍
The INCLUDE benchmark from EPFL's NLP Lab and @cohereforai.bsky.social reveal that there is still a gap...
👉 Find out how benchmarks like INCLUDE can help make AI truly inclusive: actu.epfl.ch/news/beyond-...

Beyond translation – making AI multicultural

A team of international researchers led by EPFL developed a multilingual benchmark to determine Large Language Models ability to grasp cultural context.

actu.epfl.ch

1 4