Clément Dumas
@butanium.bsky.social
530 followers
210 following
43 posts
Master student at ENS Paris-Saclay / aspiring AI safety researcher / improviser
Prev research intern @ EPFL w/ wendlerc.bsky.social and Robert West
MATS Winter 7.0 Scholar w/ neelnanda.bsky.social
https://butanium.github.io
Posts
Media
Videos
Starter Packs
Pinned
Clément Dumas
@butanium.bsky.social
· Sep 5
Clément Dumas
@butanium.bsky.social
· Sep 5
Clément Dumas
@butanium.bsky.social
· Sep 5
Reposted by Clément Dumas
John David Pressman
@jdp.extropian.net
· Aug 29
Reposted by Clément Dumas
Clément Dumas
@butanium.bsky.social
· Aug 8
Clément Dumas
@butanium.bsky.social
· Aug 6
Reposted by Clément Dumas
Reposted by Clément Dumas
Clément Dumas
@butanium.bsky.social
· Jun 29
Clément Dumas
@butanium.bsky.social
· Jun 29
Separating Tongue from Thought: Activation Patching Reveals Language-Agnostic Concept Representations in Transformers
A central question in multilingual language modeling is whether large language models (LLMs) develop a universal concept representation, disentangled from specific languages. In this paper, we address...
arxiv.org
Clément Dumas
@butanium.bsky.social
· Jun 29
Clément Dumas
@butanium.bsky.social
· Jun 29
Clément Dumas on X: "Excited to share our latest paper, accepted as a spotlight at the #ICML2024 mechanistic interpretability workshop! We find evidence that LLMs use language-agnostic representations of concepts 🧵↘️ https://t.co/dDS5iv199i" / X
Excited to share our latest paper, accepted as a spotlight at the #ICML2024 mechanistic interpretability workshop! We find evidence that LLMs use language-agnostic representations of concepts 🧵↘️ https://t.co/dDS5iv199i
x.com
Clément Dumas
@butanium.bsky.social
· Jun 26
Reposted by Clément Dumas
Clément Dumas
@butanium.bsky.social
· Apr 9
Reposted by Clément Dumas
Clément Dumas
@butanium.bsky.social
· Apr 7
Robustly identifying concepts introduced during chat fine-tuning using crosscoders
Model diffing is the study of how fine-tuning changes a model's representations and internal algorithms. Many behaviours of interest are introduced during fine-tuning, and model diffing offers a promi...
arxiv.org