Tiago Pimentel
@tpimentel.bsky.social
2.2K followers 120 following 32 posts
Postdoc at ETH. Formerly, PhD student at the University of Cambridge :)
Posts Media Videos Starter Packs
Pinned
tpimentel.bsky.social
A string may get 17 times less probability if tokenised as two symbols (e.g., ⟨he, llo⟩) than as one (e.g., ⟨hello⟩)—by an LM trained from scratch in each situation! Our new ACL paper proposes an observational method to estimate this causal effect! Longer thread soon!
Title of paper "Causal Estimation of Tokenisation Bias" and schematic of how we define tokenisation bias, which is the causal effect we are interested in.
Reposted by Tiago Pimentel
mryskina.bsky.social
Interested in language models, brains, and concepts? Check out our COLM 2025 🔦 Spotlight paper!

(And if you’re at COLM, come hear about it on Tuesday – sessions Spotlight 2 & Poster 2)!
Paper title: Language models align with brain regions that represent concepts across modalities.
Authors:  Maria Ryskina, Greta Tuckute, Alexander Fung, Ashley Malkin, Evelina Fedorenko. 
Affiliations: Maria is affiliated with the Vector Institute for AI, but the work was done at MIT. All other authors are affiliated with MIT. 
Email address: maria.ryskina@vectorinstitute.ai.
Reposted by Tiago Pimentel
alexanderhoyle.bsky.social
Accepted to EMNLP (and more to come 👀)! The camera ready version is now online---very happy with how this turned out

arxiv.org/abs/2507.01234
alexanderhoyle.bsky.social
New preprint! Have you ever tried to cluster text embeddings from different sources, but the clusters just reproduce the sources? Or attempted to retrieve similar documents across multiple languages, and even multilingual embeddings return items in the same language?

Turns out there's an easy fix🧵
Barchart of number of items in four clusters of text embeddings, with colors showing the distribution of sources in each cluster.

Caption: Clustering text embeddings from disparate sources (here, U.S. congressional bill summaries and senators’ tweets) can produce clusters where one source dominates (Panel A). Using linear erasure to remove the source information produces more evenly balanced clusters that maintain semantic coherence (Panel B; sampled items relate to immigration). Four random clusters of k-means shown (k=25), trained on a combined 5,000 samples from each dataset
tpimentel.bsky.social
See our paper for more: we have analyses on other models, downstream tasks, and considering only subsets of tokens (e.g., only tokens with a certain part-of-speech)!
tpimentel.bsky.social
This means that: (1) LMs can get less similar to each other, even while they all get closer to the true distribution; and (2) larger models reconverge faster, while small ones may never reconverge.
tpimentel.bsky.social
* A sharp-divergence phase, where models diverge as they start using context.
* A slow-reconvergence phase, where predictions slowly become more similar again (especially in larger models).
tpimentel.bsky.social
Surprisingly, convergence isn’t monotonic. Instead, we find four convergence phases across model training.
* A uniform phase, where all seeds output nearly-uniform distributions.
* A sharp-convergence phase, where models align, largely due to unigram frequency learning.
tpimentel.bsky.social
In this paper, we define convergence as the similarity between outputs of LMs trained under different seeds, where similarity is measured as a per-token KL divergence. This lets us track whether models trained under identical settings, but different seeds, behave the same.
tpimentel.bsky.social
LLMs are trained to mimic a “true” distribution—their reducing cross-entropy then confirms they get closer to this target while training. Do similar models approach this target distribution in similar ways, though? 🤔 Not really! Our new paper studies this, finding 4-convergence phases in training 🧵
Figure showing the four phases of convergence in LM training
tpimentel.bsky.social
Very happy this paper got accepted to NeurIPS 2025 as a Spotlight! 😁

Main takeaway: In mechanistic interpretability, we need assumptions about how DNNs encode concepts in their representations (eg, the linear representation hypothesis). Without them, we can claim any DNN implements any algorithm!
tpimentel.bsky.social
Mechanistic interpretability often relies on *interventions* to study how DNNs work. Are these interventions enough to guarantee the features we find are not spurious? No!⚠️ In our new paper, we show many mech int methods implicitly rely on the linear representation hypothesis🧵
Paper title "The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?" with the paper's graphical abstract showing how more powerful alignment maps between a DNN and an algorithm allow more complex features to be found and more "accurate" abstractions.
tpimentel.bsky.social
Honoured to receive two (!!) SAC highlights awards at #ACL2025 😁 (Conveniently placed on the same slide!)
With the amazing: @philipwitti.bsky.social, @gregorbachmann.bsky.social and @wegotlieb.bsky.social,
@cuiding.bsky.social, Giovanni Acampa, @alexwarstadt.bsky.social, @tamaregev.bsky.social
tpimentel.bsky.social
We are presenting this paper at #ACL2025 😁 Find us at poster session 4 (Wednesday morning, 11h~12h30) to learn more about tokenisation bias!
tpimentel.bsky.social
A string may get 17 times less probability if tokenised as two symbols (e.g., ⟨he, llo⟩) than as one (e.g., ⟨hello⟩)—by an LM trained from scratch in each situation! Our new ACL paper proposes an observational method to estimate this causal effect! Longer thread soon!
Title of paper "Causal Estimation of Tokenisation Bias" and schematic of how we define tokenisation bias, which is the causal effect we are interested in.
tpimentel.bsky.social
@philipwitti.bsky.social will be presenting our paper "Tokenisation is NP-Complete" at #ACL2025 😁 Come to the language modelling 2 session (Wednesday morning, 9h~10h30) to learn more about how challenging tokenisation can be!
Reposted by Tiago Pimentel
pietrolesci.bsky.social
Headed to Vienna for #ACL2025 to present our tokenisation bias paper and co-organise the L2M2 workshop on memorisation in language models. Reach out to chat about tokenisation, memorisation, and all things pre-training (esp. data-related topics)!
pietrolesci.bsky.social
All modern LLMs run on top of a tokeniser, an often overlooked “preprocessing detail”. But what if that tokeniser systematically affects model behaviour? We call this tokenisation bias.

Let’s talk about it and why it matters👇
@aclmeeting.bsky.social #ACL2025 #NLProc
Reposted by Tiago Pimentel
jkminder.bsky.social
Causal Abstraction, the theory behind DAS, tests if a network realizes a given algorithm. We show (w/ @denissutter.bsky.social, T. Hofmann, @tpimentel.bsky.social ) that the theory collapses without the linear representation hypothesis—a problem we call the non-linear representation dilemma.
tpimentel.bsky.social
Importantly, despite these results, we still believe causal abstraction is one of the best frameworks available for mech interpretability. Going forward, we should try to better understand how it is impacted by assumptions about how DNNs encode information. Longer🧵soon by @denissutter.bsky.social
tpimentel.bsky.social
Overall, our results show that causal abstraction (and interventions) is not a silver bullet, as it relies on assumptions about how features are encoded in the DNNs. We then connect our results to the linear representation hypothesis and to older debates in the probing literature.
tpimentel.bsky.social
We show—both theoretically (under reasonable assumptions) and empirically (on real-world models)—that, if we allow variables to be encoded in arbitrarily complex subspaces of the DNN’s representations, any algorithm can be mapped to any model.
tpimentel.bsky.social
Causal abstraction identifies this correspondence by finding subspaces in the DNN's hidden states which encode the algorithm’s hidden variables. Given such a map, we say the DNN implements the algorithm if the two behave identically under interventions.
tpimentel.bsky.social
Mechanistic interpretability often relies on *interventions* to study how DNNs work. Are these interventions enough to guarantee the features we find are not spurious? No!⚠️ In our new paper, we show many mech int methods implicitly rely on the linear representation hypothesis🧵
Paper title "The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?" with the paper's graphical abstract showing how more powerful alignment maps between a DNN and an algorithm allow more complex features to be found and more "accurate" abstractions.
Reposted by Tiago Pimentel
pietrolesci.bsky.social
All modern LLMs run on top of a tokeniser, an often overlooked “preprocessing detail”. But what if that tokeniser systematically affects model behaviour? We call this tokenisation bias.

Let’s talk about it and why it matters👇
@aclmeeting.bsky.social #ACL2025 #NLProc
Reposted by Tiago Pimentel
rtommccoy.bsky.social
The word "laundry" contains both steps of the laundry process:
1. Undry
2. Dry
Reposted by Tiago Pimentel
musashihi.bsky.social
Love this! Especially the explicit operationalization of what “bias” they are measuring via specifying the relevant counterfactual.
Definitely an approach that more papers talking about effects can incorporate to better clarify what the phenomenon they are studying.
tpimentel.bsky.social
A string may get 17 times less probability if tokenised as two symbols (e.g., ⟨he, llo⟩) than as one (e.g., ⟨hello⟩)—by an LM trained from scratch in each situation! Our new ACL paper proposes an observational method to estimate this causal effect! Longer thread soon!
Title of paper "Causal Estimation of Tokenisation Bias" and schematic of how we define tokenisation bias, which is the causal effect we are interested in.
tpimentel.bsky.social
If you use LLMs, tokenisation bias probably affects you:
* Text generation: tokenisation bias ⇒ length bias 🤯
* Psycholinguistics: tokenisation bias ⇒ systematically biased surprisal estimates 🫠
* Interpretability: tokenisation bias ⇒ biased logits 🤔
tpimentel.bsky.social
A string may get 17 times less probability if tokenised as two symbols (e.g., ⟨he, llo⟩) than as one (e.g., ⟨hello⟩)—by an LM trained from scratch in each situation! Our new ACL paper proposes an observational method to estimate this causal effect! Longer thread soon!
Title of paper "Causal Estimation of Tokenisation Bias" and schematic of how we define tokenisation bias, which is the causal effect we are interested in.