Lightnews — Scholar-powered news

Deniz Bayazit @bayazitdeniz.bsky.social · 13d

7/ Work done with amazing collaborators Aaron Mueller
@amuuueller.bsky.social and Antoine Bosselut @abosselut.bsky.social !

Paper: arxiv.org/abs/2509.05291
Code: github.com/bayazitdeniz...

Crosscoding Through Time: Tracking Emergence & Consolidation Of Linguistic Representations Throughout LLM Pretraining

Large language models (LLMs) learn non-trivial abstractions during pretraining, like detecting irregular plural noun subjects. However, it is not well understood when and how specific linguistic abili...

arxiv.org

2

Deniz Bayazit @bayazitdeniz.bsky.social · 13d

6/ Concurrently, recent work shows broad phases of concept evolution (statistical→feature learning) with sparse crosscoders; we track causal dynamics of specific concepts over time and across languages with RelIE, giving a fuller and deeper view.

arxiv.org/abs/2509.17196

Evolution of Concepts in Language Model Pre-Training

Language models obtain extensive capabilities through pre-training. However, the pre-training process remains a black box. In this work, we track linear interpretable feature evolution across pre-trai...

arxiv.org

1 2

Deniz Bayazit @bayazitdeniz.bsky.social · 13d

5/ Looking closer, feature sharing has limits: in Hindi & Arabic, overlap stays low even at 341B tokens. This may be due to richer agreement systems (e.g., verbs agreeing w/ subjects & objects) forcing BLOOM to keep language-specific features—or simply data scarcity!

1 2

Deniz Bayazit @bayazitdeniz.bsky.social · 13d

4/ In #multilingual models, cross-language feature overlap starts low and rises with training. At 6B tokens in BLOOM, most detectors are language-specific or for punctuation; by 341B tokens shared crosslingual features emerge, capturing syntactic abstractions over token patterns.

1 2

Deniz Bayazit @bayazitdeniz.bsky.social · 13d

3/ Which features matter early but fade, and which gain importance later? In Pythia, token-level detectors drop out, while higher-level grammatical features—like plural-noun detectors and nouns formed from verbs (e.g., runner from run)—strengthen by 286B tokens.

1 2

Deniz Bayazit @bayazitdeniz.bsky.social · 13d

2/ We align critical checkpoints for a task with sparse crosscoders, measure each feature’s causal role, and introduce RelIE to compare their influence across checkpoints. This lets us trace how internal features shift—and when they matter—in models like Pythia, OLMo, and BLOOM.

1 2

Deniz Bayazit @bayazitdeniz.bsky.social · 13d

1/🚨 New preprint

How do #LLMs’ inner features change as they train? Using #crosscoders + a new causal metric, we map when features appear, strengthen, or fade across checkpoints—opening a new lens on training dynamics beyond loss curves & benchmarks.

#interpretability

2 6 13