Lightnews — Scholar-powered news

bowphs.bsky.social

@bowphs.bsky.social

Looking at Bruegel's Tower of Babel in Vienna makes you wonder: How can multilingual language models overcome the language barriers? Find out tomorrow!
📍 Level 1 (ironic, right?), Room 1.15-1
🕐 2 PM
#ACL2025NLP

Me in front of Bruegel's painting "The Tower of Babel".

July 27, 2025 at 9:11 PM

bowphs.bsky.social

@bowphs.bsky.social

This phenomenon has a visible effect on text generation: In BLOOM-560m, activating 'earthquake' neurons derived from Spanish data at checkpoint 10,000 generates Spanish text. At checkpoint 400,000, the same method yields English text!

Table comparing text generations between early and late checkpoints for the concepts "earthquake" and "joy". Early checkpoint generations show language-specific text, while late checkpoint generations demonstrate a shift toward "language-agnostic" (= English) text.

June 7, 2025 at 10:12 AM

bowphs.bsky.social

@bowphs.bsky.social

This is not a bug, it's a feature! These layers are repurposing the space to form cross-lingual abstractions.
We track this by examining how specific concepts (like "earthquake" or "joy") align across languages.

Average overlap proportion of expert neurons across layers and training checkpoints. Later checkpoints exhibit more shared neurons, particularly in the middle layers.

June 7, 2025 at 10:12 AM

bowphs.bsky.social

@bowphs.bsky.social

How and when do multilingual LMs achieve cross-lingual generalization during pre-training? And why do later, supposedly more advanced checkpoints, lose some language identification abilities in the process? Our #ACL2025 paper investigates.

Probing classifier performance comparison between early and late checkpoint across layers. While the early checkpoint shows uniformly high performance, the later checkpoint exhibits relatively high variance across layers.

June 7, 2025 at 10:12 AM

bowphs.bsky.social

@bowphs.bsky.social

This phenomenon has a visible effect on text generation: In BLOOM-560m, activating 'earthquake' neurons derived from Spanish data at checkpoint 10,000 generates Spanish text. At checkpoint 400,000, the same method yields English text!

June 7, 2025 at 10:07 AM

bowphs.bsky.social

@bowphs.bsky.social

This is not a bug, it's a feature! These layers are repurposing the space to form cross-lingual abstractions.
We track this by examining how specific concepts (like "earthquake" or "joy") align across languages.

June 7, 2025 at 10:07 AM

bowphs.bsky.social

@bowphs.bsky.social

This phenomenon has a visible effect on text generation: In BLOOM-560m, activating 'earthquake' neurons derived from Spanish data at checkpoint 10,000 generates Spanish text. At checkpoint 400,000, the same method yields English text!

Sample generations demonstrating language-specific generation in early checkpoint and language-agnostic (= English) generation in late checkpoint.

June 6, 2025 at 5:22 PM

bowphs.bsky.social

@bowphs.bsky.social

This is not a bug, it's a feature! These layers are repurposing the space to form cross-lingual abstractions.
We track this by examining how specific concepts (like "earthquake" or "joy") align across languages.

Expert overlap proportion across layers for different training checkpoints.

June 6, 2025 at 5:22 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news