Lightnews — Scholar-powered news

Reposted by Sarah Wiegreffe

Aaron Mueller @amuuueller.bsky.social · Jul 17

If you're at #ICML2025, chat with me, @sarah-nlp.bsky.social, Atticus, and others at our poster 11am - 1:30pm at East #1205! We're establishing a 𝗠echanistic 𝗜nterpretability 𝗕enchmark.

We're planning to keep this a living benchmark; come by and share your ideas/hot takes!

3 13

Sarah Wiegreffe @sarah-nlp.bsky.social · Jul 16

I am also recruiting PhD students @univofmaryland.bsky.social for fall 2026 with interests in (causal/mechanistic) LM interpretability and its practical applications (steering, efficient adaptation, model editing, textual explanations for users, etc.).

2

Sarah Wiegreffe @sarah-nlp.bsky.social · Jul 16

I am at #ICML2025! 🇨🇦🏞️
Catch me:

1️⃣ Presenting this paper👇 tomorrow 11am-1:30pm at East #1205

2️⃣ At the Actionable Interpretability @actinterp.bsky.social workshop on Saturday in East Ballroom A (I’m an organizer!)

Aaron Mueller @amuuueller.bsky.social · Apr 23

Lots of progress in mech interp (MI) lately! But how can we measure when new mech interp methods yield real improvements over prior work?

We propose 😎 𝗠𝗜𝗕: a 𝗠echanistic 𝗜nterpretability 𝗕enchmark!

1 1 3

Reposted by Sarah Wiegreffe

Ai2 @ai2.bsky.social · Jul 14

This week is #ICML in Vancouver, and a number of our researchers are participating. Here's the full list of Ai2's conference engagements—we look forward to connecting with fellow attendees. 👋

2 3

Sarah Wiegreffe @sarah-nlp.bsky.social · Jun 26

Thank you! Look forward to being colleagues.

Sarah Wiegreffe @sarah-nlp.bsky.social · Jun 26

Thank you!

Sarah Wiegreffe @sarah-nlp.bsky.social · Jun 26

Thank you!

Sarah Wiegreffe @sarah-nlp.bsky.social · Jun 26

Thanks :))

Sarah Wiegreffe @sarah-nlp.bsky.social · Jun 16

Thanks so much for all your support ☺️🥰

1

Sarah Wiegreffe @sarah-nlp.bsky.social · Jun 16

Thank you!

Sarah Wiegreffe @sarah-nlp.bsky.social · Jun 16

Thank you 😄

Sarah Wiegreffe @sarah-nlp.bsky.social · Jun 16

☺️ come visit!

1

Sarah Wiegreffe @sarah-nlp.bsky.social · Jun 13

A bit late to announce, but I’m excited to share that I'll be starting as an assistant professor at UMD CS @univofmaryland.bsky.social this August.

I'll be recruiting PhD students this upcoming cycle for fall 2026. (And if you're a UMD grad student, sign up for my fall seminar!)

13 3 65

Sarah Wiegreffe @sarah-nlp.bsky.social · May 30

Congrats Kristina! 😍

1 1

Reposted by Sarah Wiegreffe

Actionable Interpretability Workshop ICML2025 @actinterp.bsky.social · May 20

🚨 We're looking for more reviewers for the workshop!
📆 Review period: May 24-June 7

If you're passionate about making interpretability useful and want to help shape the conversation, we'd love your input.

💡🔍 Self-nominate here:
docs.google.com/forms/d/e/1F...

5 6

Sarah Wiegreffe @sarah-nlp.bsky.social · Apr 25

🤖: "Great review, but it could be improved by doing [exact thing I wrote in subsequent sentences]"

3

Sarah Wiegreffe @sarah-nlp.bsky.social · Apr 25

Where is version control and shared editing for keynote files?! 🤦‍♀️

2

Sarah Wiegreffe @sarah-nlp.bsky.social · Apr 25

We are quite excited about the leaderboard and release, and are open to feedback to help this remain a living benchmark.

1

Sarah Wiegreffe @sarah-nlp.bsky.social · Apr 25

Checkout our new preprint/project which has been over a year in the making! This has been a very fun collaboration (and one of the biggest I've personally participated in).

@amuuueller.bsky.social @boknilev.bsky.social and other co-authors are around #ICLR2025 if you want to find out more. 😊

Aaron Mueller @amuuueller.bsky.social · Apr 23

Lots of progress in mech interp (MI) lately! But how can we measure when new mech interp methods yield real improvements over prior work?

We propose 😎 𝗠𝗜𝗕: a 𝗠echanistic 𝗜nterpretability 𝗕enchmark!

1 9

Sarah Wiegreffe @sarah-nlp.bsky.social · Apr 25

See Yanai's thread for more info:
bsky.app/profile/yana...

Yanai Elazar @yanai.bsky.social · Apr 25

💡 New ICLR paper! 💡
"On Linear Representations and Pretraining Data Frequency in Language Models":

We provide an explanation for when & why linear representations form in large (or small) language models.

Led by @jackmerullo.bsky.social, w/ @nlpnoah.bsky.social & @sarah-nlp.bsky.social

Sarah Wiegreffe @sarah-nlp.bsky.social · Apr 25

2) On the connection between linear relational embeddings in LMs and frequency of relations in pretraining data
- Led by @jackmerullo.bsky.social w/ @nlpnoah.bsky.social @yanai.bsky.social
- arxiv.org/abs/2504.12459
- Yanai is presenting the poster tomorrow 04/26 10am-12:30pm (Hall 3+Hall 2B #236)!

1 1 2

Sarah Wiegreffe @sarah-nlp.bsky.social · Apr 25

I'm not at #ICLR2025, but have 2 works being presented:

1) Understanding how LMs answer multiple-choice questions
- arxiv.org/abs/2407.15018
- @boknilev.bsky.social is presenting the poster *now* until 12:30 (Hall 3+Hall 2B #207)
- & w/ @oyvind-t.bsky.social @hanna-nlp.bsky.social Ashish Sabharwal

1 1 6

Reposted by Sarah Wiegreffe

Yanai Elazar @yanai.bsky.social · Apr 25

I'm in Singapore for ICLR to present this paper:
Tomorrow, April 26th, 10-12:30 in Hall 3+2B #236
Come check it out!

arxiv.org/abs/2504.12459

2 3

Reposted by Sarah Wiegreffe

Yanai Elazar @yanai.bsky.social · Apr 25

💡 New ICLR paper! 💡
"On Linear Representations and Pretraining Data Frequency in Language Models":

We provide an explanation for when & why linear representations form in large (or small) language models.

Led by @jackmerullo.bsky.social, w/ @nlpnoah.bsky.social & @sarah-nlp.bsky.social

3 12 43

Sarah Wiegreffe @sarah-nlp.bsky.social · Apr 3

Have work on the actionable impact of interpretability findings? Consider submitting to our Actionable Interpretability workshop at ICML! See below for more info.

Website: actionable-interpretability.github.io
Deadline: May 9

Mor Geva @megamor2.bsky.social · Mar 31

🎉 Our Actionable Interpretability workshop has been accepted to #ICML2025! 🎉
> Follow @actinterp.bsky.social
> Website actionable-interpretability.github.io

@talhaklay.bsky.social @anja.re @mariusmosbach.bsky.social @sarah-nlp.bsky.social @iftenney.bsky.social

Paper submission deadline: May 9th!

10 20