Sarah Wiegreffe
@sarah-nlp.bsky.social
1.2K followers 210 following 30 posts
Research in NLP (mostly LM interpretability & explainability). Assistant prof at UMD CS + CLIP. Previously @ai2.bsky.social @uwnlp.bsky.social Views my own. sarahwie.github.io
Posts Media Videos Starter Packs
Pinned
sarah-nlp.bsky.social
A bit late to announce, but I’m excited to share that I'll be starting as an assistant professor at UMD CS @univofmaryland.bsky.social this August.

I'll be recruiting PhD students this upcoming cycle for fall 2026. (And if you're a UMD grad student, sign up for my fall seminar!)
Reposted by Sarah Wiegreffe
amuuueller.bsky.social
If you're at #ICML2025, chat with me, @sarah-nlp.bsky.social, Atticus, and others at our poster 11am - 1:30pm at East #1205! We're establishing a 𝗠echanistic 𝗜nterpretability 𝗕enchmark.

We're planning to keep this a living benchmark; come by and share your ideas/hot takes!
sarah-nlp.bsky.social
I am also recruiting PhD students @univofmaryland.bsky.social for fall 2026 with interests in (causal/mechanistic) LM interpretability and its practical applications (steering, efficient adaptation, model editing, textual explanations for users, etc.).
sarah-nlp.bsky.social
I am at #ICML2025! 🇨🇦🏞️
Catch me:

1️⃣ Presenting this paper👇 tomorrow 11am-1:30pm at East #1205

2️⃣ At the Actionable Interpretability @actinterp.bsky.social workshop on Saturday in East Ballroom A (I’m an organizer!)
amuuueller.bsky.social
Lots of progress in mech interp (MI) lately! But how can we measure when new mech interp methods yield real improvements over prior work?

We propose 😎 𝗠𝗜𝗕: a 𝗠echanistic 𝗜nterpretability 𝗕enchmark!
Logo for MIB: A Mechanistic Interpretability Benchmark
Reposted by Sarah Wiegreffe
ai2.bsky.social
Ai2 @ai2.bsky.social · Jul 14
This week is #ICML in Vancouver, and a number of our researchers are participating. Here's the full list of Ai2's conference engagements—we look forward to connecting with fellow attendees. 👋
sarah-nlp.bsky.social
Thank you! Look forward to being colleagues.
sarah-nlp.bsky.social
Thanks so much for all your support ☺️🥰
sarah-nlp.bsky.social
A bit late to announce, but I’m excited to share that I'll be starting as an assistant professor at UMD CS @univofmaryland.bsky.social this August.

I'll be recruiting PhD students this upcoming cycle for fall 2026. (And if you're a UMD grad student, sign up for my fall seminar!)
sarah-nlp.bsky.social
Congrats Kristina! 😍
Reposted by Sarah Wiegreffe
actinterp.bsky.social
🚨 We're looking for more reviewers for the workshop!
📆 Review period: May 24-June 7

If you're passionate about making interpretability useful and want to help shape the conversation, we'd love your input.

💡🔍 Self-nominate here:
docs.google.com/forms/d/e/1F...
An image with the Vancouver skyline and the words "sign up to review". At the top are the logos of both the Actionable Interpretability workshop (a magnifying glass) and the ICML conference (a brain).
sarah-nlp.bsky.social
🤖: "Great review, but it could be improved by doing [exact thing I wrote in subsequent sentences]"
sarah-nlp.bsky.social
Where is version control and shared editing for keynote files?! 🤦‍♀️
sarah-nlp.bsky.social
We are quite excited about the leaderboard and release, and are open to feedback to help this remain a living benchmark.
sarah-nlp.bsky.social
Checkout our new preprint/project which has been over a year in the making! This has been a very fun collaboration (and one of the biggest I've personally participated in).

@amuuueller.bsky.social @boknilev.bsky.social and other co-authors are around #ICLR2025 if you want to find out more. 😊
amuuueller.bsky.social
Lots of progress in mech interp (MI) lately! But how can we measure when new mech interp methods yield real improvements over prior work?

We propose 😎 𝗠𝗜𝗕: a 𝗠echanistic 𝗜nterpretability 𝗕enchmark!
Logo for MIB: A Mechanistic Interpretability Benchmark
sarah-nlp.bsky.social
See Yanai's thread for more info:
bsky.app/profile/yana...
yanai.bsky.social
💡 New ICLR paper! 💡
"On Linear Representations and Pretraining Data Frequency in Language Models":

We provide an explanation for when & why linear representations form in large (or small) language models.

Led by @jackmerullo.bsky.social, w/ @nlpnoah.bsky.social & @sarah-nlp.bsky.social
sarah-nlp.bsky.social
2) On the connection between linear relational embeddings in LMs and frequency of relations in pretraining data
- Led by @jackmerullo.bsky.social w/ @nlpnoah.bsky.social @yanai.bsky.social
- arxiv.org/abs/2504.12459
- Yanai is presenting the poster tomorrow 04/26 10am-12:30pm (Hall 3+Hall 2B #236)!
sarah-nlp.bsky.social
I'm not at #ICLR2025, but have 2 works being presented:

1) Understanding how LMs answer multiple-choice questions
- arxiv.org/abs/2407.15018
- @boknilev.bsky.social is presenting the poster *now* until 12:30 (Hall 3+Hall 2B #207)
- & w/ @oyvind-t.bsky.social @hanna-nlp.bsky.social Ashish Sabharwal
Reposted by Sarah Wiegreffe
yanai.bsky.social
I'm in Singapore for ICLR to present this paper:
Tomorrow, April 26th, 10-12:30 in Hall 3+2B #236
Come check it out!

arxiv.org/abs/2504.12459
Reposted by Sarah Wiegreffe
yanai.bsky.social
💡 New ICLR paper! 💡
"On Linear Representations and Pretraining Data Frequency in Language Models":

We provide an explanation for when & why linear representations form in large (or small) language models.

Led by @jackmerullo.bsky.social, w/ @nlpnoah.bsky.social & @sarah-nlp.bsky.social
sarah-nlp.bsky.social
Have work on the actionable impact of interpretability findings? Consider submitting to our Actionable Interpretability workshop at ICML! See below for more info.

Website: actionable-interpretability.github.io
Deadline: May 9
megamor2.bsky.social
🎉 Our Actionable Interpretability workshop has been accepted to #ICML2025! 🎉
> Follow @actinterp.bsky.social
> Website actionable-interpretability.github.io

@talhaklay.bsky.social @anja.re @mariusmosbach.bsky.social @sarah-nlp.bsky.social @iftenney.bsky.social

Paper submission deadline: May 9th!