Lightnews — Scholar-powered news

Reposted by Mor Geva

Yoav Gur Arieh @yoav.ml · 13h

🧠 To reason over text and track entities, we find that language models use three types of 'pointers'!

They were thought to rely only on a positional one—but when many entities appear, that system breaks down.

Our new paper shows what these pointers are and how they interact 👇

1 1 3

Reposted by Mor Geva

Sohee Yang @soheeyang.bsky.social · Jun 13

🚨 New Paper 🚨
How effectively do reasoning models reevaluate their thought? We find that:
- Models excel at identifying unhelpful thoughts but struggle to recover from them
- Smaller models can be more robust
- Self-reevaluation ability is far from true meta-cognitive awareness
1/N 🧵

1 3 12

Reposted by Mor Geva

Yoav Gur Arieh @yoav.ml · May 29

New Paper Alert! Can we precisely erase conceptual knowledge from LLM parameters?
Most methods are shallow, coarse, or overreach, adversely affecting related or general knowledge.

We introduce🪝𝐏𝐈𝐒𝐂𝐄𝐒 — a general framework for Precise In-parameter Concept EraSure. 🧵 1/

1 2 5

Reposted by Mor Geva

Marius Mosbach @mariusmosbach.bsky.social · Apr 15

Checkout Benno's notes about our impact of interpretability paper 👇.

Also, we are organizing a workshop at #ICML2025 which is inspired by some of the questions discussed in the paper: actionable-interpretability.github.io

3 11

Reposted by Mor Geva

Sarah Wiegreffe @sarah-nlp.bsky.social · Apr 3

Have work on the actionable impact of interpretability findings? Consider submitting to our Actionable Interpretability workshop at ICML! See below for more info.

Website: actionable-interpretability.github.io
Deadline: May 9

Mor Geva @megamor2.bsky.social · Mar 31

🎉 Our Actionable Interpretability workshop has been accepted to #ICML2025! 🎉
> Follow @actinterp.bsky.social
> Website actionable-interpretability.github.io

@talhaklay.bsky.social @anja.re @mariusmosbach.bsky.social @sarah-nlp.bsky.social @iftenney.bsky.social

Paper submission deadline: May 9th!

10 20

Mor Geva @megamor2.bsky.social · Mar 31

Forgot to tag the one and only @hadasorgad.bsky.social !!!

2

Mor Geva @megamor2.bsky.social · Mar 31

🎉 Our Actionable Interpretability workshop has been accepted to #ICML2025! 🎉
> Follow @actinterp.bsky.social
> Website actionable-interpretability.github.io

@talhaklay.bsky.social @anja.re @mariusmosbach.bsky.social @sarah-nlp.bsky.social @iftenney.bsky.social

Paper submission deadline: May 9th!

3 16 43

Mor Geva @megamor2.bsky.social · Feb 24

📣 📣 Looking for ethics reviewers for COLM 2025!
Please sign up and share the form below 👇
forms.gle/3a52jbDNB9bd...

COLM 2025 Ethics Reviewer Sign Up

Ethics reviewing of papers for COLM 2025 starts in May. We will share more details later. In the meantime, please sign up.

forms.gle

2 2

Mor Geva @megamor2.bsky.social · Feb 13

Communication between LLM agents can be super noisy! One rogue agent can easily drag the whole system into failure 😱

We find that (1) it's possible to detect rogue agents early on
(2) interventions can boost system performance by up to 20%!

Thread with details and paper link below!

ohav.bsky.social @ohav.bsky.social · Feb 13

"One bad apple can spoil the bunch 🍎", and that's doubly true for language agents!
Our new paper shows how monitoring and intervention can prevent agents from going rogue, boosting performance by up to 20%. We're also releasing a new multi-agent environment 🕵️‍♂️