Lightnews — Scholar-powered news

Arnab Sen Sharma

@arnabsensharma.bsky.social

26 followers 51 following 15 posts

PhD Student at Northeastern, working to make LLMs interpretable

Posts Replies Media Videos

Arnab Sen Sharma

@arnabsensharma.bsky.social

We validate this flag-based eager evaluation hypothesis with a series of carefully designed causal analysis. If we swap this flag with another item, in the question-before context the LM consistently picks the item carrying the flag. However, the question-after is not sensitive to this.

November 4, 2025 at 5:48 PM

Arnab Sen Sharma

@arnabsensharma.bsky.social

We test this across a range of different semantic types, presentation formats, languages, and even different tasks that require a different "reduce" step after filtering.

November 4, 2025 at 5:48 PM

Arnab Sen Sharma

@arnabsensharma.bsky.social

🤔 But do these heads play a *causal* role in the operation?

To test them, we transport their query states from one context to another. We find that will trigger the execution of the same filtering operation, even if the new context has a new list of items and format!

November 4, 2025 at 5:48 PM

Arnab Sen Sharma

@arnabsensharma.bsky.social

🔍 In Llama-70B and Gemma-27B, we found special attention heads that consistently focus their attention on the filtered items. This behavior seems consistent across a range of different formats and semantic types.

November 4, 2025 at 5:48 PM

Arnab Sen Sharma

@arnabsensharma.bsky.social

How can a language model find the veggies in a menu?

New pre-print where we investigate the internal mechanisms of LLMs when filtering on a list of options.

Spoiler: turns out LLMs use strategies surprisingly similar to functional programming (think "filter" from python)! 🧵

November 4, 2025 at 5:48 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news