Lightnews — Scholar-powered news

Nishant Subramani @ ACL @nsubramani23.bsky.social · 2d

At @colmweb.org all week 🥯🍁! Presenting 3 mechinterp + actionable interp papers at @interplay-workshop.bsky.social

1. BERTology in the Modern World w/ @bearseascape.bsky.social
2. MICE for CATs
3. LLM Microscope w/ Jiarui Liu, Jivitesh Jain, @monadiab77.bsky.social

Reach out to chat! #COLM2025

2 7

Nishant Subramani @ ACL @nsubramani23.bsky.social · Aug 22

Excited to be attending NEMI in Boston today to present 🐁 MICE for CATs: Model-Internal Confidence Estimation for Calibrating Agents with Tools and co-moderate the model steering and control roundtable! Come find me to connect and chat about steering and actionable interp

David Bau @davidbau.bsky.social · Aug 18

This Friday NEMI 2025 is at Northeastern in Boston, 8 talks, 24 roundtables, 90 posters; 200+ attendees. Thanks to
goodfire.ai/ for sponsoring! nemiconf.github.io/summer25/

If you can't make it in person, the livestream will be here:
www.youtube.com/live/4BJBis...

New England Mechanistic Interpretability Workshop

About:The New England Mechanistic Interpretability (NEMI) workshop aims to bring together academic and industry researchers from the New England and surround...

www.youtube.com

2

Nishant Subramani @ ACL @nsubramani23.bsky.social · Jul 25

At #ACL2025 in Vienna 🇦🇹 till next Saturday! Love to chat about anything #interpretability 🔎, understanding model internals 🔬, and finding yummy vegan food 🥬

5

Nishant Subramani @ ACL @nsubramani23.bsky.social · Jul 14

At #ICML2025 🇨🇦 till Sunday! Love to chat about #interpretability, understanding model internals, and finding yummy vegan food in Vancouver 🥬🍜

5

Nishant Subramani @ ACL @nsubramani23.bsky.social · Jun 13

Congrats 🥳🥳🥳🥳

1

Reposted by Nishant Subramani @ ACL

bearseascape.bsky.social @bearseascape.bsky.social · Jun 4

🚨New #interpretability paper with @nsubramani23.bsky.social: 🕵️ Model Internal Sleuthing: Finding Lexical Identity and Inflectional Morphology in Modern Language Models

1 1 1

Nishant Subramani @ ACL @nsubramani23.bsky.social · Jun 4

🚨 Check out our new #interpretability paper: 🕵🏽 Model Internal Sleuthing led by the amazing @bearseascape.bsky.social who is an undergrad at @scsatcmu.bsky.social @ltiatcmu.bsky.social

bearseascape.bsky.social @bearseascape.bsky.social · Jun 4

🚨New #interpretability paper with @nsubramani23.bsky.social: 🕵️ Model Internal Sleuthing: Finding Lexical Identity and Inflectional Morphology in Modern Language Models

1 3

Nishant Subramani @ ACL @nsubramani23.bsky.social · Jun 2

Excited to announce that I started at @googleresearch.bsky.social on the cloud team as a student researcher last month working with Hamid Palangi on actionable #interpretability 🔍 to build better tool using #agents ⚒️🤖

4

Nishant Subramani @ ACL @nsubramani23.bsky.social · May 1

Presenting this today at the poster session at #NAACL2025!

Come chat about interpretability, trustworthiness, and tool-using agents!

🗓️ - Thursday May 1st (today)
📍 - Hall 3
🕑 - 200-330pm

Nishant Subramani @ ACL @nsubramani23.bsky.social · Apr 29

🚀 Excited to share a new interp+agents paper: 🐭🐱 MICE for CATs: Model-Internal Confidence Estimation for Calibrating Agents with Tools appearing at #NAACL2025

This was work done @msftresearch.bsky.social last summer with Jason Eisner, Justin Svegliato, Ben Van Durme, Yu Su, and Sam Thomson

1/🧵

2

Nishant Subramani @ ACL @nsubramani23.bsky.social · Apr 30

At #NAACL2025 🌵till Sunday! Love to chat about interpretability, understanding model internals, and finding vegan food 🥬

Nishant Subramani @ ACL @nsubramani23.bsky.social · Apr 29

🚀 Excited to share a new interp+agents paper: 🐭🐱 MICE for CATs: Model-Internal Confidence Estimation for Calibrating Agents with Tools appearing at #NAACL2025

This was work done @msftresearch.bsky.social last summer with Jason Eisner, Justin Svegliato, Ben Van Durme, Yu Su, and Sam Thomson

1/🧵

3

Nishant Subramani @ ACL @nsubramani23.bsky.social · Apr 29

Come to our poster in Albuquerque on Thursday 2-330pm in the interpretability & analysis section!

Paper: aclanthology.org/2025.naacl-l...
Code (coming soon): github.com/microsoft/mi...

🧵/🧵

1

Nishant Subramani @ ACL @nsubramani23.bsky.social · Apr 29

MICE 🐭:
🎯 - significantly beats baselines on expected tool-calling utility, especially in high risk scenarios
✅ - matches expected calibration error of baselines
✅ - is sample efficient
✅ - generalizes zeroshot to unseen tools

5/🧵

1

Nishant Subramani @ ACL @nsubramani23.bsky.social · Apr 29

Calibration is not sufficient: both an oracle and a model that just predicts the base rate are perfectly calibrated🤦🏽‍♂️

We develop a new metric expected tool-calling utility 🛠️to measure the utility of deciding whether or not to execute a tool call via a confidence score!

4/🧵

1

Nishant Subramani @ ACL @nsubramani23.bsky.social · Apr 29

We propose 🐭 MICE to better assess confidence when calling tools:

1️⃣ decode from each intermediate layer of an LM
2️⃣ compute similarity scores between each layer’s generation and the final output.
3️⃣ train a probabilistic classifier on these features

3/🧵

1

Nishant Subramani @ ACL @nsubramani23.bsky.social · Apr 29

1️⃣ Tool-using agents need to be useful and safe as they take actions in the world
2️⃣ Language models are poorly calibrated

🤔 Can we use model internals to better calibrate language models to make tool-using agents safer and more useful?

2/🧵

1

Nishant Subramani @ ACL @nsubramani23.bsky.social · Apr 29

🚀 Excited to share a new interp+agents paper: 🐭🐱 MICE for CATs: Model-Internal Confidence Estimation for Calibrating Agents with Tools appearing at #NAACL2025

This was work done @msftresearch.bsky.social last summer with Jason Eisner, Justin Svegliato, Ben Van Durme, Yu Su, and Sam Thomson

1/🧵

1 8 12

Nishant Subramani @ ACL @nsubramani23.bsky.social · Apr 24

Congrats!!

1

Nishant Subramani @ ACL @nsubramani23.bsky.social · Mar 27

Congrats! 🥳

1

Reposted by Nishant Subramani @ ACL

Rumman Chowdhury @ruchowdh.bsky.social · Jan 25

Have these people met … society? Read a book? Listened to music? Regurgitating esoteric facts isn’t intelligence.

This is more like humanity’s last stand at jeopardy

www.nytimes.com/2025/01/23/t...

A Test So Hard No AI System Can Pass It — Yet

The creators of a new test called “Humanity’s Last Exam” argue we may soon lose the ability to create tests hard enough for A.I. models.

www.nytimes.com

3 13 50

Nishant Subramani @ ACL @nsubramani23.bsky.social · Dec 14

👍🏽 looks good to me!

1

Nishant Subramani @ ACL @nsubramani23.bsky.social · Dec 10

👏🏽 Intro

💼 PhD student @ltiatcmu.bsky.social

📜 My research is in model interpretability 🔎, understanding the internals of LLMs to build more controllable and trustworthy systems

🫵🏽 If you are interested in better understanding of language technology or model interpretability, let's connect!

1 7

Nishant Subramani @ ACL @nsubramani23.bsky.social · Nov 21

🙋🏽

2

Nishant Subramani @ ACL @nsubramani23.bsky.social · Nov 19

🙋🏽

1

Nishant Subramani @ ACL @nsubramani23.bsky.social · Nov 18

1) I'm working on using intermediate model generations with LLMs to better calibrate tool using agents ⚒️🤖 than the probabilities themselves! Turns out you can 🥳

2) There's gotta be a nice geometric understanding of what's going on within LLMs when we tune them 🤔

lastpositivist.bsky.social @lastpositivist.bsky.social · Nov 17

Bluesky academics, lets get to know each other! Quote this & tell me: 1) a project you are working on & 2) an odd idea/theory you aren’t working on but keep thinking about

1. I came to hate my work and thinking so don't do it anymore.
2.

Embrace the Void @etvpod.bsky.social · Nov 17

Bluesky academics, lets get to know each other! Quote this & tell me: 1) a project you are working on & 2) an odd idea/theory you aren’t working on but keep thinking about

1. Convincing everyone that everything is luck, all the way down.

2. LLM’s can reason and understand in the external sense.

3

Nishant Subramani @ ACL @nsubramani23.bsky.social · Nov 17

Love to be added too!

1