Sheridan Feucht
banner
sfeucht.bsky.social
Sheridan Feucht
@sfeucht.bsky.social
PhD student doing LLM interpretability with @davidbau.bsky.social and @byron.bsky.social. (they/them) https://sfeucht.github.io
Pinned
[📄] Are LLMs mindless token-shifters, or do they build meaningful representations of language? We study how LLMs copy text in-context, and physically separate out two types of induction heads: token heads, which copy literal tokens, and concept heads, which copy word meanings.
Reposted by Sheridan Feucht
Looking forward to attending #cogsci2025 (Jul 29 - Aug 3)! I’m especially excited to meet students who will be applying to PhD programs in Computational Ling/CogSci in the coming cycle.

Please reach out if you want to meet up and chat! Email is the best way, but DM also works if you must!

quick🧵:
July 28, 2025 at 9:20 PM
We've added a quick new section to this paper, which was just accepted to @COLM_conf! By summing weights of concept induction heads, we created a "concept lens" that lets you read out semantic information in a model's hidden states. 🔎
[📄] Are LLMs mindless token-shifters, or do they build meaningful representations of language? We study how LLMs copy text in-context, and physically separate out two types of induction heads: token heads, which copy literal tokens, and concept heads, which copy word meanings.
July 22, 2025 at 12:40 PM
Reposted by Sheridan Feucht
🚨 Registration is live! 🚨

The New England Mechanistic Interpretability (NEMI) Workshop is happening Aug 22nd 2025 at Northeastern University!

A chance for the mech interp community to nerd out on how models really work 🧠🤖

🌐 Info: nemiconf.github.io/summer25/
📝 Register: forms.gle/v4kJCweE3UUH...
June 30, 2025 at 10:55 PM
Nikhil's recent paper is a tour de force in causal analysis! They show that LLMs keep track of what characters know in a story using "pointer" mechanisms. Definitely worth checking out.
How do language models track mental states of each character in a story, often referred to as Theory of Mind?

We reverse-engineered how LLaMA-3-70B-Instruct handles a belief-tracking task and found something surprising: it uses mechanisms strikingly similar to pointer variables in C programming!
June 24, 2025 at 5:48 PM
I used to think formal reasoning was central to language and intelligence, but now I’m not so sure. Wrote a short post about my thoughts on this, with a couple chewy anecdotes. Would love to get some feedback or pointers to further reading.
sfeucht.github.io/syllogisms/
Sheridan Feucht
Solving Syllogisms is Not Intelligence April 23, 2025 (I think that we overvalue logical reasoning when it comes to measuring "intelligence.") What do we mean by intelligence in the context of cogniti...
sfeucht.github.io
April 25, 2025 at 3:39 PM
I'll present a poster for this work at NENLP tomorrow! Come find me at poster #80...
[📄] Are LLMs mindless token-shifters, or do they build meaningful representations of language? We study how LLMs copy text in-context, and physically separate out two types of induction heads: token heads, which copy literal tokens, and concept heads, which copy word meanings.
April 10, 2025 at 9:19 PM
[📄] Are LLMs mindless token-shifters, or do they build meaningful representations of language? We study how LLMs copy text in-context, and physically separate out two types of induction heads: token heads, which copy literal tokens, and concept heads, which copy word meanings.
April 7, 2025 at 1:54 PM
Reposted by Sheridan Feucht
I'm searching for some comp/ling experts to provide a precise definition of “slop” as it refers to text (see: corp.oup.com/word-of-the-...)

I put together a google form that should take no longer than 10 minutes to complete: forms.gle/oWxsCScW3dJU...
If you can help, I'd appreciate your input! 🙏
Oxford Word of the Year 2024 - Oxford University Press
The Oxford Word of the Year 2024 is 'brain rot'. Discover more about the winner, our shortlist, and 20 years of words that reflect the world.
corp.oup.com
March 10, 2025 at 8:00 PM
I like this work a lot. Racism+misogyny in medicine is genuinely dangerous, so it's really important to keep tabs on model biases if we're going to use LLMs in clinical settings. It's nice to see that interpretability techniques are useful here.
LLMs are known to perpetuate social biases in clinical tasks. Can we locate and intervene upon LLM activations that encode patient demographics like gender and race? 🧵

Work w/ @arnabsensharma.bsky.social, @silvioamir.bsky.social, @davidbau.bsky.social, @byron.bsky.social

arxiv.org/abs/2502.13319
February 22, 2025 at 10:33 PM
Reposted by Sheridan Feucht
Do you have a great experiment that you want to run on Llama 405b but not enough GPUs?

🚨 #NDIF is opening up more spots in our 405b pilot program! Apply now for a chance to conduct your own groundbreaking experiments on the 405b model. Details: 🧵⬇️
December 9, 2024 at 8:04 PM
Reposted by Sheridan Feucht
yes, this is what mechanistic interpretability research looks like
November 24, 2024 at 7:51 PM