Jennifer Hu @ COLM (recruiting PhDs and postdocs!)
@jennhu.bsky.social
2.5K followers 170 following 45 posts
Asst Prof at Johns Hopkins Cognitive Science • Director of the Group for Language and Intelligence (GLINT) ✨• Interested in all things language, cognition, and AI jennhu.github.io
Posts Media Videos Starter Packs
jennhu.bsky.social
At #COLM2025 and would love to chat all things cogsci, LMs, & interpretability 🍁🥯 I'm also recruiting!

👉 I'm presenting at two workshops (PragLM, Visions) on Fri

👉 Also check out "Language Models Fail to Introspect About Their Knowledge of Language" (presented by @siyuansong.bsky.social Tue 11-1)
jennhu.bsky.social
Can AI models introspect? What does introspection even mean for AI?

We revisit a recent proposal by Comșa & Shanahan, and provide new experiments + an alternate definition of introspection.

Check out this new work w/ @siyuansong.bsky.social, @harveylederman.bsky.social, & @kmahowald.bsky.social 👇
siyuansong.bsky.social
How reliable is what an AI says about itself? The answer depends on whether models can introspect. But, if an LLM says its temperature parameter is high (and it is!)….does that mean it’s introspecting? Surprisingly tricky to pin down. Our paper: arxiv.org/abs/2508.14802 (1/n)
jennhu.bsky.social
Due to popular demand, we are extending the CogInterp submission deadline again! 🗓️🥳

Submit by *8/27* (midnight AoE)
jennhu.bsky.social
Excited to announce the first workshop on CogInterp: Interpreting Cognition in Deep Learning Models @ NeurIPS 2025! 📣

How can we interpret the algorithms and representations underlying complex behavior in deep learning models?

🌐 coginterp.github.io/neurips2025/

1/4
Home
First Workshop on Interpreting Cognition in Deep Learning Models (NeurIPS 2025)
coginterp.github.io
jennhu.bsky.social
🗓️ The submission deadline for CogInterp @ NeurIPS has officially been *extended* to 8/22 (AoE)! 👇

Looking forward to seeing your submissions!
jennhu.bsky.social
Excited to announce the first workshop on CogInterp: Interpreting Cognition in Deep Learning Models @ NeurIPS 2025! 📣

How can we interpret the algorithms and representations underlying complex behavior in deep learning models?

🌐 coginterp.github.io/neurips2025/

1/4
Home
First Workshop on Interpreting Cognition in Deep Learning Models (NeurIPS 2025)
coginterp.github.io
jennhu.bsky.social
Heading to CogSci this week! ✈️

Find me giving talks on:
💬 Prod-comp asymmetry in children and LMs (Thu 7/31)
💬 How people make sense of nonsense (Sat 8/2)

📣 Also, I’m recruiting grad students + postdocs for my new lab at Hopkins! 📣

If you’re interested in language / cognition / AI, let’s chat! 😄
jennhu.bsky.social
Join us at NeurIPS in San Diego this December for talks by experts in the field, including James McClelland, @cgpotts.bsky.social, @scychan.bsky.social, @ari-holtzman.bsky.social, @mtoneva.bsky.social, & @sydneylevine.bsky.social!

🗓️ Submit your 4-page paper (non-archival) by August 15!

4/4
jennhu.bsky.social
We're bringing together researchers in fields such as machine learning, psychology, linguistics, and neuroscience to discuss new empirical findings + theories which help us interpret high-level cognitive abilities in deep learning models.

3/4
jennhu.bsky.social
Deep learning models (e.g. LLMs) show impressive abilities. But what generalizations have these models acquired? What algorithms underlie model behaviors? And how do these abilities develop?

Cognitive science offers a rich body of theories and frameworks which can help answer these questions.

2/4
jennhu.bsky.social
Excited to announce the first workshop on CogInterp: Interpreting Cognition in Deep Learning Models @ NeurIPS 2025! 📣

How can we interpret the algorithms and representations underlying complex behavior in deep learning models?

🌐 coginterp.github.io/neurips2025/

1/4
Home
First Workshop on Interpreting Cognition in Deep Learning Models (NeurIPS 2025)
coginterp.github.io
Reposted by Jennifer Hu @ COLM (recruiting PhDs and postdocs!)
rdhawkins.bsky.social
Happy to announce the first workshop on Pragmatic Reasoning in Language Models — PragLM @ COLM 2025! 🎉
How do LLMs engage in pragmatic reasoning, and what core pragmatic capacities remain beyond their reach?
🌐 sites.google.com/berkeley.edu/praglm/
📅 Submit by June 23rd
PragLM @ COLM '25
IMPORTANT DATES
sites.google.com
jennhu.bsky.social
Our work also suggests a new way of using AI models to study cognition: not just as a black box mapping stimuli to outputs, but potentially also as processing models.

Excited about future work using mechanistic interpretability to make new, testable predictions about human cognition!

(11/12)
jennhu.bsky.social
From an AI perspective, our approach could be leveraged to better understand how certain inputs are easier or harder for models to process.

This could help AI researchers design evaluations with better construct validity, or more efficient early-exiting methods to save test-time compute.

(10/12)
jennhu.bsky.social
Our results suggest that model processing & human processing may be facilitated by similar properties of an input stimulus, and this similarity has emerged through general-purpose objectives like next-token prediction or image recognition.

Why does this matter for AI/cogsci? 👇

(9/12)
jennhu.bsky.social
Moreover, across our experiments, larger models do not always show more human-like processing patterns.

Interestingly, this seems to generalize prior findings that human reading times are not always predicted better by larger LMs.

(8/12)
jennhu.bsky.social
We then test whether measures of forward-pass dynamics (including competitor interference, & others) predict signatures of processing in humans.

We find that dynamic measures improve prediction of human measures above static (final-layer) measures -- across models, domains, & modalities.

(7/12)
Screenshot of Figure 3, which has two panels, labeled (a) and (b). The caption says the following. Figure 3: Experiment 2 results for text domains. (a) Top: R2 achieved by model processing measures (x-axis) across groups of human DVs (hue). Bottom: Log Bayes Factor comparing critical to baseline regression models. Horizontal line = log(3). (b) Mean R2 across bins of model sizes.
jennhu.bsky.social
First, we use simple mech interp tools to measure competitor interference, such as evidence for “two-stage processing” and the “time to decision”.
We find that models indeed appear to initially favor a competing incorrect answer in the cases where we expect decision conflict in humans.

(6/12)
Screenshot of Figure 2, which has three panels, labeled (a), (b), and (c). The caption says the following. Figure 2: Experiment 1 results. (a) LMs generally show stronger signs of two-stage processing for the items with competing intuitive answers. Asterisks denote sig. t-tests comparing means across conditions within each domain. (b) ∆LogProb across layers for sample LMs in the capitals recall domain, illustrating different processing strategies. (c) Two-stage processing interacts with size.
jennhu.bsky.social
1. Do models show signs of competitor interference effects?
2. Do measures characterizing (a) competitor interference effects or (b) other aspects of processing difficulty in models predict human processing load?
3. How does model size affect the similarity between model vs human processing?

(5/12)
jennhu.bsky.social
One phenomenon in human processing is competitor interference: conflict btwn a salient (but incorrect) answer and a correct answer. We use this as a hypothesis-driven case study to look at the relationship btwn human/machine processing, before broadening our investigation.

We explore 3 RQs👇 (4/12)
jennhu.bsky.social
Meanwhile, mechanistic interpretability is starting to reveal the “pipelines” that models use to perform high-level cognitive tasks.

So we ask: Do models & humans use similar processing pipelines?

(3/12)
jennhu.bsky.social
Many studies treat AI models as black boxes mapping inputs to outputs, and compare output quantities (eg logprobs) to human behavior (eg offline judgments, reading times).
This is often theoretically motivated: eg, using LMs to test the role of prediction in reading, or probability & grammar. (2/12)
jennhu.bsky.social
Excited to share a new preprint w/ @michael-lepori.bsky.social & Michael Franke!

A dominant approach in AI/cogsci uses *outputs* from AI models (eg logprobs) to predict human behavior.

But how does model *processing* (across layers in a forward pass) relate to human real-time processing? 👇 (1/12)
Screenshot of Figure 1, which has two panels labeled (a) and (b). The caption states the following. Figure 1: Overview of our study. (a) Experiment 1: We explore whether forward passes show mechanistic signatures of competitor interference, first preferring a salient competing intuitive answer before preferring the correct answer. (b) Experiment 2: We systematically investigate the ability of dynamic measures derived from forward passes to predict indicators of processing load in humans.
jennhu.bsky.social
Check out our new work on introspection in LLMs! 🔍

TL;DR we find no evidence that LLMs have privileged access to their own knowledge.

Beyond the study of LLM introspection, our findings inform an ongoing debate in linguistics research: prompting (eg grammaticality judgments) =/= prob measurement!
siyuansong.bsky.social
New preprint w/ @jennhu.bsky.social @kmahowald.bsky.social : Can LLMs introspect about their knowledge of language?
Across models and domains, we did not find evidence that LLMs have privileged access to their own predictions. 🧵(1/8)
Reposted by Jennifer Hu @ COLM (recruiting PhDs and postdocs!)
tomerullman.bsky.social
new preprint on Theory of Mind in LLMs, a topic I know a lot of people care about (I care. I'm part of people):

"Re-evaluating Theory of Mind evaluation in large language models"

(by Hu* @jennhu.bsky.social , Sosa, and me)

link: arxiv.org/pdf/2502.21098