Lightnews — Scholar-powered news

Jennifer Hu @ COLM (recruiting PhDs and postdocs!) @jennhu.bsky.social · 1d

At #COLM2025 and would love to chat all things cogsci, LMs, & interpretability 🍁🥯 I'm also recruiting!

👉 I'm presenting at two workshops (PragLM, Visions) on Fri

👉 Also check out "Language Models Fail to Introspect About Their Knowledge of Language" (presented by @siyuansong.bsky.social Tue 11-1)

4 21

Jennifer Hu @ COLM (recruiting PhDs and postdocs!) @jennhu.bsky.social · Aug 26

Can AI models introspect? What does introspection even mean for AI?

We revisit a recent proposal by Comșa & Shanahan, and provide new experiments + an alternate definition of introspection.

Check out this new work w/ @siyuansong.bsky.social, @harveylederman.bsky.social, & @kmahowald.bsky.social 👇

Siyuan Song✈️COLM @siyuansong.bsky.social · Aug 26

How reliable is what an AI says about itself? The answer depends on whether models can introspect. But, if an LLM says its temperature parameter is high (and it is!)….does that mean it’s introspecting? Surprisingly tricky to pin down. Our paper: arxiv.org/abs/2508.14802 (1/n)

1 5 21

Jennifer Hu @ COLM (recruiting PhDs and postdocs!) @jennhu.bsky.social · Aug 22

Due to popular demand, we are extending the CogInterp submission deadline again! 🗓️🥳

Submit by *8/27* (midnight AoE)

Jennifer Hu @ COLM (recruiting PhDs and postdocs!) @jennhu.bsky.social · Jul 16

Excited to announce the first workshop on CogInterp: Interpreting Cognition in Deep Learning Models @ NeurIPS 2025! 📣

How can we interpret the algorithms and representations underlying complex behavior in deep learning models?

🌐 coginterp.github.io/neurips2025/

1/4

Home

First Workshop on Interpreting Cognition in Deep Learning Models (NeurIPS 2025)

coginterp.github.io

2 10

Jennifer Hu @ COLM (recruiting PhDs and postdocs!) @jennhu.bsky.social · Aug 14

🗓️ The submission deadline for CogInterp @ NeurIPS has officially been *extended* to 8/22 (AoE)! 👇

Looking forward to seeing your submissions!

Jennifer Hu @ COLM (recruiting PhDs and postdocs!) @jennhu.bsky.social · Jul 16

Excited to announce the first workshop on CogInterp: Interpreting Cognition in Deep Learning Models @ NeurIPS 2025! 📣

How can we interpret the algorithms and representations underlying complex behavior in deep learning models?

🌐 coginterp.github.io/neurips2025/

1/4

Home

First Workshop on Interpreting Cognition in Deep Learning Models (NeurIPS 2025)

coginterp.github.io

4

Jennifer Hu @ COLM (recruiting PhDs and postdocs!) @jennhu.bsky.social · Jul 28

Heading to CogSci this week! ✈️

Find me giving talks on:
💬 Prod-comp asymmetry in children and LMs (Thu 7/31)
💬 How people make sense of nonsense (Sat 8/2)

📣 Also, I’m recruiting grad students + postdocs for my new lab at Hopkins! 📣

If you’re interested in language / cognition / AI, let’s chat! 😄

1 3 21

Jennifer Hu @ COLM (recruiting PhDs and postdocs!) @jennhu.bsky.social · Jul 16

Join us at NeurIPS in San Diego this December for talks by experts in the field, including James McClelland, @cgpotts.bsky.social, @scychan.bsky.social, @ari-holtzman.bsky.social, @mtoneva.bsky.social, & @sydneylevine.bsky.social!

🗓️ Submit your 4-page paper (non-archival) by August 15!

4/4

11

Jennifer Hu @ COLM (recruiting PhDs and postdocs!) @jennhu.bsky.social · Jul 16

We're bringing together researchers in fields such as machine learning, psychology, linguistics, and neuroscience to discuss new empirical findings + theories which help us interpret high-level cognitive abilities in deep learning models.

3/4

1 4

Jennifer Hu @ COLM (recruiting PhDs and postdocs!) @jennhu.bsky.social · Jul 16

Deep learning models (e.g. LLMs) show impressive abilities. But what generalizations have these models acquired? What algorithms underlie model behaviors? And how do these abilities develop?

Cognitive science offers a rich body of theories and frameworks which can help answer these questions.

2/4

2 4

Jennifer Hu @ COLM (recruiting PhDs and postdocs!) @jennhu.bsky.social · Jul 16

Excited to announce the first workshop on CogInterp: Interpreting Cognition in Deep Learning Models @ NeurIPS 2025! 📣

How can we interpret the algorithms and representations underlying complex behavior in deep learning models?

🌐 coginterp.github.io/neurips2025/

1/4

Home

First Workshop on Interpreting Cognition in Deep Learning Models (NeurIPS 2025)

coginterp.github.io

1 19 58

Reposted by Jennifer Hu @ COLM (recruiting PhDs and postdocs!)

Robert Hawkins @rdhawkins.bsky.social · May 28

Happy to announce the first workshop on Pragmatic Reasoning in Language Models — PragLM @ COLM 2025! 🎉
How do LLMs engage in pragmatic reasoning, and what core pragmatic capacities remain beyond their reach?
🌐 sites.google.com/berkeley.edu/praglm/
📅 Submit by June 23rd

PragLM @ COLM '25

IMPORTANT DATES

sites.google.com

1 18 41

Jennifer Hu @ COLM (recruiting PhDs and postdocs!) @jennhu.bsky.social · May 20

Preprint link: arxiv.org/abs/2504.14107

A huge thank you to my amazing collaborators Michael Lepori (@michael-lepori.bsky.social) & Michael Franke (@meanwhileina.bsky.social)!

(12/12)

Signatures of human-like processing in Transformer forward passes

Modern AI models are increasingly being used as theoretical tools to study human cognition. One dominant approach is to evaluate whether human-derived measures are predicted by a model's output: that ...

arxiv.org

6

Jennifer Hu @ COLM (recruiting PhDs and postdocs!) @jennhu.bsky.social · May 20

Our work also suggests a new way of using AI models to study cognition: not just as a black box mapping stimuli to outputs, but potentially also as processing models.

Excited about future work using mechanistic interpretability to make new, testable predictions about human cognition!

(11/12)

1 2

Jennifer Hu @ COLM (recruiting PhDs and postdocs!) @jennhu.bsky.social · May 20

From an AI perspective, our approach could be leveraged to better understand how certain inputs are easier or harder for models to process.

This could help AI researchers design evaluations with better construct validity, or more efficient early-exiting methods to save test-time compute.

(10/12)

1 3

Jennifer Hu @ COLM (recruiting PhDs and postdocs!) @jennhu.bsky.social · May 20

Our results suggest that model processing & human processing may be facilitated by similar properties of an input stimulus, and this similarity has emerged through general-purpose objectives like next-token prediction or image recognition.

Why does this matter for AI/cogsci? 👇

(9/12)

1 3

Jennifer Hu @ COLM (recruiting PhDs and postdocs!) @jennhu.bsky.social · May 20

Moreover, across our experiments, larger models do not always show more human-like processing patterns.

Interestingly, this seems to generalize prior findings that human reading times are not always predicted better by larger LMs.

(8/12)

1 7

Jennifer Hu @ COLM (recruiting PhDs and postdocs!) @jennhu.bsky.social · May 20

We then test whether measures of forward-pass dynamics (including competitor interference, & others) predict signatures of processing in humans.

We find that dynamic measures improve prediction of human measures above static (final-layer) measures -- across models, domains, & modalities.

(7/12)

Screenshot of Figure 3, which has two panels, labeled (a) and (b). The caption says the following. Figure 3: Experiment 2 results for text domains. (a) Top: R2 achieved by model processing measures (x-axis) across groups of human DVs (hue). Bottom: Log Bayes Factor comparing critical to baseline regression models. Horizontal line = log(3). (b) Mean R2 across bins of model sizes.

1 4

Jennifer Hu @ COLM (recruiting PhDs and postdocs!) @jennhu.bsky.social · May 20

First, we use simple mech interp tools to measure competitor interference, such as evidence for “two-stage processing” and the “time to decision”.
We find that models indeed appear to initially favor a competing incorrect answer in the cases where we expect decision conflict in humans.

(6/12)

Screenshot of Figure 2, which has three panels, labeled (a), (b), and (c). The caption says the following. Figure 2: Experiment 1 results. (a) LMs generally show stronger signs of two-stage processing for the items with competing intuitive answers. Asterisks denote sig. t-tests comparing means across conditions within each domain. (b) ∆LogProb across layers for sample LMs in the capitals recall domain, illustrating different processing strategies. (c) Two-stage processing interacts with size.

1 2

Jennifer Hu @ COLM (recruiting PhDs and postdocs!) @jennhu.bsky.social · May 20

1. Do models show signs of competitor interference effects?
2. Do measures characterizing (a) competitor interference effects or (b) other aspects of processing difficulty in models predict human processing load?
3. How does model size affect the similarity between model vs human processing?

(5/12)

1 2

Jennifer Hu @ COLM (recruiting PhDs and postdocs!) @jennhu.bsky.social · May 20

One phenomenon in human processing is competitor interference: conflict btwn a salient (but incorrect) answer and a correct answer. We use this as a hypothesis-driven case study to look at the relationship btwn human/machine processing, before broadening our investigation.

We explore 3 RQs👇 (4/12)

1 4

Jennifer Hu @ COLM (recruiting PhDs and postdocs!) @jennhu.bsky.social · May 20

Meanwhile, mechanistic interpretability is starting to reveal the “pipelines” that models use to perform high-level cognitive tasks.

So we ask: Do models & humans use similar processing pipelines?

(3/12)

1 3

Jennifer Hu @ COLM (recruiting PhDs and postdocs!) @jennhu.bsky.social · May 20

Many studies treat AI models as black boxes mapping inputs to outputs, and compare output quantities (eg logprobs) to human behavior (eg offline judgments, reading times).
This is often theoretically motivated: eg, using LMs to test the role of prediction in reading, or probability & grammar. (2/12)

1 6

Jennifer Hu @ COLM (recruiting PhDs and postdocs!) @jennhu.bsky.social · May 20

Excited to share a new preprint w/ @michael-lepori.bsky.social & Michael Franke!

A dominant approach in AI/cogsci uses *outputs* from AI models (eg logprobs) to predict human behavior.

But how does model *processing* (across layers in a forward pass) relate to human real-time processing? 👇 (1/12)

Screenshot of Figure 1, which has two panels labeled (a) and (b). The caption states the following. Figure 1: Overview of our study. (a) Experiment 1: We explore whether forward passes show mechanistic signatures of competitor interference, first preferring a salient competing intuitive answer before preferring the correct answer. (b) Experiment 2: We systematically investigate the ability of dynamic measures derived from forward passes to predict indicators of processing load in humans.

2 15 52

Jennifer Hu @ COLM (recruiting PhDs and postdocs!) @jennhu.bsky.social · Mar 13

Ahh thank you Lucy!! 😊❤️

2

Jennifer Hu @ COLM (recruiting PhDs and postdocs!) @jennhu.bsky.social · Mar 12

Check out our new work on introspection in LLMs! 🔍

TL;DR we find no evidence that LLMs have privileged access to their own knowledge.

Beyond the study of LLM introspection, our findings inform an ongoing debate in linguistics research: prompting (eg grammaticality judgments) =/= prob measurement!

Siyuan Song✈️COLM @siyuansong.bsky.social · Mar 12

New preprint w/ @jennhu.bsky.social @kmahowald.bsky.social : Can LLMs introspect about their knowledge of language?
Across models and domains, we did not find evidence that LLMs have privileged access to their own predictions. 🧵(1/8)

7 49

Reposted by Jennifer Hu @ COLM (recruiting PhDs and postdocs!)

Tomer Ullman @tomerullman.bsky.social · Mar 6

new preprint on Theory of Mind in LLMs, a topic I know a lot of people care about (I care. I'm part of people):

"Re-evaluating Theory of Mind evaluation in large language models"

(by Hu* @jennhu.bsky.social , Sosa, and me)

link: arxiv.org/pdf/2502.21098

5 28 93