Lightnews — Scholar-powered news

James Michaelov

@jamichaelov.bsky.social

I'll also be presenting this paper with @catherinearnett.bsky.social
at #CogInterp!

Catherine Arnett @catherinearnett.bsky.social · Nov 24

@jamichaelov.bsky.social and I will be presenting our paper at the CogInterp workshop 13:15 - 14:45 on Dec 7th. The paper shows how disaggregating grammatical benchmarks over the course of training reveals stages of training where models learn heuristics before learning more generalizable patterns.

November 25, 2025 at 2:32 PM

James Michaelov

@jamichaelov.bsky.social

Preprint: www.arxiv.org/abs/2510.24963

Language Model Behavioral Phases are Consistent Across Architecture, Training Data, and Scale

We show that across architecture (Transformer vs. Mamba vs. RWKV), training dataset (OpenWebText vs. The Pile), and scale (14 million parameters to 12 billion parameters), autoregressive language mode...

www.arxiv.org

November 25, 2025 at 2:27 PM

James Michaelov

@jamichaelov.bsky.social

See the full paper here: arxiv.org/abs/2506.06808
3/3

Not quite Sherlock Holmes: Language model predictions do not reliably differentiate impossible from improbable events

Can language models reliably predict that possible events are more likely than merely improbable ones? By teasing apart possibility, typicality, and contextual relatedness, we show that despite the re...

arxiv.org

June 12, 2025 at 5:54 PM

James Michaelov

@jamichaelov.bsky.social

In the most extreme case, LMs assign sentences such as ‘the car was given a parking ticket by the explorer’ (unlikely but possible event) a lower probability than ‘the car was given a parking ticket by the brake’ (animacy-violating event, semantically-related final word) over half of the time. 2/3

June 12, 2025 at 5:54 PM

James Michaelov

@jamichaelov.bsky.social

I’ve had success using the infini-gram API for this (though it can get overloaded with user requests at times): infini-gram.io

Home

infini-gram.io

February 8, 2025 at 12:40 PM

James Michaelov

@jamichaelov.bsky.social

I don’t think this is quite what you’re looking for, but @camrobjones.bsky.social recently ran some Turing-test-style studies and found that some people believed ELIZA to be a human (and participants were asked to give reasons for their responses)

December 3, 2024 at 1:09 PM

James Michaelov

@jamichaelov.bsky.social

Seems like a great initiative to have some of these location-based ones! I’d love to be added if possible!

November 19, 2024 at 4:17 PM

James Michaelov

@jamichaelov.bsky.social

If there’s still space (and you accept postdocs), could I be added?

November 11, 2024 at 6:54 PM

James Michaelov

@jamichaelov.bsky.social

Thanks for creating this list - looks great! I’d love to be added if there’s still room

November 11, 2024 at 6:46 PM

James Michaelov

@jamichaelov.bsky.social

Thank you!

November 11, 2024 at 12:15 PM

James Michaelov

@jamichaelov.bsky.social

If there’s still room, is there any chance you could add me to this list?

November 11, 2024 at 11:40 AM

James Michaelov

@jamichaelov.bsky.social

Also, I’m going to be attending EMNLP next week - reach out if you want to meet/chat

November 10, 2024 at 7:34 PM

James Michaelov

@jamichaelov.bsky.social

Anyway, excited to learn and chat about about research along these lines and beyond here on Bluesky!

November 10, 2024 at 7:34 PM

James Michaelov

@jamichaelov.bsky.social

Of course, none of this work would have been possible without my amazing PhD advisor Ben Bergen, and my other great collaborators: Seana Coulson, @catherinearnett.bsky.social, Tyler Chang, Cyma Van Petten, and Megan Bardolph!

November 10, 2024 at 7:34 PM

James Michaelov

@jamichaelov.bsky.social

5: Recurrent models like RWKV and Mamba have recently emerged as viable alternatives to transformers. While they are intuitively more cognitively plausible, when used to model human language processing, how do they compare transformers? We find that they perform about the same overall:

Revenge of the Fallen? Recurrent Models Match Transformers at...

Transformers have generally supplanted recurrent neural networks as the dominant architecture for both natural language processing tasks and for modelling the effect of predictability on online...

openreview.net

November 10, 2024 at 7:34 PM

James Michaelov

@jamichaelov.bsky.social

4: Is the N400 sensitive only to the predicted probability of the stimuli encountered, or also the predicted probability of alternatives? We revisit this question with state-of-the-art NLP methods, with the results supporting the former hypothesis:

Ignoring the alternatives: The N400 is sensitive to stimulus preactivation alone

The N400 component of the event-related brain potential is a neural signal of processing difficulty. In the language domain, it is widely believed to …

www.sciencedirect.com

November 10, 2024 at 7:34 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news