Lightnews — Scholar-powered news

micha heilbron

@mheilbron.bsky.social

780 followers 320 following 64 posts

Assistant Professor of Cognitive AI @UvA Amsterdam language and vision in brains & machines cognitive science 🤝 AI 🤝 cognitive neuroscience michaheilbron.github.io

Posts Media Videos Starter Packs

micha heilbron @mheilbron.bsky.social · 19d

omg. what journal? name and shame

micha heilbron @mheilbron.bsky.social · 19d

huh! if these effects are similar and consistent, I think it should work, but the q. is how do you get a vector representation for novel pseudowords? we currently use lexicosemantic word vectors and they are undefined for novel words.

so how to represent the novel words? v. interesting test case

micha heilbron @mheilbron.bsky.social · 20d

@nicolecrust.bsky.social might be of interest

micha heilbron @mheilbron.bsky.social · 20d

New paper on memorability, with @davogelsang.bsky.social !

David Amadeus Vogelsang @davogelsang.bsky.social · 20d

New preprint out together with @mheilbron.bsky.social

We find that a stimulus' representational magnitude—the L2 norm of its DNN representation—predicts intrinsic memorability not just for images, but for words too.
www.biorxiv.org/content/10.1...

Representational magnitude as a geometric signature of image and word memorability

What makes some stimuli more memorable than others? While memory varies across individuals, research shows that some items are intrinsically more memorable, a property quantifiable as “memorability”. ...

www.biorxiv.org

Reposted by micha heilbron

David Amadeus Vogelsang @davogelsang.bsky.social · 20d

New preprint out together with @mheilbron.bsky.social

We find that a stimulus' representational magnitude—the L2 norm of its DNN representation—predicts intrinsic memorability not just for images, but for words too.
www.biorxiv.org/content/10.1...

Representational magnitude as a geometric signature of image and word memorability

What makes some stimuli more memorable than others? While memory varies across individuals, research shows that some items are intrinsically more memorable, a property quantifiable as “memorability”. ...

www.biorxiv.org

micha heilbron @mheilbron.bsky.social · Aug 18

Together, our results support a classic idea: cognitive limitations can be a powerful inductive bias for learning

Yet they also reveal a curious distinction: a model with more human-like *constraints* is not necessarily more human-like in its predictions

micha heilbron @mheilbron.bsky.social · Aug 18

This paradox – better language models yielding worse behavioural predictions – could not be explained by prior explanations: The mechanism appears distinct from those linked to superhuman training scale or memorisation

micha heilbron @mheilbron.bsky.social · Aug 18

However, we then used these models to predict human behaviour

Strikingly these same models that were demonstrably better at the language task, were worse at predicting human reading behaviour

micha heilbron @mheilbron.bsky.social · Aug 18

The benefit was robust

Fleeting memory models achieved better next-token prediction (lower loss) and better syntactic knowledge (higher accuracy) on the BLiMP benchmark

This was consistent across seeds and for both 10M and 100M training sets

micha heilbron @mheilbron.bsky.social · Aug 18

But we noticed this naive decay was too strong

Human memory has a brief 'echoic' buffer that perfectly preserves the immediate past. When we added this – a short window of perfect retention before the decay -- the pattern flipped

Now, fleeting memory *helped* (lower loss)

micha heilbron @mheilbron.bsky.social · Aug 18

Our first attempt, a "naive" memory decay starting from the most recent word, actually *impaired* language learning. Models with this decay had higher validation loss, and this worsened (even higher loss) as the decay became stronger

micha heilbron @mheilbron.bsky.social · Aug 18

To test this in a modern context, we propose the ‘fleeting memory transformer’

We applied a power-law memory decay to the self-attention scores, simulating how access to past words fades over time, and ran controlled experiments on the developmentally realistic BabyLM corpus

micha heilbron @mheilbron.bsky.social · Aug 18

However, this appears difficult to reconcile with the success of transformers, which can learn language very effectively, despite lacking working memory limitations or other recency biases

Would the blessing of fleeting memory still hold in transformer language models?

micha heilbron @mheilbron.bsky.social · Aug 18

A core idea in cognitive science is that the fleetingness of working memory isn't a flaw

It may actually help at learning language by forcing a focus on the recent past and providing an incentive to discover abstract structure rather than surface details

micha heilbron @mheilbron.bsky.social · Aug 18

New preprint! w/@drhanjones.bsky.social

Adding human-like memory limitations to transformers improves language learning, but impairs reading time prediction

This supports ideas from cognitive science but complicates the link between architecture and behavioural prediction
arxiv.org/abs/2508.05803

Human-like fleeting memory improves language learning but impairs reading time prediction in transformer language models

Human memory is fleeting. As words are processed, the exact wordforms that make up incoming sentences are rapidly lost. Cognitive scientists have long believed that this limitation of memory may, para...

micha heilbron @mheilbron.bsky.social · Aug 12

On Wednesday, Maithe van Noort will present a poster on “Compositional Meaning in Vision-Language Models and the Brain”

First results from a much larger project on visual and linguistic meaning in brains and machines, with many collaborators -- more to come!  
t.ly/TWsyT

Poster Presentation

micha heilbron @mheilbron.bsky.social · Aug 12

On Friday, during a contributed talk (and a poster), @wiegerscheurer will present the project he spearheaded: “A hierarchy of spatial predictions across human visual cortex during natural vision”   (Full preprint soon)

t.ly/fTJqy

Poster Presentation

micha heilbron @mheilbron.bsky.social · Aug 12

CCN has arrived here here in Amsterdam!

Come find me to meet or catch up

Some highlights from students and collaborators:

micha heilbron @mheilbron.bsky.social · Jul 15

Waarom vergeet je namen maar weet je nog precies wat iemand doet? En zijn herinneringen ooit echt helemaal weg?

Ik ging bij Oplossing Gezocht in gesprek over hoe ons brein informatie opslaat en waarom vergeten eigenlijk heel slim is:
www.nemokennislink.nl/publicaties/...

Waarom kom ik toch niet op die naam?

Eh, je weet wel wie... dinges! Heb jij ook soms zo’n moeite om op een naam te komen? Hersenonderzoeker Micha Heilbron legt uit hoe dat komt - en waarom een naam eigenlijk niet zo belangrijk is.

www.nemokennislink.nl

Reposted by micha heilbron

Tim Kietzmann @timkietzmann.bsky.social · Jul 8

Exciting new preprint from the lab: “Adopting a human developmental visual diet yields robust, shape-based AI vision”. A most wonderful case where brain inspiration massively improved AI solutions.

Work with @zejinlu.bsky.social @sushrutthorat.bsky.social and Radek Cichy

arxiv.org/abs/2507.03168

Reposted by micha heilbron

micha heilbron @mheilbron.bsky.social · May 23

New preprint, w/ @predictivebrain.bsky.social !

we've found that visual cortex, even when just viewing natural scenes, predicts *higher-level* visual features

The aligns with developments in ML, but challenges some assumptions about early sensory cortex

www.biorxiv.org/content/10.1...

Higher-level spatial prediction in natural vision across mouse visual cortex

Theories of predictive processing propose that sensory systems constantly predict incoming signals, based on spatial and temporal context. However, evidence for prediction in sensory cortex largely co...

www.biorxiv.org

micha heilbron @mheilbron.bsky.social · May 23

i’m all in the “this is a neat way to help explain things” camp fwiw :)

micha heilbron @mheilbron.bsky.social · May 23

Our findings, together with some other recent studies, suggest the brain may use a similar strategy — constantly predicting higher-level features — to efficiently learn robust visual representations of (and from!) the natural world

micha heilbron @mheilbron.bsky.social · May 23

This preference for higher-level information departs from traditional predictive coding -- but aligns with recent, successful algorithms in AI for predictive self-supervised learning, which encourage predicting higher rather than lower-level visual features (e.g. MAE, CPC, JEPA)

micha heilbron @mheilbron.bsky.social · May 23

So, what does this all mean?

The visual system seems to be constantly engaged in a sophisticated guessing game, predicting sensory input based on context

But interestingly, it seems to predict more abstract, higher-level properties, even in the earliest stages of cortex