lampinen.github.io
w/ @fepdelia.bsky.social, @hopekean.bsky.social, @lampinen.bsky.social, and @evfedorenko.bsky.social
Link: www.pnas.org/doi/10.1073/... (1/6)
deepmind.google/sima
deepmind.google/sima
Language models (LMs) are remarkably good at generating novel well-formed sentences, leading to claims that they have mastered grammar.
Yet they often assign higher probability to ungrammatical strings than to grammatical strings.
How can both things be true? 🧵👇
Language models (LMs) are remarkably good at generating novel well-formed sentences, leading to claims that they have mastered grammar.
Yet they often assign higher probability to ungrammatical strings than to grammatical strings.
How can both things be true? 🧵👇
Kudos to @sucholutsky.bsky.social @lukasmut.bsky.social for leading this!
Kudos to @sucholutsky.bsky.social @lukasmut.bsky.social for leading this!
🧠🤖
We propose a theory of how learning curriculum affects generalization through neural population dimensionality. Learning curriculum is a determining factor of neural dimensionality - where you start from determines where you end up.
🧠📈
A 🧵:
tinyurl.com/yr8tawj3
🧠🤖
We propose a theory of how learning curriculum affects generalization through neural population dimensionality. Learning curriculum is a determining factor of neural dimensionality - where you start from determines where you end up.
🧠📈
A 🧵:
tinyurl.com/yr8tawj3
Genie 3 is a new frontier for world models: its environments remain largely consistent for several minutes, with visual memory extending as far back as 1min. These limitations will only decrease with time.
Welcome to the future.🙌
deepmind.google/discover/blo...
Genie 3 is a new frontier for world models: its environments remain largely consistent for several minutes, with visual memory extending as far back as 1min. These limitations will only decrease with time.
Welcome to the future.🙌
deepmind.google/discover/blo...
Work with @zejinlu.bsky.social @sushrutthorat.bsky.social and Radek Cichy
arxiv.org/abs/2507.03168
Work with @zejinlu.bsky.social @sushrutthorat.bsky.social and Radek Cichy
arxiv.org/abs/2507.03168
Our work explains this & *predicts Transformer behavior throughout training* without its weights! 🧵
1/
authors.elsevier.com/a/1lIFK4sIRv...
🧵1/4
authors.elsevier.com/a/1lIFK4sIRv...
🧵1/4