Greta Tuckute
@gretatuckute.bsky.social
870 followers 260 following 76 posts
Studying language in biological brains and artificial ones at the Kempner Institute at Harvard University. www.tuckute.com
Posts Media Videos Starter Packs
gretatuckute.bsky.social
Check out @mryskina.bsky.social's talk and poster at COLM on Tuesday—we present a method to identify 'semantically consistent' brain regions (responding to concepts across modalities) and show that more semantically consistent brain regions are better predicted by LLMs.
mryskina.bsky.social
Interested in language models, brains, and concepts? Check out our COLM 2025 🔦 Spotlight paper!

(And if you’re at COLM, come hear about it on Tuesday – sessions Spotlight 2 & Poster 2)!
Paper title: Language models align with brain regions that represent concepts across modalities.
Authors:  Maria Ryskina, Greta Tuckute, Alexander Fung, Ashley Malkin, Evelina Fedorenko. 
Affiliations: Maria is affiliated with the Vector Institute for AI, but the work was done at MIT. All other authors are affiliated with MIT. 
Email address: maria.ryskina@vectorinstitute.ai.
Reposted by Greta Tuckute
samnastase.bsky.social
I'm recruiting PhD students to join my new lab in Fall 2026! The Shared Minds Lab at @usc.edu will combine deep learning and ecological human neuroscience to better understand how we communicate our thoughts from one brain to another.
Reposted by Greta Tuckute
kmahowald.bsky.social
Do you want to use AI models to understand human language?

Are you fascinated by whether linguistic representations are lurking in LLMs?

Are you in need of a richer model of spatial words across languages?

Consider UT Austin for all your Computational Linguistics Ph.D. needs!

mahowak.github.io
Reposted by Greta Tuckute
cmuscience.bsky.social
Elizabeth Lee, a first-year Ph.D. student in Neural Computation, has been awarded CMU’s 2025 Sutherland-Merlino Fellowship. Her work bridges neuroscience and machine learning, and she’s passionate about advancing STEM access for underrepresented groups.
www.cmu.edu/mcs/news-eve...
Elizabeth Lee smiles at the camera.
Reposted by Greta Tuckute
neurotaha.bsky.social
🚨 Paper alert:
To appear in the DBM Neurips Workshop

LITcoder: A General-Purpose Library for Building and Comparing Encoding Models

📄 arxiv: arxiv.org/abs/2509.091...
🔗 project: litcoder-brain.github.io
Reposted by Greta Tuckute
bkhmsi.bsky.social
Now that the ICLR deadline is behind us, happy to share that From Language to Cognition has been accepted as an Oral at #EMNLP2025! 🎉

Looking forward to seeing many of you in Suzhou 🇨🇳
bkhmsi.bsky.social
🚨 New Preprint!!

LLMs trained on next-word prediction (NWP) show high alignment with brain recordings. But what drives this alignment—linguistic structure or world knowledge? And how does this alignment evolve during training? Our new paper explores these questions. 👇🧵
Reposted by Greta Tuckute
hsmall.bsky.social
Excited to share new work with @hleemasson.bsky.social , Ericka Wodka, Stewart Mostofsky and @lisik.bsky.social! We investigated how simultaneous vision and language signals are combined in the brain using naturalistic+controlled fMRI. Read the paper here: osf.io/b5p4n
1/n
Reposted by Greta Tuckute
isabelpapad.bsky.social
Are there conceptual directions in VLMs that transcend modality? Check out our COLM oral spotlight 🔦 paper! We use SAEs to analyze the multimodality of linear concepts in VLMs

with @chloesu07.bsky.social, @thomasfel.bsky.social, @shamkakade.bsky.social and Stephanie Gil
arxiv.org/abs/2504.11695
Reposted by Greta Tuckute
dyamins.bsky.social
Here is our best thinking about how to make world models. I would apologize for it being a massive 40-page behemoth, but it's worth reading. arxiv.org/pdf/2509.09737
arxiv.org
Reposted by Greta Tuckute
nsaphra.bsky.social
I thought I wouldn‘t be one of those academics super into outreach talks, but I just put together something about understanding LLMs for laypeople and I get to talk about results that I don’t really focus on in any of my technical talks! It’s actually really cool. I made this lil takeaway slide
Reposted by Greta Tuckute
mdhk.net
✨ Do self-supervised speech models learn to encode language-specific linguistic features from their training data, or only more language-general acoustic correlates?

At #Interspeech2025 we presented our new Wav2Vec2-NL model and SSL-NL evaluation dataset to test this!

📄 arxiv.org/abs/2506.00981

⬇️
Interspeech paper title: What do self-supervised speech models know about Dutch? Analyzing advantages of language-specific pre-training

Authors: Marianne de Heer Kloots, Hosein Mohebbi, Charlotte Pouw, Gaofei Shen, Willem Zuidema, Martijn Bentum
Reposted by Greta Tuckute
euripsconf.bsky.social
So, what is #EurIPS anyway? 🤔

EurIPS is a community-driven conference taking place in Copenhagen Denmark endorsed by @neuripsconf.bsky.social and @nordicair.bsky.social and co-developed with @ellis.eu, where you can additionally present your NeurIPS papers.
Reposted by Greta Tuckute
mdhk.net
Had such a great time presenting our tutorial on Interpretability Techniques for Speech Models at #Interspeech2025! 🔍

For anyone looking for an introduction to the topic, we've now uploaded all materials to the website: interpretingdl.github.io/speech-inter...
Reposted by Greta Tuckute
david-g-clark.bsky.social
Wanted to share a new version (much cleaner!) of a preprint on how connectivity structure shapes collective dynamics in nonlinear RNNs. Neural circuits have highly non-iid connectivity (e.g., rapidly decaying singular values, structured singular-vector overlaps), unlike classical random RNN models.
Connectivity structure and dynamics of nonlinear recurrent neural networks
Studies of the dynamics of nonlinear recurrent neural networks often assume independent and identically distributed couplings, but large-scale connectomics data indicate that biological neural circuit...
arxiv.org
Reposted by Greta Tuckute
eringrant.me
I’m recruiting committee members for the Technical Program Committee at #CCN2026.

Please apply if you want to help make submission, review & selection of contributed work (Extended Abstracts & Proceedings) more useful for everyone! 🌐

Helps to have: programming/communications/editorial experience.
gretatuckute.bsky.social
We hope that AuriStream will serve as a task-performant model system for studying how language structure is learned from speech.

The Interspeech paper sets the stage—more work building on this idea coming soon! And as always, please feel free to get in touch with comments etc.!
gretatuckute.bsky.social
3️⃣ Temporally fine-grained → 5ms tokens preserve acoustic detail (e.g. speaker identity).
4️⃣ Unified → AuriStream learns strong speech representations and generates plausible continuations—bridging representation learning and sequence modeling in the audio domain.
gretatuckute.bsky.social
4 key advantages of AuriStream:

1️⃣ Causal → allows the study of speech/language processing as it unfolds in real time.
2️⃣ Inspectable → predictions can naturally be decoded into the cochleagram/audio, enabling visualization and interpretation.
gretatuckute.bsky.social
Examples: audio before red line = ground-truth prompt; after = AuriStream’s prediction, visualized in the time-frequency cochleagram space.

AuriStream shows that causal prediction over short audio chunks (cochlear tokens) is enough to generate meaningful sentence continuations!
gretatuckute.bsky.social
Complementing AuriStream’s strong representational capabilities, AuriStream learns short- and long-range speech statistics—completing phonemes and common words at short scales, and generating diverse continuations at longer scales, as evident by the qualitative examples below.
gretatuckute.bsky.social
We demonstrate that:

🔹 AuriStream embeddings capture information about phoneme identity, word identity, and lexical semantics.
🔹 AuriStream embeddings serve as a strong backbone on downstream audio tasks (SUPERB benchmark, such as ASR and intent classification).
gretatuckute.bsky.social
We present a two-stage framework, loosely inspired by the human auditory hierarchy:

1️⃣ WavCoch: a small model that transforms raw audio into a cochlea-like time-frequency representation, from which we extract discrete “cochlear tokens”.
2️⃣ AuriStream: an autoregressive model over the cochlear tokens.
gretatuckute.bsky.social
Many prior speech-based models rely on heuristics such as:
🔹 Global clustering of the embedding space
🔹 Non-causal objectives
🔹 Fixed-duration “language” units
...

We believe that no high-performing, open-source audio model exists without such constraints—AuriStream is built to fill that gap.