Marianne de Heer Kloots
@mdhk.net
1.2K followers 490 following 100 posts
Linguist in AI & CogSci 🧠👩‍💻🤖 PhD student @ ILLC, University of Amsterdam 🌐 https://mdhk.net/ 🐘 https://scholar.social/@mdhk 🐦 https://twitter.com/mariannedhk
Posts Media Videos Starter Packs
Pinned
mdhk.net
✨ Do current neural speech models show human-like linguistic biases in speech perception?

We took inspiration from classic phonetic categorization experiments to explore where sensitivity to phonotactic context emerges in Wav2Vec2 models 🔍
(w/ @wzuidema.bsky.social)

📑 arxiv.org/abs/2407.03005

⬇️
Reposted by Marianne de Heer Kloots
susannebrouwer.bsky.social
PhD Position: Accented Speech Processing - Apply now!

Come work with Mirjam Broersma, @davidpeeters.bsky.social, and me at the Centre for Language Studies, Radboud University in the Netherlands.

Application deadline: 19 October 2025

For more information, see
www.ru.nl/en/working-a...
PhD Position: Accented Speech Processing | Radboud University
Do you want to work as a PhD: Accented Speech Processing at the Faculty of Arts? Check our vacancy!
www.ru.nl
mdhk.net
Huge congrats to the envisionBOX team for the Open Science award nomination! 🎉

My tutorial on speech analysis tools in Python from the Unboxing Multimodality summer school (github.com/mdhk/unboxin...) is now also available at envisionbox.org

Thanks for the invitation & this great initiative! 👏
Reposted by Marianne de Heer Kloots
frap98.bsky.social
The 𝗜𝗟𝗖𝗕 𝗦𝘂𝗺𝗺𝗲𝗿 𝗦𝗰𝗵𝗼𝗼𝗹 in Marseille went beyond all my expectations! 💯

A week has already flown by since I had one of the most formative experiences of my PhD so far. 👩‍🎨
mdhk.net
Thanks to all co-authors in the Dutch SSL training team @hmohebbi.bsky.social @cpouw.bsky.social @gaofeishen.com @wzuidema.bsky.social + Martijn Bentum

And to @itcooperativesurf.bsky.social (EINF-8324) for granting me the resources that enabled this project 👩‍💻✨
mdhk.net
Check out the paper for more details:
📄 arxiv.org/abs/2506.00981

Or the model, dataset and code released alongside it:
🤗 huggingface.co/amsterdamNLP...
🗃️ zenodo.org/records/1554...
🔍 github.com/mdhk/SSL-NL-...

We hope these resources help further research on language-specificity in speech models!
mdhk.net
Finally, downstream performance on Dutch speech-to-text transcription reflects the language-specific advantage for Dutch linguistic feature encoding in model-internal representations: on average, Wav2Vec2-NL has a 27% lower word error rate than the multilingual model.
Word Error Rate results for models fine-tuned for Dutch ASR (speech-to-text transcription), across 4 models and 5 evaluation datasets.
mdhk.net
Furthermore, Wav2Vec2-NL shows a stronger advantage on dialogue (IFADV) than on audiobook (MLS) data.
➡️ Training on conversational speech is important not only for enhancing the representation of conversation-level structures, but also for the encoding of smaller linguistic units (phones & words).
mdhk.net
But there are also interesting differences between methods: for example, trained probes show stronger language-specific advantages for phonetic encoding than zero-shot metrics.

➡️ Language-specific phonetic information may only take up a relatively small subspace of model-internal representations.
mdhk.net
We find that language-specific advantages are well-detected by trained clustering or classification probes, and partially observable using zero-shot metrics. I.e. the encoding of Dutch linguistic features is enhanced in the Dutch model, as compared to models trained on English and multilingual data.
Layerwise phonetic and lexical analyses, across a read speech (MLS, top row) and a dialogue (IFADV, bottom row) dataset of spoken Dutch. Measures marked * involve optimized linear transforms, whereas others are computed zero-shot; shading indicates 95% confidence intervals. The Dutch Wav2Vec2-NL model achieves highest scores across most analyses of Dutch phone and word encoding, though the size of this language-specific advantage varies considerably across analyses.
mdhk.net
But they also used different analysis techniques.

We designed the SSL-NL dataset to test the encoding of Dutch phonetic and lexical features in SSL speech representations, while allowing for comparisons across different analysis methods.

We compare both trained probes(*) and zero-shot metrics:
The model comparison set includes Wav2Vec2-NL and 3 other existing Wav2Vec2-base models: facebook's multilingual voxpopuli model, facebook's English base model, and another model trained on nonspeech acoustics. 

The set of analysis techniques includes probing classifiers (logistic regression), ABX similarities, PCA clustering, LDA clustering, and representational similarity analysis (RSA).

Word- and phone-level embeddings were created by mean-pooling model frame embeddings within words and phones respectively.

The SSL-NL evaluation dataset is a curated dataset of Dutch speech recordings and accompanying forced alignments, across two domains: audiobooks (MLS) and face-to-face conversations (IFADV).
mdhk.net
Wav2Vec2-NL is trained exclusively (from scratch) on 831 hours of Dutch speech recordings. So does this help the model to encode Dutch-specific phonetic and lexical information?

Previous studies analyzing language-specific representations in speech SSL models have reported mixed results.
mdhk.net
✨ Do self-supervised speech models learn to encode language-specific linguistic features from their training data, or only more language-general acoustic correlates?

At #Interspeech2025 we presented our new Wav2Vec2-NL model and SSL-NL evaluation dataset to test this!

📄 arxiv.org/abs/2506.00981

⬇️
Interspeech paper title: What do self-supervised speech models know about Dutch? Analyzing advantages of language-specific pre-training

Authors: Marianne de Heer Kloots, Hosein Mohebbi, Charlotte Pouw, Gaofei Shen, Willem Zuidema, Martijn Bentum
mdhk.net
We also share a working bibliography of recent publications reporting speech model interpretability analyses, that we've compiled while surveying the literature. It is incomplete and we would love your input! github.com/mdhk/awesome...
mdhk.net
Had such a great time presenting our tutorial on Interpretability Techniques for Speech Models at #Interspeech2025! 🔍

For anyone looking for an introduction to the topic, we've now uploaded all materials to the website: interpretingdl.github.io/speech-inter...
Reposted by Marianne de Heer Kloots
gretatuckute.bsky.social
Humans largely learn language through speech. In contrast, most LLMs learn from pre-tokenized text.

In our #Interspeech2025 paper, we introduce AuriStream: a simple, causal model that learns phoneme, word & semantic information from speech.

Poster P6, tomorrow (Aug 19) at 1:30 pm, Foyer 2.2!
Reposted by Marianne de Heer Kloots
annabavaresco.bsky.social
What a privilege to have #CCN2025 in (an exceptionally warm and sunny) Amsterdam this year!

It was my first time attending the conference, and being surrounded by so many talented researchers whose interests are similar to mine has been a deeply enriching experience ✨
mdhk.net
Huge congrats to @maithevannoort.bsky.social on her very popular poster! 🎉 She is now also on bluesky (and looking for a PhD position 👀)
mdhk.net
MSc student Maithe van Noort will present her project (co-supervised with @mheilbron.bsky.social) on Compositional Meaning in Vision Language Models and the Brain, testing the waters with new fMRI data of the human brain on Winoground! (poster B26)
🔗 2025.ccneuro.org/poster/?id=1...
mdhk.net
Last but not least, I personally can’t wait for the social event on Thursday night that we’ve been planning for the past year ✨
It features a *live brain-controlled music act* by the AIAR collective 🧠🎶 2025.ccneuro.org/social-event/ Get one of the last remaining tickets at the registration desk now!
mdhk.net
Raquel Fernández will present our joint project with @annabavaresco.bsky.social and Sandro Pezzelle: Modelling Multimodal Integration in Human Concept Processing with Vision-Language Models (poster B32)
🔗 2025.ccneuro.org/poster/?id=D...
mdhk.net
MSc student Maithe van Noort will present her project (co-supervised with @mheilbron.bsky.social) on Compositional Meaning in Vision Language Models and the Brain, testing the waters with new fMRI data of the human brain on Winoground! (poster B26)
🔗 2025.ccneuro.org/poster/?id=1...
mdhk.net
Also don’t miss my collaborators at the Wednesday poster session, featuring two vision-language-themed projects that I’ve been a part of:
mdhk.net
So exciting, #CCN2025 in Amsterdam started today! We have stroopwafels!!

Catch me at my poster on Friday to chat about the role of context in neural representational alignment to spoken language systems (C34) 🙌



🔗 2025.ccneuro.org/poster/?id=K...
mdhk.net
I’m in hall X5 at board 3! See you there 🙌
mdhk.net
Next week I’ll be in Vienna for my first *ACL conference! 🇦🇹✨

I will present our new BLiMP-NL dataset for evaluating language models on Dutch syntactic minimal pairs and human acceptability judgments ⬇️

🗓️ Tuesday, July 29th, 16:00-17:30, Hall X4 / X5 (Austria Center Vienna)
The BLiMP-NL dataset consists of 84 Dutch minimal pair paradigms covering 22 syntactic phenomena, and comes with graded human acceptability ratings & self-paced reading times. 

An example minimal pair:
A. Ik bekijk de foto van mezelf in de kamer (I watch the photograph of myself in the room; grammatical)
B. Wij bekijken de foto van mezelf in de kamer (We watch the photograph of myself in the room; ungrammatical)

Differences in human acceptability ratings between sentences correlate with differences in model syntactic log-odds ratio scores.
mdhk.net
ah yes this is the unfortunate divide.. not at cogsci this year sadly, hopefully we’ll meet another time!