Lightnews — Scholar-powered news

Marianne de Heer Kloots

@mdhk.net

1.2K followers 490 following 100 posts

Linguist in AI & CogSci 🧠👩‍💻🤖 PhD student @ ILLC, University of Amsterdam 🌐 https://mdhk.net/ 🐘 https://scholar.social/@mdhk 🐦 https://twitter.com/mariannedhk

Posts Media Videos Starter Packs

Pinned

Marianne de Heer Kloots @mdhk.net · Jul 8

✨ Do current neural speech models show human-like linguistic biases in speech perception?

We took inspiration from classic phonetic categorization experiments to explore where sensitivity to phonotactic context emerges in Wav2Vec2 models 🔍
(w/ @wzuidema.bsky.social)

📑 arxiv.org/abs/2407.03005

⬇️

Reposted by Marianne de Heer Kloots

Susanne Brouwer @susannebrouwer.bsky.social · 6d

PhD Position: Accented Speech Processing - Apply now!

Come work with Mirjam Broersma, @davidpeeters.bsky.social, and me at the Centre for Language Studies, Radboud University in the Netherlands.

Application deadline: 19 October 2025

For more information, see
www.ru.nl/en/working-a...

PhD Position: Accented Speech Processing | Radboud University

Do you want to work as a PhD: Accented Speech Processing at the Faculty of Arts? Check our vacancy!

Marianne de Heer Kloots @mdhk.net · 6d

Huge congrats to the envisionBOX team for the Open Science award nomination! 🎉

My tutorial on speech analysis tools in Python from the Unboxing Multimodality summer school (github.com/mdhk/unboxin...) is now also available at envisionbox.org

Thanks for the invitation & this great initiative! 👏

wimpouw @wimpouw.bsky.social · 6d

www.envisionbox.org has been shortlisted for the Leo Waaijers Open Science price: ukb.nl/en/news/shor...

@babajideowoyele.bsky.social @jamestrujillo.bsky.social @sarkadava.bsky.social @DavideAhmar @acwiek.bsky.social

Amazing Markus Küpper made an animated video:
www.youtube.com/watch?v=HduI...

EnvisionBOX overview2025

YouTube video by Wim Pouw

www.youtube.com

Reposted by Marianne de Heer Kloots

Francesca Padovani @frap98.bsky.social · 26d

The 𝗜𝗟𝗖𝗕 𝗦𝘂𝗺𝗺𝗲𝗿 𝗦𝗰𝗵𝗼𝗼𝗹 in Marseille went beyond all my expectations! 💯

A week has already flown by since I had one of the most formative experiences of my PhD so far. 👩‍🎨

Marianne de Heer Kloots @mdhk.net · Aug 27

Thanks to all co-authors in the Dutch SSL training team @hmohebbi.bsky.social @cpouw.bsky.social @gaofeishen.com @wzuidema.bsky.social + Martijn Bentum

And to @itcooperativesurf.bsky.social (EINF-8324) for granting me the resources that enabled this project 👩‍💻✨

Marianne de Heer Kloots @mdhk.net · Aug 27

Check out the paper for more details:
📄 arxiv.org/abs/2506.00981

Or the model, dataset and code released alongside it:
🤗 huggingface.co/amsterdamNLP...
🗃️ zenodo.org/records/1554...
🔍 github.com/mdhk/SSL-NL-...

We hope these resources help further research on language-specificity in speech models!

Marianne de Heer Kloots @mdhk.net · Aug 27

Finally, downstream performance on Dutch speech-to-text transcription reflects the language-specific advantage for Dutch linguistic feature encoding in model-internal representations: on average, Wav2Vec2-NL has a 27% lower word error rate than the multilingual model.

Word Error Rate results for models fine-tuned for Dutch ASR (speech-to-text transcription), across 4 models and 5 evaluation datasets.

Marianne de Heer Kloots @mdhk.net · Aug 27

Furthermore, Wav2Vec2-NL shows a stronger advantage on dialogue (IFADV) than on audiobook (MLS) data.
➡️ Training on conversational speech is important not only for enhancing the representation of conversation-level structures, but also for the encoding of smaller linguistic units (phones & words).

Marianne de Heer Kloots @mdhk.net · Aug 27

But there are also interesting differences between methods: for example, trained probes show stronger language-specific advantages for phonetic encoding than zero-shot metrics. 
➡️ Language-specific phonetic information may only take up a relatively small subspace of model-internal representations.

Marianne de Heer Kloots @mdhk.net · Aug 27

We find that language-specific advantages are well-detected by trained clustering or classification probes, and partially observable using zero-shot metrics. I.e. the encoding of Dutch linguistic features is enhanced in the Dutch model, as compared to models trained on English and multilingual data.

Layerwise phonetic and lexical analyses, across a read speech (MLS, top row) and a dialogue (IFADV, bottom row) dataset of spoken Dutch. Measures marked * involve optimized linear transforms, whereas others are computed zero-shot; shading indicates 95% confidence intervals. The Dutch Wav2Vec2-NL model achieves highest scores across most analyses of Dutch phone and word encoding, though the size of this language-specific advantage varies considerably across analyses.

Marianne de Heer Kloots @mdhk.net · Aug 27

But they also used different analysis techniques.

We designed the SSL-NL dataset to test the encoding of Dutch phonetic and lexical features in SSL speech representations, while allowing for comparisons across different analysis methods.

We compare both trained probes(*) and zero-shot metrics:

The model comparison set includes Wav2Vec2-NL and 3 other existing Wav2Vec2-base models: facebook's multilingual voxpopuli model, facebook's English base model, and another model trained on nonspeech acoustics.

The set of analysis techniques includes probing classifiers (logistic regression), ABX similarities, PCA clustering, LDA clustering, and representational similarity analysis (RSA).

Word- and phone-level embeddings were created by mean-pooling model frame embeddings within words and phones respectively.

The SSL-NL evaluation dataset is a curated dataset of Dutch speech recordings and accompanying forced alignments, across two domains: audiobooks (MLS) and face-to-face conversations (IFADV).

Marianne de Heer Kloots @mdhk.net · Aug 27

Wav2Vec2-NL is trained exclusively (from scratch) on 831 hours of Dutch speech recordings. So does this help the model to encode Dutch-specific phonetic and lexical information?

Previous studies analyzing language-specific representations in speech SSL models have reported mixed results.

Marianne de Heer Kloots @mdhk.net · Aug 27

✨ Do self-supervised speech models learn to encode language-specific linguistic features from their training data, or only more language-general acoustic correlates?

At #Interspeech2025 we presented our new Wav2Vec2-NL model and SSL-NL evaluation dataset to test this!

📄 arxiv.org/abs/2506.00981

⬇️

Interspeech paper title: What do self-supervised speech models know about Dutch? Analyzing advantages of language-specific pre-training

Authors: Marianne de Heer Kloots, Hosein Mohebbi, Charlotte Pouw, Gaofei Shen, Willem Zuidema, Martijn Bentum

Marianne de Heer Kloots @mdhk.net · Aug 20

We also share a working bibliography of recent publications reporting speech model interpretability analyses, that we've compiled while surveying the literature. It is incomplete and we would love your input! github.com/mdhk/awesome...

Marianne de Heer Kloots @mdhk.net · Aug 19

The materials include slides and notebooks by @grzegorz.chrupala.me, Martijn Bentum, @cpouw.bsky.social, @hmohebbi.bsky.social, @gaofeishen.com, @wzuidema.bsky.social & me.
Find an overview here: interpretingdl.github.io/speech-inter...

Marianne de Heer Kloots @mdhk.net · Aug 19

Had such a great time presenting our tutorial on Interpretability Techniques for Speech Models at #Interspeech2025! 🔍

For anyone looking for an introduction to the topic, we've now uploaded all materials to the website: interpretingdl.github.io/speech-inter...

Reposted by Marianne de Heer Kloots

Greta Tuckute @gretatuckute.bsky.social · Aug 19

Humans largely learn language through speech. In contrast, most LLMs learn from pre-tokenized text.

In our #Interspeech2025 paper, we introduce AuriStream: a simple, causal model that learns phoneme, word & semantic information from speech.

Poster P6, tomorrow (Aug 19) at 1:30 pm, Foyer 2.2!

Reposted by Marianne de Heer Kloots

Anna Bavaresco @annabavaresco.bsky.social · Aug 17

What a privilege to have #CCN2025 in (an exceptionally warm and sunny) Amsterdam this year!

It was my first time attending the conference, and being surrounded by so many talented researchers whose interests are similar to mine has been a deeply enriching experience ✨

Marianne de Heer Kloots @mdhk.net · Aug 13

Huge congrats to @maithevannoort.bsky.social on her very popular poster! 🎉 She is now also on bluesky (and looking for a PhD position 👀)

Marianne de Heer Kloots @mdhk.net · Aug 12

MSc student Maithe van Noort will present her project (co-supervised with @mheilbron.bsky.social) on Compositional Meaning in Vision Language Models and the Brain, testing the waters with new fMRI data of the human brain on Winoground! (poster B26)
🔗 2025.ccneuro.org/poster/?id=1...

Marianne de Heer Kloots @mdhk.net · Aug 12

Last but not least, I personally can’t wait for the social event on Thursday night that we’ve been planning for the past year ✨
It features a *live brain-controlled music act* by the AIAR collective 🧠🎶 2025.ccneuro.org/social-event/ Get one of the last remaining tickets at the registration desk now!

Marianne de Heer Kloots @mdhk.net · Aug 12

Raquel Fernández will present our joint project with @annabavaresco.bsky.social and Sandro Pezzelle: Modelling Multimodal Integration in Human Concept Processing with Vision-Language Models (poster B32)
🔗 2025.ccneuro.org/poster/?id=D...

Marianne de Heer Kloots @mdhk.net · Aug 12

MSc student Maithe van Noort will present her project (co-supervised with @mheilbron.bsky.social) on Compositional Meaning in Vision Language Models and the Brain, testing the waters with new fMRI data of the human brain on Winoground! (poster B26)
🔗 2025.ccneuro.org/poster/?id=1...

Marianne de Heer Kloots @mdhk.net · Aug 12

Also don’t miss my collaborators at the Wednesday poster session, featuring two vision-language-themed projects that I’ve been a part of:

Marianne de Heer Kloots @mdhk.net · Aug 12

So exciting, #CCN2025 in Amsterdam started today! We have stroopwafels!!

Catch me at my poster on Friday to chat about the role of context in neural representational alignment to spoken language systems (C34) 🙌  

🔗 2025.ccneuro.org/poster/?id=K...

Marianne de Heer Kloots @mdhk.net · Jul 29

I’m in hall X5 at board 3! See you there 🙌

Marianne de Heer Kloots @mdhk.net · Jul 24

Next week I’ll be in Vienna for my first *ACL conference! 🇦🇹✨

I will present our new BLiMP-NL dataset for evaluating language models on Dutch syntactic minimal pairs and human acceptability judgments ⬇️

🗓️ Tuesday, July 29th, 16:00-17:30, Hall X4 / X5 (Austria Center Vienna)

The BLiMP-NL dataset consists of 84 Dutch minimal pair paradigms covering 22 syntactic phenomena, and comes with graded human acceptability ratings & self-paced reading times.

An example minimal pair:
A. Ik bekijk de foto van mezelf in de kamer (I watch the photograph of myself in the room; grammatical)
B. Wij bekijken de foto van mezelf in de kamer (We watch the photograph of myself in the room; ungrammatical)

Differences in human acceptability ratings between sentences correlate with differences in model syntactic log-odds ratio scores.

Marianne de Heer Kloots @mdhk.net · Jul 29

ah yes this is the unfortunate divide.. not at cogsci this year sadly, hopefully we’ll meet another time!