Verena Blaschke
@verenablaschke.bsky.social
170 followers 250 following 19 posts
PhD student @mainlp.bsky.social (@cislmu.bsky.social, LMU Munich). Interested in language variation & change, currently working on NLP for dialects and low-resource languages. verenablaschke.github.io
Posts Media Videos Starter Packs
Pinned
verenablaschke.bsky.social
At #Interspeech2025 I'm going to present Betthupferl, a dataset for German dialect ASR & dialect-to-standard speech translation! We analyze differences between dialectal & Standard German transcriptions, benchmark ASR models, and examine shortcomings of current ASR models & evaluation metrics.
Piper title ("A multi-dialectal dataset for German dialect ASR and dialect-to-standard speech translation") and a map of the German state Bavaria showing where the Franconian, Bavarian, and Alemannic dialect groups are spoken
verenablaschke.bsky.social
#Interspeech2025 had a science fair today with lots of interactive speech tech demos, not just for conference attendees but also/especially for curious laypeople! The demos were fun, and I like the idea of combining a conference w/ a bit of scicomm for the local public
interspeech.bsky.social
NL: Ben je nieuwsgierig naar taal, technologie en wetenschap?
Op 17/8 ben je van harte welkom op het Speech Science Festival in Ahoy Rotterdam!
----
EN: Are you curious about language, technology, and science?
Join us on Aug 17 at the Speech Science Festival in Ahoy Rotterdam!
speech science festival logo
verenablaschke.bsky.social
Automatic metrics like WER and human quality judgements are moderately correlated. Dialectal words are often rendered as nonsense. Dialectal syntactic structures are often retained in the output – whether this is acceptable in Std German is hit-or-miss.
verenablaschke.bsky.social
All ASR models we benchmark perform much better on Standard German than dialectal audio. Whether the transcriptions of the dialectal audios tend to be closer to the Std German references or to the dialectal references depends on the model decoder type.
verenablaschke.bsky.social
Betthupferl contains sentences from three dialect groups spoken in southeast Germany, as well as Std German sentences for comparison. The dialectal sentences have both dialectal and Std German gold transcriptions, showing differences between pronunciation, word choice and morphosyntax.
A sentence from the dataset with a Standard German and a dialectal transcription that differ on the word and phrase level.
verenablaschke.bsky.social
At #Interspeech2025 I'm going to present Betthupferl, a dataset for German dialect ASR & dialect-to-standard speech translation! We analyze differences between dialectal & Standard German transcriptions, benchmark ASR models, and examine shortcomings of current ASR models & evaluation metrics.
Piper title ("A multi-dialectal dataset for German dialect ASR and dialect-to-standard speech translation") and a map of the German state Bavaria showing where the Franconian, Bavarian, and Alemannic dialect groups are spoken
verenablaschke.bsky.social
UPDATE: Our poster presentation got moved to Tuesday, 16:00–17:30 (session 10)! #ACL2025NLP
verenablaschke.bsky.social
At #ACL2025NLP I'll present our analysis of the effect of linguistic similarity on cross-lingual transfer! We looked at how 10 similarity measures correlate w/ transfer results btwn 263 languages across 3 NLP tasks. Different similarity measures matter for diff. experiments (no one-size-fits-all)!
Correlations between transfer results per experiment (parsing, POS tagging, topic classification with different input representations) and similarity measures. The results vary a lot across experiments and measures – some are described in the next posts.
verenablaschke.bsky.social
The poster presentation slot got moved to Tuesday, 16:00–17:30!
verenablaschke.bsky.social
In practice, selecting a transfer language based on just one relevant similarity measure or the transfer results on a similar NLP task w/ similar input representations works well -- although it's best to compare multiple promising transfer candidates.
verenablaschke.bsky.social
... Topic classification based on n-grams is sensitive to string overlap (+ correlated linguistic measures), but topic classification based on mBERT embeddings doesn't show any strong correlations – here, inclusion in the pre-training data is important instead.
verenablaschke.bsky.social
Fortunately, the patterns confirm our intuitions – e.g., syntactic similarity matters for parsing but not for topic classification. However, input representations matter too....
verenablaschke.bsky.social
At #ACL2025NLP I'll present our analysis of the effect of linguistic similarity on cross-lingual transfer! We looked at how 10 similarity measures correlate w/ transfer results btwn 263 languages across 3 NLP tasks. Different similarity measures matter for diff. experiments (no one-size-fits-all)!
Correlations between transfer results per experiment (parsing, POS tagging, topic classification with different input representations) and similarity measures. The results vary a lot across experiments and measures – some are described in the next posts.
Reposted by Verena Blaschke
barbaraplank.bsky.social
My ACL 2024 keynote talk on "Are LLMs Narrowing Our Horizon? Let’s Embrace Variation in NLP!" is online now:

underline.io/events/466/s...

2024.aclweb.org/program/keyn...

It was a huge honor to me to give last year's flagship-in-NLP-conference keynote in Bangkok 🇹🇭
Watch lectures from the best researchers.
On-demand video platform giving you access to lectures from conferences worldwide.
underline.io
verenablaschke.bsky.social
Dei Boarisch heard ned bei "Servus" und "Pfiade" auf? Dann suach ma genau Di!
Wir suachan Bairischsprecher:innen, de a kurze Umfrage über KI-generierds Boarisch für a Masterarbeit beantwortn mechadn.
Mid jeder Teilnahme bring ma den boarischn Dialekt a Stickal weida in de digitale Weyd!
verenablaschke.bsky.social
Bavarian dialect speakers needed! Our MSc student Miriam wants to find out 1. how good/bad LLM-generated "Bavarian" is, and 2. whether dialect speakers agree with each other on this. The survey takes <5 min: survey.ifkw.lmu.de/dialquali25/ Thank you for sharing/participating!
verenablaschke.bsky.social
Bavarian dialect speakers needed! Our MSc student Miriam wants to find out 1. how good/bad LLM-generated "Bavarian" is, and 2. whether dialect speakers agree with each other on this. The survey takes <5 min: survey.ifkw.lmu.de/dialquali25/ Thank you for sharing/participating!
Reposted by Verena Blaschke
queerinai.com
The first archival *CL Queer in AI workshop will kick off in about 15 min! Join us in-person if you're at NAACL or virtually 💜

We will have presentations from our amazing contributors and invited speakers. Read on for more details 🧵
Reposted by Verena Blaschke
vinodkpg.bsky.social
Happening now at #NAACL2025 in room Pecos.

Kicking off with amazing talks and a panel by Monojit Choudhury, Isabelle Augenstein, and Katia Shutova
c3nlp.bsky.social
📣 Excited that our C3NLP 2025 Workshop program is finalized — just one week to go! 🎉

Full program: c3nlp.github.io

Co-organized with @vinodkpg.bsky.social @sunipadev.bsky.social @lucianabenotti.bsky.social @yongcao.bsky.social @danielhers.bsky.social Laura Cabello, Ife Adebara, and Li Zhou. ❤️
Reposted by Verena Blaschke
pywirrarika.bsky.social
Happening now at @americasnlp.bsky.social 2025. Telegram in Aymara and how to translate tech terminology. #NAACL2025
Reposted by Verena Blaschke
alanramponi.bsky.social
📣 Join us tomorrow May 3rd for the 10th Workshop on Noisy and User-generated Text #W-NUT at #NAACL2025 (📍 Room Navajo/Nambe)!

The workshop features 16 paper presentations and 2 exciting keynote talks by @verenablaschke.bsky.social and Su Lin Blodgett (titles+abstracts below)! #NLProc #NAACL

👇
verenablaschke.bsky.social
On my way to #NAACL2025 where I'll give a keynote at the noisy text workshop (WNUT), presenting some of the challenges & methods for dialect NLP + also discussing dialect speakers' perspectives!

🗨️ Beyond “noisy” text: How (and why) to process dialect data
🗓️ Saturday, May 3, 9:30–10:30
verenablaschke.bsky.social
This article is about a success story, but it also mentions unsuccessful prior attempts and discusses the different perspectives/priorities that NLP researchers vs. field linguists might have: hdl.handle.net/10125/24793
Integrating Automatic Transcription into the Language Documentation Workflow: Experiments with Na Data and the Persephone Toolkit
Automatic speech recognition tools have potential for facilitating language documentation, but in practice these tools remain little-used by linguists for a variety of reasons, such as that the technology is still new (and evolving rapidly), user-friendly interfaces are still under development, and case studies demonstrating the practical usefulness of automatic recognition in a low-resource setting remain few. This article reports on a success story in integrating automatic transcription into the language documentation workflow, specifically for Yongning Na, a language of Southwest China. Using Persephone, an open-source toolkit, a single-speaker speech transcription tool was trained over five hours of manually transcribed speech. The experiments found that this method can achieve a remarkably low error rate (on the order of 17%), and that automatic transcriptions were useful as a canvas for the linguist. The present report is intended for linguists with little or no knowledge of speech processing. It aims to provide insights into (i) the way the tool operates and (ii) the process of collaborating with natural language processing specialists. Practical recommendations are offered on how to anticipate the requirements of this type of technology from the early stages of data collection in the field.
hdl.handle.net
Reposted by Verena Blaschke
barbaraplank.bsky.social
Are you attending NAACL 2025 and are you interested in low-resource languages and dialects?

Then don't miss our very own @verenablaschke.bsky.social's keynote talk at the WNUT 2025 workshop on May 3rd:

Beyond “noisy” text: How (and why) to process dialect data

🌐 ☀️
noisy-text.github.io/2025/
verenablaschke.bsky.social
I just finished reading the preprint -- cool paper + very timely!