BUT Speech
@butspeech.bsky.social
11 followers 1 following 30 posts
We do impactful research and raise new leading scientific personalities in the field of speech processing.
Posts Media Videos Starter Packs
butspeech.bsky.social
Thu July 24 11:00, we will have the last plenary talk of #JSALT2025 - Jordan Boyd-Graber Ying [University of Maryland] will present "Helpful AI Models: You can't always get what you want, but you might get what you need”" You can also watch it on YT: youtube.com/playlist?lis...
butspeech.bsky.social
Tue July 22 11:00, we will have another plenary talk of #JSALT2025 - Xavier Serra [UPF Barcelona] will speak about Methodologies for Music Understanding and Generation in the Context of Trustworthy AI. You can also watch it on YT: youtube.com/playlist?lis...

jsalt2025.fit.vut.cz/plenary-lect...
butspeech.bsky.social
Today, July 18, at 11:00, Herve Bredin [pyannoteAI, France] will give the 5th Plenary talk at the JSALT workshop "Speaker diarization, a love loss story", see jsalt2025.fit.vut.cz/plenary-lect... for details.
butspeech.bsky.social
📢 Barbara Schuppler (TU Graz) gives the 2nd #JSALT plenary tomorrow, Tue July 1, 11:00 in Room E112: "Cross-layer models for conversational speech recognition in low-resourced scenarios". Join in person or on YouTube: www.youtube.com/playlist?lis... 🎤📺
jsalt2025.fit.vut.cz/plenary-lect...
butspeech.bsky.social
Honza is giving a lecture at Charles University in Prague (Faculty of Mathematics and Physics, MFF) today. If you want to attend, note that it takes place in the new buildings of MFF in Troja, not the historical one in Mala Strana.
www.mff.cuni.cz/en/research-...
butspeech.bsky.social
We have great pleasure to invite you to a talk of an excellent Czech scientist, Professor at #EPFL, Lenka Zdeborová. We have never seen a talk treating machine learning as a problem of statistical physics! Tuesday, May 20, 2025 at 13:00 in lecture room E112 and online www.youtube.com/live/FCvPhHm...
butspeech.bsky.social
Several speech students participated at the FIT Conference of innovations, technology and science - Excel@FIT. Congratulations to Sathvik, Dominik, and Ondrej for winning Excel prizes!
excel.fit.vutbr.cz/vysledky/
butspeech.bsky.social
Congratulations to Lin Zhang for her new post-doc position at CLSP at Johns Hopkins University! She is working on anti-spoofing and anonymization with Nicholas Andrews and Matthew Wiesner. She also collaborates closely with Sanjeev Khudanpur, Leibny Paola García-Perera, and Kevin Duh.
butspeech.bsky.social
Over the weekend we to plan for the upcoming #JSALT25 workshop for the topic "Advancing Expert-Level Reasoning and Understanding in Large Audio Language Models". Two days of intense brain storming 🧠 and planning powered by extra portions of coffee ☕️.
jsalt2025.fit.vut.cz/summer-works...
butspeech.bsky.social
Glad to announce another married man in the group - Pradyoth's wedding with Sameeksha took place in Mangalore on Thu 3rd April (before ICASSP) in presence of 3.5k guests! All the best!
butspeech.bsky.social
And after ICASSP, Johan. Lukas, Alex, Martas and Santosh even made it to the local newspapers after their visit to the Ramappa Temple UNESCO heritage site!
butspeech.bsky.social
The networking activities around ICASSP continued after Meeami: on Monday 7 April, Honza took part in an official visit to IIIT Hyderabad, met its director and spent nice time with Anil Kumar Vuppala and his colleagues and students in Language Technologies Research Center (LTRC).
butspeech.bsky.social
A lot of interesting discussions happened during and after the presentations, and also during the amazing lunch at ITC Peshawar. We thank Meeami for hosting us.
butspeech.bsky.social
Santosh presented the team's work on Aligning foundation models for (1) speech to text translation (2) dialogue state tracking from speech.
butspeech.bsky.social
Alex presented the team's work on (1) Target speaker ASR with Whisper, (2) Robust ASR via internal language model regularisation, (3) Speech foundation models for European languages using open and legally accessible datasets.
butspeech.bsky.social
The Speech@FIT research group continues its industry collaboration at global scale, with Santosh and Alex recently visiting Meeami Technologies in Hyderabad.
butspeech.bsky.social
Leveraging Self-Supervised Learning for Speaker Diarization, by Jiangyu Han et al. ieeexplore.ieee.org/stamp/stamp....
utilizes SSL models to alleviate the problem of data scarcity for neural speaker diarization.
Apr 9: 5:00 pm - 6:30 pm, Lecture, Room: MRG.04, Johan Rohdin
butspeech.bsky.social
Our papers to be presented at ICASSP in Hyderabad!

Target Speaker ASR with Whisper, ieeexplore.ieee.org/document/108...
Introduces a novel approach to training target-speaker ASR systems utilizing frame-level diarization outputs.
Apr 11: 2:00 pm - 3:30 pm, Poster 2E, presented by Alexander Polok
Reposted by BUT Speech
ufal.mff.cuni.cz
Yesterday, we hosted folks from @butspeech.bsky.social, Phonexia, Phrase, and MAMA AI at the first meeting of the Linguistics, AI, Speech, and Language Technologies project, which is funded by @msmtcr and the EU's Programme Johannes Amos Comenius.
butspeech.bsky.social
🔗 Competition details: www.nexdata.ai/competition/...
This work builds on DiCoW, our diarization-conditioned ASR model—learn more in our paper:
🔗 arxiv.org/abs/2501.00114
🖥️ Codebase available on GitHub:
🔗 github.com/BUTSpeechFIT...
[4/4]
butspeech.bsky.social
🔍 Why should you try it?
✅ Strong starting point for multilingual conversational ASR research
✅ Open for experimentation, adaptation, and fine-tuning
✅ Join us in pushing the boundaries of robust, multilingual speech recognition
🚀 Test and improve multilingual conversational ASR
[3/4]
butspeech.bsky.social
📊 Baseline WER (No Domain Adaptation Yet, Oracle diarization):
🇺🇸 English (American): 9.4%
🇮🇳 English (Indian): 15.1%
🇵🇭 English (Filipino): 11.3%
🇩🇪 German: 19.7%
🆕 Now supports transcription of multiple speakers speaking different languages! 🌍🗣️
[2/4]
butspeech.bsky.social
🗣️ Are you participating in the Interspeech 2025 Workshop on Multilingual Conversational Speech Language Models organised by Nexdata【旧Datatang株式会社公式】?

We’ve released our baseline model for the community—ready for you to explore and build upon!
🔗 Try it here: pccnect.fit.vutbr.cz/gradio-demo/
[1/4]
butspeech.bsky.social
📊 Baseline WER (No Domain Adaptation Yet, Oracle diarization):
🇺🇸 English (American): 9.4%
🇮🇳 English (Indian): 15.1%
🇵🇭 English (Filipino): 11.3%
🇩🇪 German: 19.7%
🆕 Now supports transcription of multiple speakers speaking different languages! 🌍🗣️
[2/4]
butspeech.bsky.social
Congratulations to Dr. Karel ! The defense as well as the "one" in the evening were serious and successful.
Committee Beer bill