Lightnews — Scholar-powered news

Reposted by MT Group at FBK

Lina Conti @linaconti.bsky.social · 7d

🎉 Excited to share that my paper "The Unheard Alternative" was accepted to @blackboxnlp.bsky.social 2025!
We introduce contrastive explanations for speech-to-text, identifying which audio features ST models use to assign a grammatical gender to the speaker.
📄 Preprint: arxiv.org/abs/2509.265...

The Unheard Alternative: Contrastive Explanations for Speech-to-Text Models

Contrastive explanations, which indicate why an AI system produced one output (the target) instead of another (the foil), are widely regarded in explainable AI as more informative and interpretable th...

arxiv.org

1 2

MT Group at FBK @fbk-mt.bsky.social · 14d

Our very own @sarapapi.bsky.social presenting FAMA at #clicit2025:

📗Paper: clic2025.unica.it/wp-content/u...
🔗 Models: hf.co/collections/...
📊 Data: hf.co/datasets/FBK...
💻 Code: github.com/hlt-mt/FBK-f...

Joint work with @speechtekfbk.bsky.social

2 4

Reposted by MT Group at FBK

sarapapi.bsky.social @sarapapi.bsky.social · 15d

🚀 Excited to present FAMA, the first large-scale #OpenScience #Speech foundation model for 🇮🇹 Italian & 🇬🇧 English, at #clicit2025 (17:30–18:45 oral session)!

🔗 Models: hf.co/collections/...
📊 Data: hf.co/datasets/FBK...
💻 Code: github.com/hlt-mt/FBK-f...
📄 Preprint: arxiv.org/pdf/2505.22759

2 7

Reposted by MT Group at FBK

DH Group at FBK @dh-fbk.bsky.social · 15d

We are on our way to Casteddu for #clicit2025 with a guest from @fbk-mt.bsky.social @ailc-nlp.bsky.social

5 7

MT Group at FBK @fbk-mt.bsky.social · 20d

Our pick of the week by @sarapapi.bsky.social: "Retrieval-Augmented Generation for AI-Generated Content: A Survey" by Penghao Zhao, Hailin Zhang, Qinhan Yu, Zhengren Wang, Yunteng Geng, Fangcheng Fu, Ling Yang, Wentao Zhang, Jie Jiang, Bin Cui.

arxiv.org/pdf/2402.19473

#RAG #survey

MT Group at FBK @fbk-mt.bsky.social · 26d

Our pick of the week by Marco Gaido: "Context-Driven Dynamic #Pruning for Large #Speech #Foundation Models" by Masao Someki, Shikhar Bharadwaj, Atharva Anand Joshi, Chyi-Jiunn Lin, Jinchuan Tian, Jee-weon Jung, @shinjiw.bsky.social, et al. #INTERSPEECH2025.

arxiv.org/abs/2505.18860

Context-Driven Dynamic Pruning for Large Speech Foundation Models

Speech foundation models achieve strong generalization across languages and acoustic conditions, but require significant computational resources for inference. In the context of speech foundation mode...

arxiv.org

2

MT Group at FBK @fbk-mt.bsky.social · Sep 3

Our pick of the week by @zhihangxie.bsky.social: "SimulMEGA: MoE Routers are Advanced Policy Makers for Simultaneous Speech Translation" by Chenyang Le, Bing Han, Jinshun Li, Songyong Chen, and Yanmin Qian (2025)

#Speech #Simultaneous #Translation #MOE #SpeechTech

Zhihang Xie @zhihangxie.bsky.social · Sep 3

🚀 SimulMEGA: MoE Routers as advanced policy makers for Simultaneous Speech Translation 🎧🌍
Mixture-of-Experts routing → smarter decisions on when & how to translate, balancing latency vs quality in real-time speech. Paper link at arxiv.org/pdf/2509.012...

arxiv.org

MT Group at FBK @fbk-mt.bsky.social · Aug 28

Our pick of the week by @beomseok-lee.bsky.social: "Speech Discrete Tokens or Continuous Features? A Comparative Analysis for Spoken Language Understanding in SpeechLLMs" by Dingdong Wang, Junan Li, Mingyu Cui, Dongchao Yang, Xueyuan Chen, and Helen Meng (EMNLP 2025)

Beomseok Lee @beomseok-lee.bsky.social · Aug 28

🤔 Ever wondered how discrete tokens vs. continuous features behave in SpeechLLMs?
This new work dives into 6 SLU tasks and reveals some interesting takeaways!
arxiv.org/abs/2508.17863

Speech Discrete Tokens or Continuous Features? A Comparative Analysis for Spoken Language Understanding in SpeechLLMs

With the rise of Speech Large Language Models (SpeechLLMs), two dominant approaches have emerged for speech processing: discrete tokens and continuous features. Each approach has demonstrated strong c...

arxiv.org

2

MT Group at FBK @fbk-mt.bsky.social · Aug 21

Our pick of the week by @linaconti.bsky.social: "I Have No Mouth, and I Must Rhyme: Uncovering Internal Phonetic Representations in LLaMA 3.2" @jackmerullo.bsky.social, Arjun Khurana, Oliver McLaughlin (ICML 2025 Workshop on Assessing World Models)

arxiv.org/abs/2508.02527

#XAI #LLM