Lightnews — Scholar-powered news

Reposted by arXiv eess.AS Audio and Speech Processing

arXiv cs.SD Sound @cssd-bot.bsky.social · 21h

Christopher Mitcheltree, Hao Hao Tan, Joshua D. Reiss: Modulation Discovery with Differentiable Digital Signal Processing https://arxiv.org/abs/2510.06204 https://arxiv.org/pdf/2510.06204 https://arxiv.org/html/2510.06204

2

Reposted by arXiv eess.AS Audio and Speech Processing

arXiv cs.CL Computation and Language @cscl-bot.bsky.social · 21h

Yen-Ju Lu, Yashesh Gaur, Wei Zhou, Benjamin Muller, Jesus Villalba, Najim Dehak, Luke Zettlemoyer, Gargi Ghosh, Mike Lewis, Srinivasan Iyer, Duc Le: Latent Speech-Text Transformer https://arxiv.org/abs/2510.06195 https://arxiv.org/pdf/2510.06195 https://arxiv.org/html/2510.06195

3

Reposted by arXiv eess.AS Audio and Speech Processing

arXiv cs.SD Sound @cssd-bot.bsky.social · 21h

Tao Zhu, Yinfeng Yu, Liejun Wang, Fuchun Sun, Wendong Zheng: ECTSpeech: Enhancing Efficient Speech Synthesis via Easy Consistency Tuning https://arxiv.org/abs/2510.05984 https://arxiv.org/pdf/2510.05984 https://arxiv.org/html/2510.05984

2

Reposted by arXiv eess.AS Audio and Speech Processing

arXiv cs.SD Sound @cssd-bot.bsky.social · 21h

Ping-Yi Chen, Chih-Pin Tan, Yi-Hsuan Yang: Segment-Factorized Full-Song Generation on Symbolic Piano Music https://arxiv.org/abs/2510.05881 https://arxiv.org/pdf/2510.05881 https://arxiv.org/html/2510.05881

4

Reposted by arXiv eess.AS Audio and Speech Processing

arXiv cs.SD Sound @cssd-bot.bsky.social · 21h

Gramaccioni, Marinoni, Grassucci, Cicchetti, Uncini, Comminiello: FoleyGRAM: Video-to-Audio Generation with GRAM-Aligned Multimodal Encoders https://arxiv.org/abs/2510.05829 https://arxiv.org/pdf/2510.05829 https://arxiv.org/html/2510.05829

4

Reposted by arXiv eess.AS Audio and Speech Processing

arXiv cs.SD Sound @cssd-bot.bsky.social · 21h

Christian Marinoni, Riccardo Fosco Gramaccioni, Kazuki Shimada, Takashi Shibuya, Yuki Mitsufuji, Danilo Comminiello: StereoSync: Spatially-Aware Stereo Audio Generation from Video https://arxiv.org/abs/2510.05828 https://arxiv.org/pdf/2510.05828 https://arxiv.org/html/2510.05828

4

Reposted by arXiv eess.AS Audio and Speech Processing

arXiv cs.SD Sound @cssd-bot.bsky.social · 21h

Aleksandr Lukoianov, Anssi Klapuri: Transcribing Rhythmic Patterns of the Guitar Track in Polyphonic Music https://arxiv.org/abs/2510.05756 https://arxiv.org/pdf/2510.05756 https://arxiv.org/html/2510.05756

2 1

Reposted by arXiv eess.AS Audio and Speech Processing

arXiv cs.SD Sound @cssd-bot.bsky.social · 21h

Xilin Jiang, Hannes Gamper, Sebastian Braun: Sci-Phi: A Large Language Model Spatial Audio Descriptor https://arxiv.org/abs/2510.05542 https://arxiv.org/pdf/2510.05542 https://arxiv.org/html/2510.05542

2

Reposted by arXiv eess.AS Audio and Speech Processing

arXiv cs.CL Computation and Language @cscl-bot.bsky.social · 21h

Si-Ioi Ng, Pranav S. Ambadi, Kimberly D. Mueller, Julie Liss, Visar Berisha: Advancing Automated Spatio-Semantic Analysis in Picture Description Using Language Models https://arxiv.org/abs/2510.05128 https://arxiv.org/pdf/2510.05128 https://arxiv.org/html/2510.05128

2

arXiv eess.AS Audio and Speech Processing @eessas-bot.bsky.social · 21h

Mingxuan Wang, Satoshi Nakamura: TokenChain: A Discrete Speech Chain via Semantic Token Modeling https://arxiv.org/abs/2510.06201 https://arxiv.org/pdf/2510.06201 https://arxiv.org/html/2510.06201

3

arXiv eess.AS Audio and Speech Processing @eessas-bot.bsky.social · 21h

Huang-Cheng Chou, Chi-Chun Lee: Revisiting Modeling and Evaluation Approaches in Speech Emotion Recognition: Considering Subjectivity of Annotators and Ambiguity of Emotions https://arxiv.org/abs/2510.05934 https://arxiv.org/pdf/2510.05934 https://arxiv.org/html/2510.05934

arXiv eess.AS Audio and Speech Processing @eessas-bot.bsky.social · 21h

Vitor Magno de O. S. Bezerra, Gabriel F. A. Bastos, Jugurta Montalv\~ao: Revisiting MFCCs: Evidence for Spectral-Prosodic Coupling https://arxiv.org/abs/2510.05922 https://arxiv.org/pdf/2510.05922 https://arxiv.org/html/2510.05922

arXiv eess.AS Audio and Speech Processing @eessas-bot.bsky.social · 21h

Jingqi Sun, Shulin He, Ruizhe Pang, Zhong-Qiu Wang: Neural Forward Filtering for Speaker-Image Separation https://arxiv.org/abs/2510.05757 https://arxiv.org/pdf/2510.05757 https://arxiv.org/html/2510.05757

arXiv eess.AS Audio and Speech Processing @eessas-bot.bsky.social · 21h

Rui Wang, Liping Chen, Kong Aik Lee, Zhengpeng Zha, Zhenhua Ling: Investigation of perception inconsistency in speaker embedding for asynchronous voice anonymization https://arxiv.org/abs/2510.05718 https://arxiv.org/pdf/2510.05718 https://arxiv.org/html/2510.05718

arXiv eess.AS Audio and Speech Processing @eessas-bot.bsky.social · 21h

Akshay Anand, Chenxu Guo, Cheol Jun Cho, Jiachen Lian, Gopala Anumanchipalli: Teaching Machines to Speak Using Articulatory Control https://arxiv.org/abs/2510.05619 https://arxiv.org/pdf/2510.05619 https://arxiv.org/html/2510.05619

arXiv eess.AS Audio and Speech Processing @eessas-bot.bsky.social · 21h

Haoyu Zhang, Jiaxian Guo, Yusuke Iwasawa, Yutaka Matsuo: AQA-TTRL: Self-Adaptation in Audio Question Answering with Test-Time Reinforcement Learning https://arxiv.org/abs/2510.05478 https://arxiv.org/pdf/2510.05478 https://arxiv.org/html/2510.05478

arXiv eess.AS Audio and Speech Processing @eessas-bot.bsky.social · 21h

Xi Xuan, Xuechen Liu, Wenxin Zhang, Yi-Cheng Lin, Xiaojian Lin, Tomi Kinnunen: WaveSP-Net: Learnable Wavelet-Domain Sparse Prompt Tuning for Speech Deepfake Detection https://arxiv.org/abs/2510.05305 https://arxiv.org/pdf/2510.05305 https://arxiv.org/html/2510.05305

2

arXiv eess.AS Audio and Speech Processing @eessas-bot.bsky.social · 21h

[2025-10-08 Wed (UTC), 8 new articles found for eessAS Audio and Speech Processing]

Reposted by arXiv eess.AS Audio and Speech Processing

arXiv cs.SD Sound @cssd-bot.bsky.social · 1d

Baher Mohammad, Magauiya Zhussip, Stamatios Lefkimmiatis: Speak, Edit, Repeat: High-Fidelity Voice Editing and Zero-Shot TTS with Cross-Attentive Mamba https://arxiv.org/abs/2510.04738 https://arxiv.org/pdf/2510.04738 https://arxiv.org/html/2510.04738

4

Reposted by arXiv eess.AS Audio and Speech Processing

arXiv cs.CL Computation and Language @cscl-bot.bsky.social · 1d

Fernando L\'opez, Santosh Kesiraju, Jordi Luque: Robustness assessment of large audio language models in multiple-choice evaluation https://arxiv.org/abs/2510.04584 https://arxiv.org/pdf/2510.04584 https://arxiv.org/html/2510.04584

2

Reposted by arXiv eess.AS Audio and Speech Processing

arXiv cs.SD Sound @cssd-bot.bsky.social · 1d

Juncheng Wang, Chao Xu, Cheng Yu, Zhe Hu, Haoyu Xie, Guoqi Yu, Lei Shang, Shujun Wang: Language Model Based Text-to-Audio Generation: Anti-Causally Aligned Collaborative Residual Transformers https://arxiv.org/abs/2510.04577 https://arxiv.org/pdf/2510.04577 https://arxiv.org/html/2510.04577

3

Reposted by arXiv eess.AS Audio and Speech Processing

arXiv cs.SD Sound @cssd-bot.bsky.social · 1d

Takashi Maekaku, Keita Goto, Jinchuan Tian, Yusuke Shinohara, Shinji Watanabe: Evaluating Self-Supervised Speech Models via Text-Based LLMS https://arxiv.org/abs/2510.04463 https://arxiv.org/pdf/2510.04463 https://arxiv.org/html/2510.04463

1

Reposted by arXiv eess.AS Audio and Speech Processing

arXiv cs.SD Sound @cssd-bot.bsky.social · 1d

Christian Limberg, Fares Schulz, Zhe Zhang, Stefan Weinzierl: Pitch-Conditioned Instrument Sound Synthesis From an Interactive Timbre Latent Space https://arxiv.org/abs/2510.04339 https://arxiv.org/pdf/2510.04339 https://arxiv.org/html/2510.04339

4 1

Reposted by arXiv eess.AS Audio and Speech Processing

arXiv cs.SD Sound @cssd-bot.bsky.social · 1d

Zhao Ren, Rathi Adarshi Rammohan, Kevin Scheck, Tanja Schultz: Machine Unlearning in Speech Emotion Recognition via Forget Set Alone https://arxiv.org/abs/2510.04251 https://arxiv.org/pdf/2510.04251 https://arxiv.org/html/2510.04251

1

Reposted by arXiv eess.AS Audio and Speech Processing

arXiv cs.SD Sound @cssd-bot.bsky.social · 1d

Efrayim Yanir, David Burshtein, Sharon Gannot: GDiffuSE: Diffusion-based speech enhancement with noise model guidance https://arxiv.org/abs/2510.04157 https://arxiv.org/pdf/2510.04157 https://arxiv.org/html/2510.04157

1