arXiv eess.AS Audio and Speech Processing
@eessas-bot.bsky.social
25 followers 2 following 3.1K posts
Unofficial bot by @vele.bsky.social w/ http://github.com/so-okada/bXiv https://arxiv.org/list/eess.AS/new List https://bsky.app/profile/vele.bsky.social/lists/3lim7ccweqo2j ModList https://bsky.app/profile/vele.bsky.social/lists/3lim3qnexsw2g
Posts Media Videos Starter Packs
Reposted by arXiv eess.AS Audio and Speech Processing
cssd-bot.bsky.social
Christopher Mitcheltree, Hao Hao Tan, Joshua D. Reiss: Modulation Discovery with Differentiable Digital Signal Processing https://arxiv.org/abs/2510.06204 https://arxiv.org/pdf/2510.06204 https://arxiv.org/html/2510.06204
Reposted by arXiv eess.AS Audio and Speech Processing
cscl-bot.bsky.social
Yen-Ju Lu, Yashesh Gaur, Wei Zhou, Benjamin Muller, Jesus Villalba, Najim Dehak, Luke Zettlemoyer, Gargi Ghosh, Mike Lewis, Srinivasan Iyer, Duc Le: Latent Speech-Text Transformer https://arxiv.org/abs/2510.06195 https://arxiv.org/pdf/2510.06195 https://arxiv.org/html/2510.06195
Reposted by arXiv eess.AS Audio and Speech Processing
cssd-bot.bsky.social
Tao Zhu, Yinfeng Yu, Liejun Wang, Fuchun Sun, Wendong Zheng: ECTSpeech: Enhancing Efficient Speech Synthesis via Easy Consistency Tuning https://arxiv.org/abs/2510.05984 https://arxiv.org/pdf/2510.05984 https://arxiv.org/html/2510.05984
Reposted by arXiv eess.AS Audio and Speech Processing
cssd-bot.bsky.social
Ping-Yi Chen, Chih-Pin Tan, Yi-Hsuan Yang: Segment-Factorized Full-Song Generation on Symbolic Piano Music https://arxiv.org/abs/2510.05881 https://arxiv.org/pdf/2510.05881 https://arxiv.org/html/2510.05881
Reposted by arXiv eess.AS Audio and Speech Processing
cssd-bot.bsky.social
Gramaccioni, Marinoni, Grassucci, Cicchetti, Uncini, Comminiello: FoleyGRAM: Video-to-Audio Generation with GRAM-Aligned Multimodal Encoders https://arxiv.org/abs/2510.05829 https://arxiv.org/pdf/2510.05829 https://arxiv.org/html/2510.05829
Reposted by arXiv eess.AS Audio and Speech Processing
cssd-bot.bsky.social
Christian Marinoni, Riccardo Fosco Gramaccioni, Kazuki Shimada, Takashi Shibuya, Yuki Mitsufuji, Danilo Comminiello: StereoSync: Spatially-Aware Stereo Audio Generation from Video https://arxiv.org/abs/2510.05828 https://arxiv.org/pdf/2510.05828 https://arxiv.org/html/2510.05828
Reposted by arXiv eess.AS Audio and Speech Processing
cssd-bot.bsky.social
Aleksandr Lukoianov, Anssi Klapuri: Transcribing Rhythmic Patterns of the Guitar Track in Polyphonic Music https://arxiv.org/abs/2510.05756 https://arxiv.org/pdf/2510.05756 https://arxiv.org/html/2510.05756
Reposted by arXiv eess.AS Audio and Speech Processing
cssd-bot.bsky.social
Xilin Jiang, Hannes Gamper, Sebastian Braun: Sci-Phi: A Large Language Model Spatial Audio Descriptor https://arxiv.org/abs/2510.05542 https://arxiv.org/pdf/2510.05542 https://arxiv.org/html/2510.05542
Reposted by arXiv eess.AS Audio and Speech Processing
cscl-bot.bsky.social
Si-Ioi Ng, Pranav S. Ambadi, Kimberly D. Mueller, Julie Liss, Visar Berisha: Advancing Automated Spatio-Semantic Analysis in Picture Description Using Language Models https://arxiv.org/abs/2510.05128 https://arxiv.org/pdf/2510.05128 https://arxiv.org/html/2510.05128
eessas-bot.bsky.social
Huang-Cheng Chou, Chi-Chun Lee: Revisiting Modeling and Evaluation Approaches in Speech Emotion Recognition: Considering Subjectivity of Annotators and Ambiguity of Emotions https://arxiv.org/abs/2510.05934 https://arxiv.org/pdf/2510.05934 https://arxiv.org/html/2510.05934
eessas-bot.bsky.social
Vitor Magno de O. S. Bezerra, Gabriel F. A. Bastos, Jugurta Montalv\~ao: Revisiting MFCCs: Evidence for Spectral-Prosodic Coupling https://arxiv.org/abs/2510.05922 https://arxiv.org/pdf/2510.05922 https://arxiv.org/html/2510.05922
eessas-bot.bsky.social
Jingqi Sun, Shulin He, Ruizhe Pang, Zhong-Qiu Wang: Neural Forward Filtering for Speaker-Image Separation https://arxiv.org/abs/2510.05757 https://arxiv.org/pdf/2510.05757 https://arxiv.org/html/2510.05757
eessas-bot.bsky.social
Rui Wang, Liping Chen, Kong Aik Lee, Zhengpeng Zha, Zhenhua Ling: Investigation of perception inconsistency in speaker embedding for asynchronous voice anonymization https://arxiv.org/abs/2510.05718 https://arxiv.org/pdf/2510.05718 https://arxiv.org/html/2510.05718
eessas-bot.bsky.social
Akshay Anand, Chenxu Guo, Cheol Jun Cho, Jiachen Lian, Gopala Anumanchipalli: Teaching Machines to Speak Using Articulatory Control https://arxiv.org/abs/2510.05619 https://arxiv.org/pdf/2510.05619 https://arxiv.org/html/2510.05619
eessas-bot.bsky.social
Haoyu Zhang, Jiaxian Guo, Yusuke Iwasawa, Yutaka Matsuo: AQA-TTRL: Self-Adaptation in Audio Question Answering with Test-Time Reinforcement Learning https://arxiv.org/abs/2510.05478 https://arxiv.org/pdf/2510.05478 https://arxiv.org/html/2510.05478
eessas-bot.bsky.social
Xi Xuan, Xuechen Liu, Wenxin Zhang, Yi-Cheng Lin, Xiaojian Lin, Tomi Kinnunen: WaveSP-Net: Learnable Wavelet-Domain Sparse Prompt Tuning for Speech Deepfake Detection https://arxiv.org/abs/2510.05305 https://arxiv.org/pdf/2510.05305 https://arxiv.org/html/2510.05305
eessas-bot.bsky.social
[2025-10-08 Wed (UTC), 8 new articles found for eessAS Audio and Speech Processing]
Reposted by arXiv eess.AS Audio and Speech Processing
cssd-bot.bsky.social
Baher Mohammad, Magauiya Zhussip, Stamatios Lefkimmiatis: Speak, Edit, Repeat: High-Fidelity Voice Editing and Zero-Shot TTS with Cross-Attentive Mamba https://arxiv.org/abs/2510.04738 https://arxiv.org/pdf/2510.04738 https://arxiv.org/html/2510.04738
Reposted by arXiv eess.AS Audio and Speech Processing
cscl-bot.bsky.social
Fernando L\'opez, Santosh Kesiraju, Jordi Luque: Robustness assessment of large audio language models in multiple-choice evaluation https://arxiv.org/abs/2510.04584 https://arxiv.org/pdf/2510.04584 https://arxiv.org/html/2510.04584
Reposted by arXiv eess.AS Audio and Speech Processing
cssd-bot.bsky.social
Juncheng Wang, Chao Xu, Cheng Yu, Zhe Hu, Haoyu Xie, Guoqi Yu, Lei Shang, Shujun Wang: Language Model Based Text-to-Audio Generation: Anti-Causally Aligned Collaborative Residual Transformers https://arxiv.org/abs/2510.04577 https://arxiv.org/pdf/2510.04577 https://arxiv.org/html/2510.04577
Reposted by arXiv eess.AS Audio and Speech Processing
cssd-bot.bsky.social
Takashi Maekaku, Keita Goto, Jinchuan Tian, Yusuke Shinohara, Shinji Watanabe: Evaluating Self-Supervised Speech Models via Text-Based LLMS https://arxiv.org/abs/2510.04463 https://arxiv.org/pdf/2510.04463 https://arxiv.org/html/2510.04463
Reposted by arXiv eess.AS Audio and Speech Processing
cssd-bot.bsky.social
Christian Limberg, Fares Schulz, Zhe Zhang, Stefan Weinzierl: Pitch-Conditioned Instrument Sound Synthesis From an Interactive Timbre Latent Space https://arxiv.org/abs/2510.04339 https://arxiv.org/pdf/2510.04339 https://arxiv.org/html/2510.04339
Reposted by arXiv eess.AS Audio and Speech Processing
cssd-bot.bsky.social
Zhao Ren, Rathi Adarshi Rammohan, Kevin Scheck, Tanja Schultz: Machine Unlearning in Speech Emotion Recognition via Forget Set Alone https://arxiv.org/abs/2510.04251 https://arxiv.org/pdf/2510.04251 https://arxiv.org/html/2510.04251
Reposted by arXiv eess.AS Audio and Speech Processing
cssd-bot.bsky.social
Efrayim Yanir, David Burshtein, Sharon Gannot: GDiffuSE: Diffusion-based speech enhancement with noise model guidance https://arxiv.org/abs/2510.04157 https://arxiv.org/pdf/2510.04157 https://arxiv.org/html/2510.04157