arXiv cs.SD Sound
@cssd-bot.bsky.social
29 followers 1 following 4.4K posts
Unofficial bot by @vele.bsky.social w/ http://github.com/so-okada/bXiv https://arxiv.org/list/cs.SD/new List https://bsky.app/profile/vele.bsky.social/lists/3lim7ccweqo2j ModList https://bsky.app/profile/vele.bsky.social/lists/3lim3qnexsw2g
Posts Media Videos Starter Packs
Reposted by arXiv cs.SD Sound
eessas-bot.bsky.social
Guobin Ma, Jixun Yao, Ziqian Ning, Yuepeng Jiang, Lingxin Xiong, Lei Xie, Pengcheng Zhu: MeanVC: Lightweight and Streaming Zero-Shot Voice Conversion via Mean Flows https://arxiv.org/abs/2510.08392 https://arxiv.org/pdf/2510.08392 https://arxiv.org/html/2510.08392
Reposted by arXiv cs.SD Sound
cscv-bot.bsky.social
Harsh Kavediya, Vighnesh Nayak, Bheeshm Sharma, Balamurugan Palaniappan: IsoSignVid2Aud: Sign Language Video to Audio Conversion without Text Intermediaries https://arxiv.org/abs/2510.07837 https://arxiv.org/pdf/2510.07837 https://arxiv.org/html/2510.07837
Reposted by arXiv cs.SD Sound
csmm-bot.bsky.social
Krish Patel, et al.: AV-EMO-Reasoning: Benchmarking Emotional Reasoning Capabilities in Omni-modal LLMS with Audio-visual Cues https://arxiv.org/abs/2510.07355 https://arxiv.org/pdf/2510.07355 https://arxiv.org/html/2510.07355
Reposted by arXiv cs.SD Sound
csmm-bot.bsky.social
Han Hu, Dongheng Lin, Qiming Huang, Yuqi Hou, Hyung Jin Chang, Jianbo Jiao: Audio-Visual Separation with Hierarchical Fusion and Representation Alignment https://arxiv.org/abs/2510.07326 https://arxiv.org/pdf/2510.07326 https://arxiv.org/html/2510.07326
cssd-bot.bsky.social
Eleonora Mancini, Joan Serr\`a, Paolo Torroni, Yuki Mitsufuji: Leveraging Whisper Embeddings for Audio-based Lyrics Matching https://arxiv.org/abs/2510.08176 https://arxiv.org/pdf/2510.08176 https://arxiv.org/html/2510.08176
cssd-bot.bsky.social
Liyang Chen, Hongkai Chen, Yujun Cai, Sifan Li, Qingwen Ye, Yiwei Wang: Detecting and Mitigating Insertion Hallucination in Video-to-Audio Generation https://arxiv.org/abs/2510.08078 https://arxiv.org/pdf/2510.08078 https://arxiv.org/html/2510.08078
cssd-bot.bsky.social
Fabio Morreale, Wiebke Hutiri, Joan Serr\`a, Alice Xiang, Yuki Mitsufuji: Attribution-by-design: Ensuring Inference-Time Provenance in Generative Music Systems https://arxiv.org/abs/2510.08062 https://arxiv.org/pdf/2510.08062 https://arxiv.org/html/2510.08062
cssd-bot.bsky.social
Honghong Wang, Jing Deng, Rong Zheng: Personality-Enhanced Multimodal Depression Detection in the Elderly https://arxiv.org/abs/2510.08004 https://arxiv.org/pdf/2510.08004 https://arxiv.org/html/2510.08004
cssd-bot.bsky.social
Wei Wang, Rong Cao, Yi Guo, Zhengyang Chen, Kuan Chen, Yuanyuan Huo: IntMeanFlow: Few-step Speech Generation with Integral Velocity Distillation https://arxiv.org/abs/2510.07979 https://arxiv.org/pdf/2510.07979 https://arxiv.org/html/2510.07979
cssd-bot.bsky.social
Ji Yu, Yang shuo, Xu Yuetonghui, Liu Mengmei, Ji Qiang, Han Zerui: ACMID: Automatic Curation of Musical Instrument Dataset for 7-Stem Music Source Separation https://arxiv.org/abs/2510.07840 https://arxiv.org/pdf/2510.07840 https://arxiv.org/html/2510.07840
cssd-bot.bsky.social
Harshvardhan C. Takawale, Nirupam Roy, Phil Brown: INFER : Learning Implicit Neural Frequency Response Fields for Confined Car Cabin https://arxiv.org/abs/2510.07442 https://arxiv.org/pdf/2510.07442 https://arxiv.org/html/2510.07442
cssd-bot.bsky.social
[2025-10-10 Fri (UTC), 7 new articles found for csSD Sound]
Reposted by arXiv cs.SD Sound
eessas-bot.bsky.social
Peter Plantinga, et al.: Comparison of Speech Tasks in Human Expert and Machine Detection of Parkinson's Disease https://arxiv.org/abs/2510.07299 https://arxiv.org/pdf/2510.07299 https://arxiv.org/html/2510.07299
Reposted by arXiv cs.SD Sound
cscl-bot.bsky.social
Zhu Li, Yuqing Zhang, Xiyuan Gao, Shekhar Nayak, Matt Coler: Making Machines Sound Sarcastic: LLM-Enhanced and Retrieval-Guided Sarcastic Speech Synthesis https://arxiv.org/abs/2510.07096 https://arxiv.org/pdf/2510.07096 https://arxiv.org/html/2510.07096
Reposted by arXiv cs.SD Sound
cscl-bot.bsky.social
Vaibhav Srivastav, et al.: Open ASR Leaderboard: Towards Reproducible and Transparent Multilingual and Long-Form Speech Recognition Evaluation https://arxiv.org/abs/2510.06961 https://arxiv.org/pdf/2510.06961 https://arxiv.org/html/2510.06961
Reposted by arXiv cs.SD Sound
eessas-bot.bsky.social
Yun-Ning (Amy), Hung, Igor Pereira, Filip Korzeniowski: Moises-Light: Resource-efficient Band-split U-Net For Music Source Separation https://arxiv.org/abs/2510.06785 https://arxiv.org/pdf/2510.06785 https://arxiv.org/html/2510.06785
cssd-bot.bsky.social
He, Wen, Wang, Wang, Liu, Huang, Lei, Gu, Jin, Yang, Li, Liu, Li, Wang, He, Zhang: AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs https://arxiv.org/abs/2510.07293 https://arxiv.org/pdf/2510.07293 https://arxiv.org/html/2510.07293
cssd-bot.bsky.social
Phuong Tuan Dat, Tran Huy Dat: XLSR-Kanformer: A KAN-Intergrated model for Synthetic Speech Detection https://arxiv.org/abs/2510.06706 https://arxiv.org/pdf/2510.06706 https://arxiv.org/html/2510.06706
cssd-bot.bsky.social
Murat Yasar Baskin: Pitch Estimation With Mean Averaging Smoothed Product Spectrum And Musical Consonance Evaluation Using MASP https://arxiv.org/abs/2510.06625 https://arxiv.org/pdf/2510.06625 https://arxiv.org/html/2510.06625
cssd-bot.bsky.social
Xutao Mao, Ke Li, Cameron Baird, Ezra Xuanru Tao, Dan Lin: Benchmarking Fake Voice Detection in the Fake Voice Generation Arms Race https://arxiv.org/abs/2510.06544 https://arxiv.org/pdf/2510.06544 https://arxiv.org/html/2510.06544
cssd-bot.bsky.social
Mingyang Yao, Ke Chen, Shlomo Dubnov, Taylor Berg-Kirkpatrick: BACHI: Boundary-Aware Symbolic Chord Recognition Through Masked Iterative Decoding on Pop and Classical Music https://arxiv.org/abs/2510.06528 https://arxiv.org/pdf/2510.06528 https://arxiv.org/html/2510.06528
cssd-bot.bsky.social
[2025-10-09 Thu (UTC), 5 new articles found for csSD Sound]
Reposted by arXiv cs.SD Sound
Reposted by arXiv cs.SD Sound
cscl-bot.bsky.social
Rikuto Kotoge, Yuichi Sasaki: Data-efficient Targeted Token-level Preference Optimization for LLM-based Text-to-Speech https://arxiv.org/abs/2510.05799 https://arxiv.org/pdf/2510.05799 https://arxiv.org/html/2510.05799
cssd-bot.bsky.social
Christopher Mitcheltree, Hao Hao Tan, Joshua D. Reiss: Modulation Discovery with Differentiable Digital Signal Processing https://arxiv.org/abs/2510.06204 https://arxiv.org/pdf/2510.06204 https://arxiv.org/html/2510.06204