Lightnews — Scholar-powered news

LightNews

arXiv cs.SD Sound

@cssd-bot.bsky.social

29 followers 1 following 4.4K posts

Unofficial bot by @vele.bsky.social w/ http://github.com/so-okada/bXiv https://arxiv.org/list/cs.SD/new List https://bsky.app/profile/vele.bsky.social/lists/3lim7ccweqo2j ModList https://bsky.app/profile/vele.bsky.social/lists/3lim3qnexsw2g

Posts Media Videos Starter Packs

Reposted by arXiv cs.SD Sound

arXiv eess.AS Audio and Speech Processing @eessas-bot.bsky.social · 1d

Guobin Ma, Jixun Yao, Ziqian Ning, Yuepeng Jiang, Lingxin Xiong, Lei Xie, Pengcheng Zhu: MeanVC: Lightweight and Streaming Zero-Shot Voice Conversion via Mean Flows https://arxiv.org/abs/2510.08392 https://arxiv.org/pdf/2510.08392 https://arxiv.org/html/2510.08392

Reposted by arXiv cs.SD Sound

arXiv cs.CV Computer Vision and Pattern Recognition @cscv-bot.bsky.social · 1d

Harsh Kavediya, Vighnesh Nayak, Bheeshm Sharma, Balamurugan Palaniappan: IsoSignVid2Aud: Sign Language Video to Audio Conversion without Text Intermediaries https://arxiv.org/abs/2510.07837 https://arxiv.org/pdf/2510.07837 https://arxiv.org/html/2510.07837

Reposted by arXiv cs.SD Sound

arXiv cs.MM Multimedia @csmm-bot.bsky.social · 1d

Krish Patel, et al.: AV-EMO-Reasoning: Benchmarking Emotional Reasoning Capabilities in Omni-modal LLMS with Audio-visual Cues https://arxiv.org/abs/2510.07355 https://arxiv.org/pdf/2510.07355 https://arxiv.org/html/2510.07355

Reposted by arXiv cs.SD Sound

arXiv cs.MM Multimedia @csmm-bot.bsky.social · 1d

Han Hu, Dongheng Lin, Qiming Huang, Yuqi Hou, Hyung Jin Chang, Jianbo Jiao: Audio-Visual Separation with Hierarchical Fusion and Representation Alignment https://arxiv.org/abs/2510.07326 https://arxiv.org/pdf/2510.07326 https://arxiv.org/html/2510.07326

arXiv cs.SD Sound @cssd-bot.bsky.social · 1d

Eleonora Mancini, Joan Serr\`a, Paolo Torroni, Yuki Mitsufuji: Leveraging Whisper Embeddings for Audio-based Lyrics Matching https://arxiv.org/abs/2510.08176 https://arxiv.org/pdf/2510.08176 https://arxiv.org/html/2510.08176

arXiv cs.SD Sound @cssd-bot.bsky.social · 1d

Liyang Chen, Hongkai Chen, Yujun Cai, Sifan Li, Qingwen Ye, Yiwei Wang: Detecting and Mitigating Insertion Hallucination in Video-to-Audio Generation https://arxiv.org/abs/2510.08078 https://arxiv.org/pdf/2510.08078 https://arxiv.org/html/2510.08078

arXiv cs.SD Sound @cssd-bot.bsky.social · 1d

Fabio Morreale, Wiebke Hutiri, Joan Serr\`a, Alice Xiang, Yuki Mitsufuji: Attribution-by-design: Ensuring Inference-Time Provenance in Generative Music Systems https://arxiv.org/abs/2510.08062 https://arxiv.org/pdf/2510.08062 https://arxiv.org/html/2510.08062

arXiv cs.SD Sound @cssd-bot.bsky.social · 1d

Honghong Wang, Jing Deng, Rong Zheng: Personality-Enhanced Multimodal Depression Detection in the Elderly https://arxiv.org/abs/2510.08004 https://arxiv.org/pdf/2510.08004 https://arxiv.org/html/2510.08004

arXiv cs.SD Sound @cssd-bot.bsky.social · 1d

Wei Wang, Rong Cao, Yi Guo, Zhengyang Chen, Kuan Chen, Yuanyuan Huo: IntMeanFlow: Few-step Speech Generation with Integral Velocity Distillation https://arxiv.org/abs/2510.07979 https://arxiv.org/pdf/2510.07979 https://arxiv.org/html/2510.07979

arXiv cs.SD Sound @cssd-bot.bsky.social · 1d

Ji Yu, Yang shuo, Xu Yuetonghui, Liu Mengmei, Ji Qiang, Han Zerui: ACMID: Automatic Curation of Musical Instrument Dataset for 7-Stem Music Source Separation https://arxiv.org/abs/2510.07840 https://arxiv.org/pdf/2510.07840 https://arxiv.org/html/2510.07840

arXiv cs.SD Sound @cssd-bot.bsky.social · 1d

Harshvardhan C. Takawale, Nirupam Roy, Phil Brown: INFER : Learning Implicit Neural Frequency Response Fields for Confined Car Cabin https://arxiv.org/abs/2510.07442 https://arxiv.org/pdf/2510.07442 https://arxiv.org/html/2510.07442

arXiv cs.SD Sound @cssd-bot.bsky.social · 1d

[2025-10-10 Fri (UTC), 7 new articles found for csSD Sound]

Reposted by arXiv cs.SD Sound

arXiv eess.AS Audio and Speech Processing @eessas-bot.bsky.social · 2d

Peter Plantinga, et al.: Comparison of Speech Tasks in Human Expert and Machine Detection of Parkinson's Disease https://arxiv.org/abs/2510.07299 https://arxiv.org/pdf/2510.07299 https://arxiv.org/html/2510.07299

Reposted by arXiv cs.SD Sound

arXiv cs.CL Computation and Language @cscl-bot.bsky.social · 2d

Zhu Li, Yuqing Zhang, Xiyuan Gao, Shekhar Nayak, Matt Coler: Making Machines Sound Sarcastic: LLM-Enhanced and Retrieval-Guided Sarcastic Speech Synthesis https://arxiv.org/abs/2510.07096 https://arxiv.org/pdf/2510.07096 https://arxiv.org/html/2510.07096

Reposted by arXiv cs.SD Sound

arXiv cs.CL Computation and Language @cscl-bot.bsky.social · 2d

Vaibhav Srivastav, et al.: Open ASR Leaderboard: Towards Reproducible and Transparent Multilingual and Long-Form Speech Recognition Evaluation https://arxiv.org/abs/2510.06961 https://arxiv.org/pdf/2510.06961 https://arxiv.org/html/2510.06961

Reposted by arXiv cs.SD Sound

arXiv eess.AS Audio and Speech Processing @eessas-bot.bsky.social · 2d

Yun-Ning (Amy), Hung, Igor Pereira, Filip Korzeniowski: Moises-Light: Resource-efficient Band-split U-Net For Music Source Separation https://arxiv.org/abs/2510.06785 https://arxiv.org/pdf/2510.06785 https://arxiv.org/html/2510.06785

arXiv cs.SD Sound @cssd-bot.bsky.social · 2d

He, Wen, Wang, Wang, Liu, Huang, Lei, Gu, Jin, Yang, Li, Liu, Li, Wang, He, Zhang: AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs https://arxiv.org/abs/2510.07293 https://arxiv.org/pdf/2510.07293 https://arxiv.org/html/2510.07293

arXiv cs.SD Sound @cssd-bot.bsky.social · 2d

Phuong Tuan Dat, Tran Huy Dat: XLSR-Kanformer: A KAN-Intergrated model for Synthetic Speech Detection https://arxiv.org/abs/2510.06706 https://arxiv.org/pdf/2510.06706 https://arxiv.org/html/2510.06706

arXiv cs.SD Sound @cssd-bot.bsky.social · 2d

Murat Yasar Baskin: Pitch Estimation With Mean Averaging Smoothed Product Spectrum And Musical Consonance Evaluation Using MASP https://arxiv.org/abs/2510.06625 https://arxiv.org/pdf/2510.06625 https://arxiv.org/html/2510.06625

arXiv cs.SD Sound @cssd-bot.bsky.social · 2d

Xutao Mao, Ke Li, Cameron Baird, Ezra Xuanru Tao, Dan Lin: Benchmarking Fake Voice Detection in the Fake Voice Generation Arms Race https://arxiv.org/abs/2510.06544 https://arxiv.org/pdf/2510.06544 https://arxiv.org/html/2510.06544

arXiv cs.SD Sound @cssd-bot.bsky.social · 2d

Mingyang Yao, Ke Chen, Shlomo Dubnov, Taylor Berg-Kirkpatrick: BACHI: Boundary-Aware Symbolic Chord Recognition Through Masked Iterative Decoding on Pop and Classical Music https://arxiv.org/abs/2510.06528 https://arxiv.org/pdf/2510.06528 https://arxiv.org/html/2510.06528

arXiv cs.SD Sound @cssd-bot.bsky.social · 2d

[2025-10-09 Thu (UTC), 5 new articles found for csSD Sound]

Reposted by arXiv cs.SD Sound

arXiv eess.AS Audio and Speech Processing @eessas-bot.bsky.social · 3d

Mingxuan Wang, Satoshi Nakamura: TokenChain: A Discrete Speech Chain via Semantic Token Modeling https://arxiv.org/abs/2510.06201 https://arxiv.org/pdf/2510.06201 https://arxiv.org/html/2510.06201

Reposted by arXiv cs.SD Sound

arXiv cs.CL Computation and Language @cscl-bot.bsky.social · 3d

Rikuto Kotoge, Yuichi Sasaki: Data-efficient Targeted Token-level Preference Optimization for LLM-based Text-to-Speech https://arxiv.org/abs/2510.05799 https://arxiv.org/pdf/2510.05799 https://arxiv.org/html/2510.05799

arXiv cs.SD Sound @cssd-bot.bsky.social · 3d

Christopher Mitcheltree, Hao Hao Tan, Joshua D. Reiss: Modulation Discovery with Differentiable Digital Signal Processing https://arxiv.org/abs/2510.06204 https://arxiv.org/pdf/2510.06204 https://arxiv.org/html/2510.06204