arXiv cs.MM Multimedia
@csmm-bot.bsky.social
3 followers 1 following 1.1K posts
Unofficial bot by @vele.bsky.social w/ http://github.com/so-okada/bXiv https://arxiv.org/list/cs.MM/new List https://bsky.app/profile/vele.bsky.social/lists/3lim7ccweqo2j ModList https://bsky.app/profile/vele.bsky.social/lists/3lim3qnexsw2g
Posts Media Videos Starter Packs
Reposted by arXiv cs.MM Multimedia
cssd-bot.bsky.social
Ping-Yi Chen, Chih-Pin Tan, Yi-Hsuan Yang: Segment-Factorized Full-Song Generation on Symbolic Piano Music https://arxiv.org/abs/2510.05881 https://arxiv.org/pdf/2510.05881 https://arxiv.org/html/2510.05881
Reposted by arXiv cs.MM Multimedia
cssd-bot.bsky.social
Gramaccioni, Marinoni, Grassucci, Cicchetti, Uncini, Comminiello: FoleyGRAM: Video-to-Audio Generation with GRAM-Aligned Multimodal Encoders https://arxiv.org/abs/2510.05829 https://arxiv.org/pdf/2510.05829 https://arxiv.org/html/2510.05829
Reposted by arXiv cs.MM Multimedia
cssd-bot.bsky.social
Christian Marinoni, Riccardo Fosco Gramaccioni, Kazuki Shimada, Takashi Shibuya, Yuki Mitsufuji, Danilo Comminiello: StereoSync: Spatially-Aware Stereo Audio Generation from Video https://arxiv.org/abs/2510.05828 https://arxiv.org/pdf/2510.05828 https://arxiv.org/html/2510.05828
Reposted by arXiv cs.MM Multimedia
cscv-bot.bsky.social
Daniel Gonz\'albez-Biosca, Josep Cabacas-Maso, Carles Ventura, Ismael Benito-Altamirano: When and How to Cut Classical Concerts? A Multimodal Automated Video Editing Approach https://arxiv.org/abs/2510.05661 https://arxiv.org/pdf/2510.05661 https://arxiv.org/html/2510.05661
Reposted by arXiv cs.MM Multimedia
cssd-bot.bsky.social
M. Sajid, et al.: AUREXA-SE: Audio-Visual Unified Representation Exchange Architecture with Cross-Attention and Squeezeformer for Speech Enhancement https://arxiv.org/abs/2510.05295 https://arxiv.org/pdf/2510.05295 https://arxiv.org/html/2510.05295
csmm-bot.bsky.social
Christian Marinoni, Riccardo Fosco Gramaccioni, Eleonora Grassucci, Danilo Comminiello: Controllable Audio-Visual Viewpoint Generation from 360{\deg} Spatial Information https://arxiv.org/abs/2510.06060 https://arxiv.org/pdf/2510.06060 https://arxiv.org/html/2510.06060
csmm-bot.bsky.social
Hengyang Zhou, Yiwei Wei, Jian Yang, Zhenyu Zhang: Towards Robust and Realible Multimodal Fake News Detection with Incomplete Modality https://arxiv.org/abs/2510.05839 https://arxiv.org/pdf/2510.05839 https://arxiv.org/html/2510.05839
csmm-bot.bsky.social
[2025-10-08 Wed (UTC), 2 new articles found for csMM Multimedia]
Reposted by arXiv cs.MM Multimedia
Reposted by arXiv cs.MM Multimedia
cscv-bot.bsky.social
Sarkhoosh, {\O}ye, S{\o}rlie, Vu, Johansen, Midoglu, Kupka, Halvorsen: ExposureEngine: Oriented Logo Detection and Sponsor Visibility Analytics in Sports Broadcasts https://arxiv.org/abs/2510.04739 https://arxiv.org/pdf/2510.04739 https://arxiv.org/html/2510.04739
Reposted by arXiv cs.MM Multimedia
cscv-bot.bsky.social
Luo Cheng, Song Siyang, Yan Siyuan, Yu Zhen, Ge Zongyuan: ReactDiff: Fundamental Multiple Appropriate Facial Reaction Diffusion Model https://arxiv.org/abs/2510.04712 https://arxiv.org/pdf/2510.04712 https://arxiv.org/html/2510.04712
Reposted by arXiv cs.MM Multimedia
cscv-bot.bsky.social
Vrushank Ahire, Aniruddh Muley, Shivam Zample, Siddharth Verma, Pranav Menon, Surbhi Madan, Abhinav Dhall: SFANet: Spatial-Frequency Attention Network for Deepfake Detection https://arxiv.org/abs/2510.04630 https://arxiv.org/pdf/2510.04630 https://arxiv.org/html/2510.04630
Reposted by arXiv cs.MM Multimedia
cssd-bot.bsky.social
Juncheng Wang, Chao Xu, Cheng Yu, Zhe Hu, Haoyu Xie, Guoqi Yu, Lei Shang, Shujun Wang: Language Model Based Text-to-Audio Generation: Anti-Causally Aligned Collaborative Residual Transformers https://arxiv.org/abs/2510.04577 https://arxiv.org/pdf/2510.04577 https://arxiv.org/html/2510.04577
Reposted by arXiv cs.MM Multimedia
cscv-bot.bsky.social
Yuyan Bu, Qiang Sheng, Juan Cao, Shaofei Wang, Peng Qi, Yuhui Shi, Beizhe Hu: Enhancing Fake News Video Detection via LLM-Driven Creative Process Simulation https://arxiv.org/abs/2510.04024 https://arxiv.org/pdf/2510.04024 https://arxiv.org/html/2510.04024
Reposted by arXiv cs.MM Multimedia
csir-bot.bsky.social
Yu-Fei Shih, An-Zi Yen, Hen-Hsen Huang, Hsin-Hsi Chen: Visual Lifelog Retrieval through Captioning-Enhanced Interpretation https://arxiv.org/abs/2510.04010 https://arxiv.org/pdf/2510.04010 https://arxiv.org/html/2510.04010
Reposted by arXiv cs.MM Multimedia
eessiv-bot.bsky.social
Shuoyan Wei, Feng Li, Shengeng Tang, Runmin Cong, Yao Zhao, Meng Wang, Huihui Bai: Towards Robust and Generalizable Continuous Space-Time Video Super-Resolution with Events https://arxiv.org/abs/2510.03833 https://arxiv.org/pdf/2510.03833 https://arxiv.org/html/2510.03833
csmm-bot.bsky.social
Bastian J\"ackl, Ji\v{r}\'i Kruchina, Lucas Joos, Daniel A. Keim, Ladislav Pe\v{s}ka, Jakub Loko\v{c}: Evaluating Keyframe Layouts for Visual Known-Item Search in Homogeneous Collections https://arxiv.org/abs/2510.04396 https://arxiv.org/pdf/2510.04396 https://arxiv.org/html/2510.04396
csmm-bot.bsky.social
Dong Shu, Yanguang Liu, Huopu Zhang, Mengnan Du: FinCall-Surprise: A Large Scale Multi-modal Benchmark for Earning Surprise Prediction https://arxiv.org/abs/2510.03965 https://arxiv.org/pdf/2510.03965 https://arxiv.org/html/2510.03965
csmm-bot.bsky.social
[2025-10-07 Tue (UTC), 2 new articles found for csMM Multimedia]
Reposted by arXiv cs.MM Multimedia
csmm-bot.bsky.social
G\'er\'e L\'eo (Cnam, CEDRIC - VERTIGO), Nicolas Audebert (LaSTIG, IGN, CEDRIC - VERTIGO), Florent Jacquemard (CEDRIC - VERTIGO): Detecting Notational Errors in Digital Music Scores https://arxiv.org/abs/2510.02746 https://arxiv.org/pdf/2510.02746 https://arxiv.org/html/2510.02746
csmm-bot.bsky.social
[2025-10-06 Mon (UTC), 1 new article found for csMM Multimedia]
Reposted by arXiv cs.MM Multimedia
csir-bot.bsky.social
Seungheon Doh, Keunwoo Choi, Juhan Nam: TalkPlay-Tools: Conversational Music Recommendation with LLM Tool Calling https://arxiv.org/abs/2510.01698 https://arxiv.org/pdf/2510.01698 https://arxiv.org/html/2510.01698
Reposted by arXiv cs.MM Multimedia
eessiv-bot.bsky.social
Conall Daly, Darren Ramsook, Anil Kokaram: An Efficient Quality Metric for Video Frame Interpolation Based on Motion-Field Divergence https://arxiv.org/abs/2510.01361 https://arxiv.org/pdf/2510.01361 https://arxiv.org/html/2510.01361
csmm-bot.bsky.social
Donghuo Zeng: Comparing Contrastive and Triplet Loss in Audio-Visual Embedding: Intra-Class Variance and Greediness Analysis https://arxiv.org/abs/2510.02161 https://arxiv.org/pdf/2510.02161 https://arxiv.org/html/2510.02161