Lightnews — Scholar-powered news

Stefan Lattner

@stefanlattner.bsky.social

Zero-shot Musical Stem Retrieval with Joint-Embedding Predictive Architectures
A. Riou, S. Lattner, A. Gagneré, G. Hadjeres, S. Lattner, G. Peeters
Tuesday, April 8 ( pm): Music analysis I

April 8, 2025 at 6:40 AM

Stefan Lattner

@stefanlattner.bsky.social

Hybrid Losses for Hierarchical Embedding Learning
H. Tian, S. Lattner, B. McFee, C. Saitis
Tuesday, April 8 ( pm): Music analysis I

April 8, 2025 at 6:40 AM

Stefan Lattner

@stefanlattner.bsky.social

Accompaniment Prompt Adherence: A Measure for Evaluating Music Accompaniment Systems
M. Grachten, J. Nistal
Friday, April 11 ( am): Applied Signal Processing Systems

Estimating Musical Surprisal in Audio
M. Bjare, G. Cantisani, S. Lattner and G. Widmer
Wednesday, April 9 ( am): Music analysis II

April 8, 2025 at 6:40 AM

Stefan Lattner

@stefanlattner.bsky.social

We also show that our IC estimates can help predict EEG measurements. 💆‍♀️

Surprisal can be used for segment boundary detection and to simulate the information processing of a listener. 🎶 🧠

📜 Link to the paper: arxiv.org/pdf/2501.07474

Model weights are soon to come! 🏋️

💫✨ #SonyCSLMusic 💫✨

arxiv.org

January 21, 2025 at 3:26 PM

Stefan Lattner

@stefanlattner.bsky.social

3/ Results show:

- Higher fidelity (FAD ↓ by 20%)
- Better adherence to text & audio prompts (APA ↑)
- Faster generation with 5-step inference!

AI-assisted music production. 🎼💡 Let us know your thoughts!

Congrats to the authors Javier Nistal and Marco Pasini!

#AI #MusicGeneration #Transformers

January 20, 2025 at 1:44 PM

Stefan Lattner

@stefanlattner.bsky.social

2/ 🎤 What’s new?

- Stereo output with superior fidelity
- Bridging the gap in Text-to-audio CLAP embeddings 📝🎵
- Faster inference using a consistency framework ⚡

Audio examples: sonycslparis.github.io/improved_dar/ 🎶👂

Improving Musical Accompaniment Co-creation via Diffusion Transformers

sonycslparis.github.io

January 20, 2025 at 1:43 PM

Stefan Lattner

@stefanlattner.bsky.social

1/ Building on Diff-A-Riff, we’ve upgraded to a stereo-capable autoencoder & replaced the U-Net with a Diffusion Transformer (DiT) to improve quality, diversity, and control. 🎧📈 Plus, our model generates high-quality audio with fewer denoising steps. 🚀

January 20, 2025 at 1:42 PM

Stefan Lattner

@stefanlattner.bsky.social

Hybrid Losses for Hierarchical Embedding Learning
H. Tian, S. Lattner, B. McFee, C. Saitis

Congrats to the authors!

January 14, 2025 at 12:55 PM

Stefan Lattner

@stefanlattner.bsky.social

Music2Latent2: Audio Compression with Summary Embeddings and Autoregressive Decoding
M. Pasini, S. Lattner, G. Fazekas

Zero-shot Musical Stem Retrieval with Joint-Embedding Predictive Architectures
A. Riou, S. Lattner, A. Gagneré, G. Hadjeres, S. Lattner, G. Peeters

January 14, 2025 at 12:54 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news