Lightnews — Scholar-powered news

Ben Hayes

@ben-hayes.bsky.social

🔊 Follow the links above for audio examples, full training code, and the arXiv pre-print.

June 10, 2025 at 10:13 AM

Ben Hayes

@ben-hayes.bsky.social

🏆 We then apply this method to a dataset of sounds sampled from Surge XT — a feature rich software synthesizer — and find that it dramatically outperforms state-of-the-art baselines on audio reconstruction.

June 10, 2025 at 10:13 AM

Ben Hayes

@ben-hayes.bsky.social

🤔 However, in the case of real synthesizers, we may not know the appropriate symmetries a priori. To allow them to be discovered adaptively, we introduce a technique called Param2Tok, which learns a mapping from synthesizer parameters to model tokens.

June 10, 2025 at 10:13 AM

Ben Hayes

@ben-hayes.bsky.social

🗺️ We can further improve performance by designing a model with equivariance to the appropriate symmetry.

June 10, 2025 at 10:13 AM

Ben Hayes

@ben-hayes.bsky.social

📈 We design a toy task that isolates this phenomenon and find that the presence of permutation symmetry degrades the performance of conventional methods. We then show that a generative approach, which can assign predictive weight to multiple possible solutions, performs considerably better.

June 10, 2025 at 10:13 AM

Ben Hayes

@ben-hayes.bsky.social

‼️ In this work, we argue that the problem is ill-posed: there are multiple sets of parameters that produce any given sound. Further, we show that many of these equivalent solutions are due to intrinsic symmetries of the synthesizer!

June 10, 2025 at 10:13 AM

Ben Hayes

@ben-hayes.bsky.social

🧑‍🔬 Previous approaches have struggled to scale to the full complexity of synthesizers used in modern audio production. Why?

June 10, 2025 at 10:13 AM

Ben Hayes

@ben-hayes.bsky.social

🎛️ Programming synthesizers is a fiddly business, and so a line of work known as "sound matching" has, over the last few decades, sought to answer the question: given an audio signal and a synthesizer, which configuration of parameters best approximates the signal?

June 10, 2025 at 10:13 AM

Ben Hayes

@ben-hayes.bsky.social

🎹 Audio synthesizers are diverse and complex beasts, combining a variety of techniques to produce sounds ranging from familiar to entirely alien.

June 10, 2025 at 10:13 AM

Ben Hayes

@ben-hayes.bsky.social

TL;DR: Predicting synthesizer parameters from audio is hard because multiple parameter configurations can produce the same sound. We design a model that accounts for this and find that it dramatically outperforms previous approaches, and works on production grade, feature rich VST synthesizers.

June 10, 2025 at 10:13 AM

Ben Hayes

@ben-hayes.bsky.social

the best ones combine two or more

March 29, 2025 at 12:23 AM

Ben Hayes

@ben-hayes.bsky.social

Two excellent recent resources:

1. (not strictly a paper) This tutorial from the last ISMIR, courtesy of: geoffroypeeters.github.io/deeplearning...
2. This overview of model-based deep learning for MIR: arxiv.org/abs/2406.11540

Deep Learning 101 for Audio-based MIR — Deep Learning 101 for Audio-based MIR

geoffroypeeters.github.io

February 13, 2025 at 10:15 AM

Ben Hayes

@ben-hayes.bsky.social

I look at it as squeezing a *slightly* better coupling out of the batch.

they do something related here (arxiv.org/abs/2306.15030) with the Kabsch algorithm, but they transform the target samples as they're specifically trying to learn a rotation invariant distribution with an equivariant flow.

Equivariant flow matching

Normalizing flows are a class of deep generative models that are especially interesting for modeling probability distributions in physics, where the exact likelihood of flows allows reweighting to kno...

arxiv.org

January 29, 2025 at 11:02 AM

Ben Hayes

@ben-hayes.bsky.social

haven't crunched through it on paper but my hunch is this works because of the spherical symmetry of the Gaussian dist, so any orthogonal transformation of the batch is exactly as probable (should work for any O(d) invariant distribution if true)

January 29, 2025 at 11:02 AM

Ben Hayes

@ben-hayes.bsky.social

very anecdotally, I've found that when using a normal source distribution, performing orthogonal Procrustes on the source samples (to match the target samples) after minibatch coupling by exact linear assignment (Hungarian algo), seems to speed up convergence by a noticeable amount.

January 29, 2025 at 11:02 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news