Ben Walker
banner
benjamincwalker.bsky.social
Ben Walker
@benjamincwalker.bsky.social
🎓 Machine Learning PhD
🌍 Mathematical Institute, Oxford
📈 Researching Neural Differential Equations & Rough Path Theory
📧 Email: [email protected]
🌐 GitHub: Benjamin-Walker
Alternative Title: ‘A Timely Series of Talks on Time Series.’

Was too proud of this one so had to post it somewhere!
March 3, 2025 at 5:54 PM
Huge thanks to my incredible co-authors Nicola Cirone, Antonio Orvieto, Cristopher Salvi, and Terry Lyons!

#NeurIPS2024 #MachineLearning #DeepLearning #StateSpaceModels

🧵6/6
November 23, 2024 at 9:04 AM
S4, Mamba, and Transformers need 4 blocks just to compose 12 permutations!

In contrast, using a dense state-transition matrix (IDS4/Linear CDE) or a non-linear state-transition (RNN) allows for state-tracking with only 1 layer.

🧵5/6
November 23, 2024 at 9:04 AM
An excellent empirical example of this limited capacity is the A5 benchmark, from “The Illusion of State in State-Space Models” by Merrill et al.

The benchmark tests state-tracking, a crucial ability for tasks involving permutation composition like chess.

The results? 👇

🧵4/6
November 23, 2024 at 9:04 AM
We rigorously show that Mamba’s selectivity mechanism boosts expressiveness.

However, we also show that using a diagonal state-transition matrix—while drastically reducing computational costs—also significantly limits the model's capacity.

🧵3/6
November 23, 2024 at 9:04 AM
In this paper, we introduce a unified framework for state-space models using Rough Path Theory, providing a rigorous theoretical foundation for why the Mamba recurrence outperforms other SSMs—and precisely where their expressiveness may be limited.

🧵2/6
November 23, 2024 at 9:04 AM