Volkan Cevher
banner
cevherlions.bsky.social
Volkan Cevher
@cevherlions.bsky.social
Associate Professor of Electrical Engineering, EPFL.
Amazon Scholar (AGI Foundations). IEEE Fellow. ELLIS Fellow.
It turns out that the algorithm is closely related to the continuous greedy algorithm used in submodular optimization.
February 13, 2025 at 5:04 PM
This is a joint work that I am very grateful to have worked on with the exceptionally talented team of Thomas Pethick, @wanyunxie.bsky.social, Kimon Antonakopoulos, Zhenyu Zhu at LIONS@EPFL and @tonysf.bsky.social from CentraleSupélec.
February 13, 2025 at 4:51 PM
🧑‍🍳 We provide a complete cookbook for choosing the right LMO for your architecture: 📚
- Input layers (1-hot vs image)
- Hidden layers (spectral norms)
- Output layers (flexible norm choices)
All with explicit formulas and guidance for when to use each one.
February 13, 2025 at 4:51 PM
🌟 It turns out many popular optimizers (SignSGD, Muon, etc.) are special cases of our framework - just with different norm choices.
Our unified analysis reveals deep connections between seemingly different approaches and provides new insights into why they work 🤔
February 13, 2025 at 4:51 PM
📝 Check out the preprint: arxiv.org/abs/2502.07529
Worst-case convergence analysis with rates, guarantees for learning rate transfer, and practical advice on how to properly choose norms adapted to network geometry, backed by theory 🎯
February 13, 2025 at 4:51 PM
🕵️ It’s “just” stochastic conditional gradient. The secret sauce? Don't treat your weight matrices like they're flat vectors! SCION adapts to the geometry of matrices using LMOs with respect to the correct norm: the induced operator norm.
February 13, 2025 at 4:51 PM
arxiv.org/abs/2502.07529
🚀 Key results:
- Based on conditional gradient method
- Beats Muon+Adam on NanoGPT (tested up to 3B params)
- Zero-shot learning rate transfer across model size
- Uses WAY less memory (just one set of params + half-precision grads)
- Provides explicit norm control
February 13, 2025 at 4:51 PM
Timeo professores machinae discendi et dona ferentes.
January 5, 2025 at 7:08 PM
Reposted by Volkan Cevher
This is joint work with wonderful collaborators @leenacvankadara.bsky.social , @cevherlions.bsky.social and Jin Xu during our time at Amazon.

🧵 10/10
arxiv.org
December 10, 2024 at 7:08 AM