Volkan Cevher
@cevherlions.bsky.social
970 followers 100 following 12 posts
Associate Professor of Electrical Engineering, EPFL. Amazon Scholar (AGI Foundations). IEEE Fellow. ELLIS Fellow.
Posts Media Videos Starter Packs
cevherlions.bsky.social
It turns out that the algorithm is closely related to the continuous greedy algorithm used in submodular optimization.
Reposted by Volkan Cevher
tonysf.bsky.social
We also provide the first convergence rate analysis that I'm aware of for stochastic unconstrained Frank-Wolfe (i.e., without weight decay), which directly covers the muon optimizer (and much more)!
cevherlions.bsky.social
🔥 Want to train large neural networks WITHOUT Adam while using less memory and getting better results? ⚡
Check out SCION: a new optimizer that adapts to the geometry of your problem using norm-constrained linear minimization oracles (LMOs): 🧵👇
cevherlions.bsky.social
This is a joint work that I am very grateful to have worked on with the exceptionally talented team of Thomas Pethick, @wanyunxie.bsky.social, Kimon Antonakopoulos, Zhenyu Zhu at LIONS@EPFL and @tonysf.bsky.social from CentraleSupélec.
cevherlions.bsky.social
🧑‍🍳 We provide a complete cookbook for choosing the right LMO for your architecture: 📚
- Input layers (1-hot vs image)
- Hidden layers (spectral norms)
- Output layers (flexible norm choices)
All with explicit formulas and guidance for when to use each one.
cevherlions.bsky.social
🌟 It turns out many popular optimizers (SignSGD, Muon, etc.) are special cases of our framework - just with different norm choices.
Our unified analysis reveals deep connections between seemingly different approaches and provides new insights into why they work 🤔
cevherlions.bsky.social
📝 Check out the preprint: arxiv.org/abs/2502.07529
Worst-case convergence analysis with rates, guarantees for learning rate transfer, and practical advice on how to properly choose norms adapted to network geometry, backed by theory 🎯
cevherlions.bsky.social
🕵️ It’s “just” stochastic conditional gradient. The secret sauce? Don't treat your weight matrices like they're flat vectors! SCION adapts to the geometry of matrices using LMOs with respect to the correct norm: the induced operator norm.
cevherlions.bsky.social
arxiv.org/abs/2502.07529
🚀 Key results:
- Based on conditional gradient method
- Beats Muon+Adam on NanoGPT (tested up to 3B params)
- Zero-shot learning rate transfer across model size
- Uses WAY less memory (just one set of params + half-precision grads)
- Provides explicit norm control
Hyper-parameter transfer on NanonGPT.
cevherlions.bsky.social
🔥 Want to train large neural networks WITHOUT Adam while using less memory and getting better results? ⚡
Check out SCION: a new optimizer that adapts to the geometry of your problem using norm-constrained linear minimization oracles (LMOs): 🧵👇
cevherlions.bsky.social
It was a fun panel. Quite informative.
epfl-ai-center.bsky.social
A thought-provoking panel with Scarlet of the EPFL AI Center, @cevherlions.bsky.social and Thomas Schneider from OFCOM - looking at the state of regulations, the business case for GenAI & the opportunities for Swiss research & innovation... a fine balance between talent, data and hardware. #AMLD
cevherlions.bsky.social
Timeo professores machinae discendi et dona ferentes.
cevherlions.bsky.social
Timeo professores machinae discendi et dona ferentes.
Reposted by Volkan Cevher
eugenevinitsky.bsky.social
An illustrated guide to never learning anything
Reposted by Volkan Cevher
wanyunxie.bsky.social
We'll present "SAMPa: Sharpness-Aware Minimization Parallelized" at #NeurIPS24 on Thursday! This is joint work with Thomas Pethick and Volkan Cevher.
📍 Find us at Poster #5904 from 16:30 in the West Ballroom.
Reposted by Volkan Cevher
mohaas.bsky.social
Stable model scaling with width-independent dynamics?

Thrilled to present 2 papers at #NeurIPS 🎉 that study width-scaling in Sharpness Aware Minimization (SAM) (Th 16:30, #2104) and in Mamba (Fr 11, #7110). Our scaling rules stabilize training and transfer optimal hyperparams across scales.

🧵 1/10
Reposted by Volkan Cevher
mohaas.bsky.social
This is joint work with wonderful collaborators @leenacvankadara.bsky.social , @cevherlions.bsky.social and Jin Xu during our time at Amazon.

🧵 10/10
arxiv.org
Reposted by Volkan Cevher
docmilanfar.bsky.social
Reviewers take note:
57% of people rejected their own argument when they thought it was someone else's. So take it easy with the criticism.