Lightnews — Scholar-powered news

Volkan Cevher

@cevherlions.bsky.social

970 followers 100 following 12 posts

Associate Professor of Electrical Engineering, EPFL.
Amazon Scholar (AGI Foundations). IEEE Fellow. ELLIS Fellow.

Posts Replies Media Videos

Volkan Cevher

@cevherlions.bsky.social

It turns out that the algorithm is closely related to the continuous greedy algorithm used in submodular optimization.

February 13, 2025 at 5:04 PM

Volkan Cevher

@cevherlions.bsky.social

This is a joint work that I am very grateful to have worked on with the exceptionally talented team of Thomas Pethick, @wanyunxie.bsky.social, Kimon Antonakopoulos, Zhenyu Zhu at LIONS@EPFL and @tonysf.bsky.social from CentraleSupélec.

February 13, 2025 at 4:51 PM

Volkan Cevher

@cevherlions.bsky.social

🧑‍🍳 We provide a complete cookbook for choosing the right LMO for your architecture: 📚
- Input layers (1-hot vs image)
- Hidden layers (spectral norms)
- Output layers (flexible norm choices)
All with explicit formulas and guidance for when to use each one.

February 13, 2025 at 4:51 PM

Volkan Cevher

@cevherlions.bsky.social

🌟 It turns out many popular optimizers (SignSGD, Muon, etc.) are special cases of our framework - just with different norm choices.
Our unified analysis reveals deep connections between seemingly different approaches and provides new insights into why they work 🤔

February 13, 2025 at 4:51 PM

Volkan Cevher

@cevherlions.bsky.social

📝 Check out the preprint: arxiv.org/abs/2502.07529
Worst-case convergence analysis with rates, guarantees for learning rate transfer, and practical advice on how to properly choose norms adapted to network geometry, backed by theory 🎯

February 13, 2025 at 4:51 PM

Volkan Cevher

@cevherlions.bsky.social

🕵️ It’s “just” stochastic conditional gradient. The secret sauce? Don't treat your weight matrices like they're flat vectors! SCION adapts to the geometry of matrices using LMOs with respect to the correct norm: the induced operator norm.

February 13, 2025 at 4:51 PM

Volkan Cevher

@cevherlions.bsky.social

arxiv.org/abs/2502.07529
🚀 Key results:
- Based on conditional gradient method
- Beats Muon+Adam on NanoGPT (tested up to 3B params)
- Zero-shot learning rate transfer across model size
- Uses WAY less memory (just one set of params + half-precision grads)
- Provides explicit norm control

February 13, 2025 at 4:51 PM

Volkan Cevher

@cevherlions.bsky.social

Timeo professores machinae discendi et dona ferentes.

January 5, 2025 at 7:08 PM

Reposted by Volkan Cevher

Moritz Haas

@mohaas.bsky.social

This is joint work with wonderful collaborators @leenacvankadara.bsky.social , @cevherlions.bsky.social and Jin Xu during our time at Amazon.

🧵 10/10

arxiv.org

December 10, 2024 at 7:08 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news