Lightnews — Scholar-powered news

Reposted by Tony S.F.

Samuel Vaiter @samuelvaiter.com · 9d

Now accepted at #NeurIPS2025 :)

Samuel Vaiter @samuelvaiter.com · Jun 2

📣 New preprint 📣

**Differentiable Generalized Sliced Wasserstein Plans**

w/
L. Chapel
@rtavenar.bsky.social

We propose a Generalized Sliced Wasserstein method that provides an approximated transport plan and which admits a differentiable approximation.

arxiv.org/abs/2505.22049 1/5

2 6 18

Tony S.F. @tonysf.bsky.social · 13d

In conditional gradient sliding you are using the conditional gradient algorithm to "chase" the projected Nesterov algorithm. Instead of computing the projection, you do some conditional gradient steps to approximate it. I wonder if you can do the same with FISTA/accelerated proximal point alg ?

Tony S.F. @tonysf.bsky.social · 14d

Your wish is granted (by Sebastian Pokutta) www.pokutta.com/blog/littles...

Little’s Law and Conference Reviewing: the Queueing Perspective

TL;DR: This is the queueing model perspective of the “paper pool” conference reviewing model with math and numbers based on Little’s Law. Think of it as supplementary material to the post on David’s b...

www.pokutta.com

1 2

Tony S.F. @tonysf.bsky.social · 19d

nerd sniped by the bayesian learning rule again and still unsatisfied... ok, so you can explain a lot of DL optimization algorithms with certain approximations of various posteriors but that's kind of kicking the can down the road - the question becomes: why those approximations instead of others?

1

Tony S.F. @tonysf.bsky.social · 19d

My paper on Generalized Gradient Norm Clipping & Non-Euclidean (L0, L1)-Smoothness (together with collaborators from EPFL) was accepted as an oral at NeurIPS! We extend the theory for our Scion algorithm to include gradient clipping. Read about it here arxiv.org/abs/2506.01913

3 14

Tony S.F. @tonysf.bsky.social · 27d

Don’t most people use the word increasing in everyday life to mean strictly increasing? If your boss said your salary was increasing next year and then it stayed the same, wouldn’t you object to the use of increasing?

2

Tony S.F. @tonysf.bsky.social · Jul 30

Found this on r/math: priority dispute in pure math that has come to a head, arxiv.org/abs/2507.20816

History of the canonical basis and crystal basis

The history of the canonical basis and crystal basis of a quantized enveloping algebra and its representations is presented

arxiv.org

Tony S.F. @tonysf.bsky.social · Jul 22

Zed

1

Tony S.F. @tonysf.bsky.social · Jul 17

My ANR JCJC Grant was funded! 🎉

1 6

Tony S.F. @tonysf.bsky.social · Jul 6

The french branch of beyond meat missed the mark by not naming themselves beyond viande

1

Tony S.F. @tonysf.bsky.social · Jun 16

Frank Wolfe!

1

Tony S.F. @tonysf.bsky.social · Jun 11

Gorillas/yetis doing outdoors survival content

1 1

Reposted by Tony S.F.

Tam Le @ntamle.bsky.social · Jun 5

🎉🎉🎉Our paper "Inexact subgradient methods for semialgebraic
functions" is accepted at Mathematical Programming !! This is a joint work with Jerome Bolte, Eric Moulines and Edouard Pauwels where we study a subgradient method with errors for nonconvex nonsmooth functions.

arxiv.org/pdf/2404.19517

arxiv.org

3 3 8

Tony S.F. @tonysf.bsky.social · May 28

mean-field this, mean-field that, how about a nice field for once

1 2

Tony S.F. @tonysf.bsky.social · May 19

Nope, I mean it's relatable to have to defend your choice to study frank-wolfe instead of proximal methods or whatever.

1

Tony S.F. @tonysf.bsky.social · May 19

Canon event for frank-wolfe researchers

1 1

Tony S.F. @tonysf.bsky.social · May 8

Doing analysis of stochastic Frank-Wolfe and steepest descent/generalized matching pursuit variants at the same time is useful. If your argument/setup isn't symmetric for both then something is probably wrong or you have formulated/parameterized things incorrectly.

1

Tony S.F. @tonysf.bsky.social · May 6

Which lab is training a language model that can fix the latex for my beamer slides so that things don't shift a few pixels when I go to the next \onslide within a slide???

2

Reposted by Tony S.F.

Aurelien Lucchi @alucchi.bsky.social · May 3

Our research group in the department of Mathematics and Computer Science at the University of Basel (Switzerland) is looking for several PhD candidates and one post-doc who have a theoretical background in optimization and machine learning or practical experience in the field of reasoning.

Universität Basel: Post-doc position in the field of Optimization and Deep Learning Theory

The Optimization of Machine Learning Systems Group (Prof. A. Lucchi) at the Department of Mathematics and Computer Science at the University of Basel is looking for one post-doctorate to work in the a...

jobs.unibas.ch

1 6 9

Tony S.F. @tonysf.bsky.social · May 2

Really not a fan of people's "creative" paper titles. A few people are able to do it well/tastefully but it inspires so many bad/cringe titles and it's worse for keyword searching.

4

Tony S.F. @tonysf.bsky.social · Apr 27

That problem is smooth.

And if it's not, it is differentiable everywhere.

And if it's not, we avoid the kinks almost surely.

And if we don't, what is computed is a subgradient.

And if it's not, it approximates one.

And if that's not true, who cares? The loss went down.

4 2 12

Tony S.F. @tonysf.bsky.social · Mar 30

If cover to the cover is the requirement then I don't think I can say I've read any books. I made it to curve selection and that was enough for me but I liked how it was written (it helps to also have Edouard explaining everything, I recommend this way the most).

1

Tony S.F. @tonysf.bsky.social · Mar 29

No love for van den dries?

1 1

Reposted by Tony S.F.

Samuel Vaiter @samuelvaiter.com · Mar 7

Tarski—Seidenberg theorem claims that semialgebraic sets on 𝐑 are stable by projection. perso.univ-rennes1.fr/michel.coste...

1 6

Tony S.F. @tonysf.bsky.social · Feb 13

We also provide the first convergence rate analysis that I'm aware of for stochastic unconstrained Frank-Wolfe (i.e., without weight decay), which directly covers the muon optimizer (and much more)!

Volkan Cevher @cevherlions.bsky.social · Feb 13

🔥 Want to train large neural networks WITHOUT Adam while using less memory and getting better results? ⚡
Check out SCION: a new optimizer that adapts to the geometry of your problem using norm-constrained linear minimization oracles (LMOs): 🧵👇

1 1 10