Tony S.F.
@tonysf.bsky.social
240 followers 130 following 26 posts
Ass. Prof. of AI at CentraleSupélec in the Centre pour la Vision Numérique.
Posts Media Videos Starter Packs
Pinned
tonysf.bsky.social
That problem is smooth.

And if it's not, it is differentiable everywhere.

And if it's not, we avoid the kinks almost surely.

And if we don't, what is computed is a subgradient.

And if it's not, it approximates one.

And if that's not true, who cares? The loss went down.
Reposted by Tony S.F.
samuelvaiter.com
Now accepted at #NeurIPS2025 :)
samuelvaiter.com
📣 New preprint 📣

**Differentiable Generalized Sliced Wasserstein Plans**

w/
L. Chapel
@rtavenar.bsky.social

We propose a Generalized Sliced Wasserstein method that provides an approximated transport plan and which admits a differentiable approximation.

arxiv.org/abs/2505.22049 1/5
tonysf.bsky.social
In conditional gradient sliding you are using the conditional gradient algorithm to "chase" the projected Nesterov algorithm. Instead of computing the projection, you do some conditional gradient steps to approximate it. I wonder if you can do the same with FISTA/accelerated proximal point alg ?
tonysf.bsky.social
nerd sniped by the bayesian learning rule again and still unsatisfied... ok, so you can explain a lot of DL optimization algorithms with certain approximations of various posteriors but that's kind of kicking the can down the road - the question becomes: why those approximations instead of others?
tonysf.bsky.social
My paper on Generalized Gradient Norm Clipping & Non-Euclidean (L0, L1)-Smoothness (together with collaborators from EPFL) was accepted as an oral at NeurIPS! We extend the theory for our Scion algorithm to include gradient clipping. Read about it here arxiv.org/abs/2506.01913
tonysf.bsky.social
Don’t most people use the word increasing in everyday life to mean strictly increasing? If your boss said your salary was increasing next year and then it stayed the same, wouldn’t you object to the use of increasing?
tonysf.bsky.social
My ANR JCJC Grant was funded! 🎉
tonysf.bsky.social
The french branch of beyond meat missed the mark by not naming themselves beyond viande
tonysf.bsky.social
Gorillas/yetis doing outdoors survival content
Reposted by Tony S.F.
ntamle.bsky.social
🎉🎉🎉Our paper "Inexact subgradient methods for semialgebraic
functions" is accepted at Mathematical Programming !! This is a joint work with Jerome Bolte, Eric Moulines and Edouard Pauwels where we study a subgradient method with errors for nonconvex nonsmooth functions.

arxiv.org/pdf/2404.19517
arxiv.org
tonysf.bsky.social
mean-field this, mean-field that, how about a nice field for once
tonysf.bsky.social
Nope, I mean it's relatable to have to defend your choice to study frank-wolfe instead of proximal methods or whatever.
tonysf.bsky.social
Canon event for frank-wolfe researchers
tonysf.bsky.social
Doing analysis of stochastic Frank-Wolfe and steepest descent/generalized matching pursuit variants at the same time is useful. If your argument/setup isn't symmetric for both then something is probably wrong or you have formulated/parameterized things incorrectly.
tonysf.bsky.social
Which lab is training a language model that can fix the latex for my beamer slides so that things don't shift a few pixels when I go to the next \onslide within a slide???
Reposted by Tony S.F.
alucchi.bsky.social
Our research group in the department of Mathematics and Computer Science at the University of Basel (Switzerland) is looking for several PhD candidates and one post-doc who have a theoretical background in optimization and machine learning or practical experience in the field of reasoning.
Universität Basel: Post-doc position in the field of Optimization and Deep Learning Theory
The Optimization of Machine Learning Systems Group (Prof. A. Lucchi) at the Department of Mathematics and Computer Science at the University of Basel is looking for one post-doctorate to work in the a...
jobs.unibas.ch
tonysf.bsky.social
Really not a fan of people's "creative" paper titles. A few people are able to do it well/tastefully but it inspires so many bad/cringe titles and it's worse for keyword searching.
tonysf.bsky.social
That problem is smooth.

And if it's not, it is differentiable everywhere.

And if it's not, we avoid the kinks almost surely.

And if we don't, what is computed is a subgradient.

And if it's not, it approximates one.

And if that's not true, who cares? The loss went down.
tonysf.bsky.social
If cover to the cover is the requirement then I don't think I can say I've read any books. I made it to curve selection and that was enough for me but I liked how it was written (it helps to also have Edouard explaining everything, I recommend this way the most).
tonysf.bsky.social
No love for van den dries?
Reposted by Tony S.F.
samuelvaiter.com
Tarski—Seidenberg theorem claims that semialgebraic sets on 𝐑 are stable by projection. perso.univ-rennes1.fr/michel.coste...
tonysf.bsky.social
We also provide the first convergence rate analysis that I'm aware of for stochastic unconstrained Frank-Wolfe (i.e., without weight decay), which directly covers the muon optimizer (and much more)!
cevherlions.bsky.social
🔥 Want to train large neural networks WITHOUT Adam while using less memory and getting better results? ⚡
Check out SCION: a new optimizer that adapts to the geometry of your problem using norm-constrained linear minimization oracles (LMOs): 🧵👇