Pierre Ablin
@pierreablin.bsky.social
250 followers
220 following
4 posts
Research scientist at Apple | machine learning, optimization, language modeling
pierreablin.com
Posts
Media
Videos
Starter Packs
Reposted by Pierre Ablin
Reposted by Pierre Ablin
Fabian Schaipp
@fschaipp.bsky.social
· Feb 5
The Surprising Agreement Between Convex Optimization Theory and Learning-Rate Scheduling for Large Model Training
We show that learning-rate schedules for large model training behave surprisingly similar to a performance bound from non-smooth convex optimization theory. We provide a bound for the constant schedul...
arxiv.org
Reposted by Pierre Ablin
Mathieu Blondel
@mblondel.bsky.social
· Jan 31
Reposted by Pierre Ablin
Reposted by Pierre Ablin
Reposted by Pierre Ablin
Reposted by Pierre Ablin
Reposted by Pierre Ablin
Pierre Ablin
@pierreablin.bsky.social
· Jan 24
Theory, Analysis, and Best Practices for Sigmoid Self-Attention
Attention is a key part of the transformer architecture. It is a sequence-to-sequence mapping that transforms each sequence element into a weighted sum of values. The weights are typically obtained as...
arxiv.org
Reposted by Pierre Ablin
Marco Cuturi
@marcocuturi.bsky.social
· Dec 18
Reposted by Pierre Ablin
Reposted by Pierre Ablin