Lightnews — Scholar-powered news

Valérie Castin

@vcastin.bsky.social

94 followers 53 following 1 posts

PhD student in Machine learning at Ecole Normale Supérieure, Paris My webpage: https://vcastin.github.io/

vcastin.github.io

Posts Media Videos Starter Packs

Reposted by Valérie Castin

François Fleuret @francois.fleuret.org · Apr 28

I asked "on the other platform" what were the most important improvements to the original 2017 transformer.

That was quite popular and here is a synthesis of the responses:

4 43 210

Reposted by Valérie Castin

Pierre Ablin @pierreablin.bsky.social · Feb 5

Excited to share Soup-of-Experts, a new neural network architecture that, for any given specific task, can instantiate in a flash a small model that is very good on it.

Made with ❤️ at Apple

Thanks to my co-authors David Grangier, Angelos Katharopoulos, and Skyler Seto!

arxiv.org/abs/2502.01804

4 12

Reposted by Valérie Castin

Gabriel Peyré @gabrielpeyre.bsky.social · Feb 1

A cute result from Valérie’s work is that Gaussian distributions remain closed under evolution by attentions layers, allowing one to study an ODE in the (mean, covariance) space. In particular, this enables the analysis of the “clustering of tokens” toward low-rank covariances.

2 5

Valérie Castin @vcastin.bsky.social · Jan 31

How do tokens evolve as they are processed by a deep Transformer?

With José A. Carrillo, @gabrielpeyre.bsky.social and @pierreablin.bsky.social, we tackle this in our new preprint: A Unified Perspective on the Dynamics of Deep Transformers arxiv.org/abs/2501.18322

ML and PDE lovers, check it out!

2 16 95

Reposted by Valérie Castin

Pierre Ablin @pierreablin.bsky.social · Jan 24

Excited to see Sigmoid Attention accepted at ICLR 2025 !!

Make attention ~18% faster with a drop-in replacement 🚀

Code:
github.com/apple/ml-sig...

Paper
arxiv.org/abs/2409.04431

Theory, Analysis, and Best Practices for Sigmoid Self-Attention

Attention is a key part of the transformer architecture. It is a sequence-to-sequence mapping that transforms each sequence element into a weighted sum of values. The weights are typically obtained as...

arxiv.org

1 5 28

Reposted by Valérie Castin

Gabriel Peyré @gabrielpeyre.bsky.social · Jan 22

The Mathematics of Artificial Intelligence: In this introductory and highly subjective survey, aimed at a general mathematical audience, I showcase some key theoretical concepts underlying recent advancements in machine learning. arxiv.org/abs/2501.10465

2 40 140

Reposted by Valérie Castin

Carl Allen @carl-allen.bsky.social · Dec 18

Machine learning has made incredible breakthroughs, but our theoretical understanding lags behind.

We take a step towards unravelling its mystery by explaining why the phenomenon of disentanglement arises in generative latent variable models.

Blog post: carl-allen.github.io/theory/2024/...

1 4 18

Reposted by Valérie Castin

Carissa Véliz @carissaveliz.bsky.social · Dec 8

It's like when Google decided to fund itself through ads, but worse, because chatbots are already much more misleading and anthropomorphic than search engines. #AIEthics www.ft.com/content/9350...

OpenAI explores advertising as it steps up revenue drive

ChatGPT maker hires advertising talent from big tech rivals

www.ft.com

4 15 48