Matteo Saponati
@matteosaponati.bsky.social
200 followers 86 following 15 posts
I am a research scientist in Machine Learning and Neuroscience. I am fascinated by life and intelligence, and I like to study complex systems. I love to play music and dance. Postdoctoral Research Scientist @ ETH Zürich ↳ https://matteosaponati.github.io
Posts Media Videos Starter Packs
matteosaponati.bsky.social
really great work! nice to see some feedback control :)
matteosaponati.bsky.social
uh! Very interesting, great work! nice to see that feedback control approaches are getting more famous :)
matteosaponati.bsky.social
Take our short 5-min anonymous survey on the Neuromorphic field’s current state & future:

📋 tinyurl.com/3jkszrnr
🗓️ Open until May 12, 2025

Results will be shared openly and submitted for publication. Your input will help us understand how interdisciplinary trends are shaping the field.
Neuromorphic Questionnaire
This form collects valuable information from the Neuromorphic Community as part of a project led by Matteo Saponati, Laura Kriener, Sebastian Billaudelle, Filippo Moro, and Melika Payvand. The goal is...
tinyurl.com
Reposted by Matteo Saponati
jeromelecoq.bsky.social
How does our brain predict the future? Our review of predictive processing + research program is now on arXiv arxiv.org/abs/2504.09614
50+ neuroscientists distributed across the world worked together to create this unique community project.
Reposted by Matteo Saponati
elisadonati.bsky.social
🌟 Paper out in npj Unconventional Computing!
www.nature.com/articles/s44...

A system built with just a few neurons, yet able to solve a complex task — not by stacking layers or going deeper, but by embracing unconventional thinking.

This is neuromorphic to me!
A neuromorphic multi-scale approach for real-time heart rate and state detection - npj Unconventional Computing
npj Unconventional Computing - A neuromorphic multi-scale approach for real-time heart rate and state detection
www.nature.com
Reposted by Matteo Saponati
giacomoi.bsky.social
I'm extremely proud of this work, which shows how using the physics of analog electronic circuits helps us understand learning and computational principles of cortical neural networks and build efficient neural processing systems that can complement and outperform AI accelerators in edge computing!
biorxivpreprint.bsky.social
A canonical cortical electronic circuit for neuromorphic intelligence https://www.biorxiv.org/content/10.1101/2025.03.28.646019v1
matteosaponati.bsky.social
#preprint #machinelearning #transformers #selfattention #ml #deeplearning
matteosaponati.bsky.social
7/ I would like to thank Pascal Sager for all the training, the writing, the discussion, and whatnot, Pau Vilimelis Aceituno for the hours spent on refining the math, Thilo Stadelmann and Benjamin Grewe for their great contribution and supervision, and all the people at INI.

cheers 💜
a cartoon of two robots standing next to each other and the words `` bye '' .
ALT: a cartoon of two robots standing next to each other and the words `` bye '' .
media.tenor.com
matteosaponati.bsky.social
6/ TL;DR

- Self-attention matrices in Transformers show universal structural differences based on training.
- Bidirectional models → Symmetric self-attention
- Autoregressive models → Directional, column-dominant
- Using symmetry as an inductive bias improves training.

⬇️
matteosaponati.bsky.social
5/ Finally, we leveraged symmetry to improve Transformer training.

- Initializing self-attention matrices symmetrically improves training efficiency for bidirectional models, leading to faster convergence.

This suggest that imposing structures at initialization can enhance training dynamics.

⬇️
matteosaponati.bsky.social
4/ We validate our analysis empirically showing that these patterns consistently emerge different language models and input modalities such as text, vision, and audio models.

- ModernBERT, GPT, LLaMA3, Mistral, etc
- Text, vision, and audio models
- Different model sizes, and architectures

⬇️
matteosaponati.bsky.social
3/ We demonstrate that the self-attention matrices behaves differently for different training objectives:

- Bidirectional training (BERT-style) induces symmetric self-attention structures.
- Autoregressive training (GPT-style) induces directional structures with column dominance.

⬇️
matteosaponati.bsky.social
2/ Self-attention is the backbone of Transformer models, but how does training shape the internal structure of self-attention matrices?

We introduce a mathematical framework to study these matrices and uncover fundamental differences in how they are updated during gradient descent.

⬇️
matteosaponati.bsky.social
1/ I am very excited to announce that our paper "The underlying structures of self-attention: symmetry, directionality, and emergent dynamics in Transformer training" is available on arXiv 💜

arxiv.org/abs/2502.10927

How is information encoded in self-attention matrices?How to interpret it?

⬇️
The underlying structures of self-attention: symmetry, directionality, and emergent dynamics in Transformer training
Self-attention is essential to Transformer architectures, yet how information is embedded in the self-attention matrices and how different objective functions impact this process remains unclear. We p...
arxiv.org
matteosaponati.bsky.social
Hey Dan! I would like to be added :)