Ambroise Odonnat
@ambroiseodt.bsky.social
84 followers 130 following 36 posts
Ph.D. student in Machine Learning at Inria. Website: https://ambroiseodt.github.io/ Blog: https://logb-research.github.io
Posts Media Videos Starter Packs
Pinned
ambroiseodt.bsky.social
🚨So, you want to predict your model's performance at test time?🚨

💡Our NeurIPS 2024 paper proposes 𝐌𝐚𝐍𝐨, a training-free and SOTA approach!

📑 arxiv.org/pdf/2405.18979
🖥️https://github.com/Renchunzi-Xie/MaNo

1/🧵(A surprise at the end!)
Reposted by Ambroise Odonnat
rflamary.bsky.social
SKADA-Bench : Benchmarking Unsupervised Domain Adaptation Methods with Realistic Validation On Diverse Modalities, has been published published in TMLR today 🚀. It was a huge team effort to design (and publish) an open source fully reproducible DA benchmark 🧵1/n. openreview.net/forum?id=k9F...
SKADA-Bench: Benchmarking Unsupervised Domain Adaptation Methods...
Unsupervised Domain Adaptation (DA) consists of adapting a model trained on a labeled source domain to perform well on an unlabeled target domain with some data distribution shift. While many...
openreview.net
ambroiseodt.bsky.social
🚀 We are happy to organize the BERT²S workshop @neuripsconf.bsky.social 2025 on Recent Advances in Time Series Foundation Models.
🌐 berts-workshop.github.io
📜Submit by August 22
🎓Speakers and panelists: Chenghao Liu, Mingsheng Long, Zoe Piran, Danielle C. Maddix, Ameet Talwalkar, Qingsong Wen
ambroiseodt.bsky.social
🚀 Very happy to be presenting Large Language Models as Markov Chains at Cohere Labs on June 19th at 6 pm CET (Paris time)!!

Huge thanks to Andrej Jovanović @cohere.com @cohereforai.bsky.social for the invitation 🤗

Paper: arxiv.org/pdf/2410.02724
Learn more: cohere.com/events/Coher...
Reposted by Ambroise Odonnat
tgnassou.bsky.social
Skada Sprint Alert: Contribute to Domain Adaptation in Python

📖 Machine learning models often fail when the data distribution changes between training and testing. That’s where Domain Adaptation comes in — helping models stay reliable across domains.
ambroiseodt.bsky.social
🤗Thanks a lot @haeggee.bsky.social and @mjaggi.bsky.social for having me in the MLO group at EPFL @icepfl.bsky.social to present "Large Language Models as Markov Chains".

Slides are available on my website (link in thread).

🎉 New experiments with Llama and Gemma models in the updated paper!
ambroiseodt.bsky.social
🤗 Very happy to have (humbly) contributed to this work!

This is a collab with the usual open-source suspects from Inria, @polytechniqueparis.bsky.social and @univparissaclay.bsky.social.

Check it out if you are interested in open-source reproducible research 😇
tgnassou.bsky.social
🚀 I’m pleased to announce a new preprint!

"SKADA-Bench: Benchmarking Unsupervised Domain Adaptation Methods with Realistic Validation On Diverse Modalities"

📢 Check it out & contribute!
📜 Paper: arxiv.org/abs/2407.11676
💻 Code: github.com/scikit-adapt...
Reposted by Ambroise Odonnat
ozekri.bsky.social
🚀 Policy gradient methods like DeepSeek’s GRPO are great for finetuning LLMs via RLHF.

But what happens when we swap autoregressive generation for discrete diffusion, a rising architecture promising faster & more controllable LLMs?

Introducing SEPO !

📑 arxiv.org/pdf/2502.01384

🧵👇
ambroiseodt.bsky.social
Finally, I can't thank you enough Wes and @viviencabannes.bsky.social for this collab: you are a rare combination of super-smart and fun to work with!

Hopefully, more to come soon🤠

"Moi, si je devais résumer ma vie aujourd’hui avec vous, je dirais que c’est d’abord des rencontres."
ambroiseodt.bsky.social
We want to thank Elvis Dohmatob, Eshaan Nichani, @giupaolo.bsky.social , Faniriana Rakoto Endor, and Ievgen Redko for fruitful discussions during the elaboration of this work 😇
ambroiseodt.bsky.social
From the theoretical side, we show that clustering heads can be learned via gradient descent and provide theoretical insights into the two-stage learning observed in practice.
6/🧵
ambroiseodt.bsky.social
We investigate loss spikes, suggesting potential strategies for mitigation, which could lead to more stable training processes. We also peek into the transferability of circuits to showcase the usefulness of curriculum learning and data curation.
5/🧵
ambroiseodt.bsky.social
In the second, we unveil "𝑪𝒍𝒖𝒔𝒕𝒆𝒓𝒊𝒏𝒈 𝑯𝒆𝒂𝒅𝒔", circuits that learn the invariance of the task. Their training dynamic is in two phases: 1) clustering of the attention embeddings according to invariance and 2) classifier fitting.
4/🧵
ambroiseodt.bsky.social
In the first paper, we show how GD (gradient descent) reinforces useful circuits in transformers while pruning others to create sub-circuits that help solve complex tasks by breaking them down into intermediate reasoning steps.

3/🧵
ambroiseodt.bsky.social
We consider the 𝒔𝒑𝒂𝒓𝒔𝒆 𝒎𝒐𝒅𝒖𝒍𝒂𝒓 𝒂𝒅𝒅𝒊𝒕𝒊𝒐𝒏 problem where the inputs are sequences of L tokens in the ring of integers modulo p and the corresponding targets are the sum of the first k terms modulo p. Formally, we aim to learn the mapping:

2/🧵
ambroiseodt.bsky.social
🚀Proud to share our work on the training dynamics in Transformers with Wassim Bouaziz & @viviencabannes.bsky.social @Inria @MetaAI

📝Easing Optimization Paths arxiv.org/pdf/2501.02362 (accepted @ICASSP 2025 🥳)

📝Clustering Heads 🔥https://arxiv.org/pdf/2410.24050

🖥️ github.com/facebookrese...

1/🧵
ambroiseodt.bsky.social
🎤Presenting our work on Unsupervised Accuracy Estimation at #NeurIPS2024 this week!

✋🏾Poster Session 4 West - on Thu. at 4:30 pm

📍 Poster #4310 - East Exhibit Hall A-C

DM me if you'd like to chat :)
ambroiseodt.bsky.social
Checkout the new version of this awesome domain adaptation library! So nice to work with such good people 🤗
tgnassou.bsky.social
🚀 Skada v0.4.0 is out!

Skada is an open-source Python library built for domain adaptation (DA), helping machine learning models to adapt to distribution shifts.
Github: github.com/scikit-adapt...
Doc: scikit-adaptation.github.io
DOI: doi.org/10.5281/zeno...
Installation: `pip install skada`
ambroiseodt.bsky.social
Hi @vickiboykis.com, thanks for your interest. Don’t hesitate if you have any questions on the paper, we would be happy to help with @ozekri.bsky.social :)
ambroiseodt.bsky.social
Ahah, thanks, still a lot to learn before that 😅
ambroiseodt.bsky.social
🤗This is joint work with Renchunzi Xie, Vasilii Feofanov, Weijian Deng, Jianfeng Zhang, and Bo An.

Finally, I want to thank @ramealexandre.bsky.social Youssef Attia El Hili for fruitful discussions during the elaboration of this work.

🧵/🧵
ambroiseodt.bsky.social
🥳Finally the awaited surprise!
Our work includes a result akin to the one of
@petar-v.bsky.social in “softmax is not sharp enough” (arxiv.org/pdf/2410.01104). We discuss its implications in the context of unsupervised accuracy estimation.

12/🧵