Olivier Hénaff
@olivierhenaff.bsky.social
93 followers 7 following 5 posts
Working on something new, combining active, multimodal, and memory-augmented learning. Formerly Senior Staff Scientist @GoogleDeepMind, PhD @NYU, @Polytechnique
Posts Media Videos Starter Packs
Pinned
olivierhenaff.bsky.social
After an amazing 6 years at Google DeepMind, I'm thrilled to announce that I'll be starting a new project at the intersection of multimodal foundation modeling, data curation, and human behavior.

If this is of interest to you please reach out!
olivierhenaff.bsky.social
Looking forward to sharing more about what's next, but in the meantime would like to thank Google DeepMind for creating a fantastic environment for foundational research. Excited to now bridge the gap between research and transformational societal impact!
olivierhenaff.bsky.social
I'm extremely grateful for the amazing collaborations at GDM, from multimodal SSL, new evaluations, memory-augmented transformers, and data curation
olivierhenaff.bsky.social
After an amazing 6 years at Google DeepMind, I'm thrilled to announce that I'll be starting a new project at the intersection of multimodal foundation modeling, data curation, and human behavior.

If this is of interest to you please reach out!
olivierhenaff.bsky.social
Active data curation keeps on giving.

This time we enabled the distillation of large multimodal models into much smaller ones, simply by choosing the data they learn from.

Sets a new state of the art in small multimodal models that are very efficient for inference!
vishaalurao.bsky.social
🚀New Paper: Active Data Curation Effectively Distills Multimodal Models
arxiv.org/abs/2411.18674

Smol models are all the rage these days & knowledge distillation (KD) is key for model compression!

We show how data curation can effectively distill to yield SoTA FLOP-efficient {C/Sig}LIPs!!
🧵👇
Reposted by Olivier Hénaff
confusezius.bsky.social
This was an insightful project I worked on at Google DeepMind alongside the amazing @zeynepakata.bsky.social , @dimadamen.bsky.social , @ibalazevic.bsky.social and @olivierhenaff.bsky.social:

👉Language-image pretraining with CLIP or SigLIP is widely used due to strong zero-shot transfer, but ....
Reposted by Olivier Hénaff
ibalazevic.bsky.social
We maintain strong zero-shot transfer of CLIP / SigLIP across model size and data scale, while achieving up to 4x few-shot sample efficiency and up to +16% performance gains!

Fun project with @confusezius.bsky.social, @zeynepakata.bsky.social, @dimadamen.bsky.social and
@olivierhenaff.bsky.social.
confusezius.bsky.social
🤔 Can you turn your vision-language model from a great zero-shot model into a great-at-any-shot generalist?

Turns out you can, and here is how: arxiv.org/abs/2411.15099

Really excited to this work on multimodal pretraining for my first bluesky entry!

🧵 A short and hopefully informative thread:
olivierhenaff.bsky.social
More than zero-shot generalization, few-shot *adaptation* is critical for many applications.

We find simple changes to multimodal pretraining are sufficient to yield outsized gains on a wide range of few-shot tasks.

Congratulations @confusezius.bsky.social on a very successful internship!
confusezius.bsky.social
🤔 Can you turn your vision-language model from a great zero-shot model into a great-at-any-shot generalist?

Turns out you can, and here is how: arxiv.org/abs/2411.15099

Really excited to this work on multimodal pretraining for my first bluesky entry!

🧵 A short and hopefully informative thread: