Mehdi S. M. Sajjadi
banner
msajjadi.com
Mehdi S. M. Sajjadi
@msajjadi.com
Research Scientist
Tech Lead & Manager
Google DeepMind
msajjadi.com
Looking forward to it!
EurIPS is coming! 📣 Mark your calendar for Dec. 2-7, 2025 in Copenhagen 📅

EurIPS is a community-organized conference where you can present accepted NeurIPS 2025 papers, endorsed by @neuripsconf.bsky.social and @nordicair.bsky.social and is co-developed by @ellis.eu

eurips.cc
November 2, 2025 at 5:42 AM
Scaling 4D Representations

Self-supervised learning from video does scale! In our latest work, we scaled masked auto-encoding models to 22B params, boosting performance on pose estimation, tracking & more.

Paper: arxiv.org/abs/2412.15212
Code & models: github.com/google-deepmind/representations4d
July 10, 2025 at 11:52 AM
Reposted by Mehdi S. M. Sajjadi
We're very excited to introduce TAPNext: a model that sets a new state-of-art for Tracking Any Point in videos, by formulating the task as Next Token Prediction. For more, see: tap-next.github.io
April 9, 2025 at 2:04 PM
Generative Video Diffusion: does a model trained with this objective learn better features compared to image generation?

We investigated this question and more in our latest work, please check it out!

*From Image to Video: An Empirical Study of Diffusion Representations*
arxiv.org/abs/2502.07001
February 13, 2025 at 4:11 PM
Check out @tkipf.bsky.social's post on MooG, the latest in our line of research on self-supervised neural scene representations learned from raw pixels:

SRT: srt-paper.github.io
OSRT: osrt-paper.github.io
RUST: rust-paper.github.io
DyST: dyst-paper.github.io
MooG: moog-paper.github.io
January 13, 2025 at 3:25 PM
TRecViT: A Recurrent Video Transformer
arxiv.org/abs/2412.14294

Causal, 3× fewer parameters, 12× less memory, 5× higher FLOPs than (non-causal) ViViT, matching / outperforming on Kinetics & SSv2 action recognition.

Code and checkpoints out soon.
January 10, 2025 at 3:44 PM