Lightnews — Scholar-powered news

Mehdi S. M. Sajjadi

@msajjadi.com

89 followers 83 following 5 posts

Research Scientist Tech Lead & Manager Google DeepMind msajjadi.com

Posts Media Videos Starter Packs

Mehdi S. M. Sajjadi @msajjadi.com · Jul 10

Scaling 4D Representations

Self-supervised learning from video does scale! In our latest work, we scaled masked auto-encoding models to 22B params, boosting performance on pose estimation, tracking & more.

Paper: arxiv.org/abs/2412.15212
Code & models: github.com/google-deepmind/representations4d

8 20

Reposted by Mehdi S. M. Sajjadi

carldoersch.bsky.social @carldoersch.bsky.social · Apr 9

We're very excited to introduce TAPNext: a model that sets a new state-of-art for Tracking Any Point in videos, by formulating the task as Next Token Prediction. For more, see: tap-next.github.io

1 9 23

Mehdi S. M. Sajjadi @msajjadi.com · Feb 13

Generative Video Diffusion: does a model trained with this objective learn better features compared to image generation?

We investigated this question and more in our latest work, please check it out!

*From Image to Video: An Empirical Study of Diffusion Representations*
arxiv.org/abs/2502.07001

Video vs. image diffusion representations

Feature visualization for image and video diffusion

2 6

Mehdi S. M. Sajjadi @msajjadi.com · Jan 13

Check out @tkipf.bsky.social's post on MooG, the latest in our line of research on self-supervised neural scene representations learned from raw pixels:

SRT: srt-paper.github.io
OSRT: osrt-paper.github.io
RUST: rust-paper.github.io
DyST: dyst-paper.github.io
MooG: moog-paper.github.io

3 13

Mehdi S. M. Sajjadi @msajjadi.com · Jan 10

Authors:
Viorica Pătrăucean, Xu Owen He, Joseph Heyward, Chuhan Zhang, Mehdi S. M. Sajjadi, George-Cristian Muraru, Artem Zholus, Mahdi Karami, Ross Goroshin, Yutian Chen, Simon Osindero, João Carreira, Razvan Pascanu

Original post:
www.linkedin.com/posts/vioric...

Viorica Patraucean on LinkedIn: Super excited to share our recent work on designing more efficient video…

Super excited to share our recent work on designing more efficient video models: TRecViT https://lnkd.in/ehh4gGbn alternates SSM blocks (LRUs) that integrate…

www.linkedin.com

1 8

Mehdi S. M. Sajjadi @msajjadi.com · Jan 10

TRecViT: A Recurrent Video Transformer
arxiv.org/abs/2412.14294

Causal, 3× fewer parameters, 12× less memory, 5× higher FLOPs than (non-causal) ViViT, matching / outperforming on Kinetics & SSv2 action recognition.

Code and checkpoints out soon.

1 7 26