Lightnews — Scholar-powered news

gaoyuezhou.bsky.social @gaoyuezhou.bsky.social · Jan 31

Huge thanks to all my collaborators who made this project possible @hengkaipan.bsky.social, @yann-lecun.bsky.social, @lerrelpinto.com
We have open-sourced our code and data. For more details, checkout the paper and website:
Website: dino-wm.github.io
arXiv: arxiv.org/abs/2411.04983

DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning

dino-wm.github.io

2

gaoyuezhou.bsky.social @gaoyuezhou.bsky.social · Jan 31

Overall, DINO-WM takes a step toward bridging the gap between task-agnostic world modeling and reasoning and control, offering promising prospects for generic world models in real-world applications.

1 1

gaoyuezhou.bsky.social @gaoyuezhou.bsky.social · Jan 31

The object and spatial understanding priors of DINOv2 features enable robust scene understanding, essential for navigation and manipulation tasks. With this prior, DINO-WM outperforms state-of-the-art world models by 45% in downstream task performance on our hardest tasks.

1 1

gaoyuezhou.bsky.social @gaoyuezhou.bsky.social · Jan 31

DINO-WM consists of:

1️⃣An out-of-the-box DINOv2 model as the observation model.
2️⃣A causal ViT as the predictor.
3️⃣A decoder that is optional for visualization.

DINO-WM plans entirely in latent space, without the need to reconstruct pixel images.

1 1

gaoyuezhou.bsky.social @gaoyuezhou.bsky.social · Jan 31

Unlike previous works that couple world model learning with behavior learning, we train a dynamics-only model and infer actions only at test time. This allows zero-shot goal-reaching by reasoning through the dynamics—no expert demonstrations, no rewards, no online interactions.

1 1

gaoyuezhou.bsky.social @gaoyuezhou.bsky.social · Jan 31

Can we extend the power of world models beyond just online model-based learning? Absolutely!

We believe the true potential of world models lies in enabling agents to reason at test time.
Introducing DINO-WM: World Models on Pre-trained Visual Features for Zero-shot Planning.

1 8 20