Lightnews — Scholar-powered news

gaoyuezhou.bsky.social

@gaoyuezhou.bsky.social

Huge thanks to all my collaborators who made this project possible @hengkaipan.bsky.social, @yann-lecun.bsky.social, @lerrelpinto.com
We have open-sourced our code and data. For more details, checkout the paper and website:
Website: dino-wm.github.io
arXiv: arxiv.org/abs/2411.04983

DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning

dino-wm.github.io

January 31, 2025 at 7:24 PM

gaoyuezhou.bsky.social

@gaoyuezhou.bsky.social

Overall, DINO-WM takes a step toward bridging the gap between task-agnostic world modeling and reasoning and control, offering promising prospects for generic world models in real-world applications.

January 31, 2025 at 7:24 PM

gaoyuezhou.bsky.social

@gaoyuezhou.bsky.social

The object and spatial understanding priors of DINOv2 features enable robust scene understanding, essential for navigation and manipulation tasks. With this prior, DINO-WM outperforms state-of-the-art world models by 45% in downstream task performance on our hardest tasks.

January 31, 2025 at 7:24 PM

gaoyuezhou.bsky.social

@gaoyuezhou.bsky.social

DINO-WM consists of:

1️⃣An out-of-the-box DINOv2 model as the observation model.
2️⃣A causal ViT as the predictor.
3️⃣A decoder that is optional for visualization.

DINO-WM plans entirely in latent space, without the need to reconstruct pixel images.

January 31, 2025 at 7:24 PM

gaoyuezhou.bsky.social

@gaoyuezhou.bsky.social

Unlike previous works that couple world model learning with behavior learning, we train a dynamics-only model and infer actions only at test time. This allows zero-shot goal-reaching by reasoning through the dynamics—no expert demonstrations, no rewards, no online interactions.

January 31, 2025 at 7:24 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news