Lightnews — Scholar-powered news

@gaoyuezhou.bsky.social

26 followers 19 following 6 posts

Posts Replies Media Videos

gaoyuezhou.bsky.social

@gaoyuezhou.bsky.social

The object and spatial understanding priors of DINOv2 features enable robust scene understanding, essential for navigation and manipulation tasks. With this prior, DINO-WM outperforms state-of-the-art world models by 45% in downstream task performance on our hardest tasks.

January 31, 2025 at 7:24 PM

gaoyuezhou.bsky.social

@gaoyuezhou.bsky.social

DINO-WM consists of:

1️⃣An out-of-the-box DINOv2 model as the observation model.
2️⃣A causal ViT as the predictor.
3️⃣A decoder that is optional for visualization.

DINO-WM plans entirely in latent space, without the need to reconstruct pixel images.

January 31, 2025 at 7:24 PM

gaoyuezhou.bsky.social

@gaoyuezhou.bsky.social

Unlike previous works that couple world model learning with behavior learning, we train a dynamics-only model and infer actions only at test time. This allows zero-shot goal-reaching by reasoning through the dynamics—no expert demonstrations, no rewards, no online interactions.

January 31, 2025 at 7:24 PM

gaoyuezhou.bsky.social

@gaoyuezhou.bsky.social

Can we extend the power of world models beyond just online model-based learning? Absolutely!

We believe the true potential of world models lies in enabling agents to reason at test time.
Introducing DINO-WM: World Models on Pre-trained Visual Features for Zero-shot Planning.

January 31, 2025 at 7:24 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news