1️⃣An out-of-the-box DINOv2 model as the observation model.
2️⃣A causal ViT as the predictor.
3️⃣A decoder that is optional for visualization.
DINO-WM plans entirely in latent space, without the need to reconstruct pixel images.
1️⃣An out-of-the-box DINOv2 model as the observation model.
2️⃣A causal ViT as the predictor.
3️⃣A decoder that is optional for visualization.
DINO-WM plans entirely in latent space, without the need to reconstruct pixel images.
We believe the true potential of world models lies in enabling agents to reason at test time.
Introducing DINO-WM: World Models on Pre-trained Visual Features for Zero-shot Planning.
We believe the true potential of world models lies in enabling agents to reason at test time.
Introducing DINO-WM: World Models on Pre-trained Visual Features for Zero-shot Planning.