Cansu Sancaktar
@cansusancaktar.bsky.social
72 followers 44 following 8 posts
PhD Student @ Max Planck Institute for Intelligent Systems & University of Tübingen | Working on intrinsically motivated open-ended reinforcement learning 🤖
Posts Media Videos Starter Packs
Reposted by Cansu Sancaktar
gmartius.bsky.social
Sergey Levine was just presenting in the Exploration in AI @ #ICML2025 and promoted that exploration needs to be grounded, and that VLMs are a good source ;-) Check our paper below
👇
cansusancaktar.bsky.social
Want to find out more about SENSEI?

🗣️ICML Poster West Exhibition Hall, 16 Jul, 11a.m. PDT, No. W-707
📜arxiv.org/abs/2503.01584
🌐sites.google.com/view/sensei-paper

Work done with @cgumbsch.bsky.social (co-first), @zadaianchuk.bsky.social, @pavelkolevbg.bsky.social and @gmartius.bsky.social

8/8
cansusancaktar.bsky.social
SENSEI can also guide exploration in combination with task rewards. When playing Pokémon Red from pixels, we achieve superior performance to Dreamer (pure task rewards) and Plan2Explore. Only SENSEI manages to obtain the first Gym Badge within 2M steps of exploration 🥇
7/8
cansusancaktar.bsky.social
The agent learns a world model during exploration that can later be re-used to solve downstream tasks. We demonstrate more sample-efficient policy learning with SENSEI compared to exploration via Plan2Explore.

6/8
cansusancaktar.bsky.social
Through the combination of semantic exploration with epistemic uncertainty, the agent unlocks a variety of interesting behaviors during task-free exploration. For example, in Robodesk the agent focuses on interacting with all available objects 🦾
5/8
cansusancaktar.bsky.social
To continuously push the frontier of experience, we combine semantic rewards with epistemic uncertainty deploying an adaptive go-explore strategy. The agent first tries to reach interesting situations (🔝 semantic reward) and then tries new things from there (🔝 uncertainty)
4/8
cansusancaktar.bsky.social
How do we get a signal for meaningful behavior?🤔
Our approach is to use human priors found in foundation models. We extend MOTIF to VLMs: A VLM compares observation pairs, collected through self-supervised exploration. This ranking is distilled into a reward function.
3/8
cansusancaktar.bsky.social
Intrinsically motivated exploration faces a chicken-or-egg problem: how do you know what’s worth exploring before trying it out and experiencing the consequences?
Children solve this by observing and imitating adults. We bring such semantic exploration to artificial agents.
2/8
cansusancaktar.bsky.social
✨Introducing SENSEI✨ We bring semantically meaningful exploration to model-based RL using VLMs.

With intrinsic rewards for novel yet useful behaviors, SENSEI showcases strong exploration in MiniHack, Pokémon Red & Robodesk.

Accepted at ICML 2025🎉

Joint work with @cgumbsch.bsky.social
🧵