Lightnews — Scholar-powered news

Vivek Myers

@vivekmyers.bsky.social

130 followers 68 following 24 posts

PhD student @Berkeley_AI reinforcement learning, AI, robotics

Posts Media Videos Starter Packs

Reposted by Vivek Myers

Data on the Brain & Mind @NeurIPS2025 @dataonbrainmind.bsky.social · Aug 27

🚨 Deadline Extended 🚨
The submission deadline for the Data on the Brain & Mind Workshop (NeurIPS 2025) has been extended to Sep 8 (AoE)! 🧠✨
We invite you to submit your findings or tutorials via the OpenReview portal:
openreview.net/group?id=Neu...

NeurIPS 2025 Workshop DBM

Welcome to the OpenReview homepage for NeurIPS 2025 Workshop DBM

Reposted by Vivek Myers

Data on the Brain & Mind @NeurIPS2025 @dataonbrainmind.bsky.social · Aug 25

📢 10 days left to submit to the Data on the Brain & Mind Workshop at #NeurIPS2025!

📝 Call for:
• Findings (4 or 8 pages)
• Tutorials

If you’re submitting to ICLR or NeurIPS, consider submitting here too—and highlight how to use a cog neuro dataset in our tutorial track!
🔗 data-brain-mind.github.io

Data on the Brain & Mind

data-brain-mind.github.io

Reposted by Vivek Myers

Data on the Brain & Mind @NeurIPS2025 @dataonbrainmind.bsky.social · Aug 4

🚨 Excited to announce our #NeurIPS2025 Workshop: Data on the Brain & Mind

📣 Call for: Findings (4- or 8-page) + Tutorials tracks

🎙️ Speakers include @dyamins.bsky.social @lauragwilliams.bsky.social @cpehlevan.bsky.social

🌐 Learn more: data-brain-mind.github.io

Reposted by Vivek Myers

Alison Gopnik @alisongopnik.bsky.social · Jun 12

This is an excellent and very clear piece from Sergey Levine about the strengths and limitations of Large Language models.
sergeylevine.substack.com/p/language-m...

Language Models in Plato's Cave

Why language models succeeded where video models failed, and what that teaches us about AI

sergeylevine.substack.com

Reposted by Vivek Myers

raj-ghugare.bsky.social @raj-ghugare.bsky.social · Jun 5

Normalizing Flows (NFs) check all boxes for RL: exact likelihoods (imitation learning), efficient sampling (real-time control), and variational inference (Q-learning)! Yet they are overlooked over more expensive and less flexible contemporaries like diffusion models.

Are NFs fundamentally limited?

Vivek Myers @vivekmyers.bsky.social · Apr 26

How can agents trained to reach (temporally) nearby goals generalize to attain distant goals?

Come to our #ICLR2025 poster now to discuss 𝘩𝘰𝘳𝘪𝘻𝘰𝘯 𝘨𝘦𝘯𝘦𝘳𝘢𝘭𝘪𝘻𝘢𝘵𝘪𝘰𝘯!

w/ @crji.bsky.social and @ben-eysenbach.bsky.social

📍Hall 3 + Hall 2B #637

Reposted by Vivek Myers

Aly Lidayan @aliday.bsky.social · Mar 26

🚨Our new #ICLR2025 paper presents a unified framework for intrinsic motivation and reward shaping: they signal the value of the RL agent’s state🤖=external state🌎+past experience🧠. Rewards based on potentials over the learning agent’s state provably avoid reward hacking!🧵

Vivek Myers @vivekmyers.bsky.social · Feb 14

Thanks to incredible collaborators Bill Zheng, Anca Dragan, Kuan Fang, and Sergey Levine!

Website: tra-paper.github.io
Paper: arxiv.org/pdf/2502.05454

Temporal Representation Alignment: Successor Features Enable Emergent Compositionality in Robot Instruction Following

Temporal Representation Alignment

tra-paper.github.io

Vivek Myers @vivekmyers.bsky.social · Feb 14

...but to create truly autonomous self-improving agents, we must not only imitate, but also 𝘪𝘮𝘱𝘳𝘰𝘷𝘦 upon the training capabilities. Our findings suggest that this improvement might emerge from better task representations, rather than more complex learning algorithms. 7/

Vivek Myers @vivekmyers.bsky.social · Feb 14

𝘞𝘩𝘺 𝘥𝘰𝘦𝘴 𝘵𝘩𝘪𝘴 𝘮𝘢𝘵𝘵𝘦𝘳? Recent breakthroughs in both end-to-end robot learning and language modeling have been enabled not through complex TD-based reinforcement learning objectives, but rather through scaling imitation with large architectures and datasets... 6/

Vivek Myers @vivekmyers.bsky.social · Feb 14

We validated this in simulation. Across offline RL benchmarks, imitation using our TRA task representations outperformed standard behavioral cloning-especially for stitching tasks. In many cases, TRA beat "true" value-based offline RL, using only an imitation loss. 5/

Vivek Myers @vivekmyers.bsky.social · Feb 14

Successor features have long been known to boost RL generalization (Dayan, 1993). Our findings suggest something stronger: successor task representations produce emergent capabilities beyond training even without RL or explicit subtask decomposition. 4/

Vivek Myers @vivekmyers.bsky.social · Feb 14

This trick encourages a form of time invariance during learning: both nearby and distant goals are represented similarly. By additionally aligning language instructions 𝜉(ℓ) to the goal representations 𝜓(𝑔), the policy can also perform new compound language tasks. 3/

Vivek Myers @vivekmyers.bsky.social · Feb 14

What does temporal alignment mean? When training, our policy imitates the human actions that lead to the end goal 𝑔 of a trajectory. Rather than training on the raw goals, we use a representation 𝜓(𝑔) that aligns with the preceding state “successor features” 𝜙(𝑠). 2/

Vivek Myers @vivekmyers.bsky.social · Feb 14

Current robot learning methods are good at imitating tasks seen during training, but struggle to compose behaviors in new ways. When training imitation policies, we found something surprising—using temporally-aligned task representations enabled compositional generalization. 1/

Reposted by Vivek Myers

Ben Eysenbach @ben-eysenbach.bsky.social · Feb 6

Excited to share new work led by @vivekmyers.bsky.social and @crji.bsky.social that proves you can learn to reach distant goals by solely training on nearby goals. The key idea is a new form of invariance. This invariance implies generalization w.r.t. the horizon.

Vivek Myers @vivekmyers.bsky.social · Feb 4

Reinforcement learning agents should be able to improve upon behaviors seen during training.
In practice, RL agents often struggle to generalize to new long-horizon behaviors.
Our new paper studies *horizon generalization*, the degree to which RL algorithms generalize to reaching distant goals. 1/

Reposted by Vivek Myers

cathy ji @crji.bsky.social · Feb 4

Want to see an agent carry out long horizons tasks when only trained on short horizon trajectories?

We formalize and demonstrate this notion of *horizon generalization* in RL.

Check out our website! horizon-generalization.github.io

Vivek Myers @vivekmyers.bsky.social · Feb 4

With wonderful collaborators @crji.bsky.social, @ben-eysenbach.bsky.social !
Paper: arxiv.org/abs/2501.02709
Website: horizon-generalization.github.io
Code: github.com/vivekmyers/h...

Horizon Generalization in Reinforcement Learning

We study goal-conditioned RL through the lens of generalization, but not in the traditional sense of random augmentations and domain randomization. Rather, we aim to learn goal-directed policies that ...

Vivek Myers @vivekmyers.bsky.social · Feb 4

What does this mean in practice? To generalize to long-horizon goal-reaching behavior, we should consider how our GCRL algorithms and architectures enable invariance to planning. When possible, prefer architectures like quasimetric networks (MRN, IQE) that enforce this invariance. 6/

Vivek Myers @vivekmyers.bsky.social · Feb 4

Empirical results support this theory. The degree of planning invariance and horizon generalization is correlated across environments and GCRL methods. Critics parameterized as a quasimetric distance indeed tend to generalize the most over horizon. 5/

Vivek Myers @vivekmyers.bsky.social · Feb 4

Similar to how CNN architectures exploit the inductive bias of translation-invariance for image classification, RL policies can enforce planning invariance by using a *quasimetric* critic parameterization that is guaranteed to obey the triangle inequality. 4/

Vivek Myers @vivekmyers.bsky.social · Feb 4

The key to achieving horizon generalization is *planning invariance*. A policy is planning invariant if decomposing tasks into simpler subtasks doesn't improve performance. We prove planning invariance can enable horizon generalization. 3/

Vivek Myers @vivekmyers.bsky.social · Feb 4

Certain RL algorithms are more conducive to horizon generalization than others. Goal-conditioned (GCRL) methods with a bilinear critic ϕ(𝑠)ᵀψ(𝑔) as well as quasimetric methods better-enable horizon generalization. 2/

Vivek Myers @vivekmyers.bsky.social · Feb 4

Reinforcement learning agents should be able to improve upon behaviors seen during training.
In practice, RL agents often struggle to generalize to new long-horizon behaviors.
Our new paper studies *horizon generalization*, the degree to which RL algorithms generalize to reaching distant goals. 1/

Vivek Myers @vivekmyers.bsky.social · Jan 22

Website: empowering-humans.github.io
Paper: arxiv.org/abs/2411.02623

Many thanks to wonderful collaborators Evan Ellis, Sergey Levine, Benjamin Eysenbach, and Anca Dragan!

Learning to Assist Humans without Inferring Rewards

empowering-humans.github.io