Sai Prasanna
@saiprasanna.in
2.1K followers 690 following 290 posts
See(k)ing the surreal Causal World Models for Curious Robots @ University of Tübingen/Max Planck Institute for Intelligent Systems 🇩🇪 #reinforcementlearning #robotics #causality #meditation #vegan
Posts Media Videos Starter Packs
Pinned
saiprasanna.in
📌 Thread of threads for research ideas 💡 Collaborations are most welcome 😁
saiprasanna.in
Use Beta NLL for regression when you also predict standard deviations, a simple change to NLL that works reliably better.
saiprasanna.in
If open-endedness has to be fundamentally subjectively measured, what are the factors of the agent makes it so if we fix humans as the final arbiter or evaluator. Does embodiment/action space etc of the agent matter for a human evaluator of open-endedness?
saiprasanna.in
But this is from the vibes of Tübingen from 1.5 days of visit. I have lived in Freiburg for 3 years
saiprasanna.in
Tübingen: Freiburg:: Introvert:Extrovert
saiprasanna.in
Had a discussion with a fellow not-so-political Indian colleague doing a PhD in computer science in Europe. He is now thinking twice on his plan to go for an exchange at an US lab
Reposted by Sai Prasanna
vgr.bsky.social
This might be the most fun I’ve had writing an essay in a while. Felt some of that old going-nuts-with-an-idea energy flowing.

open.substack.com/pub/contrapt...
Discworld Rules
And LOTR is brain-rot for technologists
open.substack.com
Reposted by Sai Prasanna
tomssilver.bsky.social
This week's #PaperILike is "Model Predictive Control and Reinforcement Learning: A Unified Framework Based on Dynamic Programming" (Bertsekas 2024).

If you know 1 of {RL, controls} and want to understand the other, this is a good starting point.

PDF: arxiv.org/abs/2406.00592
Model Predictive Control and Reinforcement Learning: A Unified Framework Based on Dynamic Programming
In this paper we describe a new conceptual framework that connects approximate Dynamic Programming (DP), Model Predictive Control (MPC), and Reinforcement Learning (RL). This framework centers around ...
arxiv.org
saiprasanna.in
Curious to know which show
saiprasanna.in
One strategy I guess is to have good stream of good (BS filter) and diverse (topics, areas) inputs (books, research papers, what not)

And not get bogged by the fact that I am too distracted to go deep into one input stream (book or podcast or article or paper) at a time
saiprasanna.in
Do any of my fellow fox-brained folks (@vgr.bsky.social) have good strategies for aiding background processing? I think background processing feels more foxy thing intutively

@visakanv.com (not sure if you identify as a fox in the fox hedgehog dichotomy though)
saiprasanna.in
I guess the trick would be to do actions that makes the mind and emotional states to be fertile for the background processing to happen consistently!
saiprasanna.in
I realized how I background process tonnes of information, from work/research and emotional stuff. And it works well, leads to good research ideas, wise processing of tough situations! But It's so hard to learn to trust this as conscious thinking for solving problems feels more under my "control"
saiprasanna.in
Conditioning gap in latent space world models is due to how uncertainty can go into latent posterior distribution or the learnt prior (dynamics model) and not conditioning on the future would put the uncertainty incorrectly into dynamics model.
saiprasanna.in
To re-think I think the problems could be orthogonal. Clever hans pertains to teacher forcing during training leading to easy solutions for lot of the timesteps skewing it to not learning the hard timestep which is most important for test-time.
saiprasanna.in
(Shame that argmax.org/blog is down now!! They're a really nice less known research group in Volkswagen doing important stuff in world models.)

Anyways, If these two problems are related, just establishing that would be an amazing paper!
argmax.org
saiprasanna.in
Conditioning gap: When you train a value encoder that computes an approximate posterior that's conditioned partially (say on past tokens), then the posterior has a worse lower bound than one also conditioned on everything (also future tokens).
saiprasanna.in
It reminds me of another problem, and I'm not sure if it's equivalent or if it's some dual problem. It's called the conditioning gap in latent space inference.
saiprasanna.in
The fix involves modelling forward and backward directions. I haven't grokked it fully, but I learnt about the above problem there. I find this two papers a really nice sequence of a fundamental problem and then a solution!
saiprasanna.in
And there is a new paper that claims to fix this for transformer architecture!!! They call it "belief state transformer". Apparently it fixes lots of practical problems arising due to clever hans cheat!

arxiv.org/abs/2410.23506
The Belief State Transformer
We introduce the "Belief State Transformer", a next-token predictor that takes both a prefix and suffix as inputs, with a novel objective of predicting both the next token for the prefix and the previ...
arxiv.org