Lightnews — Scholar-powered news

Sai Prasanna

@saiprasanna.in

2.1K followers 690 following 290 posts

See(k)ing the surreal Causal World Models for Curious Robots @ University of Tübingen/Max Planck Institute for Intelligent Systems 🇩🇪 #reinforcementlearning #robotics #causality #meditation #vegan

Posts Media Videos Starter Packs

Pinned

Sai Prasanna @saiprasanna.in · Dec 4

📌 Thread of threads for research ideas 💡 Collaborations are most welcome 😁

5 5

Sai Prasanna @saiprasanna.in · 29d

arxiv.org/abs/2203.091...

On the Pitfalls of Heteroscedastic Uncertainty Estimation with Probabilistic Neural Networks

Capturing aleatoric uncertainty is a critical part of many machine learning systems. In deep learning, a common approach to this end is to train a neural network to estimate the parameters of a hetero...

arxiv.org

Sai Prasanna @saiprasanna.in · 29d

Use Beta NLL for regression when you also predict standard deviations, a simple change to NLL that works reliably better.

1 2

Sai Prasanna @saiprasanna.in · Aug 2

If open-endedness has to be fundamentally subjectively measured, what are the factors of the agent makes it so if we fix humans as the final arbiter or evaluator. Does embodiment/action space etc of the agent matter for a human evaluator of open-endedness?

Sai Prasanna @saiprasanna.in · Jun 25

🤣 generalrobots.substack.com/p/a-brief-in...

A Brief, Incomplete, and Mostly Wrong History of Robotics

(An homage to one of my favorite pieces on the internet: A Brief, Incomplete, and Mostly Wrong History of Programming Languages)

generalrobots.substack.com

Sai Prasanna @saiprasanna.in · Mar 27

But this is from the vibes of Tübingen from 1.5 days of visit. I have lived in Freiburg for 3 years

Sai Prasanna @saiprasanna.in · Mar 27

Freiburg

1 1

Sai Prasanna @saiprasanna.in · Mar 27

Tübingen

1 1

Sai Prasanna @saiprasanna.in · Mar 27

Tübingen: Freiburg:: Introvert:Extrovert

1 4

Sai Prasanna @saiprasanna.in · Mar 27

Had a discussion with a fellow not-so-political Indian colleague doing a PhD in computer science in Europe. He is now thinking twice on his plan to go for an exchange at an US lab

1 18

Sai Prasanna @saiprasanna.in · Mar 15

contraptions.venkateshrao.com/p/discworld-...

Discworld Rules

And LOTR is brain-rot for technologists

contraptions.venkateshrao.com

1 9

Reposted by Sai Prasanna

Venkatesh Rao 🔹 @vgr.bsky.social · Mar 8

This might be the most fun I’ve had writing an essay in a while. Felt some of that old going-nuts-with-an-idea energy flowing.

open.substack.com/pub/contrapt...

Discworld Rules

And LOTR is brain-rot for technologists

open.substack.com

4 9 56

Reposted by Sai Prasanna

Tom Silver @tomssilver.bsky.social · Mar 2

This week's #PaperILike is "Model Predictive Control and Reinforcement Learning: A Unified Framework Based on Dynamic Programming" (Bertsekas 2024).

If you know 1 of {RL, controls} and want to understand the other, this is a good starting point.

PDF: arxiv.org/abs/2406.00592

Model Predictive Control and Reinforcement Learning: A Unified Framework Based on Dynamic Programming

In this paper we describe a new conceptual framework that connects approximate Dynamic Programming (DP), Model Predictive Control (MPC), and Reinforcement Learning (RL). This framework centers around ...

arxiv.org

8 43

Sai Prasanna @saiprasanna.in · Mar 2

Curious to know which show

1 1

Sai Prasanna @saiprasanna.in · Mar 1

One strategy I guess is to have good stream of good (BS filter) and diverse (topics, areas) inputs (books, research papers, what not)

And not get bogged by the fact that I am too distracted to go deep into one input stream (book or podcast or article or paper) at a time

Sai Prasanna @saiprasanna.in · Mar 1

Do any of my fellow fox-brained folks (@vgr.bsky.social) have good strategies for aiding background processing? I think background processing feels more foxy thing intutively

@visakanv.com (not sure if you identify as a fox in the fox hedgehog dichotomy though)

Sai Prasanna @saiprasanna.in · Mar 1

I guess the trick would be to do actions that makes the mind and emotional states to be fertile for the background processing to happen consistently!

Sai Prasanna @saiprasanna.in · Mar 1

I realized how I background process tonnes of information, from work/research and emotional stuff. And it works well, leads to good research ideas, wise processing of tough situations! But It's so hard to learn to trust this as conscious thinking for solving problems feels more under my "control"

2 1

Sai Prasanna @saiprasanna.in · Mar 1

Conditioning gap in latent space world models is due to how uncertainty can go into latent posterior distribution or the learnt prior (dynamics model) and not conditioning on the future would put the uncertainty incorrectly into dynamics model.

Sai Prasanna @saiprasanna.in · Mar 1

To re-think I think the problems could be orthogonal. Clever hans pertains to teacher forcing during training leading to easy solutions for lot of the timesteps skewing it to not learning the hard timestep which is most important for test-time.

Sai Prasanna @saiprasanna.in · Mar 1

(Shame that argmax.org/blog is down now!! They're a really nice less known research group in Volkswagen doing important stuff in world models.)

Anyways, If these two problems are related, just establishing that would be an amazing paper!

argmax.org

Sai Prasanna @saiprasanna.in · Mar 1

Blog web.archive.org/web/20241108...

paper arxiv.org/abs/2101.07046

Applied to world models for pomdps web.archive.org/web/20241009...

A Tale of Gaps - argmax.org

With variational auto-encoders (VAEs), it has become popular to approximate Bayesian inference with neural networks. This scales Bayesian inference to large datasets and deep generative models at the ...

web.archive.org

1 2

Sai Prasanna @saiprasanna.in · Mar 1

Conditioning gap: When you train a value encoder that computes an approximate posterior that's conditioned partially (say on past tokens), then the posterior has a worse lower bound than one also conditioned on everything (also future tokens).

Sai Prasanna @saiprasanna.in · Mar 1

It reminds me of another problem, and I'm not sure if it's equivalent or if it's some dual problem. It's called the conditioning gap in latent space inference.

Sai Prasanna @saiprasanna.in · Mar 1

The fix involves modelling forward and backward directions. I haven't grokked it fully, but I learnt about the above problem there. I find this two papers a really nice sequence of a fundamental problem and then a solution!

Sai Prasanna @saiprasanna.in · Mar 1

And there is a new paper that claims to fix this for transformer architecture!!! They call it "belief state transformer". Apparently it fixes lots of practical problems arising due to clever hans cheat!

arxiv.org/abs/2410.23506

The Belief State Transformer

We introduce the "Belief State Transformer", a next-token predictor that takes both a prefix and suffix as inputs, with a novel objective of predicting both the next token for the prefix and the previ...

arxiv.org

1 1