@raj-ghugare.bsky.social
23 followers 5 following 11 posts
Posts Media Videos Starter Packs
raj-ghugare.bsky.social
I’m excited about the possibilities this opens up for RL and robotics. Large-scale behavioral models can be trained directly via likelihoods—without expensive sampling (diffusion) or discretization errors (transformers). They can also be fine-tuned directly using exact MaxEnt RL!
raj-ghugare.bsky.social
In unsupervised goal conditioned RL: A simple goal sampling strategy that uses NF’s ability to provide density estimates outperforms supervised oracles like contrastive RL on 3 standard exploration tasks.
raj-ghugare.bsky.social
On offline RL: NF-RLBC outperforms strong baselines like flow matching and diffusion-based Q-learning on half the tasks. This boost comes by simply replacing the Gaussian policy with an NF in the SAC+BC recipe. Unlike diffusion, no distillation or importance sampling is needed.
raj-ghugare.bsky.social
On conditional Imitation learning: (left) Just exchanging a gaussian policy with an NF policy leads to significant improvements. (right) NF-GCBC outerforms the flow matching policies, as well as dedicated offline RL algorithms like IQL or quasimetric RL.
raj-ghugare.bsky.social
On Imitation learning: NF-BC is competitive with diffusion / transformer policies, which are the go to models for imitation learning today. But NF-BC requires fewer hyper-parameters (no SDEs / no noise scheduling / no discrete representations).
raj-ghugare.bsky.social
Combining this architecture with the canonical RL algorithms leads to a strong and simple recipe. On 82 tasks spanning 5 settings like imitation learning, offline, goal-conditioned, unsupervised RL this recipe rivals and surpasses strong baselines using diffusion, flow matching or transformers.
raj-ghugare.bsky.social
Stacking affine coupling networks, permutations layers and layerNorm results in a simple and scalable architecture. It seamlessly integrates with canonical imitation learning, offline RL, goal conditioned and unsupervised RL algorithms.
raj-ghugare.bsky.social
We believe their underuse is due to the (mis)conception that they have overly restricted architectures or training instabilities. We revisit one of the simplest NF architectures and show that it can yield strong results across diverse RL problem settings.
raj-ghugare.bsky.social
The core of most RL algorithms is just likelihood estimation, sampling, and variational Inference (see attached image). NFs can efficiently do all three! It raises the question of why don’t we see them more commonly used in RL?
raj-ghugare.bsky.social
Normalizing Flows (NFs) check all boxes for RL: exact likelihoods (imitation learning), efficient sampling (real-time control), and variational inference (Q-learning)! Yet they are overlooked over more expensive and less flexible contemporaries like diffusion models.

Are NFs fundamentally limited?