I’m excited about the possibilities this opens up for RL and robotics. Large-scale behavioral models can be trained directly via likelihoods—without expensive sampling (diffusion) or discretization errors (transformers). They can also be fine-tuned directly using exact MaxEnt RL!
In unsupervised goal conditioned RL: A simple goal sampling strategy that uses NF’s ability to provide density estimates outperforms supervised oracles like contrastive RL on 3 standard exploration tasks.
On offline RL: NF-RLBC outperforms strong baselines like flow matching and diffusion-based Q-learning on half the tasks. This boost comes by simply replacing the Gaussian policy with an NF in the SAC+BC recipe. Unlike diffusion, no distillation or importance sampling is needed.
On conditional Imitation learning: (left) Just exchanging a gaussian policy with an NF policy leads to significant improvements. (right) NF-GCBC outerforms the flow matching policies, as well as dedicated offline RL algorithms like IQL or quasimetric RL.
On Imitation learning: NF-BC is competitive with diffusion / transformer policies, which are the go to models for imitation learning today. But NF-BC requires fewer hyper-parameters (no SDEs / no noise scheduling / no discrete representations).
Combining this architecture with the canonical RL algorithms leads to a strong and simple recipe. On 82 tasks spanning 5 settings like imitation learning, offline, goal-conditioned, unsupervised RL this recipe rivals and surpasses strong baselines using diffusion, flow matching or transformers.
Stacking affine coupling networks, permutations layers and layerNorm results in a simple and scalable architecture. It seamlessly integrates with canonical imitation learning, offline RL, goal conditioned and unsupervised RL algorithms.
We believe their underuse is due to the (mis)conception that they have overly restricted architectures or training instabilities. We revisit one of the simplest NF architectures and show that it can yield strong results across diverse RL problem settings.
The core of most RL algorithms is just likelihood estimation, sampling, and variational Inference (see attached image). NFs can efficiently do all three! It raises the question of why don’t we see them more commonly used in RL?
Normalizing Flows (NFs) check all boxes for RL: exact likelihoods (imitation learning), efficient sampling (real-time control), and variational inference (Q-learning)! Yet they are overlooked over more expensive and less flexible contemporaries like diffusion models.