https://alexiglad.github.io/
EBTs also learn better representations of images than diffusion models, achieving a ~10x higher ImageNet accuracy.
EBTs also learn better representations of images than diffusion models, achieving a ~10x higher ImageNet accuracy.
We think this performance improvement occurs because verification is often easier than generation and because EBTs can learn to express uncertainty in continuous spaces.
We think this performance improvement occurs because verification is often easier than generation and because EBTs can learn to express uncertainty in continuous spaces.
We find that EBTs can out-generalize the Transformer++ on Out-of-Distribution data by thinking longer and that Thinking also improves with scale.
We find that EBTs can out-generalize the Transformer++ on Out-of-Distribution data by thinking longer and that Thinking also improves with scale.
EBMs have struggled to scale due to issues with stability and parallelization. Therefore, we create Transformers specifically for solving these issues, which we call Energy-Based Transformers (EBTs).
EBMs have struggled to scale due to issues with stability and parallelization. Therefore, we create Transformers specifically for solving these issues, which we call Energy-Based Transformers (EBTs).
EBMs learn to assign a scalar energy value denoting the compatibility of inputs.
Then, EBMs learn to optimize predictions to minimize this energy.
This allows EBMs to know when a problem is difficult (high energy), and adjust resources until a good solution is found.
EBMs learn to assign a scalar energy value denoting the compatibility of inputs.
Then, EBMs learn to optimize predictions to minimize this energy.
This allows EBMs to know when a problem is difficult (high energy), and adjust resources until a good solution is found.
It turns out that there’s an elegant solution:💡
Learn to verify predictions
Optimization predictions with respect to this verifier
This is exactly what Energy-Based Models (EBM) are! EBMs enable thinking longer and self-verifying.
It turns out that there’s an elegant solution:💡
Learn to verify predictions
Optimization predictions with respect to this verifier
This is exactly what Energy-Based Models (EBM) are! EBMs enable thinking longer and self-verifying.
⚡️Introducing Energy-Based Transformers (EBTs), an approach that out-scales (feed-forward) transformers and unlocks generalized reasoning/thinking on any modality/problem without rewards.
🧵Thread:
⚡️Introducing Energy-Based Transformers (EBTs), an approach that out-scales (feed-forward) transformers and unlocks generalized reasoning/thinking on any modality/problem without rewards.
🧵Thread: