Working on conditional diffusion models
No more post-training alignment!
We integrate human alignment right from the start, during pretraining!
Results:
✨ 19x faster convergence ⚡
✨ 370x less compute 💻
🔗 Explore the project: nicolas-dufour.github.io/miro/
No more post-training alignment!
We integrate human alignment right from the start, during pretraining!
Results:
✨ 19x faster convergence ⚡
✨ 370x less compute 💻
🔗 Explore the project: nicolas-dufour.github.io/miro/
- 19x faster convergence ⚡
- 370x less FLOPS than FLUX-dev 📉
- 19x faster convergence ⚡
- 370x less FLOPS than FLUX-dev 📉
Do you think there would be people in for that? Do you think it would make for a nice competition?
Do you think there would be people in for that? Do you think it would make for a nice competition?
TL;DR: train a text2image model from scratch on ImageNet only and beat SDXL.
Paper, code, data available! Reproducible science FTW!
🧵👇
📜 arxiv.org/abs/2502.21318
💻 github.com/lucasdegeorg...
💽 huggingface.co/arijitghosh/...
TL;DR: train a text2image model from scratch on ImageNet only and beat SDXL.
Paper, code, data available! Reproducible science FTW!
🧵👇
📜 arxiv.org/abs/2502.21318
💻 github.com/lucasdegeorg...
💽 huggingface.co/arijitghosh/...
If you want to learn more nicolas-dufour.github.io/plonk
www.youtube.com/watch?v=s5oH...
If you want to learn more nicolas-dufour.github.io/plonk
www.youtube.com/watch?v=s5oH...
We have released the models from our latest paper "How far can we go with ImageNet for text-to-image generation?"
Check out the models on HuggingFace:
🤗 huggingface.co/Lucasdegeorg...
📜 arxiv.org/abs/2502.21318
We have released the models from our latest paper "How far can we go with ImageNet for text-to-image generation?"
Check out the models on HuggingFace:
🤗 huggingface.co/Lucasdegeorg...
📜 arxiv.org/abs/2502.21318
But, is it necessary?
Our "How far can we go with ImageNet for T2I generation?" @lucasdegeorge.bsky.social @arrijitghosh.bsky.social @nicolasdufour.bsky.social @davidpicard.bsky.social shows that no, if we are careful arxiv.org/abs/2502.21318
But, is it necessary?
Our "How far can we go with ImageNet for T2I generation?" @lucasdegeorge.bsky.social @arrijitghosh.bsky.social @nicolasdufour.bsky.social @davidpicard.bsky.social shows that no, if we are careful arxiv.org/abs/2502.21318
Conjecture:
As we are get more and more well-aligned text-image data, it will become easier and easier to train models.
This will allow us to explore both more streamlined and more exotic training recipes.
More signals that exciting times are coming!
How far can we go with ImageNet for Text-to-Image generation? w. @arrijitghosh.bsky.social @lucasdegeorge.bsky.social @nicolasdufour.bsky.social @vickykalogeiton.bsky.social
TL;DR: Train a text-to-image model using 1000 less data in 200 GPU hrs!
📜https://arxiv.org/abs/2502.21318
🧵👇
Conjecture:
As we are get more and more well-aligned text-image data, it will become easier and easier to train models.
This will allow us to explore both more streamlined and more exotic training recipes.
More signals that exciting times are coming!
How far can we go with ImageNet for Text-to-Image generation? w. @arrijitghosh.bsky.social @lucasdegeorge.bsky.social @nicolasdufour.bsky.social @vickykalogeiton.bsky.social
TL;DR: Train a text-to-image model using 1000 less data in 200 GPU hrs!
📜https://arxiv.org/abs/2502.21318
🧵👇
How far can we go with ImageNet for Text-to-Image generation? w. @arrijitghosh.bsky.social @lucasdegeorge.bsky.social @nicolasdufour.bsky.social @vickykalogeiton.bsky.social
TL;DR: Train a text-to-image model using 1000 less data in 200 GPU hrs!
📜https://arxiv.org/abs/2502.21318
🧵👇
How far can we go with ImageNet for Text-to-Image generation? w. @arrijitghosh.bsky.social @lucasdegeorge.bsky.social @nicolasdufour.bsky.social @vickykalogeiton.bsky.social
TL;DR: Train a text-to-image model using 1000 less data in 200 GPU hrs!
📜https://arxiv.org/abs/2502.21318
🧵👇
🗺️ Paper, code, and demo: nicolas-dufour.github.io/plonk
🗺️ Paper, code, and demo: nicolas-dufour.github.io/plonk