http://nicolas-dufour.github.io
Both NoPE and DroPE rely on the causal mask to leak absolute PE. The number of tokens in the attention gets leaked because you can encode a bias that grows with the number of tokens.
So not a fix for images yet =(
Both NoPE and DroPE rely on the causal mask to leak absolute PE. The number of tokens in the attention gets leaked because you can encode a bias that grows with the number of tokens.
So not a fix for images yet =(
Recently it was more flow matching (or v-loss) that is mostly used since SD3 basically.
From my experience, flow doesn't really improve quality, but sampling in fewer steps works better than epsilon prediction
Recently it was more flow matching (or v-loss) that is mostly used since SD3 basically.
From my experience, flow doesn't really improve quality, but sampling in fewer steps works better than epsilon prediction
MIRO is quite different in the sense we focus on improving pretraining (not finetuning). Also, we explore the advantages of having multiple rewards to push the Pareto frontier.
MIRO is quite different in the sense we focus on improving pretraining (not finetuning). Also, we explore the advantages of having multiple rewards to push the Pareto frontier.
This will be the last work of my PhD as I will be defending the 26th of November!
This will be the last work of my PhD as I will be defending the 26th of November!
Project page for all the details: nicolas-dufour.github.io/miro
Project page for all the details: nicolas-dufour.github.io/miro
The multi-reward conditioning fosters better understanding of complex spatial relationships and object interactions.
The multi-reward conditioning fosters better understanding of complex spatial relationships and object interactions.
GenEval score of 75, outperforming the 12B FLUX-dev (67) for 370x less inference cost.
Conditioning on rich reward signals is a highly effective way to achieve large model capabilities in a compact form!
GenEval score of 75, outperforming the 12B FLUX-dev (67) for 370x less inference cost.
Conditioning on rich reward signals is a highly effective way to achieve large model capabilities in a compact form!
On PickScore, MIRO needs just 4 samples to match the baseline's 128 samples (a 32x efficiency gain).
For ImageReward, it's a 16x efficiency gain
This demonstrates superior inference-time efficiency for high-quality generation.
On PickScore, MIRO needs just 4 samples to match the baseline's 128 samples (a 32x efficiency gain).
For ImageReward, it's a 16x efficiency gain
This demonstrates superior inference-time efficiency for high-quality generation.
AestheticScore: 19.1x faster to reach baseline quality.
HPSv2: 6.2x faster.
You can clearly see the improvements visually
AestheticScore: 19.1x faster to reach baseline quality.
HPSv2: 6.2x faster.
You can clearly see the improvements visually