Lightnews — Scholar-powered news

Nicolas Dufour

@nicolasdufour.bsky.social

Sadly i don't think DroPE will work for images / videos.
Both NoPE and DroPE rely on the causal mask to leak absolute PE. The number of tokens in the attention gets leaked because you can encode a bias that grows with the number of tokens.
So not a fix for images yet =(

January 12, 2026 at 8:51 PM

Reposted by Nicolas Dufour

Giorgos Tolias

@gtolias.bsky.social

It was a big pleasure to be in Nicolas's committee. Congratulations to Nicolas for the great work, and congratulations to the advisors too!

November 28, 2025 at 11:49 AM

Nicolas Dufour

@nicolasdufour.bsky.social

Apparently some people reported knowing of the bug before 11th of november so even before the release of the reviews

November 27, 2025 at 9:56 PM

Reposted by Nicolas Dufour

Diane Larlus

@dlarlus.bsky.social

Congrats Nicolas ! On the PhD and on those beautifully crafted slides 🤩

November 27, 2025 at 5:46 PM

Nicolas Dufour

@nicolasdufour.bsky.social

Yes it's latent space just because i had my setup that way. Might try in pixel space in the future.

November 18, 2025 at 2:20 PM

Nicolas Dufour

@nicolasdufour.bsky.social

Yes it's the raw prediction, we predict the velocity directly

November 18, 2025 at 2:06 PM

Nicolas Dufour

@nicolasdufour.bsky.social

It's also very domain dependent. I know that for example, x-pred works better than epsilon pred for human motion generation.

November 18, 2025 at 1:49 PM

Nicolas Dufour

@nicolasdufour.bsky.social

Epsilon loss was used for a while for image generation since DDPM.
Recently it was more flow matching (or v-loss) that is mostly used since SD3 basically.
From my experience, flow doesn't really improve quality, but sampling in fewer steps works better than epsilon prediction

November 18, 2025 at 1:47 PM

Nicolas Dufour

@nicolasdufour.bsky.social

Thanks for the pointer! We were doing something similar in "Don't drop your samples" (arxiv.org/abs/2405.20324)

MIRO is quite different in the sense we focus on improving pretraining (not finetuning). Also, we explore the advantages of having multiple rewards to push the Pareto frontier.

Don't drop your samples! Coherence-aware training benefits Conditional diffusion

Conditional diffusion models are powerful generative models that can leverage various types of conditional information, such as class labels, segmentation masks, or text captions. However, in many rea...

arxiv.org

November 3, 2025 at 1:20 PM

Nicolas Dufour

@nicolasdufour.bsky.social

Yes, thanks for pointing it out, will try to clarify

November 3, 2025 at 1:15 PM

Nicolas Dufour

@nicolasdufour.bsky.social

Work with @lucasdegeorge.bsky.social @arrijitghosh.bsky.social @vickykalogeiton.bsky.social and @davidpicard.bsky.social.

This will be the last work of my PhD as I will be defending the 26th of November!

October 31, 2025 at 11:24 AM

Nicolas Dufour

@nicolasdufour.bsky.social

MIRO demonstrates that aligning T2I models during pretraining is not only viable but superior: it's faster, more compute-efficient, and provides fine-grained, interpretable control.

Project page for all the details: nicolas-dufour.github.io/miro

MIRO: Multi-Reward Conditioning for Efficient Text-to-Image Generation

Train once, align many rewards. MIRO achieves 19× faster convergence and 370× less compute than FLUX while reaching GenEval score of 75. Controllable trade-offs at inference time.

nicolas-dufour.github.io

October 31, 2025 at 11:24 AM

Nicolas Dufour

@nicolasdufour.bsky.social

The explicit reward conditioning allows for flexible trade-offs, like optimizing for GenEval by reducing the aesthetic weight in the prompt. We can also isolate the look of a specific reward or interpolate them via multi-reward classifier-free guidance

October 31, 2025 at 11:24 AM

Nicolas Dufour

@nicolasdufour.bsky.social

MIRO excels on challenging compositional tasks (Geneval here)

The multi-reward conditioning fosters better understanding of complex spatial relationships and object interactions.

October 31, 2025 at 11:24 AM

Nicolas Dufour

@nicolasdufour.bsky.social

Despite being a compact model (0.36B parameters), MIRO achieves state-of-the-art results:

GenEval score of 75, outperforming the 12B FLUX-dev (67) for 370x less inference cost.
Conditioning on rich reward signals is a highly effective way to achieve large model capabilities in a compact form!

October 31, 2025 at 11:24 AM

Nicolas Dufour

@nicolasdufour.bsky.social

MIRO dramatically improves sample efficiency for test-time scaling.

On PickScore, MIRO needs just 4 samples to match the baseline's 128 samples (a 32x efficiency gain).
For ImageReward, it's a 16x efficiency gain

This demonstrates superior inference-time efficiency for high-quality generation.

October 31, 2025 at 11:24 AM

Nicolas Dufour

@nicolasdufour.bsky.social

Traditional single-objective optimization often leads to reward hacking. MIRO's multi-dimensional conditioning naturally prevents this by requiring the model to balance multiple objectives simultaneously. This produces balanced, robust performance across all metrics contrary to single rewards.

October 31, 2025 at 11:24 AM

Nicolas Dufour

@nicolasdufour.bsky.social

The multi-reward conditioning provides a dense supervisory signal, accelerating convergence dramatically. A snapshot of the speed-up:

AestheticScore: 19.1x faster to reach baseline quality.
HPSv2: 6.2x faster.

You can clearly see the improvements visually

October 31, 2025 at 11:24 AM

Nicolas Dufour

@nicolasdufour.bsky.social

This reward vector s becomes an explicit, interpretable control input at inference time. We extend classifier-free guidance to the multi-reward setting, allowing users to steer generation toward jointly high-reward regions by defining positive (s^+) and negative (s^−) targets.

October 31, 2025 at 11:24 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news