Paul Couairon
@paulcouairon.bsky.social
24 followers 50 following 5 posts
PhD student at Sorbonne University
Posts Media Videos Starter Packs
Reposted by Paul Couairon
ssirko.bsky.social
1/n 🚀New paper out - accepted at #ICCV2025!

Introducing DIP: unsupervised post-training that enhances dense features in pretrained ViTs for dense in-context scene understanding

Below: Low-shot in-context semantic segmentation examples. DIP features outperform DINOv2!
paulcouairon.bsky.social
Despite the absence of high-resolution ground truth features, we find that training JAFAR at low upsampling ratios and resolutions generalizes remarkably well to significantly higher output scales (4/n)
paulcouairon.bsky.social
Given an image, JAFAR builds high-res queries at the target resolution and low-res, semantically enriched keys using spatial feature modulation to power a cross-resolution attention mechanism that interpolates the low-resolution features from the foundation vision encoder (3/n)
paulcouairon.bsky.social
Foundation Vision Encoders produce rich, semantically meaningful features—but at low spatial resolution—requiring feature upsampling for dense vision tasks. JAFAR tackles this in a single step, without relying on any downstream task supervision (2/n)
paulcouairon.bsky.social
🚀Thrilled to introduce JAFAR—a lightweight, flexible, plug-and-play module that upsamples features from any Foundation Vision Encoder to any desired output resolution (1/n)

Paper : arxiv.org/abs/2506.11136
Project Page: jafar-upsampler.github.io
Github: github.com/PaulCouairon...