Lightnews — Scholar-powered news

Reposted by Thibaut Loiseau

Nicolas Dufour @nicolasdufour.bsky.social · Aug 18

🚀 DinoV3 just became the new go-to backbone for geoloc!
It outperforms CLIP-like models (SigLip2, finetuned StreetCLIP)… and that’s shocking 🤯
Why? CLIP models have an innate advantage — they literally learn place names + images. DinoV3 doesn’t.

1 14 46

Reposted by Thibaut Loiseau

Imagine-ENPC @imagineenpc.bsky.social · Jun 15

Some of our IMAGINE members at #CVPR2025

7 34

Thibaut Loiseau @thibautloiseau.bsky.social · Jun 15

I will be at #CVPR2025 to present this work (RUBIK: A Structured Benchmark for Image Matching across Geometric Challenges) at 4pm, poster #88.
Come if you want to discuss!

2 10

Reposted by Thibaut Loiseau

Vincent Lepetit @vincentlepetit.bsky.social · Jun 14

I am heartbroken that I am not at the conference, but seeing what the government is doing to its people and the world, I simply couldn't go there.

1 6 21

Reposted by Thibaut Loiseau

Imagine-ENPC @imagineenpc.bsky.social · Apr 30

Looking forward to #CVPR2025! We will present the following papers:

1 7 28

Reposted by Thibaut Loiseau

Nicolas Dufour @nicolasdufour.bsky.social · Apr 24

This is an idea I've had for a while, but wow, it's working way better than expected! 🚀
The model looks really promising, even though it's just 256px for now.

1 3 7

Reposted by Thibaut Loiseau

Lucas Ventura @lucasventura.com · Apr 4

Introducing Chapter-Llama #CVPR2025, a framework for 𝐯𝐢𝐝𝐞𝐨 𝐜𝐡𝐚𝐩𝐭𝐞𝐫𝐢𝐧𝐠 using Large Language Models! 🎬🦙

Check it out:
📄 Paper: arxiv.org/abs/2504.00072
🔗 Project: imagine.enpc.fr/~lucas.ventu...
💻 Code: github.com/lucas-ventur...
🤗 Demo: huggingface.co/spaces/lucas...

1 5 24

Reposted by Thibaut Loiseau

David Picard @davidpicard.bsky.social · Mar 21

🔥🔥🔥 CV Folks, I have some news! We're organizing a 1-day meeting in center Paris on June 6th before CVPR called CVPR@Paris (similar as NeurIPS@Paris) 🥐🍾🥖🍷

Registration is open (it's free) with priority given to authors of accepted papers: cvprinparis.github.io/CVPR2025InPa...

Big 🧵👇 with details!

8 51 140

Reposted by Thibaut Loiseau

Imagine-ENPC @imagineenpc.bsky.social · Mar 14

Starter pack including some of the lab members: go.bsky.app/QK8j87w

11 24

Thibaut Loiseau @thibautloiseau.bsky.social · Mar 11

In the end, we might not care about explicit correspondences during pre-training, as it might already happen implicitly as was seen in CroCo. Also, the checks are done in 3D in the pipeline, and it is difficult to get pixel level correspondences with the current approach.

Thibaut Loiseau @thibautloiseau.bsky.social · Mar 11

Hi Johan, thanks for your question :) For now, each pixel only has its associated class, but we might be able to add explicit correspondences between pixels in the pipeline.

1 1

Reposted by Thibaut Loiseau

Johan Edstedt @parskatt.bsky.social · Mar 11

Introducing DaD, Part 2, a pretty cool keypoint detector.

Johan Edstedt @parskatt.bsky.social · Mar 11

Introducing DaD (arxiv.org/abs/2503.07347), a pretty cool keypoint detector.
As this will get pretty long, this will be two threads.
The first will go into the RL part, and the second on the emergence and distillation.

5 5 31

Thibaut Loiseau @thibautloiseau.bsky.social · Mar 11

13/13 For more details, check out our paper: arxiv.org/abs/2503.07561 and feel free to reach out with questions!

Alligat0R: Pre-Training Through Co-Visibility Segmentation for Relative Camera Pose Regression

Pre-training techniques have greatly advanced computer vision, with CroCo's cross-view completion approach yielding impressive results in tasks like 3D reconstruction and pose regression. However, thi...

arxiv.org

3

Thibaut Loiseau @thibautloiseau.bsky.social · Mar 11

12/13 Code and the Cub3 dataset will be released soon. Stay tuned!

1 3

Thibaut Loiseau @thibautloiseau.bsky.social · Mar 11

11/13 The implications are exciting: Alligat0R enables more robust visual perception systems that can handle and benefit from challenging real-world scenarios with varying degrees of overlap between views.

1 3

Thibaut Loiseau @thibautloiseau.bsky.social · Mar 11

10/13 Our approach not only improves performance but also provides interpretable visualizations of the model's geometric understanding through its segmentation outputs.

1 2

Thibaut Loiseau @thibautloiseau.bsky.social · Mar 11

9/13 Alligat0R works particularly well on difficult pairs, maintaining strong performance even as overlap decreases, while CroCo's accuracy drops dramatically below 40% overlap.

1 2

Thibaut Loiseau @thibautloiseau.bsky.social · Mar 11

8/13 On the RUBIK benchmark, our method achieves 60.3% accuracy (at 5°/2m threshold) compared to just 19.1% for the best CroCo model!

1 2

Thibaut Loiseau @thibautloiseau.bsky.social · Mar 11

7/13 In experiments, Alligat0R significantly outperforms CroCo for relative pose regression, with the same architecture, especially in challenging scenarios with limited overlap between views.

1 4

Thibaut Loiseau @thibautloiseau.bsky.social · Mar 11

6/13 To enable this approach, we created Cub3, a large-scale dataset with 2.5M image pairs and dense co-visibility annotations derived from nuScenes, featuring pairs with varying degrees of overlap, scale ratio and viewpoint angle.

2 2

Thibaut Loiseau @thibautloiseau.bsky.social · Mar 11

5/13 This formulation offers major advantages:
- Can use image pairs with ANY degree of overlap
- Provides interpretable outputs showing the model's 3D understanding
- Better aligns with downstream binocular vision tasks

1 3

Thibaut Loiseau @thibautloiseau.bsky.social · Mar 11

4/13 The key insight: For each pixel in one image, we explicitly predict whether it is:
- Co-visible in the second image
- Occluded in the second image
- Outside the field of view in the second image

1 3

Thibaut Loiseau @thibautloiseau.bsky.social · Mar 11

3/13 We introduce Alligat0R, which reformulates this problem as a co-visibility segmentation task instead of trying to reconstruct masked regions of images.

1 4

Thibaut Loiseau @thibautloiseau.bsky.social · Mar 11

2/13 Current methods for training vision models to understand 3D relationships between images (like CroCo) require substantial overlap (>50%) between training image pairs. This limits their effectiveness in many real-world scenarios.

1 2

Thibaut Loiseau @thibautloiseau.bsky.social · Mar 11

1/13 🐊 Introducing our latest work on improving relative camera pose regression with a novel pre-training approach Alligat0R (arxiv.org/abs/2503.07561)!
@gbourmaud.bsky.social @vincentlepetit.bsky.social

4 5 20