Thibaut Loiseau
@thibautloiseau.bsky.social
410 followers 270 following 26 posts
PhD Student at IMAGINE (ENPC) Working on camera pose estimation thibautloiseau.github.io
Posts Media Videos Starter Packs
Pinned
thibautloiseau.bsky.social
1/13 🐊 Introducing our latest work on improving relative camera pose regression with a novel pre-training approach Alligat0R (arxiv.org/abs/2503.07561)!
@gbourmaud.bsky.social @vincentlepetit.bsky.social
Reposted by Thibaut Loiseau
nicolasdufour.bsky.social
🚀 DinoV3 just became the new go-to backbone for geoloc!
It outperforms CLIP-like models (SigLip2, finetuned StreetCLIP)… and that’s shocking 🤯
Why? CLIP models have an innate advantage — they literally learn place names + images. DinoV3 doesn’t.
Reposted by Thibaut Loiseau
imagineenpc.bsky.social
Some of our IMAGINE members at #CVPR2025
thibautloiseau.bsky.social
I will be at #CVPR2025 to present this work (RUBIK: A Structured Benchmark for Image Matching across Geometric Challenges) at 4pm, poster #88.
Come if you want to discuss!
Reposted by Thibaut Loiseau
vincentlepetit.bsky.social
I am heartbroken that I am not at the conference, but seeing what the government is doing to its people and the world, I simply couldn't go there.
Reposted by Thibaut Loiseau
imagineenpc.bsky.social
Looking forward to #CVPR2025! We will present the following papers:
Reposted by Thibaut Loiseau
nicolasdufour.bsky.social
This is an idea I've had for a while, but wow, it's working way better than expected! 🚀
The model looks really promising, even though it's just 256px for now.
Reposted by Thibaut Loiseau
lucasventura.com
Introducing Chapter-Llama #CVPR2025, a framework for 𝐯𝐢𝐝𝐞𝐨 𝐜𝐡𝐚𝐩𝐭𝐞𝐫𝐢𝐧𝐠 using Large Language Models! 🎬🦙

Check it out:
📄 Paper: arxiv.org/abs/2504.00072
🔗 Project: imagine.enpc.fr/~lucas.ventu...
💻 Code: github.com/lucas-ventur...
🤗 Demo: huggingface.co/spaces/lucas...
Reposted by Thibaut Loiseau
davidpicard.bsky.social
🔥🔥🔥 CV Folks, I have some news! We're organizing a 1-day meeting in center Paris on June 6th before CVPR called CVPR@Paris (similar as NeurIPS@Paris) 🥐🍾🥖🍷

Registration is open (it's free) with priority given to authors of accepted papers: cvprinparis.github.io/CVPR2025InPa...

Big 🧵👇 with details!
Reposted by Thibaut Loiseau
imagineenpc.bsky.social
Starter pack including some of the lab members: go.bsky.app/QK8j87w
thibautloiseau.bsky.social
In the end, we might not care about explicit correspondences during pre-training, as it might already happen implicitly as was seen in CroCo. Also, the checks are done in 3D in the pipeline, and it is difficult to get pixel level correspondences with the current approach.
thibautloiseau.bsky.social
Hi Johan, thanks for your question :) For now, each pixel only has its associated class, but we might be able to add explicit correspondences between pixels in the pipeline.
Reposted by Thibaut Loiseau
parskatt.bsky.social
Introducing DaD, Part 2, a pretty cool keypoint detector.
parskatt.bsky.social
Introducing DaD (arxiv.org/abs/2503.07347), a pretty cool keypoint detector.
As this will get pretty long, this will be two threads.
The first will go into the RL part, and the second on the emergence and distillation.
thibautloiseau.bsky.social
12/13 Code and the Cub3 dataset will be released soon. Stay tuned!
thibautloiseau.bsky.social
11/13 The implications are exciting: Alligat0R enables more robust visual perception systems that can handle and benefit from challenging real-world scenarios with varying degrees of overlap between views.
thibautloiseau.bsky.social
10/13 Our approach not only improves performance but also provides interpretable visualizations of the model's geometric understanding through its segmentation outputs.
thibautloiseau.bsky.social
9/13 Alligat0R works particularly well on difficult pairs, maintaining strong performance even as overlap decreases, while CroCo's accuracy drops dramatically below 40% overlap.
thibautloiseau.bsky.social
8/13 On the RUBIK benchmark, our method achieves 60.3% accuracy (at 5°/2m threshold) compared to just 19.1% for the best CroCo model!
thibautloiseau.bsky.social
7/13 In experiments, Alligat0R significantly outperforms CroCo for relative pose regression, with the same architecture, especially in challenging scenarios with limited overlap between views.
thibautloiseau.bsky.social
6/13 To enable this approach, we created Cub3, a large-scale dataset with 2.5M image pairs and dense co-visibility annotations derived from nuScenes, featuring pairs with varying degrees of overlap, scale ratio and viewpoint angle.
thibautloiseau.bsky.social
5/13 This formulation offers major advantages:
- Can use image pairs with ANY degree of overlap
- Provides interpretable outputs showing the model's 3D understanding
- Better aligns with downstream binocular vision tasks
thibautloiseau.bsky.social
4/13 The key insight: For each pixel in one image, we explicitly predict whether it is:
- Co-visible in the second image
- Occluded in the second image
- Outside the field of view in the second image
thibautloiseau.bsky.social
3/13 We introduce Alligat0R, which reformulates this problem as a co-visibility segmentation task instead of trying to reconstruct masked regions of images.
thibautloiseau.bsky.social
2/13 Current methods for training vision models to understand 3D relationships between images (like CroCo) require substantial overlap (>50%) between training image pairs. This limits their effectiveness in many real-world scenarios.
thibautloiseau.bsky.social
1/13 🐊 Introducing our latest work on improving relative camera pose regression with a novel pre-training approach Alligat0R (arxiv.org/abs/2503.07561)!
@gbourmaud.bsky.social @vincentlepetit.bsky.social