Lightnews — Scholar-powered news

Reposted by Vladan Stojnić

Giorgos Tolias @gtolias.bsky.social · 2d

The Visual Recognition Group at CTU in Prague organizes the 50th Pattern Recognition and Computer Vision Colloquium with
Torsten Sattler, Paul-Edouard Sarlin, Vicky Kalogeiton, Spyros Gidaris, Anna Kukleva, and Lukas Neumann.
On Thursday Oct 9, 11:00-17:00.

cmp.felk.cvut.cz/colloquium/

2 6 21

Reposted by Vladan Stojnić

Abby Stylianou @astylianou.bsky.social · 20d

Super happy that QuARI: Query Adaptive Retrieval Improvement was accepted at #NeurIPS2025. You can significantly boost retrieval performance for very hard retrieval tasks by learning query-specific transformations of your encoders. w/ @jacobsn.bsky.social @pless.bsky.social arxiv.org/pdf/2505.21647

1 4 20

Reposted by Vladan Stojnić

Giorgos Tolias @gtolias.bsky.social · Sep 8

Crash test your foundational models for object recognition at its finest granularity. Here are the updated results on our instance-level image retrieval benchmark (ILIAS-CVPR'25). DINOv3 and Perception Encoder (PE) are included, with DINOv3 being the new SoA! Oh, but no, look at this...

Giorgos Kordopatis-Zilos @gkordo.bsky.social · Sep 5

🚀 new state-of-the-art on ILIAS dataset!

Curious how well the latest models can recognize particular objects?
We evaluated the base and large variants of DINOv3 and Perception Encoder (PE) on instance-level image retrieval.

See the results 👉 vrg.fel.cvut.cz/ilias/

2 2 12

Reposted by Vladan Stojnić

Giorgos Kordopatis-Zilos @gkordo.bsky.social · Sep 5

🚀 new state-of-the-art on ILIAS dataset!

Curious how well the latest models can recognize particular objects?
We evaluated the base and large variants of DINOv3 and Perception Encoder (PE) on instance-level image retrieval.

See the results 👉 vrg.fel.cvut.cz/ilias/

1 5 13

Reposted by Vladan Stojnić

David Picard @davidpicard.bsky.social · Sep 3

Nice idea for cross-modal retrieval by @gtolias.bsky.social and team arxiv.org/abs/2509.00177
Use the text to generate images using an image generative model, and augment the text query with these. A bit brute-force if you ask me, but effectively captures the visual diversity.

Category-level Text-to-Image Retrieval Improved: Bridging the Domain Gap with Diffusion Models and Vision Encoders

This work explores text-to-image retrieval for queries that specify or describe a semantic category. While vision-and-language models (VLMs) like CLIP offer a straightforward open-vocabulary solution,...

arxiv.org

2 1 6

Reposted by Vladan Stojnić

Giorgos Tolias @gtolias.bsky.social · Sep 1

The Colloquium in Pattern Recognition and Computer Vision of the Visual Recognition Group at CTU in Prague has a long tradition dating back to 1998. The list of all speakers is available docs.google.com/spreadsheets.... Enjoy! The 50th edition is coming soon cmp.felk.cvut.cz/colloquium/

Pattern Recognition and Computer Vision Colloquium - past speakers

docs.google.com

1 8 16

Reposted by Vladan Stojnić

Dmytro Mishkin @ducha-aiki.bsky.social · Aug 25

Processing and acquisition traces in visual encoders: What does CLIP know about your camera?

@ryan-ramos.bsky.social @stojnicv.xyz @gkordo.bsky.social Yuta Nakashima @gtolias.bsky.social
@noagarciad.bsky.social
tl;dr: CLIP sees difference DSLR vs iphone, DINO doesn't.
arxiv.org/abs/2508.10637
1/

1 4 15

Vladan Stojnić @stojnicv.xyz · Aug 18

When it comes to the CVL term we specifically went with it to discriminate CLIP-like VLMs from the VLMs that can generate text as the term VLM is overused and means many different things in different papers. It to an extent also follows the naming from arxiv.org/pdf/2405.17247

arxiv.org

1 1

Vladan Stojnić @stojnicv.xyz · Aug 18

I agree that the terminology is confusing. However, I wouldn't agree that CLIP is an SSL method. It uses a contrastive loss, but not with self-supervised labels. DINOv2 and v3 classify it as weakly-supervised as it uses labels coming from the text.

1 2

Vladan Stojnić @stojnicv.xyz · Aug 18

Many thanks to the amazing collaborators: @ryan-ramos.bsky.social , @gkordo.bsky.social , Yuta Nakashima, @gtolias.bsky.social , @noagarciad.bsky.social

4

Vladan Stojnić @stojnicv.xyz · Aug 18

If this caught your attention, check out our new paper.

Processing and acquisition traces in visual encoders: What does CLIP know about your camera?

arxiv.org/abs/2508.10637

To be presented at #ICCV2025 (highlight). @iccv.bsky.social

Processing and acquisition traces in visual encoders: What does CLIP know about your camera?

Prior work has analyzed the robustness of visual encoders to image transformations and corruptions, particularly in cases where such alterations are not seen during training. When this occurs, they in...

arxiv.org

1 5

Vladan Stojnić @stojnicv.xyz · Aug 18

The same pattern can be observed for the acquisition parameters in the task of near-duplicate retrieval. If the negatives are captured using the same camera as the query, the task becomes harder for some models compared to the case when they are captured by a different camera.

1 5

Vladan Stojnić @stojnicv.xyz · Aug 18

Impact on the semantic performance is again the most pronounced for contrastive VLMs, and the least for SSL models.

Here, we show kNN classification in a few cases, depending on whether the semantic positives and negatives share the same processing parameters as the test image.

1 5

Vladan Stojnić @stojnicv.xyz · Aug 18

This impact is especially pronounced when there is a strong correlation/anticorrelation between the semantic and metadata labels. E.g., when semantic positives/negatives have the same/different processing parameters as a query image.

1 5

Vladan Stojnić @stojnicv.xyz · Aug 18

More strikingly, we show that traces of these metadata labels (processing and acquisition parameters) can significantly impact the semantic recognition abilities.

1 1 5

Vladan Stojnić @stojnicv.xyz · Aug 18

A similar pattern is observed for the acquisition parameters, although generally, all models have a harder time predicting these parameters than the processing ones.

1 5

Vladan Stojnić @stojnicv.xyz · Aug 18

Image processing parameters can be recovered from the representations of frozen models by training a linear layer on top. This ability is especially pronounced for contrastive VLMs (e.g., CLIP). Some supervised models perform strongly as well, while SSL models perform the worst.

2 5

Vladan Stojnić @stojnicv.xyz · Aug 18

Have you ever asked yourself how much your favorite vision model knows about image capture parameters (e.g., the amount of JPEG compression, the camera model, etc.)? Furthermore, could these parameters influence its semantic recognition abilities?

1 8 26

Reposted by Vladan Stojnić

Giorgos Kordopatis-Zilos @gkordo.bsky.social · Jun 22

🚨 Deadline Extension

Instance-Level Recognition and Generation (ILR+G) Workshop at ICCV2025 @iccv.bsky.social

📅 new deadline: June 26, 2025 (23:59 AoE)
📄 paper submission: cmt3.research.microsoft.com/ILRnG2025
🌐 ILR+G website: ilr-workshop.github.io/ICCVW2025/

#ICCV2025 #ComputerVision #AI

1 10 12

Vladan Stojnić @stojnicv.xyz · Jun 13

Paper: arxiv.org/abs/2503.19777
Code: github.com/vladan-stojn...
Demo: huggingface.co/spaces/stojn...

Work with @skamalas.bsky.social, Jiri Matas, and @gtolias.bsky.social.

1 2

Vladan Stojnić @stojnicv.xyz · Jun 13

Are you at @cvprconference.bsky.social #CVPR2025 ? Come and check out LPOSS.

We show how can graph-based label propagation be used to improve weak, patch-level predictions from VLMs for open-vocabulary semantic segmentation.

📅 June 13, 2025, 16:00 – 18:00 CDT
📍 Location: ExHall D, Poster #421

1 6 14

Reposted by Vladan Stojnić

Giorgos Kordopatis-Zilos @gkordo.bsky.social · Jun 13

Are you at @cvprconference.bsky.social? Come by our poster!
📅 Sat 14/6, 10:30-12:30
📍 Poster #395, ExHall D

9 17

Reposted by Vladan Stojnić

Torsten Sattler @sattlertorsten.bsky.social · Jun 12

Attending @cvprconference.bsky.social and looking for a PhD or postdoc position in the area of 3d reconstruction (Gaussian splatting, nerfs, scene understanding, etc.)? Find me or drop me an email ;)

10 17

Reposted by Vladan Stojnić

Yannis Kalantidis @skamalas.bsky.social · Jun 9

6/ 📄 Paper 2:
"LPOSS: Label Propagation Over Patches and Pixels for Open-Vocabulary Semantic Segmentation"

Can graph-based label propagation refine weak, patch-level predictions from VLMs like CLIP? We say yes — introducing LPOSS and LPOSS+.

1 1 7

Reposted by Vladan Stojnić

Klara Janouskova @klara-cz.bsky.social · Jun 9

⚠️❗Open PhD and Postdoc positions in Prague with Lukas Neumann! ❗⚠️

We rank #5 in computer vision in Europe and Lukas is a great supervisor, so this is a great opportunity!

If you are interested, contact him, he will also be at CVPR with his group :)

1 5 14