Vladan Stojnić
@stojnicv.xyz
630 followers 220 following 36 posts
Ph.D. student at Visual Recognition Group, Czech Technical University in Prague 🔗 https://stojnicv.xyz
Posts Media Videos Starter Packs
Reposted by Vladan Stojnić
gtolias.bsky.social
The Visual Recognition Group at CTU in Prague organizes the 50th Pattern Recognition and Computer Vision Colloquium with
Torsten Sattler, Paul-Edouard Sarlin, Vicky Kalogeiton, Spyros Gidaris, Anna Kukleva, and Lukas Neumann.
On Thursday Oct 9, 11:00-17:00.

cmp.felk.cvut.cz/colloquium/
Reposted by Vladan Stojnić
astylianou.bsky.social
Super happy that QuARI: Query Adaptive Retrieval Improvement was accepted at #NeurIPS2025. You can significantly boost retrieval performance for very hard retrieval tasks by learning query-specific transformations of your encoders. w/ @jacobsn.bsky.social @pless.bsky.social arxiv.org/pdf/2505.21647
Reposted by Vladan Stojnić
gtolias.bsky.social
Crash test your foundational models for object recognition at its finest granularity. Here are the updated results on our instance-level image retrieval benchmark (ILIAS-CVPR'25). DINOv3 and Perception Encoder (PE) are included, with DINOv3 being the new SoA! Oh, but no, look at this...
gkordo.bsky.social
🚀 new state-of-the-art on ILIAS dataset!

Curious how well the latest models can recognize particular objects?
We evaluated the base and large variants of DINOv3 and Perception Encoder (PE) on instance-level image retrieval.

See the results 👉 vrg.fel.cvut.cz/ilias/
Reposted by Vladan Stojnić
gkordo.bsky.social
🚀 new state-of-the-art on ILIAS dataset!

Curious how well the latest models can recognize particular objects?
We evaluated the base and large variants of DINOv3 and Perception Encoder (PE) on instance-level image retrieval.

See the results 👉 vrg.fel.cvut.cz/ilias/
Reposted by Vladan Stojnić
gtolias.bsky.social
The Colloquium in Pattern Recognition and Computer Vision of the Visual Recognition Group at CTU in Prague has a long tradition dating back to 1998. The list of all speakers is available docs.google.com/spreadsheets.... Enjoy! The 50th edition is coming soon cmp.felk.cvut.cz/colloquium/
Pattern Recognition and Computer Vision Colloquium - past speakers
docs.google.com
Reposted by Vladan Stojnić
ducha-aiki.bsky.social
Processing and acquisition traces in visual encoders: What does CLIP know about your camera?

@ryan-ramos.bsky.social @stojnicv.xyz @gkordo.bsky.social Yuta Nakashima @gtolias.bsky.social
@noagarciad.bsky.social
tl;dr: CLIP sees difference DSLR vs iphone, DINO doesn't.
arxiv.org/abs/2508.10637
1/
stojnicv.xyz
When it comes to the CVL term we specifically went with it to discriminate CLIP-like VLMs from the VLMs that can generate text as the term VLM is overused and means many different things in different papers. It to an extent also follows the naming from arxiv.org/pdf/2405.17247
arxiv.org
stojnicv.xyz
I agree that the terminology is confusing. However, I wouldn't agree that CLIP is an SSL method. It uses a contrastive loss, but not with self-supervised labels. DINOv2 and v3 classify it as weakly-supervised as it uses labels coming from the text.
stojnicv.xyz
The same pattern can be observed for the acquisition parameters in the task of near-duplicate retrieval. If the negatives are captured using the same camera as the query, the task becomes harder for some models compared to the case when they are captured by a different camera.
stojnicv.xyz
Impact on the semantic performance is again the most pronounced for contrastive VLMs, and the least for SSL models.

Here, we show kNN classification in a few cases, depending on whether the semantic positives and negatives share the same processing parameters as the test image.
stojnicv.xyz
This impact is especially pronounced when there is a strong correlation/anticorrelation between the semantic and metadata labels. E.g., when semantic positives/negatives have the same/different processing parameters as a query image.
stojnicv.xyz
More strikingly, we show that traces of these metadata labels (processing and acquisition parameters) can significantly impact the semantic recognition abilities.
stojnicv.xyz
A similar pattern is observed for the acquisition parameters, although generally, all models have a harder time predicting these parameters than the processing ones.
stojnicv.xyz
Image processing parameters can be recovered from the representations of frozen models by training a linear layer on top. This ability is especially pronounced for contrastive VLMs (e.g., CLIP). Some supervised models perform strongly as well, while SSL models perform the worst.
stojnicv.xyz
Have you ever asked yourself how much your favorite vision model knows about image capture parameters (e.g., the amount of JPEG compression, the camera model, etc.)? Furthermore, could these parameters influence its semantic recognition abilities?
Reposted by Vladan Stojnić
gkordo.bsky.social
🚨 Deadline Extension

Instance-Level Recognition and Generation (ILR+G) Workshop at ICCV2025 @iccv.bsky.social

📅 new deadline: June 26, 2025 (23:59 AoE)
📄 paper submission: cmt3.research.microsoft.com/ILRnG2025
🌐 ILR+G website: ilr-workshop.github.io/ICCVW2025/

#ICCV2025 #ComputerVision #AI
stojnicv.xyz
Are you at @cvprconference.bsky.social #CVPR2025 ? Come and check out LPOSS.

We show how can graph-based label propagation be used to improve weak, patch-level predictions from VLMs for open-vocabulary semantic segmentation.

📅 June 13, 2025, 16:00 – 18:00 CDT
📍 Location: ExHall D, Poster #421
Reposted by Vladan Stojnić
gkordo.bsky.social
Are you at @cvprconference.bsky.social? Come by our poster!
📅 Sat 14/6, 10:30-12:30
📍 Poster #395, ExHall D
Reposted by Vladan Stojnić
sattlertorsten.bsky.social
Attending @cvprconference.bsky.social and looking for a PhD or postdoc position in the area of 3d reconstruction (Gaussian splatting, nerfs, scene understanding, etc.)? Find me or drop me an email ;)
Reposted by Vladan Stojnić
skamalas.bsky.social
6/ 📄 Paper 2:
"LPOSS: Label Propagation Over Patches and Pixels for Open-Vocabulary Semantic Segmentation"

Can graph-based label propagation refine weak, patch-level predictions from VLMs like CLIP? We say yes — introducing LPOSS and LPOSS+.
Reposted by Vladan Stojnić
klara-cz.bsky.social
⚠️❗Open PhD and Postdoc positions in Prague with Lukas Neumann! ❗⚠️

We rank #5 in computer vision in Europe and Lukas is a great supervisor, so this is a great opportunity!

If you are interested, contact him, he will also be at CVPR with his group :)