Czech Republic

Giorgos Tolias

@gtolias.bsky.social

greeks are sometimes accused of faking statistics, i bet the actual number is more than 26.2 🧀

Giorgos Tolias

@gtolias.bsky.social

I'd more focus on more down-to-earth improvements: GS to remove self-citations (I am always surprised how come they still did not do it) and calculate an additional normalized metric (eg. divide by the number of authors)

1 5

Giorgos Tolias

@gtolias.bsky.social

Not just a random behaviour. Dot product between two such global descriptors (sum of local descriptors) is the same as the sum of all local desciptor dot products between the two images (can see it as all point correspondences).

Giorgos Tolias

@gtolias.bsky.social

Better than just hope: in our ECCV2020 we show that GAP (global average pooling) is a good way to optimize local descriptors using image-level supervision and loss. The dino.txt paper switches from the CLS token to GAP and look at the segmentation task improvements.

1 4

Processing and acquisition traces in visual encoders: What does CLIP know about your camera?

Giorgos Tolias

@gtolias.bsky.social

bsky.app/profile/stoj...

Vladan Stojnić @stojnicv.xyz · Aug 18

If this caught your attention, check out our new paper.

Processing and acquisition traces in visual encoders: What does CLIP know about your camera?

arxiv.org/abs/2508.10637

To be presented at #ICCV2025 (highlight). @iccv.bsky.social

Prior work has analyzed the robustness of visual encoders to image transformations and corruptions, particularly in cases where such alterations are not seen during training. When this occurs, they in...

arxiv.org

Giorgos Tolias

@gtolias.bsky.social

Of course there is nothing we can tell about the privately kept training sets.

Giorgos Tolias

@gtolias.bsky.social

Shortcut learning hypothesis: Captions containing exif data, or JPEG compression settings may result in shortcut learning for CVLs by capturing pixel-level noise. We looked into LAION and found only a super insignificant amount of such captions. Such shortcut learning is unlikely to happen.

Giorgos Tolias

@gtolias.bsky.social

A CVL is better able to predict the image processing or acquisition settings given solely the image representation when it is trained without image augmentations.

Giorgos Tolias

@gtolias.bsky.social

Contrastive Vision Language models, eg. CLIP, are very sensitive, while self-supervised models, eg. DINOv2, are the least sensitive. We identify the lack of training image augmentations as one of the reasons for such flaws.

Processing and acquisition traces in visual encoders: What does CLIP know about your camera?

Giorgos Tolias

@gtolias.bsky.social

The hidden flaws in your favorite foundation model: We've uncovered how subtle image metadata (JPEG params, camera type, etc.) systematically biases visual representations and consequently affect the object recognition ability. To be presented at #ICCV2025 as a highlight paper.

Vladan Stojnić @stojnicv.xyz · Aug 18

Have you ever asked yourself how much your favorite vision model knows about image capture parameters (e.g., the amount of JPEG compression, the camera model, etc.)? Furthermore, could these parameters influence its semantic recognition abilities?

2 2 10

Reposted by: Czech Republic

Vladan Stojnić

@stojnicv.xyz

If this caught your attention, check out our new paper.

Processing and acquisition traces in visual encoders: What does CLIP know about your camera?

arxiv.org/abs/2508.10637

To be presented at #ICCV2025 (highlight). @iccv.bsky.social

Prior work has analyzed the robustness of visual encoders to image transformations and corruptions, particularly in cases where such alterations are not seen during training. When this occurs, they in...

arxiv.org

1 5

Reposted by: Czech Republic

Vladan Stojnić

@stojnicv.xyz

Have you ever asked yourself how much your favorite vision model knows about image capture parameters (e.g., the amount of JPEG compression, the camera model, etc.)? Furthermore, could these parameters influence its semantic recognition abilities?

1 8 26

Giorgos Tolias

@gtolias.bsky.social

I agree that image-level supervision and good image-level recognition does not necessarily translate to well localized and discriminative local features.

Giorgos Tolias

@gtolias.bsky.social

Joint vision-language training essential allows vision encoder training to be supervised by any description, either at semantic-level or at instance-level. Using the latter together with the scale of the training is possibly a way to explain such good instance-level recognition performance.

ILIAS | Instance-level Retrieval at Scale

Giorgos Tolias

@gtolias.bsky.social

SfM/SLAM follow an instance-level class definition. On ILIAS benchmark which evaluates the instance-level recognition ability (it's not about geometry), SigLIP (1&2) are significantly better than DINOv2. Before this result I had a similar intuition as yours, not anymore.
vrg.fel.cvut.cz/ilias/

Instance-level Retrieval at Scale

vrg.fel.cvut.cz

2 5

Reposted by: Czech Republic

Giorgos Kordopatis-Zilos

@gkordo.bsky.social

🚨 Deadline Extension

Instance-Level Recognition and Generation (ILR+G) Workshop at ICCV2025 @iccv.bsky.social

📅 new deadline: June 26, 2025 (23:59 AoE)
📄 paper submission: cmt3.research.microsoft.com/ILRnG2025
🌐 ILR+G website: ilr-workshop.github.io/ICCVW2025/

#ICCV2025 #ComputerVision #AI

1 10 12

Reposted by: Czech Republic

Philippe Weinzaepfel

@weinzaepfelp.bsky.social

When were #CVPR2025 papers available on arXiv? 👇

2 9 39

Giorgos Tolias

@gtolias.bsky.social

Unfortunately we have to wait at least until 2032 for CVPR to be outside of USA.

#CVPR2026 @cvprconference.bsky.social · Jun 14

Underway at PAMI TC: the upcoming meetings 🌎

1 6

Reposted by: Czech Republic

#CVPR2026

@cvprconference.bsky.social

Underway at PAMI TC: the upcoming meetings 🌎

4

Reposted by: Czech Republic

Vladan Stojnić

@stojnicv.xyz

Are you at @cvprconference.bsky.social #CVPR2025 ? Come and check out LPOSS.

We show how can graph-based label propagation be used to improve weak, patch-level predictions from VLMs for open-vocabulary semantic segmentation.

📅 June 13, 2025, 16:00 – 18:00 CDT
📍 Location: ExHall D, Poster #421

1 6 14

Reposted by: Czech Republic

Giorgos Kordopatis-Zilos

@gkordo.bsky.social

Are you at @cvprconference.bsky.social? Come by our poster!
📅 Sat 14/6, 10:30-12:30
📍 Poster #395, ExHall D

9 17

Giorgos Tolias

@gtolias.bsky.social

Fri 16:00-18:00 LPOSS: Label Propagation Over Patches and Pixels for Open- vocabulary Semantic Segmentation,
Sat 10:30-12:30 ILIAS: Instance-Level Image retrieval At Scale

Giorgos Tolias

@gtolias.bsky.social

VRG is presenting 8 papers at #CVPR2025. You can find me and collaborators at the following 4 posters:

Fri 10:30-12:30 A Dataset for Semantic Segmentation in the Presence of Unknowns
Fri 16:00-18:00 LOCORE: Image Re-ranking with Long-Context Sequence Modeling

1 3 10

Giorgos Tolias

@gtolias.bsky.social

solidarity is not a crime, genocide is

6

Reposted by: Czech Republic

Giorgos Kordopatis-Zilos

@gkordo.bsky.social

Call for Papers update - ILR+G workshop @iccv.bsky.social

We will now feature a single submission track with new submission dates.

📅 New submission deadline: June 21, 2025
🔗 Submit here: cmt3.research.microsoft.com/ILRnG2025
🌐 More details: ilr-workshop.github.io/ICCVW2025/

#ICCV2025

8 10

Giorgos Tolias

@gtolias.bsky.social

then you are not one of us. so yes, such a human exists.

1 2

Giorgos Tolias

@gtolias.bsky.social

is there any human that does not type "ration" 50% of the times they intend to type "ratio"?

1 4