Giorgos Tolias
@gtolias.bsky.social
1.2K followers 440 following 120 posts
Associate Professor at CTU in Prague. Computer Vision Researcher at the Visual Recognition Group vrg.fel.cvut.cz. Made in Greece, exported to France and Czech Republic. https://cmp.felk.cvut.cz/~toliageo
Posts Media Videos Starter Packs
Reposted by Giorgos Tolias
ducha-aiki.bsky.social
For those going to @iccv.bsky.social, welcome to our RANSAC tutorial on October 2025 with
- Daniel Barath
- @ericbrachmann.bsky.social
- Viktor Larsson
- Jiri Matas
- and me
danini.github.io/ransac-2025-...
#ICCV2025
gtolias.bsky.social
This is an event with physical attendance only.
gtolias.bsky.social
The Visual Recognition Group at CTU in Prague organizes the 50th Pattern Recognition and Computer Vision Colloquium with
Torsten Sattler, Paul-Edouard Sarlin, Vicky Kalogeiton, Spyros Gidaris, Anna Kukleva, and Lukas Neumann.
On Thursday Oct 9, 11:00-17:00.

cmp.felk.cvut.cz/colloquium/
Reposted by Giorgos Tolias
euripsconf.bsky.social
Congratulations to everyone who got their @neuripsconf.bsky.social papers accepted 🎉🎉🎉

At #EurIPS we are looking forward to welcoming presentations of all accepted NeurIPS papers, including a new “Salon des Refusés” track for papers which were rejected due to space constraints!
Reposted by Giorgos Tolias
davidpicard.bsky.social
Interesting graphs from csconferences.org
Trying to predict the inflection point of the sigmoid.
gtolias.bsky.social
Hypothesis: VLMs only optimize cross-modal relationships and not image-to-image relatioshions and, as a consequence, the visual representation space exhibits low local semantic consistency. Nevertheless, this appears to be easy to fix at a post-pre-training stage.
gtolias.bsky.social
After such a linear adaptation, perception encoder is the new SoA achieving 33.4% vs 28.3% achieved by DINOv3. Without the adaptation step the respective performance is 22.0% and 26.5%.
gtolias.bsky.social
In a different evaluation setting we train a linear layer ("adapt") on top frozen networks that generate global image descriptors. Here is the recurring observation: vision encoders of VLMs benefit a lot by this step, while vision models pre-trained with SSL benefit little or not at all.
gtolias.bsky.social
Crash test your foundational models for object recognition at its finest granularity. Here are the updated results on our instance-level image retrieval benchmark (ILIAS-CVPR'25). DINOv3 and Perception Encoder (PE) are included, with DINOv3 being the new SoA! Oh, but no, look at this...
gkordo.bsky.social
🚀 new state-of-the-art on ILIAS dataset!

Curious how well the latest models can recognize particular objects?
We evaluated the base and large variants of DINOv3 and Perception Encoder (PE) on instance-level image retrieval.

See the results 👉 vrg.fel.cvut.cz/ilias/
gtolias.bsky.social
The Colloquium in Pattern Recognition and Computer Vision of the Visual Recognition Group at CTU in Prague has a long tradition dating back to 1998. The list of all speakers is available docs.google.com/spreadsheets.... Enjoy! The 50th edition is coming soon cmp.felk.cvut.cz/colloquium/
Pattern Recognition and Computer Vision Colloquium - past speakers
docs.google.com
Reposted by Giorgos Tolias
ekazakos.bsky.social
Hi friends! I made a computer vision feed that was missing on Bluesky.

You can find it here: bsky.app/profile/did:.... Pin it to your profile if you like it. 😉

It filters relevant posts based on regular expressions. But to make sure that your post is included in the feed, add the #skyvision tag.
gtolias.bsky.social
CLIP is often refered to as VLM in the CV literature. I'm surprised someone never saw it, which line of work are you following? VLM is overloaded though; used for generative models too. Therefore, we chose CVL. Not a big deal though, the screenshot you shared is self-explanatory in my opinion.
gtolias.bsky.social
greeks are sometimes accused of faking statistics, i bet the actual number is more than 26.2 🧀
gtolias.bsky.social
I'd more focus on more down-to-earth improvements: GS to remove self-citations (I am always surprised how come they still did not do it) and calculate an additional normalized metric (eg. divide by the number of authors)
gtolias.bsky.social
Not just a random behaviour. Dot product between two such global descriptors (sum of local descriptors) is the same as the sum of all local desciptor dot products between the two images (can see it as all point correspondences).
gtolias.bsky.social
Better than just hope: in our ECCV2020 we show that GAP (global average pooling) is a good way to optimize local descriptors using image-level supervision and loss. The dino.txt paper switches from the CLS token to GAP and look at the segmentation task improvements.
gtolias.bsky.social
Of course there is nothing we can tell about the privately kept training sets.
gtolias.bsky.social
Shortcut learning hypothesis: Captions containing exif data, or JPEG compression settings may result in shortcut learning for CVLs by capturing pixel-level noise. We looked into LAION and found only a super insignificant amount of such captions. Such shortcut learning is unlikely to happen.