Lightnews — Scholar-powered news

Giorgos Tolias

@gtolias.bsky.social

1.2K followers 440 following 120 posts

Associate Professor at CTU in Prague. Computer Vision Researcher at the Visual Recognition Group vrg.fel.cvut.cz. Made in Greece, exported to France and Czech Republic. https://cmp.felk.cvut.cz/~toliageo

cmp.felk.cvut.cz

Posts Media Videos Starter Packs

Reposted by Giorgos Tolias

Dmytro Mishkin @ducha-aiki.bsky.social · 17h

For those going to @iccv.bsky.social, welcome to our RANSAC tutorial on October 2025 with
- Daniel Barath
- @ericbrachmann.bsky.social
- Viktor Larsson
- Jiri Matas
- and me
danini.github.io/ransac-2025-...
#ICCV2025

Giorgos Tolias @gtolias.bsky.social · 2d

This is an event with physical attendance only.

Giorgos Tolias @gtolias.bsky.social · 2d

@sattlertorsten.bsky.social @pesarlin.bsky.social @vickykalogeiton.bsky.social @spyrosgidaris.bsky.social @annakukleva.bsky.social @lukasneumann.bsky.social

Giorgos Tolias @gtolias.bsky.social · 2d

The Visual Recognition Group at CTU in Prague organizes the 50th Pattern Recognition and Computer Vision Colloquium with
Torsten Sattler, Paul-Edouard Sarlin, Vicky Kalogeiton, Spyros Gidaris, Anna Kukleva, and Lukas Neumann.
On Thursday Oct 9, 11:00-17:00.

cmp.felk.cvut.cz/colloquium/

Reposted by Giorgos Tolias

EurIPS Conference @euripsconf.bsky.social · 19d

Congratulations to everyone who got their @neuripsconf.bsky.social papers accepted 🎉🎉🎉

At #EurIPS we are looking forward to welcoming presentations of all accepted NeurIPS papers, including a new “Salon des Refusés” track for papers which were rejected due to space constraints!

Giorgos Tolias @gtolias.bsky.social · 21d

correlation does not equal causality

Silkabelli (mutazione-piromane Edition) @silkabelli.bsky.social · 22d

Queen

Reposted by Giorgos Tolias

David Picard @davidpicard.bsky.social · 25d

Interesting graphs from csconferences.org
Trying to predict the inflection point of the sigmoid.

Giorgos Tolias @gtolias.bsky.social · Sep 8

#skyvision

Giorgos Tolias @gtolias.bsky.social · Sep 8

This hints that a methodology like the one proposed in TULIP (by UC Berkeley) that optimize cross-modal and intra-model relationships during pre-training should make it into mainstream beast models.
arxiv.org/abs/2503.15485

TULIP: Towards Unified Language-Image Pretraining

Despite the recent success of image-text contrastive models like CLIP and SigLIP, these models often struggle with vision-centric tasks that demand high-fidelity image understanding, such as counting,...

Giorgos Tolias @gtolias.bsky.social · Sep 8

Hypothesis: VLMs only optimize cross-modal relationships and not image-to-image relatioshions and, as a consequence, the visual representation space exhibits low local semantic consistency. Nevertheless, this appears to be easy to fix at a post-pre-training stage.

Giorgos Tolias @gtolias.bsky.social · Sep 8

After such a linear adaptation, perception encoder is the new SoA achieving 33.4% vs 28.3% achieved by DINOv3. Without the adaptation step the respective performance is 22.0% and 26.5%.

Giorgos Tolias @gtolias.bsky.social · Sep 8

In a different evaluation setting we train a linear layer ("adapt") on top frozen networks that generate global image descriptors. Here is the recurring observation: vision encoders of VLMs benefit a lot by this step, while vision models pre-trained with SSL benefit little or not at all.

Giorgos Tolias @gtolias.bsky.social · Sep 8

Crash test your foundational models for object recognition at its finest granularity. Here are the updated results on our instance-level image retrieval benchmark (ILIAS-CVPR'25). DINOv3 and Perception Encoder (PE) are included, with DINOv3 being the new SoA! Oh, but no, look at this...

Giorgos Kordopatis-Zilos @gkordo.bsky.social · Sep 5

🚀 new state-of-the-art on ILIAS dataset!

Curious how well the latest models can recognize particular objects?
We evaluated the base and large variants of DINOv3 and Perception Encoder (PE) on instance-level image retrieval.

See the results 👉 vrg.fel.cvut.cz/ilias/

Reposted by Giorgos Tolias

Noa Garcia @noagarciad.bsky.social · Sep 6

Department of War. Now with 100% less euphemism. A moment of clarity for research on DoD grants. For even more clarity, check @davidthewid.bsky.social work

Basic Research, Lethal Effects: Military AI Research Funding as Enlistment

In the context of unprecedented U.S. Department of Defense (DoD) budgets, this paper examines the recent history of DoD funding for academic research in algorithmically based warfighting. We draw from...

Giorgos Tolias @gtolias.bsky.social · Sep 1

#skyvision

Giorgos Tolias @gtolias.bsky.social · Sep 1

The Colloquium in Pattern Recognition and Computer Vision of the Visual Recognition Group at CTU in Prague has a long tradition dating back to 1998. The list of all speakers is available docs.google.com/spreadsheets.... Enjoy! The 50th edition is coming soon cmp.felk.cvut.cz/colloquium/

Pattern Recognition and Computer Vision Colloquium - past speakers

docs.google.com

Reposted by Giorgos Tolias

Evangelos Kazakos @ekazakos.bsky.social · Aug 31

Hi friends! I made a computer vision feed that was missing on Bluesky.

You can find it here: bsky.app/profile/did:.... Pin it to your profile if you like it. 😉

It filters relevant posts based on regular expressions. But to make sure that your post is included in the feed, add the #skyvision tag.

Giorgos Tolias @gtolias.bsky.social · Aug 24

CLIP is often refered to as VLM in the CV literature. I'm surprised someone never saw it, which line of work are you following? VLM is overloaded though; used for generative models too. Therefore, we chose CVL. Not a big deal though, the screenshot you shared is self-explanatory in my opinion.

Giorgos Tolias @gtolias.bsky.social · Aug 22

greeks are sometimes accused of faking statistics, i bet the actual number is more than 26.2 🧀

Giorgos Tolias @gtolias.bsky.social · Aug 20

I'd more focus on more down-to-earth improvements: GS to remove self-citations (I am always surprised how come they still did not do it) and calculate an additional normalized metric (eg. divide by the number of authors)

Giorgos Tolias @gtolias.bsky.social · Aug 19

Not just a random behaviour. Dot product between two such global descriptors (sum of local descriptors) is the same as the sum of all local desciptor dot products between the two images (can see it as all point correspondences).

Giorgos Tolias @gtolias.bsky.social · Aug 19

Better than just hope: in our ECCV2020 we show that GAP (global average pooling) is a good way to optimize local descriptors using image-level supervision and loss. The dino.txt paper switches from the CLS token to GAP and look at the segmentation task improvements.

Giorgos Tolias @gtolias.bsky.social · Aug 19

bsky.app/profile/stoj...

Vladan Stojnić @stojnicv.xyz · Aug 18

If this caught your attention, check out our new paper.

Processing and acquisition traces in visual encoders: What does CLIP know about your camera?

arxiv.org/abs/2508.10637

To be presented at #ICCV2025 (highlight). @iccv.bsky.social

Processing and acquisition traces in visual encoders: What does CLIP know about your camera?

Prior work has analyzed the robustness of visual encoders to image transformations and corruptions, particularly in cases where such alterations are not seen during training. When this occurs, they in...

Giorgos Tolias @gtolias.bsky.social · Aug 19

Of course there is nothing we can tell about the privately kept training sets.

Giorgos Tolias @gtolias.bsky.social · Aug 19

Shortcut learning hypothesis: Captions containing exif data, or JPEG compression settings may result in shortcut learning for CVLs by capturing pixel-level noise. We looked into LAION and found only a super insignificant amount of such captions. Such shortcut learning is unlikely to happen.