Scholar

Czech Republic

The Czech Republic, also known as Czechia and historically known as Bohemia, is a landlocked country in Central Europe. The… more

Czech Republic
H-index: 12
Political science 36%
Engineering 27%
gtolias.bsky.social
CLIP is often refered to as VLM in the CV literature. I'm surprised someone never saw it, which line of work are you following? VLM is overloaded though; used for generative models too. Therefore, we chose CVL. Not a big deal though, the screenshot you shared is self-explanatory in my opinion.
gtolias.bsky.social
greeks are sometimes accused of faking statistics, i bet the actual number is more than 26.2 🧀
gtolias.bsky.social
I'd more focus on more down-to-earth improvements: GS to remove self-citations (I am always surprised how come they still did not do it) and calculate an additional normalized metric (eg. divide by the number of authors)
gtolias.bsky.social
Not just a random behaviour. Dot product between two such global descriptors (sum of local descriptors) is the same as the sum of all local desciptor dot products between the two images (can see it as all point correspondences).
gtolias.bsky.social
Better than just hope: in our ECCV2020 we show that GAP (global average pooling) is a good way to optimize local descriptors using image-level supervision and loss. The dino.txt paper switches from the CLS token to GAP and look at the segmentation task improvements.
gtolias.bsky.social
Of course there is nothing we can tell about the privately kept training sets.
gtolias.bsky.social
Shortcut learning hypothesis: Captions containing exif data, or JPEG compression settings may result in shortcut learning for CVLs by capturing pixel-level noise. We looked into LAION and found only a super insignificant amount of such captions. Such shortcut learning is unlikely to happen.
gtolias.bsky.social
A CVL is better able to predict the image processing or acquisition settings given solely the image representation when it is trained without image augmentations.
gtolias.bsky.social
Contrastive Vision Language models, eg. CLIP, are very sensitive, while self-supervised models, eg. DINOv2, are the least sensitive. We identify the lack of training image augmentations as one of the reasons for such flaws.
gtolias.bsky.social
The hidden flaws in your favorite foundation model: We've uncovered how subtle image metadata (JPEG params, camera type, etc.) systematically biases visual representations and consequently affect the object recognition ability. To be presented at #ICCV2025 as a highlight paper.
stojnicv.xyz
Have you ever asked yourself how much your favorite vision model knows about image capture parameters (e.g., the amount of JPEG compression, the camera model, etc.)? Furthermore, could these parameters influence its semantic recognition abilities?

Reposted by: Czech Republic

stojnicv.xyz
Have you ever asked yourself how much your favorite vision model knows about image capture parameters (e.g., the amount of JPEG compression, the camera model, etc.)? Furthermore, could these parameters influence its semantic recognition abilities?
gtolias.bsky.social
I agree that image-level supervision and good image-level recognition does not necessarily translate to well localized and discriminative local features.
gtolias.bsky.social
Joint vision-language training essential allows vision encoder training to be supervised by any description, either at semantic-level or at instance-level. Using the latter together with the scale of the training is possibly a way to explain such good instance-level recognition performance.
gtolias.bsky.social
SfM/SLAM follow an instance-level class definition. On ILIAS benchmark which evaluates the instance-level recognition ability (it's not about geometry), SigLIP (1&2) are significantly better than DINOv2. Before this result I had a similar intuition as yours, not anymore.
vrg.fel.cvut.cz/ilias/
ILIAS | Instance-level Retrieval at Scale
Instance-level Retrieval at Scale
vrg.fel.cvut.cz

Reposted by: Czech Republic

gkordo.bsky.social
🚨 Deadline Extension

Instance-Level Recognition and Generation (ILR+G) Workshop at ICCV2025 @iccv.bsky.social

📅 new deadline: June 26, 2025 (23:59 AoE)
📄 paper submission: cmt3.research.microsoft.com/ILRnG2025
🌐 ILR+G website: ilr-workshop.github.io/ICCVW2025/

#ICCV2025 #ComputerVision #AI

Reposted by: Czech Republic

weinzaepfelp.bsky.social
When were #CVPR2025 papers available on arXiv? 👇
gtolias.bsky.social
Unfortunately we have to wait at least until 2032 for CVPR to be outside of USA.
cvprconference.bsky.social
Underway at PAMI TC: the upcoming meetings 🌎

Reposted by: Czech Republic

cvprconference.bsky.social
Underway at PAMI TC: the upcoming meetings 🌎

Reposted by: Czech Republic

stojnicv.xyz
Are you at @cvprconference.bsky.social #CVPR2025 ? Come and check out LPOSS.

We show how can graph-based label propagation be used to improve weak, patch-level predictions from VLMs for open-vocabulary semantic segmentation.

📅 June 13, 2025, 16:00 – 18:00 CDT
📍 Location: ExHall D, Poster #421

Reposted by: Czech Republic

gkordo.bsky.social
Are you at @cvprconference.bsky.social? Come by our poster!
📅 Sat 14/6, 10:30-12:30
📍 Poster #395, ExHall D
gtolias.bsky.social
Fri 16:00-18:00 LPOSS: Label Propagation Over Patches and Pixels for Open- vocabulary Semantic Segmentation,
Sat 10:30-12:30 ILIAS: Instance-Level Image retrieval At Scale
gtolias.bsky.social
VRG is presenting 8 papers at #CVPR2025. You can find me and collaborators at the following 4 posters:

Fri 10:30-12:30 A Dataset for Semantic Segmentation in the Presence of Unknowns
Fri 16:00-18:00 LOCORE: Image Re-ranking with Long-Context Sequence Modeling
gtolias.bsky.social
solidarity is not a crime, genocide is

Reposted by: Czech Republic

gkordo.bsky.social
Call for Papers update - ILR+G workshop @iccv.bsky.social

We will now feature a single submission track with new submission dates.

📅 New submission deadline: June 21, 2025
🔗 Submit here: cmt3.research.microsoft.com/ILRnG2025
🌐 More details: ilr-workshop.github.io/ICCVW2025/

#ICCV2025
gtolias.bsky.social
then you are not one of us. so yes, such a human exists.
gtolias.bsky.social
is there any human that does not type "ration" 50% of the times they intend to type "ratio"?
gtolias.bsky.social
VRG from CTU in Prague has 9 of its members listed as outstanding reviewers. Congratulations to @gkordo.bsky.social, @billpsomas.bsky.social , @stojnicv.xyz , Pavel Suma, @ducha-aiki.bsky.social , Miroslav Purkrábek, Tomas Vojir, and Yaqing Ding.
cvprconference.bsky.social
Behind every great conference is a team of dedicated reviewers. Congratulations to this year’s #CVPR2025 Outstanding Reviewers!

cvpr.thecvf.com/Conferences/...

References

Fields & subjects

Updated 1m