Dmytro Mishkin
@ducha-aiki.bsky.social
2.5K followers 150 following 960 posts
Marrying classical CV and Deep Learning. I do things, which work, rather than being novel, but not working. http://dmytro.ai
Posts Media Videos Starter Packs
ducha-aiki.bsky.social
True. However, original paper visualization are very weird as well.
ducha-aiki.bsky.social
ImageNet-trained CNNs are not biased towards texture: Revisiting feature reliance through controlled suppression

Tom Burgert, Oliver Stoll, Paolo Rota, Begüm Demir
tl;dr: maybe CNNs are not texture-dependent? Different protocol -> different results.
arxiv.org/abs/2509.20234
Reposted by Dmytro Mishkin
ericbrachmann.bsky.social
I think it is a great time to have such a tutorial again. As we see competitive RANSAC-free approaches arise, it is worth looking back - and looking forward.
ducha-aiki.bsky.social
For those going to @iccv.bsky.social, welcome to our RANSAC tutorial on October 2025 with
- Daniel Barath
- @ericbrachmann.bsky.social
- Viktor Larsson
- Jiri Matas
- and me
danini.github.io/ransac-2025-...
#ICCV2025
ducha-aiki.bsky.social
Go to our ICCV tutorial to learn how to RANSAC properly :)
danini.github.io/ransac-2025-...
ducha-aiki.bsky.social
For those going to @iccv.bsky.social, welcome to our RANSAC tutorial on October 2025 with
- Daniel Barath
- @ericbrachmann.bsky.social
- Viktor Larsson
- Jiri Matas
- and me
danini.github.io/ransac-2025-...
#ICCV2025
ducha-aiki.bsky.social
We have updated the paper and:
1) Code and data is here github.com/s23dr/wirefr...
2) Do we have enough comparisons and are our results stable? Yes.
3) Minor fixes and updates.
#ICCV2025
ducha-aiki.bsky.social
And I think it’s beautiful
ducha-aiki.bsky.social
I think that model souping is a bit less surprising, because the tasks are same there
ducha-aiki.bsky.social
PowerPoint online is so much worse lag-wise than Google Slides.
ducha-aiki.bsky.social
Yes. And this is probably even more amazing.
ducha-aiki.bsky.social
No, just a common practice from people, who use diffusion models daily.
ducha-aiki.bsky.social
The most underappreciated thing about ControlNets and LoRa is that we can just combine them and it is working as expected.
Way more mind-blowing, that old word2vec king-man+woman=queen example.
Reposted by Dmytro Mishkin
ekazakos.bsky.social
To keep the tradition, the lineup is 🔥🔥🔥
gtolias.bsky.social
The Visual Recognition Group at CTU in Prague organizes the 50th Pattern Recognition and Computer Vision Colloquium with
Torsten Sattler, Paul-Edouard Sarlin, Vicky Kalogeiton, Spyros Gidaris, Anna Kukleva, and Lukas Neumann.
On Thursday Oct 9, 11:00-17:00.

cmp.felk.cvut.cz/colloquium/
Reposted by Dmytro Mishkin
gtolias.bsky.social
The Visual Recognition Group at CTU in Prague organizes the 50th Pattern Recognition and Computer Vision Colloquium with
Torsten Sattler, Paul-Edouard Sarlin, Vicky Kalogeiton, Spyros Gidaris, Anna Kukleva, and Lukas Neumann.
On Thursday Oct 9, 11:00-17:00.

cmp.felk.cvut.cz/colloquium/
ducha-aiki.bsky.social
Geometry Meets Vision: Revisiting Pretrained Semantics in Distilled Fields

Zhiting Mei, Ola Shorinwa, Anirudha Majumdar
tl;dr: who cares, look at those dino icons!
OK, distilling DINO into NERF -> better object localization, than VGGT.

arxiv.org/abs/2510.03104
ducha-aiki.bsky.social
A Scene is Worth a Thousand Features: Feed-Forward Camera Localization from a Collection of Image Features

@axelbarroso.bsky.social Tommaso Cavallari, Victor Adrian Prisacariu @ericbrachmann.bsky.social
tl;dr: matching Q to(randomly-sampled) features of DB in Mast3R-style.
arxiv.org/abs/2510.00978
ducha-aiki.bsky.social
Great thread
davidpicard.bsky.social
It's a great discussion because we all have different interpretation of what this means and what the consequences are.

Myself, I have absolutely no doubt that scaling works. If you have all the videos in the world and are able to train a model that can recall and merge any of them, then for sure...
Reposted by Dmytro Mishkin
kmyid.bsky.social
Barroso-Laguna et al., "A Scene is Worth a Thousand Features: Feed-Forward Camera Localization from a Collection of Image Features"

When contexting your feed-forward 3D point-map estimator, don't use full image pairs -- just randomly subsample! -> fast compute, more images.
ducha-aiki.bsky.social
"2m tall reviewer"?
Congrats :)
ducha-aiki.bsky.social
>irrespective of the random seed
Lol what. Is that RANSAC paper? :)
ducha-aiki.bsky.social
The whole post sounds very generic and “for all good, against all bad”
ducha-aiki.bsky.social
That's why I do warm-up, as well, as the fact that I run this a couple of times -- no changes.