I like 3D vision and training neural networks.
Code: https://github.com/parskatt
Weights: https://github.com/Parskatt/storage/releases/tag/roma
Zhimin Shao, Abhay Yadav, Rama Chellappa, Cheng Peng
tl;dr: 3D VFM+2D ConvNet->feature extraction backbone; 3D descriptor head (for geometry)+2D warp head (for pattern) fusion
arxiv.org/abs/2511.17750
Zhimin Shao, Abhay Yadav, Rama Chellappa, Cheng Peng
tl;dr: 3D VFM+2D ConvNet->feature extraction backbone; 3D descriptor head (for geometry)+2D warp head (for pattern) fusion
arxiv.org/abs/2511.17750
However, for 3D tasks we show that a scaled and simplified version of multi-view MAE (which we call MuM) can outperform DINOv3, all while using orders of magnitude less compute!
TLDR; Spiritual successor to CroCo with a simpler multi-view objective and larger scale. Beats DINOv3 and CroCo v2 in RoMa, feedforward reconstruction, and rel. pose.
arxiv.org/abs/2511.17309
github.com/davnords/mum
However, for 3D tasks we show that a scaled and simplified version of multi-view MAE (which we call MuM) can outperform DINOv3, all while using orders of magnitude less compute!
Results are pretty crisp, but it doesn't really deal with clouds (it's literally just a linear model on top of some coarse segmentation output).
Results are pretty crisp, but it doesn't really deal with clouds (it's literally just a linear model on top of some coarse segmentation output).
Here are the main improvements we made since RoMa:
Here are the main improvements we made since RoMa:
Mårten Wadenbäck, Marcus Valtonen Örnhag, @parskatt.bsky.social
tl;dr: minimal solvers for one-sided/two-sided equal/two-sided independent radial distortion homography
arxiv.org/abs/2508.21190
Should make stuff involving mask2former a bit smoother.
Should make stuff involving mask2former a bit smoother.
I like that it adds a bit of excitement to the coding experience.
I like that it adds a bit of excitement to the coding experience.
Some other countries do it the other way round.
Must say I prefer the former.
Some other countries do it the other way round.
Must say I prefer the former.