vilemil.bsky.social
@vilemil.bsky.social
Computer vision engineer
Interested in: rl, cv, 3d reconstruction, robotics, sfm etc.
Climber, ashtangi.
Reposted
MIT 6.S184 Generative AI With Stochastic Differential Equations: Introduction to Flow Matching and Diffusion Models is now online!
diffusion.csail.mit.edu

www.youtube.com/playlist?lis...
MIT 6.S184: Flow Matching and Diffusion Models - YouTube
MIT 6.S184: Generative AI with Stochastic Differential Equations Lecture notes: https://diffusion.csail.mit.edu/docs/lecture-notes.pdf Course website: https:...
www.youtube.com
March 3, 2025 at 9:19 AM
Reposted
We've built a simulated driving agent that we trained on 1.6 billion km of driving with no human data.
It is SOTA on every planning benchmark we tried.
In self-play, it goes 20 years between collisions.
February 6, 2025 at 6:34 PM
Reposted
Light3R-SfM: Towards Feed-forward Structure-from-Motion

Sven Elflein, Qunjie Zhou, Sérgio Agostinho, @lealtaixe.bsky.social

tl;dr: feed-forward multiview *3R, good for rough pose estimation, optimization might be needed to be more precise

arxiv.org/abs/2501.14914
January 29, 2025 at 8:33 AM
Reposted
TrafficLoc: Localizing Traffic Surveillance Cameras in 3D Scenes

Yan Xia, Yunxiang Lu, Oussema Dhaouadi, João F. Henriques, Daniel Cremers

tl;dr: DUSt3r-PointNet matching with transformer like SG/LG.

arxiv.org/abs/2412.10308
December 16, 2024 at 1:45 PM
Reposted
Cross-View Completion Models are Zero-shot Correspondence Estimators

Honggyu An, Jinhyeon Kim, Seonghoon Park, Jaewoo Jung, Jisang Han, Sunghwan Hong, Seungryong Kim

arxiv.org/abs/2412.09072
December 13, 2024 at 5:49 AM
Reposted
MV-DUSt3R+: Single-Stage Scene Reconstruction from Sparse Views In 2 Seconds

Zhenggang Tang, Yuchen Fan, Dilin Wang, Hongyu Xu, Rakesh Ranjan, Alexander Schwing, Zhicheng Yan

tl;dr: multi-view decoder blocks/Cross-Reference-View attention blocks->DUSt3R

arxiv.org/abs/2412.06974
December 11, 2024 at 6:19 AM
Reposted
RelayGS: Reconstructing Dynamic Scenes with Large-Scale and Complex Motions via Relay Gaussians

Qiankun Gao, et al.

tl;dr: learnable mask->high dynamic foreground+low dynamic background Gaussians; Relay Gaussians->motion trajectories->motion segments

arxiv.org/abs/2412.02493
December 4, 2024 at 5:25 AM
Reposted
Efficient Track Anything

Yunyang Xiong et 12 al.

tl;dr: ViT + Memory bank attention = speed
arxiv.org/abs/2411.18933
December 3, 2024 at 6:59 PM
Reposted
Colmap 3.11 release, a LOT of of cool things:
- incremental mapper with absolute pose priors
- CUDA-based BA through Ceres (disabled by default).
- PoseLib's RANSACs
- New BA covariance estimation, faster and more robust than Ceres.
- Fix Affine-covariant SIFT
github.com/colmap/colma...
Release 3.11.0 · colmap/colmap
New Features New pose prior based incremental mapper that can leverage absolute pose priors from e.g. GPS measurements. New bundle adjustment covariance estimation functionality. Significantly fas...
github.com
December 2, 2024 at 7:21 PM
Reposted
Boost 3D Reconstruction using Diffusion-based Monocular Camera Calibration

Junyuan Deng, Wei Yin, Xiaoyang Guo, Qian Zhang, Xiaotao Hu, Weiqiang Ren, Xiaoxiao Long, Ping Tan
tl;dr: CameraImage ~perspective field+grayscale for diffusion-based monodepth&calibration
arxiv.org/abs/2411.17240
December 2, 2024 at 1:02 PM
Reposted
Great explainer on sinusoidal positional encoding and rotary positional embedding (RoPE).

fleetwood.dev/posts/you-co...
November 30, 2024 at 11:26 PM
Reposted
I would put this even more strongly: open source AI is probably our only realistic chance to avoid a terrifying increase in concentration of power. I do not want to live in a world where the people with all the money also have all the intellectual power.
The most realistic reason to be pro open source AI is to reduce concentration of power.
"money has flowed to tech giants and others in their orbit... [and] raises an uncomfortable prospect: that this supposedly revolutionary technology might never deliver on its promise of broad economic transformation, but instead just concentrate more wealth" www.bloomberg.com/opinion/arti...
November 29, 2024 at 9:35 PM
Reposted
Posting some evergreens for the new crowd. Did you now you can differentiate RANSAC?

If you fix the # of iterations, RANSAC is an argmax over hypotheses. You turn the inlier count into your policy for hypothesis selection, and train with policy gradient (DSAC, CVPR17).

github.com/vislearn/DSA...
November 28, 2024 at 3:42 PM
Reposted
🌟 New Research Alert! 🌟
Excited to share our latest work (accepted to NeurIPS2024) on understanding working memory in multi-task RNN models using naturalistic stimuli!: with @takuito.bsky.social and @bashivan.bsky.social
#tweeprint below:
November 28, 2024 at 4:41 PM
Reposted
Introducing “MAGiC-SLAM: Multi-Agent Gaussian Globally Consistent SLAM”! We do SLAM with novel view synthesis capabilities on multiple simultaneously operating agents!

vladimiryugay.github.io/magic_slam/i...
1/7
November 27, 2024 at 5:34 AM
Reposted
NexusSplats: Efficient 3D Gaussian Splatting in the Wild

Yuzhou Tang, Dejun Xu, Yongjie Hou, Zhenzhong Wang, Min Jiang

tl;dr: nexus kernel->voxel segment->Gaussian primitives->coordinated color mapping; GS-wise uncertainty+boundary penalty

arxiv.org/abs/2411.14514
November 25, 2024 at 4:11 AM