Kwang Moo Yi
@kmyid.bsky.social
33 followers 39 following 12 posts
Assistant Professor of Computer Science at the University of British Columbia. I also post my daily finds on arxiv.
Posts Media Videos Starter Packs
kmyid.bsky.social
Xu and Lin et al., "Pixel-Perfect Depth with Semantics-Prompted Diffusion Transformers"

Append foundational features at the later stages when doing marigold-like denoising to get monocular depth. Simple straightforward idea that works.
kmyid.bsky.social
Bamberger and Jones et al., "Carré du champ flow matching: better quality-generalisation tradeoff in generative models"

Geometric regularization of the flow manifold. Boils down to adding anisotropic Gaussian Noise to flow matching training. Neat idea, enhances generalization.
kmyid.bsky.social
Yugay and Nguyen et al., “Visual Odometry with Transformers”

Instead of point maps, you can also directly output poses. This used to be much less accurate, but now it's the opposite. Simple architecture that directly predicts camera embeddings, which then regress rot and trans.
kmyid.bsky.social
Chen et al., "TTT3R: 3D Reconstruction as Test-Time Training"

Cut3R + gated updates for states (test-time training layers) = fast/efficient performance of cut3r, but with high-quality estimates.
kmyid.bsky.social
Two today: Kim et al., "How Diffusion Models Memorize" and Song and Kim et al., "Selective Underfitting in Diffusion Models"

A deep dive into how memorization and generalization happen in diffusion models. Still trying to digest what these mean. Though-provoking.
kmyid.bsky.social
Barroso-Laguna et al., "A Scene is Worth a Thousand Features: Feed-Forward Camera Localization from a Collection of Image Features"

When contexting your feed-forward 3D point-map estimator, don't use full image pairs -- just randomly subsample! -> fast compute, more images.