Zhenjun Zhao
ericzzj.bsky.social
Zhenjun Zhao
@ericzzj.bsky.social
ericzzj1989.github.io
PhD from CUHK. 3D vision, SLAM, SfM, Image Matching (https://github.com/ericzzj1989/Awesome-Image-Matching).
Pinned
🎉 Thrilled to share our CVPR 2025 Award Candidate & Oral paper:

🔹 GlobustVP
Convex Relaxation for Robust Vanishing Point Estimation in Manhattan World

🧱 Global optimality
💥 Tolerates up to 70% outliers
⚡ Fast runtime

📄 Paper: arxiv.org/abs/2505.04788

💻 Code: github.com/WU-CVGL/GlobustVP

1/
VGGT4D: Mining Motion Cues in Visual Geometry Transformers for 4D Scene Reconstruction

Yu Hu, Chong Cheng, Sicheng Yu, Xiaoyang Guo, Hao Wang

tl;dr: VGGT global attention->gram similarity statistics->gradient-aware refinement->dynamic masks->VGGT shallow attentions

arxiv.org/abs/2511.19971
November 26, 2025 at 2:00 PM
AMB3R: Accurate Feed-forward Metric-scale 3D Reconstruction with Backend

Hengyi Wang, Lourdes Agapito

tl;dr: VGGT+scale head->pointmaps+geometric features->sparse voxels->1D sequence->transformer->fused features->zero-convolution->VGGT decoder

arxiv.org/abs/2511.20343
November 26, 2025 at 2:00 PM
SPIDER: Spatial Image CorresponDence Estimator for Robust Calibration

Zhimin Shao, Abhay Yadav, Rama Chellappa, Cheng Peng

tl;dr: 3D VFM+2D ConvNet->feature extraction backbone; 3D descriptor head (for geometry)+2D warp head (for pattern) fusion

arxiv.org/abs/2511.17750
November 25, 2025 at 3:11 PM
SwiftVGGT: A Scalable Visual Geometry Grounded Transformer for Large-Scale Scenes

Jungho Lee, Minhyeok Lee, Sunghun Yang, Minseok Kang, Sangyoun Lee

tl;dr: depth/scale-guided point sampling->non-iterative Sim(3) alignment; DINO patch token->loop closure

arxiv.org/abs/2511.18290
November 25, 2025 at 3:10 PM
4D-VGGT: A General Foundation Model with SpatioTemporal Awareness for Dynamic Scene Geometry Estimation

Haonan Wang, Hanyu Zhou, Haoyue Liu, Luxin Yan

tl;dr: 4D version of VGGT

arxiv.org/abs/2511.18416
November 25, 2025 at 3:09 PM
C3Po: Cross-View Cross-Modality Correspondence by Pointmap Prediction

Kuan Wei Huang, Brandon Li, @bharathhariharan.bsky.social, @snavely.bsky.social

tl;dr: in title; paired floor plans and ground-view photos with annotated correspondences and poses

arxiv.org/abs/2511.18559
November 25, 2025 at 3:09 PM
IDSplat: Instance-Decomposed 3D Gaussian Splatting for Driving Scenes

Carl Lindström, Mahan Rafidashti, Maryam Fatemi, Lars Hammarstrand, @martin-r-oswald.bsky.social, Lennart Svensson

tl;dr: coherent instances->dynamic objects

arxiv.org/abs/2511.19235
November 25, 2025 at 3:07 PM
MetroGS: Efficient and Stable Reconstruction of Geometrically Accurate High-Fidelity Large-Scale Scenes

Kehua Chen, et al.

tl;dr: 2DGS+dense enhancement with π3+monocular&PatchMatch-based multi-view opt.+depth-guided appearance modeling with Tri-MipRF

arxiv.org/abs/2511.19172
November 25, 2025 at 3:06 PM
SP-VINS: A Hybrid Stereo Visual Inertial Navigation System based on Implicit Environmental Map

Xueyu Du, Lilian Zhang, Fuan Duan, Xincan Luo, Maosong Wang, Wenqi Wu, Jun Mao

tl;dr: implicit environment map with keyframes and 2D keypoints->loop closure

arxiv.org/abs/2511.18756
November 25, 2025 at 3:05 PM
An Efficient Closed-Form Solution to Full Visual-Inertial State Initialization

Samuel Cerezo, Seong Hun Lee, @jcivera.bsky.social

tl;dr: in title; not local solver; small-rotation and constant-velocity approximations->analytical solver->VI states

arxiv.org/abs/2511.18910
November 25, 2025 at 3:05 PM
Novel View Synthesis from A Few Glimpses via Test-Time Natural Video Completion

Yan Xu, Yixing Wang, Stella X. Yu

tl;dr: GS init.+interpolated poses->guidance images with uncertainties->video diffusion->pseudo views->GS supervision

arxiv.org/abs/2511.17932
November 25, 2025 at 3:04 PM
Neural Texture Splatting: Expressive 3D Gaussian Splatting for View Synthesis, Geometry, and Dynamic Reconstruction

Yiming Wang, Shaofei Wang, Marko Mihajlovic, Siyu Tang

tl;dr: tri-plane+neural decoder->local RGBA texture fields

arxiv.org/abs/2511.18873
November 25, 2025 at 3:03 PM
SVRecon: Sparse Voxel Rasterization for Surface Reconstruction

Seunghun Oh, Jaesung Choe, Dongjae Lee, Daeun Lee, Seunghoon Jeong, Yu-Chiang Frank Wang, Jaesik Park

tl;dr: SDF->sparse voxel rasterization; initialization+loss improve spatial coherence

arxiv.org/abs/2511.17364
November 24, 2025 at 11:12 AM
SING3R-SLAM: Submap-based Indoor Monocular Gaussian SLAM with 3D Reconstruction Priors

Kunyi Li, @miniemeyer.bsky.social, Sen Wang, Stefano Gasperini, Nassir Navab, Federico Tombari

tl;dr: local 3D reconstruction+global GS

arxiv.org/abs/2511.17207
November 24, 2025 at 11:11 AM
This work has been accepted to WACV 2026!
Surgical Gaussian Surfels: Highly Accurate Real-time Surgical Scene Rendering

@idris1.bsky.social, @ericzzj.bsky.social, Samuel Schmidgall, Yumeng Wang, Paul Maria Scheikl, Axel Krieger

tl;dr: Gaussian Surfels->dynamic surgical scenes

arxiv.org/abs/2503.04079
November 22, 2025 at 6:24 PM
CuriGS: Curriculum-Guided Gaussian Splatting for Sparse View Synthesis

Zijian Wu, Mingfeng Jiang, Zidian Lin, Ying Song, Hanjie Ma, Qun Wu, Dongping Zhang, Guiyang Pu

tl;dr: real views+multiple perturbation magnitudes->pseudo-views->optimization

arxiv.org/abs/2511.16030
November 22, 2025 at 6:14 PM
SAM 3D: 3Dfy Anything in Images

tl;dr: 3D version of SAM

arxiv.org/abs/2511.16144
November 22, 2025 at 6:14 PM
Reposted by Zhenjun Zhao
RoMa v2: Harder Better Faster Denser Feature Matching
@parskatt.bsky.social et 11 al.

tl;dr: in title.
Predict covariance per-pixel, more datasets, use DINOv3, adjust architecture.

arxiv.org/abs/2511.15706
November 20, 2025 at 9:08 AM
Reposted by Zhenjun Zhao
RoMa v2 is now out! (github.com/Parskatt/rom..., arxiv.org/abs/2511.15706)

Here are the main improvements we made since RoMa:
November 20, 2025 at 9:25 AM
Reposted by Zhenjun Zhao
We’re live! 🚀 Streaming: tinyurl.com/bdtk2nzs
The International Workshop on AI4Robotics by @naverlabseurope
2dys of Spatial AI, SLAM, robot learning, HRI, autonomy
This AM CET: @martinhumenberger.bsky.social @marcpollefeys.bsky.social Andrea Vedaldi Cordelia Schmid & @andrewdavidson.bsky.social ⬇️
November 20, 2025 at 8:40 AM
IBGS: Image-Based Gaussian Splatting

Hoang Chuong Nguyen, Wei Mao, Jose M. Alvarez, Miaomiao Liu

tl;dr: base color from 3DGS rendering and learned residual inferred from nearby training images->pixel color

arxiv.org/abs/2511.14357
November 19, 2025 at 7:53 PM
Co-Me: Confidence-Guided Token Merging for Visual Geometric Transformers

Yutian Chen, @yuhengqiu.bsky.social, Ruogu Li, Ali Agha, Shayegan Omidshafiei, Jay Patrikar, @smash0190.bsky.social

tl;dr: ViT->distillation->per-token confidence->rank tokens->selective merging

arxiv.org/abs/2511.14751
November 19, 2025 at 7:53 PM
Towards Rotation-only Imaging Geometry: Rotation Estimation

Xinrui Li, Qi Cai, Yuanxin Wu

tl;dr: pose-only->decouple translation from rotation->rotation-only; reprojection error on rotation manifold

arxiv.org/abs/2511.12415
November 18, 2025 at 1:30 PM
CloseUpShot: Close-up Novel View Synthesis from Sparse-views via Point-conditioned Diffusion Model

Yuqi Zhang, Guanying Chen, Jiaxing Chen, Chuanyu Fu, Chuan Huang, Shuguang Cui

tl;dr: enhance the quality of conditioning images

arxiv.org/abs/2511.13121
November 18, 2025 at 1:30 PM
OmniVGGT: Omni-Modality Driven Visual Geometry Grounded

Haosong Peng, Hao Li, Yalun Dai, @yushi-lan.bsky.social, Yihang Luo, Tianyu Qi, Zhengshen Zhang, Yufeng Zhan, Junfei Zhang, Wenchao Xu, Ziwei Liu

tl;dr: depth and camera intrinsics/extrinsics->VGGT

arxiv.org/abs/2511.10560
November 14, 2025 at 3:11 PM