Lightnews — Scholar-powered news

@alexmrgd.bsky.social

"No Pose at All Self-Supervised Pose-Free 3DGS from Sparse Views"
TLDR: 3DGS + no poses during training/inference; shared feature extraction backbone; simultaneous prediction of 3D Gaussian primitives+camera poses in a canonical space from unposed (1 feed-forward step).

August 7, 2025 at 3:49 PM

Alexandre Morgand, PhD

@alexmrgd.bsky.social

"Any-to-Bokeh: One-Step Video Bokeh via Multi-Plane Image Guided Diffusion"

📖TL;DR: Any-to-Bokeh is a novel one-step video bokeh framework that converts arbitrary input videos into temporally coherent, depth-aware bokeh effects.

June 13, 2025 at 1:43 PM

Alexandre Morgand, PhD

@alexmrgd.bsky.social

"QUEEN: QUantized Efficient ENcoding of Dynamic Gaussians for Streaming Free-viewpoint Videos"

TL;DR: Streamable free-viewpoint videos efficient representations for with dynamic Gaussians. Reduce model size to just 0.7 MB per frame while training in < 5s and rendering at 350 FPS

June 11, 2025 at 9:13 AM

Alexandre Morgand, PhD

@alexmrgd.bsky.social

"STORM: Spatio-Temporal Reconstruction Model for Large-Scale Outdoor Scenes"

TL;DR: Data driven transformer in a feed forward manner; dense reconstruction in dynamic environment with 3D gaussians and velocities; self-supervised scene flows

May 20, 2025 at 4:49 PM

Alexandre Morgand, PhD

@alexmrgd.bsky.social

St4RTrack: Simultaneous 4D Reconstruction and Tracking in the World

TL;DR: a feed-forward; (reconstructs+tracks dynamic video content); dust3r-like pointmaps for a pair of frames captured at different moments (1/2)

April 22, 2025 at 4:30 PM

Alexandre Morgand, PhD

@alexmrgd.bsky.social

FLARE: Feed-forward Geometry, Appearance and Camera Estimation from Uncalibrated Sparse Views

TL;DR: feed-forward model; cascaded learning paradigm with camera pose serving as the critical bridge, recognizing its essential role in mapping 3D structures onto 2D image planes.

March 14, 2025 at 10:21 AM

Alexandre Morgand, PhD

@alexmrgd.bsky.social

⚡️Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass

TL;DR: multi-view generalization to DUSt3R; processing many views in parallel: Transformer-based architecture forwards N images in a single forward pass, bypassing the need for iterative alignment.

March 13, 2025 at 10:07 AM

Alexandre Morgand, PhD

@alexmrgd.bsky.social

🪄 VACE: All-in-One Video Creation and Editing

from @alibabagroup.bsky.social's Tongyi Lab with:

Zeyinzi Jiang* Zhen Han* Chaojie Mao*† Jingfeng Zhang Yulin Pan Yu Liu

*Equal contribution, †Project lead

March 12, 2025 at 8:39 AM

Alexandre Morgand, PhD

@alexmrgd.bsky.social

Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models

TL;DR: single-step diffusion models; a single-step image diffusion model trained to enhance and remove artifacts in rendered novel views caused by underconstrained regions of the 3D representation.

March 10, 2025 at 8:43 AM

Alexandre Morgand, PhD

@alexmrgd.bsky.social

A Distractor-Aware Memory (DAM) for Visual Object Tracking with SAM2

TL;DR: SAM2.1 based; distractor-distilled (DiDi) dataset to better study the distractor problem

March 5, 2025 at 8:48 AM

Alexandre Morgand, PhD

@alexmrgd.bsky.social

CAST: Component-Aligned 3D Scene Reconstruction from an RGB Image

TL;DR: object-level 2D segmentation+relative depth; GPT-based model to analyze inter-object spatial relationships; occlusion-aware large-scale 3D generation model

March 4, 2025 at 8:41 AM

Alexandre Morgand, PhD

@alexmrgd.bsky.social

Are diffusion models falling for optical illusion?

"The Art of Deception: Color Visual Illusions and Diffusion Models"

TL;DR: Diffusion models exhibit human-like perceptual shifts in brightness and color within their latent space.

February 28, 2025 at 3:15 PM

Alexandre Morgand, PhD

@alexmrgd.bsky.social

Does 3D Gaussian Splatting Need Accurate Volumetric Rendering?

TL;DR: While more accurate volumetric rendering can help for low numbers of primitives, efficient optimization + large number of Gaussians allows 3DGS to outperform volumetric rendering despite its approximations

February 27, 2025 at 9:58 AM

Alexandre Morgand, PhD

@alexmrgd.bsky.social

The NeRF-life vengeance?

"Sparse Voxels Rasterization: Real-time High-fidelity Radiance Field Rendering"

February 24, 2025 at 10:09 AM

Alexandre Morgand, PhD

@alexmrgd.bsky.social

Self-Calibrating Gaussian Splatting for Large Field of View Reconstruction

TL;DR: Self calibration + cubemap-based resampling strategy to support large FOV images

February 20, 2025 at 8:50 AM

Alexandre Morgand, PhD

@alexmrgd.bsky.social

Animate Anyone 2: High-Fidelity Character Image Animation with Environment Affordance

TL;DR: motion from source video + capture environmental representations as conditional inputs. Shape-agnostic mask strategy for character/environment relationship .

February 18, 2025 at 4:26 PM

Alexandre Morgand, PhD

@alexmrgd.bsky.social

Pippo : High-Resolution Multi-View Humans from a Single Image

TL;DR: 1K Multiview Diffusion Transformer pre-trained on 3B Human images without captions; post-trained on 2.5K studio captures with pixel-aligned control via ControlMLP; generates > 5x views at inference

February 18, 2025 at 10:16 AM

Alexandre Morgand, PhD

@alexmrgd.bsky.social

Since 2024, it's crazy how competitive the field of generative video is. Here is another player but open source this time!

Hong Kong University and ByteDance present "Goku: Flow Based Video Generative Foundation Models"

February 14, 2025 at 9:21 AM

Alexandre Morgand, PhD

@alexmrgd.bsky.social

📜 Fillerbuster: Multi-View Scene Completion for Casual Captures

TL;DR: Unified framework for scene completion; joint models images and camera poses estimation to reconstruct missing parts of casually captured scenes. 1B-parameter diffusion model from scratch.

February 12, 2025 at 9:02 AM

Alexandre Morgand, PhD

@alexmrgd.bsky.social

Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control

TL;DR: Manipulating 3D tracking videos; link frames, significantly enhancing for temporal consistency of the generated videos; 3 days oftraining on 8 H800 GPUs using less than 10k videos

February 11, 2025 at 8:42 AM

Alexandre Morgand, PhD

@alexmrgd.bsky.social

Zero-Shot Novel View and Depth Synthesis with Multi-View Geometric Diffusion

TL;DR: diffusion-based; raymap conditioning to both augment visual features with spatial information from different viewpoints; multi-task generation of images and depth maps

February 4, 2025 at 9:47 AM

Alexandre Morgand, PhD

@alexmrgd.bsky.social

DiffVSR Enhancing Real-World Video Super-Resolution with Diffusion Models for Advanced Visual Quality and Temporal Consistency

TL;DR: multi-scale temporal attention module for spatial accuracy. Noise rescheduling mechanism & latent transition approach for temporal consistency

February 3, 2025 at 11:10 AM

Alexandre Morgand, PhD

@alexmrgd.bsky.social

CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation

TL;DR: 360° panoramas using diffusion-based image models. cubemap representations + fine-tuning pretrained txt2img models, CubeDiff simplifies the panorama generation process, delivering high-quality, consistent panoramas.

January 31, 2025 at 10:03 AM

Alexandre Morgand, PhD

@alexmrgd.bsky.social

BLADE: Single-view Body Mesh Learning through Accurate Depth Estimation

TL;DR: fully perspective projection model without applying heuristics; depth, focal parameters, 3D pose, and 2D alignment estimation

January 29, 2025 at 8:25 AM

Alexandre Morgand, PhD

@alexmrgd.bsky.social

Continuous 3D Perception Model with Persistent State

TL;DR: An online 3D reasoning framework for various 3D tasks from only RGB inputs

January 27, 2025 at 9:37 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news