Lightnews — Scholar-powered news

Martin JJ. Bucher

@mnbucher.bsky.social

7 followers 8 following 8 posts

PhD Student @Stanford doing next token prediction • RS intern
@Autodesk • Fellow @StanfordHAI • Prev: MSc CS @ETH

Posts Replies Media Videos

Martin JJ. Bucher

@mnbucher.bsky.social

Check out our paper!

Work done by myself and the amazing @ir0armeni.bsky.social

Paper: arxiv.org/abs/2506.02459
Website: respace.mnbucher.com
Code: github.com/GradientSpac...

ReSpace: Text-Driven 3D Scene Synthesis and Editing with Preference Alignment

respace.mnbucher.com

June 4, 2025 at 7:31 PM

Martin JJ. Bucher

@mnbucher.bsky.social

Results: New SOTA on object addition, and competitive on full scene synthesis. Explicit boundaries via SSR significantly reduce out-of-bounds vs. floor plan renderings.

Test-time scaling with Best-of-N shows further potential to improve preference alignment on this task! (7/8)

June 4, 2025 at 7:31 PM

Martin JJ. Bucher

@mnbucher.bsky.social

A novel Voxelization-Based Loss (VBL) captures fine-grained geometry beyond 3D bounding boxes, quantifying realistic object interactions (e.g., chair partially under table). (6/8)

June 4, 2025 at 7:31 PM

Martin JJ. Bucher

@mnbucher.bsky.social

For object removal and full scene synthesis, we leverage a zero-shot LLM to edit the SSR directly in text space (removal) and to generate object prompt lists for addition (full scenes) that get passed autoregressively into SG-LLM. (5/8)

June 4, 2025 at 7:31 PM

Martin JJ. Bucher

@mnbucher.bsky.social

To go from text to mesh-based scene, we employ a sampling engine for 3D assets that matches geometry and semantics for a queried object. (4/8)

June 4, 2025 at 7:31 PM

Martin JJ. Bucher

@mnbucher.bsky.social

For single object addition, a short object prompt gets fed, together with the existing SSR, into SG-LLM, a specially trained model for spatial reasoning.

We train SG-LLM via SFT+GRPO—first to apply preference alignment with verifiable rewards for 3D scene synthesis. (3/8)

June 4, 2025 at 7:31 PM

Martin JJ. Bucher

@mnbucher.bsky.social

At the heart of our method lies a Structured Scene Representation (SSR), encoding non-rectangular room boundaries and object semantics explicitly via text.

Lightweight, interpretable, and editable.

We then formulate scene synthesis and editing as next-token prediction. (2/8)

June 4, 2025 at 7:31 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news