Martin JJ. Bucher
mnbucher.bsky.social
Martin JJ. Bucher
@mnbucher.bsky.social
PhD Student @Stanford doing next token prediction • RS intern
@Autodesk • Fellow @StanfordHAI • Prev: MSc CS @ETH
Check out our paper!

Work done by myself and the amazing @ir0armeni.bsky.social

Paper: arxiv.org/abs/2506.02459
Website: respace.mnbucher.com
Code: github.com/GradientSpac...
ReSpace: Text-Driven 3D Scene Synthesis and Editing with Preference Alignment
respace.mnbucher.com
June 4, 2025 at 7:31 PM
Results: New SOTA on object addition, and competitive on full scene synthesis. Explicit boundaries via SSR significantly reduce out-of-bounds vs. floor plan renderings.

Test-time scaling with Best-of-N shows further potential to improve preference alignment on this task! (7/8)
June 4, 2025 at 7:31 PM
A novel Voxelization-Based Loss (VBL) captures fine-grained geometry beyond 3D bounding boxes, quantifying realistic object interactions (e.g., chair partially under table). (6/8)
June 4, 2025 at 7:31 PM
For object removal and full scene synthesis, we leverage a zero-shot LLM to edit the SSR directly in text space (removal) and to generate object prompt lists for addition (full scenes) that get passed autoregressively into SG-LLM. (5/8)
June 4, 2025 at 7:31 PM
To go from text to mesh-based scene, we employ a sampling engine for 3D assets that matches geometry and semantics for a queried object. (4/8)
June 4, 2025 at 7:31 PM
For single object addition, a short object prompt gets fed, together with the existing SSR, into SG-LLM, a specially trained model for spatial reasoning.

We train SG-LLM via SFT+GRPO—first to apply preference alignment with verifiable rewards for 3D scene synthesis. (3/8)
June 4, 2025 at 7:31 PM
At the heart of our method lies a Structured Scene Representation (SSR), encoding non-rectangular room boundaries and object semantics explicitly via text.

Lightweight, interpretable, and editable.

We then formulate scene synthesis and editing as next-token prediction. (2/8)
June 4, 2025 at 7:31 PM