Martin JJ. Bucher
@mnbucher.bsky.social
6 followers 8 following 8 posts
PhD Student @Stanford doing next token prediction • RS intern @Autodesk • Fellow @StanfordHAI • Prev: MSc CS @ETH
Posts Media Videos Starter Packs
mnbucher.bsky.social
Results: New SOTA on object addition, and competitive on full scene synthesis. Explicit boundaries via SSR significantly reduce out-of-bounds vs. floor plan renderings.

Test-time scaling with Best-of-N shows further potential to improve preference alignment on this task! (7/8)
mnbucher.bsky.social
A novel Voxelization-Based Loss (VBL) captures fine-grained geometry beyond 3D bounding boxes, quantifying realistic object interactions (e.g., chair partially under table). (6/8)
mnbucher.bsky.social
For object removal and full scene synthesis, we leverage a zero-shot LLM to edit the SSR directly in text space (removal) and to generate object prompt lists for addition (full scenes) that get passed autoregressively into SG-LLM. (5/8)
mnbucher.bsky.social
To go from text to mesh-based scene, we employ a sampling engine for 3D assets that matches geometry and semantics for a queried object. (4/8)
mnbucher.bsky.social
For single object addition, a short object prompt gets fed, together with the existing SSR, into SG-LLM, a specially trained model for spatial reasoning.

We train SG-LLM via SFT+GRPO—first to apply preference alignment with verifiable rewards for 3D scene synthesis. (3/8)
mnbucher.bsky.social
At the heart of our method lies a Structured Scene Representation (SSR), encoding non-rectangular room boundaries and object semantics explicitly via text.

Lightweight, interpretable, and editable.

We then formulate scene synthesis and editing as next-token prediction. (2/8)
mnbucher.bsky.social
✨What if you could create and redesign scenes by just talking to them?

**ReSpace: Text-Driven 3D Scene Synthesis and Editing with Preference Alignment**

Add, remove, and swap objects simply via natural language, e.g., "add tufted dark gray sofa". (1/8)