Lightnews — Scholar-powered news

Martin JJ. Bucher

@mnbucher.bsky.social

6 followers 8 following 8 posts

PhD Student @Stanford doing next token prediction • RS intern @Autodesk • Fellow @StanfordHAI • Prev: MSc CS @ETH

Posts Media Videos Starter Packs

Martin JJ. Bucher @mnbucher.bsky.social · Jun 4

Check out our paper!

Work done by myself and the amazing @ir0armeni.bsky.social

Paper: arxiv.org/abs/2506.02459
Website: respace.mnbucher.com
Code: github.com/GradientSpac...

ReSpace: Text-Driven 3D Scene Synthesis and Editing with Preference Alignment

respace.mnbucher.com

Martin JJ. Bucher @mnbucher.bsky.social · Jun 4

Results: New SOTA on object addition, and competitive on full scene synthesis. Explicit boundaries via SSR significantly reduce out-of-bounds vs. floor plan renderings.

Test-time scaling with Best-of-N shows further potential to improve preference alignment on this task! (7/8)

1 1

Martin JJ. Bucher @mnbucher.bsky.social · Jun 4

A novel Voxelization-Based Loss (VBL) captures fine-grained geometry beyond 3D bounding boxes, quantifying realistic object interactions (e.g., chair partially under table). (6/8)

1 1

Martin JJ. Bucher @mnbucher.bsky.social · Jun 4

For object removal and full scene synthesis, we leverage a zero-shot LLM to edit the SSR directly in text space (removal) and to generate object prompt lists for addition (full scenes) that get passed autoregressively into SG-LLM. (5/8)

1 1

Martin JJ. Bucher @mnbucher.bsky.social · Jun 4

To go from text to mesh-based scene, we employ a sampling engine for 3D assets that matches geometry and semantics for a queried object. (4/8)

1 2

Martin JJ. Bucher @mnbucher.bsky.social · Jun 4

For single object addition, a short object prompt gets fed, together with the existing SSR, into SG-LLM, a specially trained model for spatial reasoning.

We train SG-LLM via SFT+GRPO—first to apply preference alignment with verifiable rewards for 3D scene synthesis. (3/8)

1 1

Martin JJ. Bucher @mnbucher.bsky.social · Jun 4

At the heart of our method lies a Structured Scene Representation (SSR), encoding non-rectangular room boundaries and object semantics explicitly via text.

Lightweight, interpretable, and editable.

We then formulate scene synthesis and editing as next-token prediction. (2/8)

1 1

Martin JJ. Bucher @mnbucher.bsky.social · Jun 4

✨What if you could create and redesign scenes by just talking to them?

**ReSpace: Text-Driven 3D Scene Synthesis and Editing with Preference Alignment**

Add, remove, and swap objects simply via natural language, e.g., "add tufted dark gray sofa". (1/8)

1 6