Text-Driven 3D Indoor Scene Synthesis in Non-Manhat...

Text-Driven 3D Indoor Scene Synthesis in Non-Manhattan Environments
This paper introduces SPG-Layout, a new framework designed to generate realistic 3D indoor scenes from text descriptions. While many existing AI models excel at creating "Manhattan" environments—rooms with simple, square, and orthogonal walls—they struggle with "non-Manhattan" spaces that feature irregular shapes, curved walls, and arbitrary angles. SPG-Layout addresses these challenges by combining advanced language modeling with spatial reasoning to ensure that furniture and objects are placed in ways that are both physically possible and functionally logical.

Bridging the Gap in Architectural Design

Current generative models often fail in complex rooms because they rely on the assumption that walls are always axis-aligned. When these models encounter irregular floor plans, they often place objects in locations that cause physical collisions or violate the geometry of the room. SPG-Layout overcomes this by moving away from simple grid-based generation and instead using a structured representation of the scene that accounts for the actual boundaries of the room, regardless of their shape or orientation.

How SPG-Layout Works

The framework uses a two-part strategy to improve scene generation:

Spatial Prior Guidance (SPG): Instead of relying solely on text, the model uses statistical priors to understand how objects relate to room boundaries and to each other. For example, it understands that a nightstand should be placed near a bed. This acts as a "reward" system during training, guiding the AI to favor layouts that are physically sound.
Hierarchical Layout Strategy (HLS): Inspired by how humans design rooms, the model places large, dominant furniture (like sofas or beds) first. By establishing these "anchors" early, the model avoids the common mistake of filling space with small items that later prevent larger, necessary pieces from fitting into the room.

Training and Validation

To build and test this system, the researchers developed a new benchmark dataset consisting of 500 diverse, high-quality non-Manhattan indoor environments. They employed a two-stage training process: first, they fine-tuned a large language model to understand the specific format of their scene data; second, they used reinforcement learning to optimize the model’s ability to avoid collisions and adhere to geometric constraints.

Performance and Results

Extensive experiments show that SPG-Layout significantly outperforms existing state-of-the-art methods. It achieves higher physical fidelity and fewer geometric violations in both standard Manhattan rooms and complex, non-Manhattan spaces. By successfully breaking the "Manhattan Assumption," the researchers have created a more versatile tool that can handle the irregular, real-world architectural designs found in modern buildings. The team plans to release their code to the public to support further research in the field.

Text-Driven 3D Indoor Scene Synthesis in Non-Manhat... | AI Research

Key Takeaways

Bridging the Gap in Architectural Design

How SPG-Layout Works

Training and Validation

Performance and Results

Comments (0)

No comments yet