IterCAD: An Iterative Multimodal Agent for Visually...

IterCAD: An Iterative Multimodal Agent for Visually-Grounded CAD Generation and Editing
Computer-Aided Design (CAD) is essential for modern manufacturing, yet most automated tools rely on "one-shot" generation, where the system attempts to create a complete 3D model in a single attempt. This approach often fails to capture the iterative, trial-and-error nature of professional engineering. IterCAD introduces a unified agent framework that treats CAD design as a closed-loop, multi-turn interaction between an AI agent and a virtual CAD sandbox. By mimicking the human "generate–verify–refine" workflow, the agent can parse engineering drawings, generate parametric code, and autonomously correct geometric errors through repeated cycles of feedback.

A "Look and Loop" Philosophy

The core of IterCAD is its ability to move beyond static generation. The "Look" component uses multi-view engineering drawings as a constant reference, ensuring the agent stays aligned with the intended design. The "Loop" component integrates three types of feedback—compiler logs, execution results, and visual geometric analysis—to guide the agent. If the generated code contains errors or fails to meet the required dimensions, the agent uses this feedback to diagnose the problem and refine its code in subsequent turns, rather than starting from scratch.

Training for Precision and Self-Correction

To build these capabilities, the researchers employed a two-stage training process. First, they used a progressive supervised fine-tuning (SFT) stage to teach the agent how to translate engineering specifications into executable code. Second, they applied reinforcement learning (RL) to optimize the agent’s decision-making over long sequences of interactions. A key innovation here is "Geometry-Viable Prefix Masking," which prevents the agent from being penalized for downstream failures that were caused by earlier, unavoidable mistakes. This ensures the model learns to prioritize stable, correct steps during the design process.

A New Standard for Evaluation

Existing benchmarks for CAD generation often suffer from "survivor bias," where geometric accuracy is only measured for the few models that successfully execute, ignoring those that fail entirely. To fix this, the authors introduced IterCAD-Bench, which includes two specific tasks: Drawing-to-Code and Interactive Editing. They also developed the Chamfer Distance Tolerance-Recall (CD-TR) curve. This metric accounts for both code validity and geometric precision across all attempts, including failed ones. By calculating the Area Under the CD-TR Curve (AUC-TR), the researchers established a more rigorous way to compare how well different models handle the complexities of real-world engineering.

Performance and Real-World Application

Experiments demonstrate that IterCAD significantly outperforms existing methods in both code executability and geometric precision. By shifting from a one-shot generation model to an interactive, self-correcting agent, the framework shows superior capability in handling complex industrial features like fillets, shells, and chamfers. The results suggest that integrating iterative feedback loops is a vital step toward making AI-driven CAD tools reliable enough for professional engineering environments.

IterCAD: An Iterative Multimodal Agent for Visually... | AI Research

Key Takeaways

A "Look and Loop" Philosophy

Training for Precision and Self-Correction

A New Standard for Evaluation

Performance and Real-World Application

Comments (0)

No comments yet