Back to AI Research

AI Research

IterCAD: An Iterative Multimodal Agent for Visually... | AI Research

Key Takeaways

  • IterCAD: An Iterative Multimodal Agent for Visually-Grounded CAD Generation and Editing Computer-Aided Design (CAD) is essential for modern manufacturing, ye...
  • Computer-Aided Design is pivotal in modern manufacturing, yet existing automated methods predominantly rely on open-loop, one-shot generation, creating a mismatch with iterative real-world practices.
  • In this paper, we present IterCAD, a unified multimodal agent framework for closed-loop, interactive CAD generation and editing.
  • We formulate the task as a multi-turn interaction between a multimodal agent and an executable CAD sandbox, covering three tasks: Drawing-to-Code, Text-to-Code, and Interactive Editing.
  • We optimize the agent via progressive SFT followed by geometry-aware reinforcement learning with viable-prefix masking to enhance code executability and geometric fidelity.
Paper AbstractExpand

Computer-Aided Design is pivotal in modern manufacturing, yet existing automated methods predominantly rely on open-loop, one-shot generation, creating a mismatch with iterative real-world practices. In this paper, we present IterCAD, a unified multimodal agent framework for closed-loop, interactive CAD generation and editing. We formulate the task as a multi-turn interaction between a multimodal agent and an executable CAD sandbox, covering three tasks: Drawing-to-Code, Text-to-Code, and Interactive Editing. To support this, we develop a data synthesis pipeline incorporating advanced industrial manufacturing features to generate standard-compliant multi-view engineering drawings, complex code-editing tasks, and high-fidelity interaction trajectories. We optimize the agent via progressive SFT followed by geometry-aware reinforcement learning with viable-prefix masking to enhance code executability and geometric fidelity. Finally, we introduce the IterCAD-Bench evaluation suite and propose the Chamfer Distance Tolerance-Recall (CD-TR) curve alongside its AUC-TR metric, establishing a survivor-bias-free standard that unifies code validity and geometric precision. Extensive experiments demonstrate that IterCAD achieves highly competitive performance across multiple benchmarks, significantly outperforming existing approaches in both code executability and geometric precision, while exhibiting superior capabilities in closed-loop iterative refinement.

IterCAD: An Iterative Multimodal Agent for Visually-Grounded CAD Generation and Editing
Computer-Aided Design (CAD) is essential for modern manufacturing, yet most automated tools rely on "one-shot" generation, where the system attempts to create a complete 3D model in a single attempt. This approach often fails to capture the iterative, trial-and-error nature of professional engineering. IterCAD introduces a unified agent framework that treats CAD design as a closed-loop, multi-turn interaction between an AI agent and a virtual CAD sandbox. By mimicking the human "generate–verify–refine" workflow, the agent can parse engineering drawings, generate parametric code, and autonomously correct geometric errors through repeated cycles of feedback.

A "Look and Loop" Philosophy

The core of IterCAD is its ability to move beyond static generation. The "Look" component uses multi-view engineering drawings as a constant reference, ensuring the agent stays aligned with the intended design. The "Loop" component integrates three types of feedback—compiler logs, execution results, and visual geometric analysis—to guide the agent. If the generated code contains errors or fails to meet the required dimensions, the agent uses this feedback to diagnose the problem and refine its code in subsequent turns, rather than starting from scratch.

Training for Precision and Self-Correction

To build these capabilities, the researchers employed a two-stage training process. First, they used a progressive supervised fine-tuning (SFT) stage to teach the agent how to translate engineering specifications into executable code. Second, they applied reinforcement learning (RL) to optimize the agent’s decision-making over long sequences of interactions. A key innovation here is "Geometry-Viable Prefix Masking," which prevents the agent from being penalized for downstream failures that were caused by earlier, unavoidable mistakes. This ensures the model learns to prioritize stable, correct steps during the design process.

A New Standard for Evaluation

Existing benchmarks for CAD generation often suffer from "survivor bias," where geometric accuracy is only measured for the few models that successfully execute, ignoring those that fail entirely. To fix this, the authors introduced IterCAD-Bench, which includes two specific tasks: Drawing-to-Code and Interactive Editing. They also developed the Chamfer Distance Tolerance-Recall (CD-TR) curve. This metric accounts for both code validity and geometric precision across all attempts, including failed ones. By calculating the Area Under the CD-TR Curve (AUC-TR), the researchers established a more rigorous way to compare how well different models handle the complexities of real-world engineering.

Performance and Real-World Application

Experiments demonstrate that IterCAD significantly outperforms existing methods in both code executability and geometric precision. By shifting from a one-shot generation model to an interactive, self-correcting agent, the framework shows superior capability in handling complex industrial features like fillets, shells, and chamfers. The results suggest that integrating iterative feedback loops is a vital step toward making AI-driven CAD tools reliable enough for professional engineering environments.

Comments (0)

No comments yet

Be the first to share your thoughts!