Back to AI Research

AI Research

Goedel-Architect: Streamlining Formal Theorem Provi... | AI Research

Key Takeaways

  • Goedel-Architect is an agentic framework designed to streamline formal theorem proving in Lean 4.
  • We introduce Goedel-Architect, an agentic framework for formal theorem proving in Lean 4 centered on blueprint generation and refinement.
  • A blueprint is a dependency graph of definitions and lemmas that builds up to the main theorem.
  • First, Goedel-Architect generates a blueprint of formally stated definitions and lemmas, along with declared dependencies.
  • This blueprint is optionally guided by a natural language proof.
Paper AbstractExpand

We introduce Goedel-Architect, an agentic framework for formal theorem proving in Lean 4 centered on blueprint generation and refinement. A blueprint is a dependency graph of definitions and lemmas that builds up to the main theorem. First, Goedel-Architect generates a blueprint of formally stated definitions and lemmas, along with declared dependencies. This blueprint is optionally guided by a natural language proof. Then, a tool-equipped Lean prover component closes each open lemma node in parallel using relevant dependencies. Failed lemmas in turn drive refinement of the global blueprint. This strategy contrasts with other mainstream approaches which use recursive lemma decomposition, and can inefficiently loop on dead-end strategies. Using the open-weight DeepSeek-V4-Flash (284B-A13B) as the backbone, Goedel-Architect attains 99.2% pass@1 on MiniF2F-test and 75.6% pass@1 on PutnamBench. With an optional natural-language proof seeding the initial blueprint on the harder problems, we additionally close the remaining two MiniF2F-test problems (reaching 100%), lift PutnamBench to 88.8% (597/672), and solve 4/6 on IMO 2025, 11/12 on Putnam 2025, and 3/6 on USAMO 2026. This represents state-of-the-art performance for an open-source pipeline at a price point up to 500x less than comparable open-source pipelines.

Goedel-Architect is an agentic framework designed to streamline formal theorem proving in Lean 4. Instead of relying on traditional recursive methods that can get stuck in inefficient loops, this system uses a "blueprint" approach. By creating a dependency graph of definitions and lemmas that build toward a main theorem, the framework allows for parallel proof attempts and a more structured way to refine strategies when a proof fails.

The Blueprint Strategy

The core of the system is the blueprint, which acts as a roadmap for the entire proof. Initially, the framework generates a dependency graph of formally stated definitions and lemmas. Each node in this graph is then assigned to a Lean prover that works in parallel to close the gaps. If a lemma cannot be proven, the system does not simply restart; instead, it uses the specific feedback from the failed attempt—such as a formal counterexample or a need for further decomposition—to refine the global blueprint. This allows the system to adjust its strategy holistically rather than getting trapped in dead-end recursive paths.

Natural Language Guidance

While Goedel-Architect can operate independently, it also supports optional guidance from natural language proofs. When a problem is particularly difficult, the system can ingest an informal mathematical argument to help structure the initial blueprint. This does not mean the system is "cheating" by using informal logic; rather, it uses the natural language proof as a guide to organize the dependency graph, which the formal Lean prover then verifies and completes. This hybrid approach helps the system tackle complex competition-level mathematics more effectively.

Performance and Efficiency

Goedel-Architect achieves state-of-the-art results for an open-source pipeline, performing competitively with massive proprietary systems while being significantly more cost-effective. On the MiniF2F-test benchmark, it reached a 100% solve rate when using natural language guidance. It also demonstrated strong performance on challenging exams like PutnamBench and the IMO 2025. Notably, the system is designed to be highly efficient, with a per-problem cost that is up to 500 times lower than other comparable open-source pipelines, making high-level formal theorem proving more accessible to researchers and enthusiasts.

Key Considerations

The framework’s performance scales predictably with the amount of compute invested in the refinement loop. As the system iterates through the blueprint, the number of solved problems grows in a roughly log-linear fashion. By keeping both the backbone model and the pipeline code open-source, the researchers have created a transparent and reproducible tool that bridges the gap between the high-cost, closed-source frontier models and the lower-performing academic tools previously available.

Comments (0)

No comments yet

Be the first to share your thoughts!