AI Research

Verifiable Geometry Problem Solving: Solver-Driven... | AI Research

Key Takeaways

Verifiable Geometry Problem Solving: Solver-Driven Autoformalization and Theorem Proposing Geometry problem solving requires a delicate balance between visua...
Geometry Problem Solving have increasingly adopt the neuro-symbolic paradigm, combining neural intuition with symbolic rigor.
To address these, we propose SD-GPS, a solver-driven framework that treats the symbolic solver as an execution oracle throughout both formalization and deduction.
Second, Verified Theorem Proposing introduces an impasse-aware agent that proposes local auxiliary lemmas from current proof states, ensuring soundness by filtering all proposals through symbolic verification.
Verifiable Geometry Problem Solving: Solver-Driven Autoformalization and Theorem Proposing Geometry problem solving requires a delicate balance between visual intuition and logical rigor.

Paper AbstractExpand

Geometry Problem Solving have increasingly adopt the neuro-symbolic paradigm, combining neural intuition with symbolic rigor. However, current frameworks suffer from severe bottlenecks in two core stages: autoformalization, which treats multimodal translation as a static task decoupled from downstream solver compatibility, and theorem prediction, where solvers frequently hit a deductive impasse due to fixed rule libraries. To address these, we propose SD-GPS, a solver-driven framework that treats the symbolic solver as an execution oracle throughout both formalization and deduction. First, Solver-Driven Autoformalization unifies supervised formal-language adaptation and solvability-guided reinforcement learning into a single module built on QwenVL3-2B, making executability the central training signal. Second, Verified Theorem Proposing introduces an impasse-aware agent that proposes local auxiliary lemmas from current proof states, ensuring soundness by filtering all proposals through symbolic verification. Empirical evaluations on Geometry3K and PGPS9K demonstrate that SD-GPS consistently outperforms existing MLLM, neural, and neuro-symbolic methods across standard completion, multiple-choice, and cross-modal reference regimes, proving that closing the loop between multimodal perception and symbolic execution significantly improves geometric reasoning, offering profound insights into how neural agents can be grounded by formal systems to achieve verifiable problem-solving capabilities.

Verifiable Geometry Problem Solving: Solver-Driven Autoformalization and Theorem Proposing
Geometry problem solving requires a delicate balance between visual intuition and logical rigor. While modern systems often combine neural networks to "see" diagrams with symbolic solvers to "reason" about them, they frequently struggle with two major issues: translating images into formal logic (autoformalization) and getting stuck when a problem requires a theorem not found in their pre-defined libraries. This paper introduces SD-GPS, a framework that treats the symbolic solver as an active partner throughout the entire reasoning process, ensuring that every step taken by the AI is grounded in verifiable mathematical execution.

Bridging the Gap Between Vision and Logic

Existing geometry solvers often treat the translation of a diagram and text into formal code as a static, one-way task. This leads to errors because the AI might generate a "correct" description that the solver cannot actually use. SD-GPS changes this by using a unified model built on QwenVL3-2B that reads diagrams and text together. Instead of just aiming for linguistic accuracy, the model is trained using "solvability-guided" feedback. If the solver cannot execute the formal code produced by the AI, the model receives a signal to improve, making the final output much more reliable for downstream reasoning.

Solving Deductive Impasses

Even with a perfect formal description, solvers often hit a "deductive impasse"—a point where they cannot reach the answer because they lack a specific auxiliary lemma or a rarely used geometric rule. SD-GPS introduces an "impasse-aware" agent that acts as a helper. When the solver gets stuck, this agent proposes potential auxiliary lemmas based on the current state of the proof. Crucially, the agent does not have the final say; every proposal is filtered through a symbolic verifier. If a proposal is mathematically unsound or cannot be applied to the current problem, it is rejected, ensuring that the system remains logically rigorous.

Proven Performance

The researchers tested SD-GPS on two major benchmarks, Geometry3K and PGPS9K, comparing it against a wide range of existing neural, MLLM, and neuro-symbolic methods. The results show that SD-GPS consistently outperforms these prior systems across standard completion and multiple-choice tasks. By closing the loop between multimodal perception and symbolic execution, the framework demonstrates that grounding neural agents in formal systems significantly boosts their ability to solve complex geometric problems accurately.

Comments (0)

No comments yet

Be the first to share your thoughts!