Verifiable Geometry Problem Solving: Solver-Driven Autoformalization and Theorem Proposing
Geometry problem solving requires a delicate balance between visual intuition and logical rigor. While modern systems often combine neural networks to "see" diagrams with symbolic solvers to "reason" about them, they frequently struggle with two major issues: translating images into formal logic (autoformalization) and getting stuck when a problem requires a theorem not found in their pre-defined libraries. This paper introduces SD-GPS, a framework that treats the symbolic solver as an active partner throughout the entire reasoning process, ensuring that every step taken by the AI is grounded in verifiable mathematical execution.
Bridging the Gap Between Vision and Logic
Existing geometry solvers often treat the translation of a diagram and text into formal code as a static, one-way task. This leads to errors because the AI might generate a "correct" description that the solver cannot actually use. SD-GPS changes this by using a unified model built on QwenVL3-2B that reads diagrams and text together. Instead of just aiming for linguistic accuracy, the model is trained using "solvability-guided" feedback. If the solver cannot execute the formal code produced by the AI, the model receives a signal to improve, making the final output much more reliable for downstream reasoning.
Solving Deductive Impasses
Even with a perfect formal description, solvers often hit a "deductive impasse"—a point where they cannot reach the answer because they lack a specific auxiliary lemma or a rarely used geometric rule. SD-GPS introduces an "impasse-aware" agent that acts as a helper. When the solver gets stuck, this agent proposes potential auxiliary lemmas based on the current state of the proof. Crucially, the agent does not have the final say; every proposal is filtered through a symbolic verifier. If a proposal is mathematically unsound or cannot be applied to the current problem, it is rejected, ensuring that the system remains logically rigorous.
Proven Performance
The researchers tested SD-GPS on two major benchmarks, Geometry3K and PGPS9K, comparing it against a wide range of existing neural, MLLM, and neuro-symbolic methods. The results show that SD-GPS consistently outperforms these prior systems across standard completion and multiple-choice tasks. By closing the loop between multimodal perception and symbolic execution, the framework demonstrates that grounding neural agents in formal systems significantly boosts their ability to solve complex geometric problems accurately.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!