Using Aristotle API for AI-Assisted Theorem Proving in Lean 4: A Formalisation Case Study of the Grasshopper Problem
This paper explores the capabilities and limitations of AI-assisted theorem proving by attempting to formalize a challenging combinatorial problem from the 2009 International Mathematical Olympiad (IMO) using the Aristotle API and the Lean 4 proof assistant. The study highlights the distinction between AI-generated code that is fully verified by a computer and code that relies on unresolved placeholders, providing a clear example of how local proof search can succeed while global mathematical reasoning remains incomplete.
The Challenge of Formalizing Olympiad Mathematics
The Grasshopper problem is a well-known, difficult combinatorial puzzle. It asks whether, given a set of jump lengths and a set of "forbidden" landing positions, one can order the jumps so that a grasshopper never lands on a forbidden spot. While human mathematicians can approach this using complex combinatorial arguments, teaching an AI to construct a rigorous, machine-checked proof in Lean 4 is significantly more difficult. The author uses this problem as a test case to see how well the Aristotle API can bridge the gap between informal mathematical intuition and formal, verified code.
Verified Components vs. Unresolved Proofs
The study reveals that the AI was successful in formalizing the "local" parts of the proof. Specifically, the Aristotle API generated four verified helper lemmas that handle the mechanics of the problem:
Confirming that the final jump lands at the expected total sum.
Proving that swapping two adjacent jumps only affects the specific intermediate landing position between them.
Calculating the exact value of that intermediate position after a swap.
Using a "maximality" argument to show that if a certain jump order is optimal, then landing on a forbidden spot forces other related positions to also be forbidden.
Despite these successes, the main theorem remains unproven. In Lean, a proof can be "closed" using a command called sorry, which acts as a placeholder for an unfinished argument. The AI left the main theorem closed with this placeholder, meaning the computer has not actually verified the full solution.
The Global Reasoning Gap
The primary limitation identified in this case study is the gap between local calculations and global logic. To complete the proof, one would need to perform a "global counting argument"—a complex process of tracking how various forbidden values interact to show that they eventually contradict the problem's constraints.
The AI correctly identified that this counting step was the missing piece of the puzzle, but it was unable to translate that high-level strategy into the formal, step-by-step logic required by Lean. This illustrates a central challenge in modern AI-assisted mathematics: while AI can effectively handle small, isolated logical steps, it often struggles to synthesize these into the comprehensive, large-scale reasoning required to solve complex global problems.
Implications for Trustworthy AI
This research serves as a methodological warning for the field of AI-assisted formalization. It demonstrates that successful compilation of a Lean project does not automatically mean a theorem has been proven. Because the system allows for unresolved placeholders, researchers and users must carefully inspect the actual content of the generated code to distinguish between what has been rigorously verified and what remains an unproven assumption. The paper provides a transparent, reproducible example of this, showing that even when an AI generates a substantial amount of correct, verified code, the final result may still be incomplete.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!