Mistral AI Releases Leanstral 1.5 for Lean 4 Theorem Proving

Key Takeaways

  • Democratizes advanced formal verification by providing a high-performance, Apache 2.0 open-weights model for Lean 4.
  • Dramatically reduces the cost of automated theorem proving to ~$4 per problem, significantly undercutting existing specialized provers.
  • Enables practical software engineering applications, including automated bug detection in Rust and formal verification of complex algorithms.

Mistral AI has officially released Leanstral 1.5, a specialized code agent model engineered for the Lean 4 proof assistant. Designed to advance automated theorem proving and proof engineering, the model is available under an Apache 2.0 license. Leanstral 1.5 represents a significant evolution in the Mistral Small 4 family, offering a high-performance, open-weights solution for verifying logical steps in complex mathematical and software engineering tasks.

Architecture and Training Methodology

Leanstral 1.5 utilizes a mixture-of-experts (MoE) architecture, featuring a total of 119 billion parameters with 6.5 billion active parameters per token. The model employs 128 experts, routing each token to four specialized sub-networks to maintain high capacity while optimizing compute efficiency. It supports a context length of 256,000 tokens and accepts multimodal inputs, including both text and images.
The training process for Leanstral 1.5 occurred in three distinct stages: mid-training, supervised fine-tuning, and reinforcement learning using CISPO. The model’s agentic capabilities were refined through two primary environments. In the multiturn environment, the model submits proofs and iterates based on feedback from the Lean compiler. In the code agent environment, the model operates within a raw filesystem, utilizing the Lean language server to edit files, execute bash commands, and process real-time type information and error logs.

Benchmark Performance and Efficiency

The model demonstrates industry-leading performance, reportedly saturating the miniF2F benchmark with a 100% success rate on both validation and test sets. On the PutnamBench, Leanstral 1.5 successfully solved 587 out of 672 problems. Furthermore, it established new state-of-the-art results on the FATE-H and FATE-X algebra benchmarks, achieving 87% and 34% respectively. On FLTEval, the model achieved a pass@1 rate of 28.9 and a pass@8 rate of 43.2, surpassing the performance of Opus 4.6 at a fraction of the cost.
A defining characteristic of the model is its test-time scaling behavior. By increasing the token budget per attempt, the model shows a clear correlation with improved problem-solving success rates. This efficiency allows for complex theorem proving at approximately $4 per problem, providing a cost-effective alternative to other specialized provers that can cost significantly more per attempt.

Practical Applications in Software Verification

Beyond pure mathematics, Leanstral 1.5 has proven effective in verifying code and identifying vulnerabilities. In documented case studies, the model successfully proved the time complexity of an AVL tree implementation and identified 11 genuine bugs across 57 open-source repositories. Five of these bugs, including a critical overflow issue in the datrs/varinteger library, were previously unreported.
Developers can integrate Leanstral 1.5 into their workflows through the Mistral Vibe agent CLI or by self-hosting the model using vLLM. The model supports OpenAI-style tool calling, allowing for the execution of snippets and tighter integration with Lean language server protocols. These capabilities enable engineering teams to automate the generation of correctness properties, complete partial proofs, and stress-test Rust code by proving or disproving inferred invariants.

Comments (0)

No comments yet

Be the first to share your thoughts!