Mistral AI has officially released Leanstral 1.5, a specialized code agent model engineered for the Lean 4 proof assistant. Designed to advance automated theorem proving and proof engineering, the model is available under an Apache 2.0 license. Leanstral 1.5 represents a significant evolution in the Mistral Small 4 family, offering a high-performance, open-weights solution for verifying logical steps in complex mathematical and software engineering tasks.
Architecture and Training Methodology
Leanstral 1.5 utilizes a mixture-of-experts (MoE) architecture, featuring a total of 119 billion parameters with 6.5 billion active parameters per token. The model employs 128 experts, routing each token to four specialized sub-networks to maintain high capacity while optimizing compute efficiency. It supports a context length of 256,000 tokens and accepts multimodal inputs, including both text and images.
The training process for Leanstral 1.5 occurred in three distinct stages: mid-training, supervised fine-tuning, and reinforcement learning using CISPO. The model’s agentic capabilities were refined through two primary environments. In the multiturn environment, the model submits proofs and iterates based on feedback from the Lean compiler. In the code agent environment, the model operates within a raw filesystem, utilizing the Lean language server to edit files, execute bash commands, and process real-time type information and error logs.
Benchmark Performance and Efficiency
The model demonstrates industry-leading performance, reportedly saturating the miniF2F benchmark with a 100% success rate on both validation and test sets. On the PutnamBench, Leanstral 1.5 successfully solved 587 out of 672 problems. Furthermore, it established new state-of-the-art results on the FATE-H and FATE-X algebra benchmarks, achieving 87% and 34% respectively. On FLTEval, the model achieved a pass@1 rate of 28.9 and a pass@8 rate of 43.2, surpassing the performance of Opus 4.6 at a fraction of the cost.
A defining characteristic of the model is its test-time scaling behavior. By increasing the token budget per attempt, the model shows a clear correlation with improved problem-solving success rates. This efficiency allows for complex theorem proving at approximately $4 per problem, providing a cost-effective alternative to other specialized provers that can cost significantly more per attempt.
Practical Applications in Software Verification
Beyond pure mathematics, Leanstral 1.5 has proven effective in verifying code and identifying vulnerabilities. In documented case studies, the model successfully proved the time complexity of an AVL tree implementation and identified 11 genuine bugs across 57 open-source repositories. Five of these bugs, including a critical overflow issue in the datrs/varinteger library, were previously unreported.
Developers can integrate Leanstral 1.5 into their workflows through the Mistral Vibe agent CLI or by self-hosting the model using vLLM. The model supports OpenAI-style tool calling, allowing for the execution of snippets and tighter integration with Lean language server protocols. These capabilities enable engineering teams to automate the generation of correctness properties, complete partial proofs, and stress-test Rust code by proving or disproving inferred invariants.

Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!