ReasonSTL: Bridging Natural Language and Signal Temporal Logic via Tool-Augmented Process-Rewarded Learning
Translating natural language requirements into formal specifications is a major hurdle for engineers working with autonomous and cyber-physical systems. Signal Temporal Logic (STL) is the standard language for defining these requirements, but it is complex to write manually and difficult for standard AI models to generate accurately. ReasonSTL is a new framework designed to solve this by enabling local, open-source language models to reliably translate natural language into structured STL formulas without relying on expensive or privacy-risky third-party APIs.
A Structured Approach to Translation
Instead of asking a model to generate a complex formula in one go, ReasonSTL breaks the process into a series of manageable steps. The model acts as an orchestrator that performs explicit reasoning, calls deterministic tools for specific calculations, and then assembles the final formula. By using tools for tasks like unit conversion, arithmetic evaluation, and temporal normalization, the model avoids common errors related to numerical precision and physical units. This modular approach makes the entire translation process transparent and easier to verify.
Training with Process-Rewarded Learning
A key innovation in this framework is its training strategy, which supervises the model at every step of the process. Rather than only rewarding the final output, ReasonSTL uses "outcome-bounded process rewards." This means the model receives feedback on its intermediate reasoning and tool usage, but these rewards are strictly capped by the correctness of the final formula. This prevents the model from being rewarded for "plausible-sounding" but ultimately incorrect reasoning paths, ensuring that the model learns to prioritize accuracy throughout the entire construction of the STL formula.
The STL-Bench Benchmark
To evaluate performance, the researchers introduced STL-Bench, a comprehensive, bilingual (English and Chinese) dataset. It contains nearly 29,000 pairs of natural language requirements and their corresponding STL formulas across six engineering domains, such as aerospace, robotics, and automotive systems. Unlike previous datasets, STL-Bench is "computation-aware," meaning it specifically tests whether a model can handle the real-world complexities of CPS requirements, such as event-triggered conditions and nested temporal structures.
Performance and Practical Impact
Experiments show that a 4B parameter model trained with the ReasonSTL framework achieves state-of-the-art performance, outperforming existing methods in both automatic metrics and human evaluations. By providing a fully local, privacy-preserving, and low-cost alternative to black-box commercial APIs, ReasonSTL offers a scalable solution for industrial deployment. It allows engineers to draft formal specifications with greater confidence, knowing that the resulting formulas are both mathematically precise and grounded in the specific requirements of their systems.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!