Property-Guided LLM Program Synthesis for Planning

Property-Guided LLM Program Synthesis for Planning | AI Research

Key Takeaways

Property-Guided LLM Program Synthesis for Planning introduces a more efficient way to use Large Language Models (LLMs) for creating computer programs, specif...
LLMs have shown impressive success in program synthesis, discovering programs that surpass prior solutions.
However, these approaches rely on simple numeric scores to signal program quality, such as the value of the solution or the number of passed tests.
Because a score offers no guidance on why a program failed, the system must generate and evaluate many candidates hoping some succeed, increasing LLM inference and evaluation costs.
We study a different approach: property-guided LLM program synthesis.

Paper AbstractExpand

LLMs have shown impressive success in program synthesis, discovering programs that surpass prior solutions. However, these approaches rely on simple numeric scores to signal program quality, such as the value of the solution or the number of passed tests. Because a score offers no guidance on why a program failed, the system must generate and evaluate many candidates hoping some succeed, increasing LLM inference and evaluation costs. We study a different approach: property-guided LLM program synthesis. Instead of scoring programs after evaluation, we check whether a candidate satisfies a formally defined property. When the property is violated, we stop the evaluation early and provide the LLM with a concrete counterexample showing exactly how the program failed. This feedback drastically reduces both the number of program generations and the evaluation cost, and can guide the LLM to generate stronger programs. We evaluate this approach on PDDL planning domains, asking the LLM to synthesize direct heuristic functions: every state reachable by strictly improving transitions has a strictly improving successor. A heuristic with this property leads hill-climbing algorithm directly to a goal state. A counterexample-guided repair loop generates one candidate program, checks the property over a training set, and returns the first case that violates the property. We evaluate our approach on ten planning domains with an out-of-distribution test set. The synthesized heuristics are effectively direct on virtually all test tasks, and compared to the best prior generation method our approach generates seven times fewer programs per domain on average, solves more tasks without using search, and requires several orders of magnitude less computation to evaluate candidates. Whenever a problem admits a verifiable property, property-guided LLM synthesis can reduce cost and improve program quality.

Property-Guided LLM Program Synthesis for Planning introduces a more efficient way to use Large Language Models (LLMs) for creating computer programs, specifically focusing on "heuristic functions" used in automated planning. Traditionally, LLMs are asked to write code that is then tested against a simple numeric score, such as how many problems it solves. Because these scores don't explain why a program failed, developers often have to generate and test many versions of a program, which is slow and computationally expensive. This paper proposes a "property-guided" approach where the system checks if a program satisfies a specific, formally defined rule. If the program fails, the system provides the LLM with a concrete example of the error, allowing it to fix the code more effectively.

How the Approach Works

The researchers focus on "direct" heuristic functions, which are rules that guide a search algorithm (like hill climbing) directly toward a goal without getting stuck. The system uses a counterexample-driven repair loop: 1. The LLM generates a candidate heuristic in Python. 2. A validator checks this code against a set of training tasks to see if it follows the "direct" property. 3. If the code fails, the validator identifies the exact state where the logic broke down and provides this "counterexample" to the LLM. 4. The LLM uses this specific feedback to refine the code, repeating the process until the heuristic is verified as direct.

Why This Matters for Efficiency

This method significantly reduces the resources required for program synthesis. By stopping the evaluation as soon as a property is violated and providing targeted feedback, the system avoids wasting time on full evaluations of flawed programs. Compared to previous methods that generate a large, fixed number of candidates to find a working one, this approach generates roughly seven times fewer programs per domain. Furthermore, the computational cost of evaluating these candidates is reduced by several orders of magnitude, making the synthesis process much faster and more sustainable.

Results and Performance

The researchers tested this approach on ten different planning domains from the International Planning Competition. The synthesized heuristics proved to be highly effective, successfully guiding hill-climbing algorithms to solve tasks without needing complex combinatorial search. Importantly, the heuristics remained "direct" even when tested on larger, out-of-distribution tasks that were not part of the training set. This confirms that the property-guided approach not only saves time but also produces higher-quality, more generalizable programs.

Key Considerations

While this method is highly effective for tasks that admit a verifiable property, it relies on the ability to formally define what makes a program "correct." The researchers note that while the property-guided approach is powerful, it is specifically designed for scenarios where a failure can be traced back to a specific, actionable error. By shifting the focus from simple numeric ranking to formal verification, the study demonstrates that LLMs can be steered toward more reliable and efficient solutions in complex problem-solving domains.