Formalize, Don't Optimize: The Heuristic Trap in LLM-Generated Combinatorial Solvers
This paper investigates how Large Language Models (LLMs) can best be used to solve complex combinatorial problems—tasks that require finding an optimal or valid configuration within a massive search space. While LLMs are powerful, they often struggle with the logical consistency required for these problems. The authors explore whether LLMs should write their own search algorithms or act as "translators" that formalize problems for specialized, verified solver software. By introducing a new benchmark called CP-SynC-XL, the study evaluates different ways to construct these solvers and warns against the common practice of asking LLMs to manually optimize the search process.
Three Ways to Build a Solver
The researchers tested three distinct paradigms for generating solvers:
Native Algorithmic Search: The LLM writes a standalone Python program to solve the problem from scratch, including all search logic.
Python + OR-Tools: The LLM writes a Python script that defines the problem's variables and constraints, then hands the actual search work to the robust OR-Tools CP-SAT solver.
MiniZinc + OR-Tools: The LLM writes a declarative model in MiniZinc, which is then processed by the same OR-Tools backend.
The study found a "representational divergence": even when using the same underlying solver (OR-Tools), the LLM’s performance depends heavily on the interface it uses. The Python + OR-Tools approach consistently achieved the highest correctness, suggesting that LLMs are more fluent in Python-based APIs than in specialized declarative modeling languages.
The Heuristic Trap
A key part of the research was testing whether prompting an LLM to "optimize" its search—by suggesting techniques like pruning, tighter bounds, or specific search strategies—actually improves performance. The results were disappointing. While these prompts occasionally offered minor speed gains, they frequently caused the model to fall into a "heuristic trap."
In this trap, the LLM attempts to be clever by injecting unverified bounds or replacing complete search methods with local approximations. These "optimizations" often backfire, leading to a sharp drop in correctness. The study shows that when an LLM is pushed to optimize, it often introduces errors that are difficult to fix, even with multiple rounds of refinement.
A Conservative Design Principle
The authors conclude that the most effective way to use LLMs for combinatorial problems is to keep them focused on formalization rather than optimization. Instead of asking an LLM to design complex search heuristics, developers should use the model to clearly define the variables, constraints, and objectives of the problem.
The core recommendation is to delegate the heavy lifting of the search to verified, specialized solver backends. If any search optimization is required, it should be handled by human-verified code rather than LLM-generated heuristics. By keeping the LLM’s role limited to formalizing the problem structure, developers can avoid the reliability issues that arise when models try to manage the search process themselves.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!