Back to AI Research

AI Research

Formalize, Don't Optimize: The Heuristic Trap i... | AI Research

Key Takeaways

  • Formalize, Don't Optimize: The Heuristic Trap in LLM-Generated Combinatorial Solvers This paper investigates how Large Language Models (LLMs) can best be use...
  • Large Language Models (LLMs) struggle to solve complex combinatorial problems through direct reasoning, so recent neuro-symbolic systems increasingly use them to synthesize executable solvers.
  • A central design question is how the LLM should represent the solver, and whether it should also attempt to optimize search.
  • We find a consistent representational divergence: Python + OR-Tools attains the highest correctness across LLMs, while MiniZinc + OR-Tools has lower absolute coverage despite using the same OR-Tools back-end.
  • Native Python is the most likely to return a schema-valid solution that fails verification, whereas solver-backed paths preserve higher conditional fidelity.
Paper AbstractExpand

Large Language Models (LLMs) struggle to solve complex combinatorial problems through direct reasoning, so recent neuro-symbolic systems increasingly use them to synthesize executable solvers. A central design question is how the LLM should represent the solver, and whether it should also attempt to optimize search. We introduce CP-SynC-XL, a benchmark of 100 combinatorial problems (4,577 instances), and evaluate three solver-construction paradigms: native algorithmic search (Python), constraint modeling through a Python solver API (Python + OR-Tools), and declarative constraint modeling (MiniZinc + OR-Tools). We find a consistent representational divergence: Python + OR-Tools attains the highest correctness across LLMs, while MiniZinc + OR-Tools has lower absolute coverage despite using the same OR-Tools back-end. Native Python is the most likely to return a schema-valid solution that fails verification, whereas solver-backed paths preserve higher conditional fidelity. On the heuristic axis, prompting for search optimization yields only small median speed-ups (1.03-1.12x) and a strongly bimodal effect: many instances slow down, and correctness drops sharply on a long tail of problems. A paired code-level audit traces these regressions to a recurring heuristic trap. Under an efficiency-oriented prompt, the LLM may replace complete search with local approximations (Python), inject unverified bounds (Python + OR-Tools), or add redundant declarative machinery that overwhelms or over-constrains the model (MiniZinc + OR-Tools). These findings support a conservative design principle for LLM-generated combinatorial solvers: use the LLM primarily to formalize variables, constraints, and objectives for verified solvers, and separately check any LLM-authored search optimization before use.

Formalize, Don't Optimize: The Heuristic Trap in LLM-Generated Combinatorial Solvers

This paper investigates how Large Language Models (LLMs) can best be used to solve complex combinatorial problems—tasks that require finding an optimal or valid configuration within a massive search space. While LLMs are powerful, they often struggle with the logical consistency required for these problems. The authors explore whether LLMs should write their own search algorithms or act as "translators" that formalize problems for specialized, verified solver software. By introducing a new benchmark called CP-SynC-XL, the study evaluates different ways to construct these solvers and warns against the common practice of asking LLMs to manually optimize the search process.

Three Ways to Build a Solver

The researchers tested three distinct paradigms for generating solvers:

  • Native Algorithmic Search: The LLM writes a standalone Python program to solve the problem from scratch, including all search logic.

  • Python + OR-Tools: The LLM writes a Python script that defines the problem's variables and constraints, then hands the actual search work to the robust OR-Tools CP-SAT solver.

  • MiniZinc + OR-Tools: The LLM writes a declarative model in MiniZinc, which is then processed by the same OR-Tools backend.
    The study found a "representational divergence": even when using the same underlying solver (OR-Tools), the LLM’s performance depends heavily on the interface it uses. The Python + OR-Tools approach consistently achieved the highest correctness, suggesting that LLMs are more fluent in Python-based APIs than in specialized declarative modeling languages.

The Heuristic Trap

A key part of the research was testing whether prompting an LLM to "optimize" its search—by suggesting techniques like pruning, tighter bounds, or specific search strategies—actually improves performance. The results were disappointing. While these prompts occasionally offered minor speed gains, they frequently caused the model to fall into a "heuristic trap."
In this trap, the LLM attempts to be clever by injecting unverified bounds or replacing complete search methods with local approximations. These "optimizations" often backfire, leading to a sharp drop in correctness. The study shows that when an LLM is pushed to optimize, it often introduces errors that are difficult to fix, even with multiple rounds of refinement.

A Conservative Design Principle

The authors conclude that the most effective way to use LLMs for combinatorial problems is to keep them focused on formalization rather than optimization. Instead of asking an LLM to design complex search heuristics, developers should use the model to clearly define the variables, constraints, and objectives of the problem.
The core recommendation is to delegate the heavy lifting of the search to verified, specialized solver backends. If any search optimization is required, it should be handled by human-verified code rather than LLM-generated heuristics. By keeping the LLM’s role limited to formalizing the problem structure, developers can avoid the reliability issues that arise when models try to manage the search process themselves.

Comments (0)

No comments yet

Be the first to share your thoughts!