Back to AI Research

AI Research

Beyond Objective Equivalence: Constraint Injection... | AI Research

Key Takeaways

  • Beyond Objective Equivalence: Constraint Injection for LLM-Based Optimization Modeling on Vehicle Routing Problems Large language models (LLMs) are increasin...
  • Large language models (LLMs) increasingly translate natural-language optimization problems into executable solver code.
  • We propose constraint injection, which uses feasible probes to expose spurious over-constraint and one-constraint-violating probes to reveal silent constraint omission.
  • Combined with differential testing, it forms a dual verifier.
  • We instantiate and evaluate it on vehicle routing problems (VRPs), a representative constraint-dense combinatorial optimization testbed with coupled operational constraints.
Paper AbstractExpand

Large language models (LLMs) increasingly translate natural-language optimization problems into executable solver code. Yet for constraint-dense operations research (OR) problems, existing data-filtering and training pipelines largely rely on objective-equivalence signals such as differential testing and answer agreement, which a program can pass while adding spurious constraints or silently omitting required ones, whenever those constraints are non-binding on the tested instance. We propose constraint injection, which uses feasible probes to expose spurious over-constraint and one-constraint-violating probes to reveal silent constraint omission. Combined with differential testing, it forms a dual verifier. We instantiate and evaluate it on vehicle routing problems (VRPs), a representative constraint-dense combinatorial optimization testbed with coupled operational constraints. We develop VRPCoder, an 8B end-to-end model that translates natural-language VRP scenarios into Gurobi scripts, together with an expert-verified VRP benchmark suite covering 21 variants. The verifier is reused as a rejection-sampling filter during data synthesis and as a per-rollout reward in group relative policy optimization (GRPO). Across four VRP benchmarks, VRPCoder-GRPO reaches 93\% average Pass@1, outperforms Gemini-3.1-Pro Preview on three benchmarks, exceeds Claude-Sonnet-4.5 by 28 average points, and surpasses prior OR-LLMs by 78 average points.

Beyond Objective Equivalence: Constraint Injection for LLM-Based Optimization Modeling on Vehicle Routing Problems
Large language models (LLMs) are increasingly used to translate natural-language optimization problems into code for solvers like Gurobi. However, existing methods often rely on "objective equivalence"—checking if the model’s code produces the same optimal result as a reference solution. This paper identifies a critical flaw in this approach: a program can pass these tests while still being incorrect, either by adding unnecessary "spurious" constraints or by silently omitting required ones, provided those errors don't change the final optimal value. The authors introduce "constraint injection" to solve this, creating a more rigorous verification process that ensures every constraint is faithfully implemented.

The Problem with Current Verification

Current training pipelines for operations research LLMs typically use differential testing, which compares the output of a generated program against a known correct one. If both programs reach the same optimal objective value, the generated code is deemed correct. The authors argue this is a "blind spot." For example, if a model forgets to include a subtour-elimination constraint, the solver might still find the correct route for a specific instance because the optimal path naturally avoids subtours anyway. Because the result is correct, the model is rewarded for flawed code, which undermines the reliability of the system.

How Constraint Injection Works

To fix this, the researchers developed a dual-verification method that tests the code's logic rather than just its final answer. They use two types of "probes":

  • Feasible Probes: These are solutions that should be accepted by the code. If the model rejects them, it indicates the model has added a "spurious" constraint that shouldn't be there.

  • One-Constraint-Violating Probes: These are solutions that intentionally break one specific rule (like capacity or routing order). If the model accepts these as feasible, it proves the model has "silently omitted" that specific constraint.
    By combining these probes with standard differential testing, the system can verify if the generated code actually understands the rules of the problem, rather than just guessing the right answer.

VRPCoder and Performance

The authors applied this method to the Vehicle Routing Problem (VRP), a complex field where multiple operational constraints—such as vehicle capacity, time windows, and depot rules—must be perfectly balanced. They developed "VRPCoder," an 8B parameter model trained using this dual-verification process. By using constraint injection as both a filter for training data and a reward signal during reinforcement learning (GRPO), the model achieved significant improvements.
Across four VRP benchmarks, VRPCoder-GRPO reached a 93% average Pass@1 rate. It outperformed several state-of-the-art models, including Gemini-3.1-Pro Preview and Claude-Sonnet-4.5, and surpassed previous operations research-focused LLMs by a wide margin.

Key Takeaways

The research demonstrates that for complex, constraint-dense tasks, verifying the "what" (the final answer) is insufficient; one must also verify the "how" (the logic of the constraints). By shifting the focus from simple objective matching to constraint-level verification, the authors provide a more robust framework for building AI that can reliably handle real-world optimization challenges. The study also provides an expert-verified benchmark suite covering 21 different VRP variants to help future research in this area.

Comments (0)

No comments yet

Be the first to share your thoughts!