Beyond Objective Equivalence: Constraint Injection for LLM-Based Optimization Modeling on Vehicle Routing Problems
Large language models (LLMs) are increasingly used to translate natural-language optimization problems into code for solvers like Gurobi. However, existing methods often rely on "objective equivalence"—checking if the model’s code produces the same optimal result as a reference solution. This paper identifies a critical flaw in this approach: a program can pass these tests while still being incorrect, either by adding unnecessary "spurious" constraints or by silently omitting required ones, provided those errors don't change the final optimal value. The authors introduce "constraint injection" to solve this, creating a more rigorous verification process that ensures every constraint is faithfully implemented.
The Problem with Current Verification
Current training pipelines for operations research LLMs typically use differential testing, which compares the output of a generated program against a known correct one. If both programs reach the same optimal objective value, the generated code is deemed correct. The authors argue this is a "blind spot." For example, if a model forgets to include a subtour-elimination constraint, the solver might still find the correct route for a specific instance because the optimal path naturally avoids subtours anyway. Because the result is correct, the model is rewarded for flawed code, which undermines the reliability of the system.
How Constraint Injection Works
To fix this, the researchers developed a dual-verification method that tests the code's logic rather than just its final answer. They use two types of "probes":
Feasible Probes: These are solutions that should be accepted by the code. If the model rejects them, it indicates the model has added a "spurious" constraint that shouldn't be there.
One-Constraint-Violating Probes: These are solutions that intentionally break one specific rule (like capacity or routing order). If the model accepts these as feasible, it proves the model has "silently omitted" that specific constraint.
By combining these probes with standard differential testing, the system can verify if the generated code actually understands the rules of the problem, rather than just guessing the right answer.
VRPCoder and Performance
The authors applied this method to the Vehicle Routing Problem (VRP), a complex field where multiple operational constraints—such as vehicle capacity, time windows, and depot rules—must be perfectly balanced. They developed "VRPCoder," an 8B parameter model trained using this dual-verification process. By using constraint injection as both a filter for training data and a reward signal during reinforcement learning (GRPO), the model achieved significant improvements.
Across four VRP benchmarks, VRPCoder-GRPO reached a 93% average Pass@1 rate. It outperformed several state-of-the-art models, including Gemini-3.1-Pro Preview and Claude-Sonnet-4.5, and surpassed previous operations research-focused LLMs by a wide margin.
Key Takeaways
The research demonstrates that for complex, constraint-dense tasks, verifying the "what" (the final answer) is insufficient; one must also verify the "how" (the logic of the constraints). By shifting the focus from simple objective matching to constraint-level verification, the authors provide a more robust framework for building AI that can reliably handle real-world optimization challenges. The study also provides an expert-verified benchmark suite covering 21 different VRP variants to help future research in this area.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!