VeriEvol is a framework designed to improve how AI models learn visual mathematical reasoning. As models are trained on increasingly large datasets, the quality of the "answer keys" used for training becomes a critical bottleneck. If these labels are incorrect or unreliable, the model learns to repeat those mistakes. VeriEvol addresses this by treating data construction as a two-part challenge: making questions more difficult through systematic evolution and ensuring answers are verified through a rigorous, independent falsification process before they are ever used to train a model.
Evolving Question Difficulty
To move beyond simple questions that models can answer using basic text knowledge, VeriEvol uses a "type-aware" evolution module. Instead of applying a generic "make this harder" command to every image, the system categorizes questions into specific families—such as geometry, charts, or OCR tasks. It then uses specialized operators to rewrite these seeds into more complex, image-grounded prompts. A strict filtering gate ensures that the new questions actually require the image to be solved, preventing the model from relying on text-based shortcuts.
The HTV-Agent Verification Process
The core innovation of the framework is the HTV-Agent, a "hypothesis-test" verifier. Rather than simply trusting an initial answer, the system treats every generated answer as a hypothesis that must be proven. It uses multiple independent solvers to generate potential answers and then employs a series of "refutation" channels. These channels use code-based logic and visual analysis (such as checking bounding boxes or pixel-level data) to actively look for reasons why an answer might be wrong. Only if an answer survives these attempts at refutation—and passes a final, deterministic consensus check—is it accepted into the training data.
Scaling and Performance
The researchers found that this approach scales effectively. By increasing the volume of verified data, they observed consistent improvements in model performance across five different visual-math benchmarks. When keeping the model architecture and training recipe constant, the VeriEvol approach provided a significant boost in accuracy compared to un-evolved baselines. This gain was attributed to both the higher quality of the evolved prompts and the reliability provided by the HTV-Agent verifier.
Transparency and Traceability
A key feature of VeriEvol is its commitment to transparency. The authors have released the full "verifier trace" for every sample, which includes the original solver hypotheses, the counter-evidence reports, and the final decision-making rationale. By providing this level of detail, the researchers aim to allow other developers to audit the construction process, understand why specific data points were included or rejected, and extend the pipeline for future research rather than simply using the final model outputs.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!