Large Reasoning Models (LRMs) are powerful, but they often struggle with complex, challenging problems. While techniques like repeated sampling or tree search can help, they often waste computational resources by applying the same heavy-duty strategy to every problem, regardless of its difficulty. This paper introduces a training-free framework that treats test-time scaling as a routing problem, dynamically selecting the most efficient strategy based on how much the model disagrees with itself.
Identifying Difficulty Through Disagreement
The core insight of this research is that "output disagreement"—how often a model produces different answers for the same problem—is a reliable indicator of both problem difficulty and the likelihood of an incorrect prediction. When a model consistently produces the same answer, the problem is likely easy, and no extra work is needed. When the model produces conflicting answers, it signals that the problem is ambiguous or difficult, requiring a more sophisticated approach.
A Three-Stage Routing Strategy
Instead of blindly applying one method to every task, the framework routes instances through three stages based on the level of disagreement detected:
Disagreement Filter: The model performs two initial samplings. If the answers match, the problem is considered "easy," and the result is accepted immediately, saving computational power.
Vote Resolve: If there is minor disagreement, the model performs additional sampling. It then uses majority voting to select the most reliable answer from the combined pool of results.
Rewrite & Rethink: For instances with severe, persistent disagreement, the model reformulates the problem statement. By changing the surface expression of the question while keeping the underlying meaning, the model can often escape the incorrect reasoning path that led to the initial confusion.
Efficiency and Performance Gains
By intelligently routing problems, the framework avoids redundant computations on simple tasks and focuses resources where they are most needed. Experiments across seven mathematical benchmarks and three different models show that this approach improves accuracy by 3% to 7% compared to traditional methods. Notably, these gains are achieved while using fewer total samplings, making the process both more accurate and more efficient.
Broader Applicability
The researchers also tested their framework on code generation tasks, where they measured disagreement based on whether different code snippets produced the same functional output. The results suggest that this strategy-routing approach is not limited to math; it effectively improves performance in other reasoning-heavy domains, proving that a flexible, uncertainty-aware strategy is often more effective than a "one-size-fits-all" approach to test-time scaling.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!