Measuring Black-Box Confidence via Reasoning Trajectories: Geometry, Coverage, and Verbalization
This research addresses a critical challenge in AI: how to determine if a model is "confident" in its reasoning when using text-only APIs. Currently, the standard approach is "self-consistency," which involves generating many answers and checking how often they agree. This method is expensive and ignores the actual content of the reasoning process. The authors propose a new, more efficient way to measure confidence by analyzing the "geometry" of the reasoning trace itself—essentially tracking how the model’s internal representation of its thoughts moves toward a correct answer in a mathematical space, without needing access to the model's internal logs or hidden states.
Tracking Reasoning Geometry
Instead of just looking at the final answer, the researchers treat a chain-of-thought (CoT) as a series of sliding windows. They embed these windows into a high-dimensional space and measure their distance to known "answer anchors." By applying a one-parameter softmax model, they can calculate a continuous confidence score. This approach reveals that the model’s reasoning trajectory often points toward the correct answer before it explicitly states it. By focusing on the "penultimate" (second-to-last) window of the reasoning process, the method avoids the noise of the final sentence, where a model might simply repeat its chosen answer regardless of whether that answer is actually correct.
A Three-Channel Approach
The researchers decompose confidence into three distinct signals:
Coverage (C): A judge-mediated prior that accounts for the inherent difficulty of the question.
Geometry (G): The spatial movement of the reasoning trace toward the correct answer.
Verbalization (V): The model's own stated confidence.
By combining these three channels, the team found they could achieve better performance than the traditional self-consistency method while using fewer samples. This fusion of signals proved robust across different benchmarks, such as MedQA and GPQA, and different models, including Gemini 3.1 Pro and Claude Sonnet 4.6.
Key Findings and Reliability
The study demonstrates that the geometric signal is a powerful, independent predictor of accuracy. In tests across 18 different settings, the Geometry and Coverage channels provided unique, reliable information in almost every case. A notable mechanistic discovery is the "terminal flip": on difficult benchmarks like GPQA Diamond, the trajectory's alignment with the correct answer often inverts in the final window. This confirms that the model is committing to a specific (and sometimes incorrect) answer at the very end of its output, further validating the researchers' decision to focus on the penultimate window to capture the most accurate signal of the model's true reasoning.
Practical Implications
This method offers a way to perform "selective classification," where a system can decide whether to trust its own reasoning or abstain from answering based on a calculated confidence score. Because this approach does not require access to a model's internal logits or hidden states, it is highly practical for developers working with commercial, text-only AI APIs. By shifting the focus from simple answer-voting to the geometric path of the reasoning itself, the authors provide a more nuanced and cost-effective tool for ensuring the reliability of AI-generated chains of thought.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!