Back to AI Research

AI Research

Theoria: Rewrite-Acceptability Verification over In... | AI Research

Key Takeaways

  • Theoria is a verification architecture designed to determine when an AI system’s reasoning can be trusted.
  • When should an AI system's answer be trusted?
  • We present Theoria, a verification architecture that closes this gap.
  • The foundational invariant is completeness of change: every difference between consecutive proof states must be accounted for, so hidden premises surface as unlicensed mutations rather than passing silently.
  • On HLE-Verified Gold (185 text-only expert problems), Theoria certifies 105 at 91.4% strict precision (Wilson 95% CI [84.5%, 95.4%]).
Paper AbstractExpand

When should an AI system's answer be trusted? Formal proof assistants offer certainty but cannot reach most of the problem distribution; scalar LLM judges offer coverage but produce opaque scores that cannot be audited after the fact and are subject to the same coherence issues as any LLM. We present Theoria, a verification architecture that closes this gap. A candidate solution is rewritten into a sequence of typed state transitions, each licensed by an explicit justification, whether that be a citation, computation, or problem-given fact, and every transition is independently auditable. The foundational invariant is completeness of change: every difference between consecutive proof states must be accounted for, so hidden premises surface as unlicensed mutations rather than passing silently. On HLE-Verified Gold (185 text-only expert problems), Theoria certifies 105 at 91.4% strict precision (Wilson 95% CI [84.5%, 95.4%]). Every certification produces a human readable proof trace in which each step can be independently challenged. Holistic LLM judges achieve comparable precision at matched coverage but fail on different problems (Jaccard 0.14-0.36), making the approaches complementary. On 95 adversarial poisoned proofs across 15 domains, structured judges catch 94.7% versus 83.2% for holistic judging (p= 0.0017). The overall 11.5 pp gap concentrates in hidden premises (90.6% vs. 62.5%, a 28 pp difference) and fabricated citations (100% vs. 90%), the error classes where the formal analysis predicts an advantage; performance is identical on arithmetic and theorem-misapplication errors, where no advantage is predicted. On GPQA Diamond (n= 65), certified precision is 97.1% (Wilson CI [85.1%, 99.5%]).

Theoria is a verification architecture designed to determine when an AI system’s reasoning can be trusted. While formal proof assistants provide high certainty, they struggle with the complexity of natural-language problems. Conversely, standard AI judges often provide opaque scores that are difficult to audit. Theoria bridges this gap by requiring AI systems to rewrite their solutions into a sequence of "state transitions," where every change is backed by an explicit, auditable justification. This ensures that the reasoning process is transparent and that any hidden assumptions are surfaced as visible errors.

How Theoria Works

The core of Theoria is the "completeness-of-change" invariant. Every step in a proof must account for all differences between the previous state and the new state. A step is only accepted if it is supported by one of three specific justification types: a citation (theorems or definitions), a computation (math or logic), or a problem-given fact. By forcing the AI to explicitly label its reasoning, the system prevents "silent" assumptions—such as importing an unstated premise—from passing through the verification process unnoticed.

The Verification Process

The system operates through a loop of solving, formalizing, and judging. Once a solver proposes an answer, a formalizer converts it into a structured witness. Specialized judges then audit these steps in parallel, acting as adversaries tasked with finding errors. If a judge rejects a step, a "pedantry filter" determines if the rejection is based on a substantive error or merely a disagreement over formatting. If the rejection is substantive, the system can either trigger a repair loop or decline to certify the answer entirely, ensuring that only verified, high-confidence outputs are provided.

Key Empirical Results

Theoria was tested on 185 expert-level problems, where it certified 105 solutions with a 91.4% precision rate. In adversarial tests involving "poisoned" proofs, Theoria’s structured approach significantly outperformed holistic LLM judges, particularly in catching hidden premises and fabricated citations. The research confirms that this architecture is highly effective at identifying logical gaps that traditional judges often miss, while performing similarly to other methods on straightforward arithmetic or theorem-application tasks.

Limitations and Boundaries

Theoria is not a universal solution for all reasoning errors. The authors note that the system cannot detect "errors of interpretation," such as a subtle misunderstanding of the problem statement that occurs before the reasoning steps begin. If a reasoning error does not produce a detectable change in the state or violate the rules of the transition, it remains invisible to the verifier. Consequently, Theoria defines a clear boundary for trust: it provides a rigorous audit of the derivation process, but it remains dependent on the initial alignment between the problem statement and the starting state.

Comments (0)

No comments yet

Be the first to share your thoughts!