Theoria: Rewrite-Acceptability Verification over In...

Theoria is a verification architecture designed to determine when an AI system’s reasoning can be trusted. While formal proof assistants provide high certainty, they struggle with the complexity of natural-language problems. Conversely, standard AI judges often provide opaque scores that are difficult to audit. Theoria bridges this gap by requiring AI systems to rewrite their solutions into a sequence of "state transitions," where every change is backed by an explicit, auditable justification. This ensures that the reasoning process is transparent and that any hidden assumptions are surfaced as visible errors.

How Theoria Works

The core of Theoria is the "completeness-of-change" invariant. Every step in a proof must account for all differences between the previous state and the new state. A step is only accepted if it is supported by one of three specific justification types: a citation (theorems or definitions), a computation (math or logic), or a problem-given fact. By forcing the AI to explicitly label its reasoning, the system prevents "silent" assumptions—such as importing an unstated premise—from passing through the verification process unnoticed.

The Verification Process

The system operates through a loop of solving, formalizing, and judging. Once a solver proposes an answer, a formalizer converts it into a structured witness. Specialized judges then audit these steps in parallel, acting as adversaries tasked with finding errors. If a judge rejects a step, a "pedantry filter" determines if the rejection is based on a substantive error or merely a disagreement over formatting. If the rejection is substantive, the system can either trigger a repair loop or decline to certify the answer entirely, ensuring that only verified, high-confidence outputs are provided.

Key Empirical Results

Theoria was tested on 185 expert-level problems, where it certified 105 solutions with a 91.4% precision rate. In adversarial tests involving "poisoned" proofs, Theoria’s structured approach significantly outperformed holistic LLM judges, particularly in catching hidden premises and fabricated citations. The research confirms that this architecture is highly effective at identifying logical gaps that traditional judges often miss, while performing similarly to other methods on straightforward arithmetic or theorem-application tasks.

Limitations and Boundaries

Theoria is not a universal solution for all reasoning errors. The authors note that the system cannot detect "errors of interpretation," such as a subtle misunderstanding of the problem statement that occurs before the reasoning steps begin. If a reasoning error does not produce a detectable change in the state or violate the rules of the transition, it remains invisible to the verifier. Consequently, Theoria defines a clear boundary for trust: it provides a rigorous audit of the derivation process, but it remains dependent on the initial alignment between the problem statement and the starting state.

Theoria: Rewrite-Acceptability Verification over In... | AI Research

Key Takeaways

How Theoria Works

The Verification Process

Key Empirical Results

Limitations and Boundaries

Comments (0)

No comments yet