Beyond Probabilistic Similarity: Structural, Temporal, and Causal Limitations of Retrieval-Augmented Generation in the Legal Domain argues that the frequent failures of AI in legal settings—such as fabricated citations or outdated information—are not just technical glitches to be fixed by better training. Instead, they are the result of a fundamental mismatch between how modern AI retrieves information and the rigid, rule-based nature of the legal system. The paper proposes that to make legal AI reliable, we must move away from systems that rely on "probabilistic similarity" and toward architectures that respect the formal, institutional structure of the law.
The Mismatch Between AI and Legal Logic
Current AI systems, particularly those using Retrieval-Augmented Generation (RAG), are designed to find information that "looks" similar to a user's query. While this works for general internet searches, it fails in law because legal validity is not a matter of semantic similarity. A legal rule is valid only if it was created through a specific, authorized process and remains in force at a particular time. Because standard AI treats legal texts like any other data, it often ignores the hierarchical structure of laws and the specific dates when rules were enacted or repealed, leading to "hallucinations" that are technically plausible but legally incorrect.
Three Pathologies of Legal Retrieval
The paper identifies three specific "pathologies" that cause current AI systems to fail in the legal domain:
Mereological Blindness: The failure to understand the "part-whole" structure of law. Legal provisions often depend on their position within a larger document (like a specific chapter or article). When AI retrieves a fragment of text in isolation, it loses the context necessary to understand what the law actually means.
Diachronic Blindness: The failure to track how laws change over time. Legal systems are dynamic, with rules being amended, suspended, or reinterpreted by courts. AI often struggles to reconstruct the exact state of a law on a specific past date, leading to the use of outdated or "anachronistic" information.
Causal Opacity: The failure to provide the "institutional provenance" of a rule. In law, it is not enough to know what a rule says; one must be able to trace the official chain of acts that makes that rule valid. Current systems rarely explain the authority behind the information they provide.
A New Architectural Direction
To solve these problems, the author proposes a "deterministic-by-design" approach. This does not mean abandoning AI, but rather changing how it interacts with legal data. The author suggests four core commitments for future legal AI systems:
- Ontological Primacy: Prioritizing the formal structure of the legal system over statistical patterns. 2. Event Reification: Treating legal changes (like amendments or court rulings) as distinct, trackable events. 3. Bitemporal Correctness: Independently tracking when a law was recorded and when it actually became legally effective. 4. Deterministic Interaction Protocols: Ensuring that critical tasks—such as finding the current version of a law or tracing its history—are handled by explicit, auditable mechanisms rather than by the unpredictable "guessing" of a language model.
Scope and Limitations
This framework is specifically designed to address the quaestio juris—the question of which legal norms apply and in what state. It does not attempt to solve every problem in legal AI, such as judicial discretion or the evaluation of factual evidence in a trial. The primary focus is on legislative and constitutional retrieval, providing a foundation that ensures the AI is working with the correct, valid, and properly sourced legal rules before any further reasoning or analysis takes place.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!