Back to AI Research

AI Research

From Unstructured Recall to Schema-Grounded Memory:... | AI Research

Key Takeaways

  • From Unstructured Recall to Schema-Grounded Memory: Reliable AI Memory via Iterative, Schema-Aware Extraction This paper addresses a fundamental limitation i...
  • Persistent AI memory is often reduced to a retrieval problem: store prior interactions as text, embed them, and ask the model to recover relevant context later.
  • These operations require memory to behave less like search and more like a system of record.
  • This paper argues that reliable external AI memory must be schema-grounded.
  • Schemas define what must be remembered, what may be ignored, and which values must never be inferred.
Paper AbstractExpand

Persistent AI memory is often reduced to a retrieval problem: store prior interactions as text, embed them, and ask the model to recover relevant context later. This design is useful for thematic recall, but it is mismatched to the kinds of memory that agents need in production: exact facts, current state, updates and deletions, aggregation, relations, negative queries, and explicit unknowns. These operations require memory to behave less like search and more like a system of record. This paper argues that reliable external AI memory must be schema-grounded. Schemas define what must be remembered, what may be ignored, and which values must never be inferred. We present an iterative, schema-aware write path that decomposes memory ingestion into object detection, field detection, and field-value extraction, with validation gates, local retries, and stateful prompt control. The result shifts interpretation from the read path to the write path: reads become constrained queries over verified records rather than repeated inference over retrieved prose. We evaluate this design on structured extraction and end-to-end memory benchmarks. On the extraction benchmark, the judge-in-the-loop configuration reaches 90.42% object-level accuracy and 62.67% output accuracy, above all tested frontier structured-output baselines. On our end-to-end memory benchmark, xmemory reaches 97.10% F1, compared with 80.16%-87.24% across the third-party baselines. On the application-level task, xmemory reaches 95.2% accuracy, outperforming specialised memory systems, code-generated Markdown harnesses, and customer-facing frontier-model application harnesses. The results show that, for memory workloads requiring stable facts and stateful computation, architecture matters more than retrieval scale or model strength alone.

From Unstructured Recall to Schema-Grounded Memory: Reliable AI Memory via Iterative, Schema-Aware Extraction
This paper addresses a fundamental limitation in current AI memory systems, which typically rely on storing interactions as unstructured text and retrieving them via semantic search. While this approach works for broad thematic recall, it fails when agents need to perform precise tasks like tracking state, managing updates, or answering factual questions. The authors argue that for production-grade AI, memory must function like a system of record rather than a search engine. They propose "schema-grounded memory," an architecture that shifts the burden of interpretation from the read path to the write path, ensuring that stored information is verified, structured, and reliable.

The Problem with Unstructured Memory

Most existing AI memory systems use embeddings to find "similar" text. The authors point out that this is a heuristic, not a guarantee of truth. Because these systems do not distinguish between facts and narrative, they are prone to "substitution failures," where the model guesses a plausible-looking answer when the actual fact is missing or ambiguous. Furthermore, semantic search cannot natively handle computational tasks like aggregation, negation (e.g., "what was never discussed"), or tracking state changes over time. When memory is treated as a collection of unstructured text, the model must re-interpret the data every time it is queried, which leads to compounding errors and long-term drift.

A Schema-Grounded Approach

To solve these issues, the authors introduce a system called xmemory. Instead of storing raw text, the system uses a schema—a formal contract—that defines exactly what information must be captured, what can be ignored, and which values must never be inferred. The ingestion process is broken down into an iterative write path:

  • Object and Field Detection: The system identifies relevant entities and their associated attributes.

  • Extraction and Validation: The system extracts specific values and runs them through validation gates to ensure they meet the schema’s requirements.

  • Stateful Control: By using local retries and structured prompts, the system ensures that if a piece of information is missing or unclear, it is flagged rather than guessed.
    This architecture ensures that when an agent queries the memory, it performs a constrained, deterministic lookup over verified records rather than attempting to infer facts from retrieved prose.

Performance and Reliability

The authors evaluated their approach against several benchmarks, including structured extraction and end-to-end memory tasks. The results show that their schema-grounded design significantly outperforms traditional retrieval methods. On their end-to-end memory benchmark, xmemory achieved 97.10% F1, compared to 80.16%–87.24% for third-party baselines. In application-level tasks, the system reached 95.2% accuracy. These findings suggest that for workloads requiring stable facts and stateful computation, the underlying architecture—specifically the move toward structured, schema-governed records—is more critical than simply increasing the scale of retrieval or the strength of the model.

Limitations and Considerations

The authors acknowledge that schema-grounded memory is not a universal solution. It is designed specifically for scenarios where correctness, exactness, and state management are paramount. For exploratory search or broad thematic recall, traditional unstructured retrieval remains useful. Additionally, the system requires a well-defined schema, which introduces more complexity during the initial write phase compared to simple text storage. The authors emphasize that this is a deliberate architectural trade-off: by increasing the complexity of the write path, they gain the stability, inspectability, and reliability required for production-level AI agents.

Comments (0)

No comments yet

Be the first to share your thoughts!