User as Engram: Internalizing Per-User Memory as Lo...

User as Engram: Internalizing Per-User Memory as Lo... | AI Research

Key Takeaways

User as Engram: Internalizing Per-User Memory as Local Parametric Edits This paper addresses the challenge of personalizing language models for millions of u...
Personal memory in a language model is two problems: content and reasoning skill.
The brain keeps the two apart (a sparse, local engram in the hippocampus for each episode, a slow neocortex for the shared skills that interpret it), so a new fact need not overwrite everything else.
Most personalization today keeps a user's facts outside the weights, in a natural-language memory file or a retrieval index.
When facts are written into the model instead, the standard recipe is the per-user LoRA adapter, which does the opposite of the brain, folding content and skill into one global weight delta.

Paper AbstractExpand

Personal memory in a language model is two problems: content and reasoning skill. The brain keeps the two apart (a sparse, local engram in the hippocampus for each episode, a slow neocortex for the shared skills that interpret it), so a new fact need not overwrite everything else. Most personalization today keeps a user's facts outside the weights, in a natural-language memory file or a retrieval index. When facts are written into the model instead, the standard recipe is the per-user LoRA adapter, which does the opposite of the brain, folding content and skill into one global weight delta. Writing a user's facts as a LoRA contaminates text unrelated to them; writing the same facts as local Engram rows leaves it mathematically untouched, resulting in a roughly 33,000x smaller memory footprint. We therefore propose User as Engram: store a user's content as surgical edits to the hash-keyed memory table of an Engram model, and carry the reasoning skill in one shared adapter. This layered design matches per-user LoRA's direct recall while delivering 5.6x higher indirect-reasoning accuracy on average, and never makes a single user worse at reasoning than the untouched base. The edit is a glass box: writing a fact switches on its lookup at exactly the trigger, adds the value the answer needs, leaves every other position unchanged to the last bit, and fails if written into the wrong layer. Because different users' facts land in disjoint hash slots, their edits compose: many users live in one shared table at once, stacking additively and losslessly, where a per-user LoRA, a single global weight delta, admits only one. Upon retrieval, a per-user Engram table does not grow with the population the retriever must search, so past ~100 facts it overtakes a retrieval pipeline on a 2.5x larger model.

User as Engram: Internalizing Per-User Memory as Local Parametric Edits
This paper addresses the challenge of personalizing language models for millions of users. Current methods typically fall into two categories: storing facts in external files (retrieval) or training per-user LoRA adapters. The author argues that these approaches either incur high context costs or "contaminate" the model by folding personal facts into global weights, which degrades reasoning skills. The proposed solution, "User as Engram," splits memory into two layers: a shared adapter for reasoning and a hash-keyed memory table for storing individual facts. This design mimics the brain’s structure, where specific episodes are stored as sparse, local traces while general reasoning skills remain separate.

A Two-Layered Approach to Memory

The core of this method is the separation of content and reasoning. The reasoning skill is held in a single, shared LoRA adapter that is trained once and used by everyone. Personal facts, however, are written as surgical edits to a memory table within an Engram-based model. Because these facts are stored in specific, hash-keyed memory slots, they are only accessed when the model encounters the relevant trigger N-gram. This ensures that a user's private data is isolated and does not interfere with the model’s general ability to reason.

Surgical Edits vs. Global Changes

Unlike a LoRA adapter, which modifies the entire model's weight distribution and impacts unrelated text, the Engram approach is highly precise. When a fact is written, it only affects the specific memory rows associated with that fact's trigger. The paper demonstrates that this "glass box" approach is mathematically exact: every position in the model remains unchanged to the last bit, except for the specific trigger point. This locality allows for massive composability, meaning many users can store their facts in the same shared table without any cross-user leakage.

Performance and Scalability

The layered design provides significant advantages over existing methods. It matches the direct recall performance of per-user LoRA adapters while delivering 5.6x higher accuracy on indirect reasoning tasks. Furthermore, because the memory table does not grow with the number of users, it remains efficient as the population scales. The paper shows that once a user has more than about 100 facts, this approach outperforms traditional retrieval-augmented generation (RAG) pipelines, even when those pipelines are running on models 2.5x larger.

Key Considerations

The effectiveness of this method depends on the strength of the base model. On a base language model, global weight edits like LoRA can disrupt performance, whereas the Engram approach remains stable. On instruction-tuned models, the shared reasoning adapter is able to absorb the information from the Engram table effectively. By keeping the personal memory store separate from the reasoning backbone, the system avoids the "recall-without-reasoning" gap often seen in other personalization techniques, ensuring that the model can not only remember facts but also reason over them accurately.