User as Engram: Internalizing Per-User Memory as Local Parametric Edits
This paper addresses the challenge of personalizing language models for millions of users. Current methods typically fall into two categories: storing facts in external files (retrieval) or training per-user LoRA adapters. The author argues that these approaches either incur high context costs or "contaminate" the model by folding personal facts into global weights, which degrades reasoning skills. The proposed solution, "User as Engram," splits memory into two layers: a shared adapter for reasoning and a hash-keyed memory table for storing individual facts. This design mimics the brain’s structure, where specific episodes are stored as sparse, local traces while general reasoning skills remain separate.
A Two-Layered Approach to Memory
The core of this method is the separation of content and reasoning. The reasoning skill is held in a single, shared LoRA adapter that is trained once and used by everyone. Personal facts, however, are written as surgical edits to a memory table within an Engram-based model. Because these facts are stored in specific, hash-keyed memory slots, they are only accessed when the model encounters the relevant trigger N-gram. This ensures that a user's private data is isolated and does not interfere with the model’s general ability to reason.
Surgical Edits vs. Global Changes
Unlike a LoRA adapter, which modifies the entire model's weight distribution and impacts unrelated text, the Engram approach is highly precise. When a fact is written, it only affects the specific memory rows associated with that fact's trigger. The paper demonstrates that this "glass box" approach is mathematically exact: every position in the model remains unchanged to the last bit, except for the specific trigger point. This locality allows for massive composability, meaning many users can store their facts in the same shared table without any cross-user leakage.
Performance and Scalability
The layered design provides significant advantages over existing methods. It matches the direct recall performance of per-user LoRA adapters while delivering 5.6x higher accuracy on indirect reasoning tasks. Furthermore, because the memory table does not grow with the number of users, it remains efficient as the population scales. The paper shows that once a user has more than about 100 facts, this approach outperforms traditional retrieval-augmented generation (RAG) pipelines, even when those pipelines are running on models 2.5x larger.
Key Considerations
The effectiveness of this method depends on the strength of the base model. On a base language model, global weight edits like LoRA can disrupt performance, whereas the Engram approach remains stable. On instruction-tuned models, the shared reasoning adapter is able to absorb the information from the Engram table effectively. By keeping the personal memory store separate from the reasoning backbone, the system avoids the "recall-without-reasoning" gap often seen in other personalization techniques, ensuring that the model can not only remember facts but also reason over them accurately.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!