Large language models (LLMs) are increasingly used as long-term assistants, but they often struggle to retain and reuse historical information effectively. Simply increasing the context window is expensive and does not always guarantee that the model will actually utilize the stored information. To address this, the paper introduces $\delta$-mem, a lightweight memory mechanism that allows frozen LLMs to maintain a compact, online state of associative memory without requiring full fine-tuning or replacing the model's backbone.
How $\delta$-mem Works
Instead of expanding the context window, $\delta$-mem adds a small, fixed-size state matrix to the model. This matrix acts as an associative memory that compresses past information. The system uses "delta-rule learning" to update this state in real-time. During the generation process, the model reads from this state to create low-rank corrections, which are then applied directly to the backbone’s attention computation. This allows the model to incorporate historical context without the need for explicit context extension.
Performance and Efficiency
The researchers found that $\delta$-mem is highly efficient, requiring only an $8\times8$ online memory state to achieve significant improvements. By coupling this compact memory directly with the attention mechanism, the model performs better than both the original frozen backbone and other non-$\delta$-mem memory baselines. On average, the system improved scores to $1.10\times$ that of the frozen backbone and $1.15\times$ that of the strongest memory baseline.
Impact on Specialized Tasks
The benefits of $\delta$-mem are particularly noticeable in memory-intensive scenarios. On the MemoryAgentBench benchmark, the model reached $1.31\times$ the performance of the baseline, and on LoCoMo, it reached $1.20\times$. Importantly, these gains were achieved while largely preserving the model's general capabilities. These results suggest that effective long-term memory can be integrated into existing models through a compact, online state, avoiding the high costs associated with full fine-tuning or architectural overhauls.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!