Back to AI Research

AI Research

$δ$-mem: Efficient Online Memory for Large Language... | AI Research

Key Takeaways

  • Large language models (LLMs) are increasingly used as long-term assistants, but they often struggle to retain and reuse historical information effectively.
  • Large language models increasingly need to accumulate and reuse historical information in long-term assistants and agent systems.
  • Simply expanding the context window is costly and often fails to ensure effective context utilization.
  • We propose $\delta$-mem, a lightweight memory mechanism that augments a frozen full-attention backbone with a compact online state of associative memory.
  • $\delta$-mem compresses past information into a fixed-size state matrix updated by delta-rule learning, and uses its readout to generate low-rank corrections to the backbone's attention computation during generation.
Paper AbstractExpand

Large language models increasingly need to accumulate and reuse historical information in long-term assistants and agent systems. Simply expanding the context window is costly and often fails to ensure effective context utilization. We propose $\delta$-mem, a lightweight memory mechanism that augments a frozen full-attention backbone with a compact online state of associative memory. $\delta$-mem compresses past information into a fixed-size state matrix updated by delta-rule learning, and uses its readout to generate low-rank corrections to the backbone's attention computation during generation. With only an $8\times8$ online memory state, $\delta$-mem improves the average score to $1.10\times$ that of the frozen backbone and $1.15\times$ that of the strongest non-$\delta$-mem memory baseline. It achieves larger gains on memory-heavy benchmarks, reaching $1.31\times$ on MemoryAgentBench and $1.20\times$ on LoCoMo, while largely preserving general capabilities. These results show that effective memory can be realized through a compact online state directly coupled with attention computation, without full fine-tuning, backbone replacement, or explicit context extension.

Large language models (LLMs) are increasingly used as long-term assistants, but they often struggle to retain and reuse historical information effectively. Simply increasing the context window is expensive and does not always guarantee that the model will actually utilize the stored information. To address this, the paper introduces $\delta$-mem, a lightweight memory mechanism that allows frozen LLMs to maintain a compact, online state of associative memory without requiring full fine-tuning or replacing the model's backbone.

How $\delta$-mem Works

Instead of expanding the context window, $\delta$-mem adds a small, fixed-size state matrix to the model. This matrix acts as an associative memory that compresses past information. The system uses "delta-rule learning" to update this state in real-time. During the generation process, the model reads from this state to create low-rank corrections, which are then applied directly to the backbone’s attention computation. This allows the model to incorporate historical context without the need for explicit context extension.

Performance and Efficiency

The researchers found that $\delta$-mem is highly efficient, requiring only an $8\times8$ online memory state to achieve significant improvements. By coupling this compact memory directly with the attention mechanism, the model performs better than both the original frozen backbone and other non-$\delta$-mem memory baselines. On average, the system improved scores to $1.10\times$ that of the frozen backbone and $1.15\times$ that of the strongest memory baseline.

Impact on Specialized Tasks

The benefits of $\delta$-mem are particularly noticeable in memory-intensive scenarios. On the MemoryAgentBench benchmark, the model reached $1.31\times$ the performance of the baseline, and on LoCoMo, it reached $1.20\times$. Importantly, these gains were achieved while largely preserving the model's general capabilities. These results suggest that effective long-term memory can be integrated into existing models through a compact, online state, avoiding the high costs associated with full fine-tuning or architectural overhauls.

Comments (0)

No comments yet

Be the first to share your thoughts!