AutoMem: Automated Learning of Memory as a Cognitive Skill
Large Language Models (LLMs) often struggle with long-horizon tasks because their "working memory"—the context window—is limited. While humans use external tools like notes and files to extend their memory, LLMs typically rely on fixed, pre-designed memory systems. This paper introduces AutoMem, a framework that treats memory management as a trainable skill. Instead of building a static memory module, AutoMem gives the model the ability to use file-system operations (like reading, writing, and searching) as a first-class action, allowing the model to decide for itself what to remember and how to organize it.
Automating Memory Improvement
AutoMem optimizes memory through two automated loops, both driven by a "meta-LLM" that reviews entire episode logs—a task that would be impractical for humans to do manually. In the first loop, the meta-LLM acts as a code reviewer, analyzing the agent's performance and iteratively revising the "scaffold." This includes updating the prompts, file schemas, and the rules for how the agent interacts with its memory. In the second loop, the meta-LLM acts as a training engine, identifying the agent's most successful memory decisions from past episodes and using them to finetune a dedicated "memory specialist" model.
Separating Memory from Action
A key design choice in AutoMem is the separation of concerns. The framework uses two model instances: a "memory specialist" that handles file operations and a "gameplay model" that executes world actions. Because the gameplay model remains unmodified, the agent retains its original task competence while its ability to manage information is sharpened. This separation ensures that improvements in memory proficiency do not interfere with the model’s ability to perform tasks, allowing the two skills to stack and provide a cumulative performance boost.
High-Leverage Results
The researchers tested AutoMem on three complex, procedurally generated games: Crafter, MiniHack, and NetHack. By optimizing memory management alone—without changing the base model’s task-action weights—the framework improved performance by 2x to 4x. This approach allowed a 32B open-weight model to reach performance levels comparable to frontier proprietary systems like Claude Opus 4.5 and Gemini 3.1 Pro Thinking. These results suggest that teaching an LLM how to manage its own memory is a highly effective way to solve long-horizon tasks, often proving more impactful than simply increasing the model's scale.
Why This Matters
The success of AutoMem demonstrates that memory management is an independently learnable skill. By providing the model with a traceable, file-based memory system and using meta-LLMs to automate the refinement of that system, the framework overcomes the "bottleneck" of the context window. This research highlights that for long-term tasks, the ability to organize and retrieve information is just as critical as the model's underlying reasoning capabilities.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!