TokenMizer: Graph-Structured Session Memory for Long-Horizon LLM Context Management
Large language models often struggle with long-running tasks because their "context window"—the amount of information they can remember at once—is finite. When a work session exceeds this limit, the model begins to forget important details like architectural decisions, file histories, and task statuses. Existing solutions typically treat conversation history as a flat stream of text, which often leads to the loss of critical, structured information. TokenMizer addresses this by acting as a transparent proxy that organizes session history into a typed knowledge graph, allowing the model to "remember" the structure and rationale of a project rather than just the raw text.
How TokenMizer Works
Instead of storing history as a simple list of messages, TokenMizer models a session as a knowledge graph consisting of 14 node types (such as tasks, files, and decisions) and 7 edge types (such as "implements" or "fixes"). This system uses a hybrid extraction pipeline to identify these elements in real-time. When a session approaches its memory limit, the system uses a three-tier checkpoint process to create a "resume block"—a compact, structured summary of the graph. This allows the model to maintain continuity across long sessions without needing to store every previous interaction.
Efficiency and Performance
TokenMizer is designed to be highly efficient, achieving a 47.3% reduction in token usage through a heuristic compression pipeline that requires no external dependencies. In tests across 21 sessions spanning five different domains—including software engineering, data science, and debugging—TokenMizer produced resume blocks that were twice as small as those generated by standard baseline methods. Despite this smaller size, it achieved higher recall for key information, such as why a specific technology was chosen or how a task was completed.
Key Advantages
The primary strength of TokenMizer is its ability to preserve the "why" behind a decision. While traditional methods might only note that a technology like "Redis" was mentioned, TokenMizer captures the rationale behind that choice, which is vital for complex, multi-step tasks. By using a transparent proxy, the system can be integrated into existing workflows without requiring changes to the underlying application code. It also includes a semantic cache to reduce latency for repeated queries, further optimizing the performance of long-horizon tasks.
Limitations and Future Work
While TokenMizer performs well in structured environments like software engineering, its effectiveness can vary depending on the domain. Sessions that rely on implicit reasoning or planning are more difficult to capture than those using explicit, imperative language. Additionally, the current results are based on a controlled, synthetic benchmark; the author notes that testing the system on live, real-world developer sessions is the most important next step for future research. Currently, the system relies on a heuristic-based extraction method, with more advanced LLM-based extraction reserved for future development.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!