LongSeeker: Elastic Context Orchestration for Long-Horizon Search Agents
Long-horizon search agents often struggle as they perform complex tasks. As these agents reason, use tools, and gather information, their "working memory" becomes cluttered with redundant or irrelevant data. This accumulation increases costs, slows down performance, and raises the risk of errors or hallucinations. This paper introduces LongSeeker, an agent designed to manage its own context dynamically. By using a new paradigm called Context-ReAct, the agent can actively reshape its memory, keeping only what is necessary to solve a task effectively.
A New Way to Manage Memory
The core of this research is the Context-ReAct paradigm, which allows an agent to treat its memory as an elastic, adjustable space. Instead of simply recording every step of a search, the agent is trained to perform specific "meta-operations" alongside its reasoning and tool use. These operations allow the agent to decide in real-time how to organize its history. By co-generating these decisions with its standard reasoning, the agent learns to maintain a high-quality, relevant context throughout the entire duration of a long-horizon task.
The Five Atomic Operations
To provide precise control over memory, the researchers developed five specific actions that the agent can perform on its own history:
Skip: Keeps the current context as is when it is already efficient.
Compress: Summarizes a range of past steps into a concise abstract, helping to clear out clutter.
Rollback: Abandons a failed or unproductive line of reasoning and returns to an earlier, more promising state.
Snippet: Extracts a specific, exact piece of information (like a number or code) to ensure accuracy without needing to summarize or rewrite it.
Delete: Completely removes a step that provides no value, reducing noise.
The researchers proved that these operations are "expressively complete," meaning they provide the necessary tools to transform any history into any desired state.
Performance and Results
The team developed LongSeeker by fine-tuning a Qwen3-30B-A3B model on 10,000 synthesized search trajectories. When tested against several benchmarks, including BrowseComp and BrowseComp-ZH, LongSeeker demonstrated significant improvements over existing search agents. For example, it achieved a 61.5% score on BrowseComp, notably outperforming other models like Tongyi DeepResearch and AgentFold. These results suggest that by actively managing its own memory, an agent can achieve more reliable and efficient reasoning, moving context management from a background task to a central part of how AI agents think.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!