AI Research

SearchSwarm: Towards Delegation Intelligence in Age... | AI Research

Key Takeaways

SearchSwarm addresses a fundamental challenge in artificial intelligence: how to enable large language models (LLMs) to perform complex, long-horizon researc...
Large language models are increasingly expected to handle complex, long-horizon real-world tasks whose context demands can grow without bound, yet model context windows remain inherently finite.
Recent work explores a paradigm where a main agent decomposes tasks and dispatches subtasks to subagents, which execute and return only summarized results, conserving the main agent's context budget.
However, performing this well requires delegation intelligence: the ability to decompose complex tasks, determine when and what to delegate, and integrate returned results into the ongoing workflow.
To bridge this gap, we present a preliminary exploration targeting deep research, a representative long-horizon agent task.

Paper AbstractExpand

Large language models are increasingly expected to handle complex, long-horizon real-world tasks whose context demands can grow without bound, yet model context windows remain inherently finite. Recent work explores a paradigm where a main agent decomposes tasks and dispatches subtasks to subagents, which execute and return only summarized results, conserving the main agent's context budget. However, performing this well requires delegation intelligence: the ability to decompose complex tasks, determine when and what to delegate, and integrate returned results into the ongoing workflow. Training data for this capability is scarce in naturally occurring text, and to our knowledge, how to synthesize such data and train models to acquire this capability remains largely unexplored in the open-source community. To bridge this gap, we present a preliminary exploration targeting deep research, a representative long-horizon agent task. Specifically, we design a harness that guides the model toward high-quality task decomposition and delegation, while constraining subagents to return results properly to support the main agent's workflow. The harness-guided trajectories naturally encode correct delegation decisions, which we use as supervised fine-tuning data to internalize delegation intelligence into model weights. Our resulting model, SearchSwarm-30B-A3B, achieves 68.1 on BrowseComp and 73.3 on BrowseComp-ZH, the best results among all models of comparable scale. We will release our harness, model weights, and training data to facilitate future research.

SearchSwarm addresses a fundamental challenge in artificial intelligence: how to enable large language models (LLMs) to perform complex, long-horizon research tasks without exceeding their finite memory, or "context window." When an AI attempts to solve a massive, multi-step problem, it often runs out of space to track its own reasoning. This paper introduces "delegation intelligence," a method where a main agent acts as a project manager, breaking down large tasks into smaller, manageable pieces and assigning them to subagents. By receiving only summarized reports back from these subagents, the main agent can maintain focus on the overall goal without getting overwhelmed by excessive data.

The Power of Delegation

The core of the SearchSwarm approach is a specialized "harness"—a set of instructions and tools that guides the model on how to manage its workload. Instead of trying to do everything at once, the main agent uses a tool called call_sub_agent to dispatch specific research tasks. The harness ensures the main agent provides a "comprehensive brief" to the subagent, explaining not just what to do, but why it matters and what has already been discovered. This prevents the subagent from wasting time on redundant searches. Once the subagent finishes, it returns a condensed report with verified citations, allowing the main agent to synthesize the final answer while keeping its own memory clear for high-level decision-making.

Training for Intelligence

The researchers found that simply giving a model a delegation tool is not enough; the model must be trained to understand when and how to delegate. To achieve this, they created a dataset of high-quality "delegation trajectories." By using their harness to guide models through research tasks, they recorded successful examples of task decomposition and result integration. They then used this data to fine-tune the model, effectively teaching it to internalize the logic of a project manager. This training process ensures the model learns to prioritize its own limited "attention" for complex reasoning while offloading repetitive data gathering to subagents.

Performance and Impact

The resulting model, SearchSwarm-30B-A3B, demonstrates that a smaller, more efficient model can compete with much larger systems when equipped with the right delegation strategy. It achieved top-tier results on several research-heavy benchmarks, such as BrowseComp and GAIA, outperforming other models of a similar size. The researchers noted that without this specific training, models often fail to use delegation tools effectively, even when they are available. By open-sourcing their harness, training data, and model weights, the team aims to provide a foundation for future research into how AI agents can better coordinate to solve increasingly complex, real-world problems.

Comments (0)

No comments yet

Be the first to share your thoughts!