DuMate-DeepResearch: An Auditable Multi-Agent Syste...

DuMate-DeepResearch: An Auditable Multi-Agent Syste... | AI Research

Key Takeaways

DuMate-DeepResearch: An Auditable Multi-Agent System with Recursive Search and Rubric-Grounded Reasoning Deep Research (DR) systems are designed to handle co...
This technical report presents DuMate-DeepResearch, a multi-agent DR framework built on the Qianfan Agent Foundry.
DuMate-DeepResearch: An Auditable Multi-Agent System with Recursive Search and Rubric-Grounded Reasoning
Deep Research (DR) systems are designed to handle complex, open-ended inquiries by autonomously planning, gathering evidence, and writing detailed reports.
However, existing systems often struggle with long-term planning, the risk of hallucination, and a lack of transparency in how they reach their conclusions.

Paper AbstractExpand

Deep Research (DR) has emerged as a new agentic paradigm to tackle complex, open-ended research tasks, demanding systems that can iteratively frame problems, acquire evidence, verify sources, and synthesize long-form reports. In practice, however, current DR systems are constrained by four interrelated limitations: long-horizon planning over an underspecified scope, the bottleneck of decomposing and scheduling such tasks within a single agent, hallucination risk in long-form synthesis, and limited process auditability. This technical report presents DuMate-DeepResearch, a multi-agent DR framework built on the Qianfan Agent Foundry. The framework decouples the Agent Core, which handles task understanding, planning, and scheduling, from an extensible Tool Ecosystem for retrieval, evidence acquisition, and report rendering, making every intermediate decision and tool invocation explicitly traceable. Building on this infrastructure, DuMate-DeepResearch further introduces three mechanisms: (i) a graph-based dynamic planning strategy expands the research roadmap coarse-to-fine and continuously revises it through reflection, re-planning, backtracking, and parallel branching; (ii) a recursive two-level execution design delegates each complex search sub-task to an inner Search Agent that runs its own planning loop, isolating noisy retrieval and stabilizing long-horizon execution; (iii) a rubric-based test-time optimization mechanism dynamically generates task-specific quality criteria and uses them as live reasoning scaffolds for evidence-grounded synthesis and adaptive stopping. Across two deep research benchmarks, DuMate-DeepResearch establishes new state-of-the-art results: the best overall score (58.03%) on DeepResearch Bench, and the best overall score (61.95%) on DeepResearch Bench II while ranking first in information recall and analysis.

DuMate-DeepResearch: An Auditable Multi-Agent System with Recursive Search and Rubric-Grounded Reasoning
Deep Research (DR) systems are designed to handle complex, open-ended inquiries by autonomously planning, gathering evidence, and writing detailed reports. However, existing systems often struggle with long-term planning, the risk of hallucination, and a lack of transparency in how they reach their conclusions. DuMate-DeepResearch introduces a multi-agent framework built on the Qianfan Agent Foundry that addresses these issues by decoupling the system's "brain" (the Agent Core) from its "hands" (the Tool Ecosystem). This structure allows the system to remain highly organized, traceable, and capable of handling complex research tasks through a recursive, multi-layered approach.

A Decoupled and Transparent Architecture

The core innovation of this framework is the separation of the Agent Core—which manages task understanding, planning, and scheduling—from the Tool Ecosystem, which handles the actual retrieval and processing of information. By decoupling these components, every decision made by the system and every tool it invokes becomes an explicitly traceable artifact. This provides a level of auditability that allows users to inspect the entire research process, not just the final output, making the system more reliable for high-stakes research.

Dynamic Planning and Recursive Execution

To manage the complexity of long-horizon research, the system employs two key strategies. First, it uses a graph-based dynamic planner that views the research roadmap as an evolving map. Instead of following a rigid, step-by-step path, the system can reflect on its progress, backtrack when a search hits a dead end, and branch out into parallel lines of inquiry. Second, it uses a recursive execution design. When the main research agent encounters a complex sub-task, it delegates that work to an "inner" search agent. This inner agent runs its own independent planning loop, which isolates the "noise" of web searching from the high-level research strategy and prevents local failures from destabilizing the entire project.

Rubric-Grounded Reasoning

To ensure the quality and factual accuracy of the final report, the system utilizes a rubric-based optimization mechanism. As the research progresses, the system dynamically generates task-specific quality criteria. These criteria act as "reasoning scaffolds" that guide the agent in gathering evidence and determining when it has collected enough information to stop searching. By using these rubrics to ground its findings, the system reduces the risk of hallucination and ensures that the final synthesis is directly supported by the evidence it has retrieved.

Performance and Results

The effectiveness of the DuMate-DeepResearch framework was tested against two deep research benchmarks. The system achieved state-of-the-art results, securing the highest overall scores on both DeepResearch Bench (58.03%) and DeepResearch Bench II (61.95%). Notably, it ranked first in the specific categories of information recall and analysis, demonstrating that its combination of auditable infrastructure, adaptive planning, and rubric-guided reasoning leads to higher-quality, more reliable research outcomes.