Back to AI Research

AI Research

Uno-Orchestra: Parsimonious Agent Routing via Selec... | AI Research

Key Takeaways

  • Uno-Orchestra: Parsimonious Agent Routing via Selective Delegation Large language model (LLM) systems often struggle to balance performance with cost.
  • Uno-Orchestra: Parsimonious Agent Routing via Selective Delegation
  • Large language model (LLM) systems often struggle to balance performance with cost.
  • Typically, these systems either route every query to a single model—which can be wasteful for simple tasks—or use complex, hand-engineered workflows that are difficult to optimize.
  • Uno-Orchestra introduces a unified approach that learns to decide when a task needs to be broken down into smaller pieces and which specific model or tool is best suited to handle each piece.
Paper AbstractExpand

Large language model (LLM) multi-agent systems typically rely on rigid orchestration, committing either to flat per-query routing or to hand-engineered task decomposition, so decomposition depth, worker choice, and inference budget are not jointly optimized under one objective. We introduce Uno-Orchestra, a unified orchestration policy that selectively decomposes a task and dispatches each subtask to an admissible (model, primitive) pair, with both decisions learned together from curated RL trajectories grounded in real worker interactions. Against 22 baselines on a 13-benchmark suite spanning math, code, knowledge, long-context, and agentic tool-use, Uno-Orchestra reaches 77.0% macro pass@1, roughly 16% above the strongest workflow baseline, at roughly an order of magnitude lower per-query cost, advancing the accuracy-efficiency frontier of selective delegation.

Uno-Orchestra: Parsimonious Agent Routing via Selective Delegation
Large language model (LLM) systems often struggle to balance performance with cost. Typically, these systems either route every query to a single model—which can be wasteful for simple tasks—or use complex, hand-engineered workflows that are difficult to optimize. Uno-Orchestra introduces a unified approach that learns to decide when a task needs to be broken down into smaller pieces and which specific model or tool is best suited to handle each piece. By training a single policy to handle both task decomposition and routing simultaneously, the system achieves higher accuracy while significantly reducing the cost per query.

A Unified Approach to Task Delegation

The core innovation of Uno-Orchestra is the integration of task planning and worker assignment into a single causal language model. Instead of having separate modules for planning and routing, the system emits a plan and assigns a specific (model, primitive) pair to each subtask in one go. A "primitive" here refers to the specific action a worker performs, such as a cognitive operation, a tool call, or a skill invocation. This design eliminates the need for redundant data passing between different modules and allows the system to be "selective"—it only performs complex, multi-step orchestration when the task actually requires it, keeping simple queries fast and inexpensive.

Training Through Real-World Interaction

The system is trained in two distinct stages to ensure it learns effective behaviors. First, it undergoes supervised fine-tuning using a "verifier-gated" curriculum. This involves using teacher-distilled trajectories that have been verified for correctness, ensuring the model learns from high-quality examples. Second, the model is refined using a technique called Agentic-GRPO. This reinforcement learning objective is designed for multi-turn agentic tasks, where the system receives feedback at the end of a process. By providing intermediate rewards and structured credit assignment, the model learns to make better decisions throughout the entire lifecycle of a task, rather than just focusing on the final result.

Significant Gains in Efficiency and Accuracy

When tested across a suite of 13 benchmarks covering math, coding, knowledge retrieval, and tool use, Uno-Orchestra demonstrates a clear advantage over existing methods. It achieved a 77.0% macro pass@1 rate, which is approximately 16% higher than the strongest workflow baseline. Perhaps most notably, it accomplishes this while operating at roughly an order of magnitude lower cost per query. By successfully navigating the trade-off between accuracy and efficiency, Uno-Orchestra proves that a learned, unified policy can outperform rigid, hand-engineered orchestration systems.

Comments (0)

No comments yet

Be the first to share your thoughts!