Uno-Orchestra: Parsimonious Agent Routing via Selective Delegation
Large language model (LLM) systems often struggle to balance performance with cost. Typically, these systems either route every query to a single model—which can be wasteful for simple tasks—or use complex, hand-engineered workflows that are difficult to optimize. Uno-Orchestra introduces a unified approach that learns to decide when a task needs to be broken down into smaller pieces and which specific model or tool is best suited to handle each piece. By training a single policy to handle both task decomposition and routing simultaneously, the system achieves higher accuracy while significantly reducing the cost per query.
A Unified Approach to Task Delegation
The core innovation of Uno-Orchestra is the integration of task planning and worker assignment into a single causal language model. Instead of having separate modules for planning and routing, the system emits a plan and assigns a specific (model, primitive) pair to each subtask in one go. A "primitive" here refers to the specific action a worker performs, such as a cognitive operation, a tool call, or a skill invocation. This design eliminates the need for redundant data passing between different modules and allows the system to be "selective"—it only performs complex, multi-step orchestration when the task actually requires it, keeping simple queries fast and inexpensive.
Training Through Real-World Interaction
The system is trained in two distinct stages to ensure it learns effective behaviors. First, it undergoes supervised fine-tuning using a "verifier-gated" curriculum. This involves using teacher-distilled trajectories that have been verified for correctness, ensuring the model learns from high-quality examples. Second, the model is refined using a technique called Agentic-GRPO. This reinforcement learning objective is designed for multi-turn agentic tasks, where the system receives feedback at the end of a process. By providing intermediate rewards and structured credit assignment, the model learns to make better decisions throughout the entire lifecycle of a task, rather than just focusing on the final result.
Significant Gains in Efficiency and Accuracy
When tested across a suite of 13 benchmarks covering math, coding, knowledge retrieval, and tool use, Uno-Orchestra demonstrates a clear advantage over existing methods. It achieved a 77.0% macro pass@1 rate, which is approximately 16% higher than the strongest workflow baseline. Perhaps most notably, it accomplishes this while operating at roughly an order of magnitude lower cost per query. By successfully navigating the trade-off between accuracy and efficiency, Uno-Orchestra proves that a learned, unified policy can outperform rigid, hand-engineered orchestration systems.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!