MASPO: Joint Prompt Optimization for LLM-based Multi-Agent Systems
Multi-agent systems (MAS) use multiple specialized AI agents to solve complex tasks, but they often struggle because individual agents are optimized in isolation. This leads to a "local-global misalignment," where an agent performs its specific task well but provides output that confuses the next agent in the chain, causing the entire system to fail. MASPO is a new framework designed to solve this by optimizing prompts across the entire system simultaneously, ensuring that every agent’s instructions are tuned to support the success of the whole group.
Bridging the Gap Between Agents
The core innovation of MASPO is its joint evaluation mechanism. Instead of judging an agent’s prompt based only on its immediate output, MASPO looks at how that output affects the rest of the system. It uses three metrics: "Local Validity" (did the agent follow its instructions?), "Lookahead Potential" (did the output help the next agent in the chain?), and "Global Alignment" (did the system reach the correct final answer?). By evaluating prompts based on their contribution to the entire causal chain, the system can identify and fix coordination breakdowns that traditional methods miss.
Learning from Failure
MASPO uses a technique called "Misalignment-Aware Sampling" to improve performance. When the system detects a scenario where an agent succeeds locally but the system fails globally, it saves these "misalignment cases" into a memory buffer. During the optimization process, the system injects these failure examples back into the optimizer. This forces the AI to generate new, improved prompts that specifically address and repair these recurring coordination errors.
Adaptive Optimization
Because agents in a multi-agent system are functionally linked, changing one agent’s prompt can change the input for others, creating a moving target. To handle this, MASPO uses a coordinate ascent-style scheduling protocol that updates agents in a specific topological order. It also includes a "Beam Refresh" mechanism, which periodically discards outdated performance scores. This ensures that the system doesn't rely on stale data and that every agent is constantly adapting to the current, evolving behaviors of its peers.
Proven Performance
In extensive testing across six diverse domains—including complex mathematics, reasoning, and code generation—MASPO consistently outperformed existing state-of-the-art prompt optimization methods. By focusing on the interactions between agents rather than just individual performance, the framework achieved an average accuracy improvement of 2.9 points. These results demonstrate that optimizing the collaborative dynamics of a multi-agent system is a more effective path to solving complex, multi-stage problems than tuning agents one by one.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!