Back to AI Research

AI Research

MASPO: Joint Prompt Optimization for LLM-based Mult... | AI Research

Key Takeaways

  • MASPO: Joint Prompt Optimization for LLM-based Multi-Agent Systems Multi-agent systems (MAS) use multiple specialized AI agents to solve complex tasks, but t...
  • Large language model (LLM)-based Multi-agent systems (MAS) have shown promise in tackling complex collaborative tasks, where agents are typically orchestrated via role-specific prompts.
  • To address this, we introduce MASPO, a novel framework designed to automatically and iteratively refine prompts across the entire system.
  • A core innovation of MASPO is its joint evaluation mechanism, which assesses prompts not merely by their local validity, but by their capacity to facilitate downstream success for successor agents.
  • This effectively bridges the gap between local interactions and global outcomes without relying on ground-truth labels.
Paper AbstractExpand

Large language model (LLM)-based Multi-agent systems (MAS) have shown promise in tackling complex collaborative tasks, where agents are typically orchestrated via role-specific prompts. While the quality of these prompts is pivotal, jointly optimizing them across interacting agents remains a non-trivial challenge, primarily due to the misalignment between local agent objectives and holistic system goals. To address this, we introduce MASPO, a novel framework designed to automatically and iteratively refine prompts across the entire system. A core innovation of MASPO is its joint evaluation mechanism, which assesses prompts not merely by their local validity, but by their capacity to facilitate downstream success for successor agents. This effectively bridges the gap between local interactions and global outcomes without relying on ground-truth labels. Furthermore, MASPO employs a data-driven evolutionary beam search to efficiently navigate the high-dimensional prompt space. Extensive empirical evaluations across 6 diverse tasks demonstrate that MASPO consistently outperforms state-of-the-art prompt optimization methods, achieving an average accuracy improvement of 2.9. We release our code at this https URL .

MASPO: Joint Prompt Optimization for LLM-based Multi-Agent Systems
Multi-agent systems (MAS) use multiple specialized AI agents to solve complex tasks, but they often struggle because individual agents are optimized in isolation. This leads to a "local-global misalignment," where an agent performs its specific task well but provides output that confuses the next agent in the chain, causing the entire system to fail. MASPO is a new framework designed to solve this by optimizing prompts across the entire system simultaneously, ensuring that every agent’s instructions are tuned to support the success of the whole group.

Bridging the Gap Between Agents

The core innovation of MASPO is its joint evaluation mechanism. Instead of judging an agent’s prompt based only on its immediate output, MASPO looks at how that output affects the rest of the system. It uses three metrics: "Local Validity" (did the agent follow its instructions?), "Lookahead Potential" (did the output help the next agent in the chain?), and "Global Alignment" (did the system reach the correct final answer?). By evaluating prompts based on their contribution to the entire causal chain, the system can identify and fix coordination breakdowns that traditional methods miss.

Learning from Failure

MASPO uses a technique called "Misalignment-Aware Sampling" to improve performance. When the system detects a scenario where an agent succeeds locally but the system fails globally, it saves these "misalignment cases" into a memory buffer. During the optimization process, the system injects these failure examples back into the optimizer. This forces the AI to generate new, improved prompts that specifically address and repair these recurring coordination errors.

Adaptive Optimization

Because agents in a multi-agent system are functionally linked, changing one agent’s prompt can change the input for others, creating a moving target. To handle this, MASPO uses a coordinate ascent-style scheduling protocol that updates agents in a specific topological order. It also includes a "Beam Refresh" mechanism, which periodically discards outdated performance scores. This ensures that the system doesn't rely on stale data and that every agent is constantly adapting to the current, evolving behaviors of its peers.

Proven Performance

In extensive testing across six diverse domains—including complex mathematics, reasoning, and code generation—MASPO consistently outperformed existing state-of-the-art prompt optimization methods. By focusing on the interactions between agents rather than just individual performance, the framework achieved an average accuracy improvement of 2.9 points. These results demonstrate that optimizing the collaborative dynamics of a multi-agent system is a more effective path to solving complex, multi-stage problems than tuning agents one by one.

Comments (0)

No comments yet

Be the first to share your thoughts!