Back to AI Research

AI Research

Learning to Communicate: Toward End-to-End Optimiza... | AI Research

Key Takeaways

  • Learning to Communicate: Toward End-to-End Optimization of Multi-Agent Language Systems Multi-agent systems—where multiple AI agents collaborate to solve com...
  • Therefore we propose DiffMAS, a training framework that treats latent communication as a learnable component of multi-agent systems.
  • DiffMAS performs parameter-efficient supervised training over multi-agent latent trajectories, enabling agents to jointly learn how information should be encoded and interpreted across interactions.
  • Learning to Communicate: Toward End-to-End Optimization of Multi-Agent Language Systems
  • Multi-agent systems—where multiple AI agents collaborate to solve complex problems—have become a standard way to improve reasoning in large language models.
Paper AbstractExpand

Multi-agent systems built on large language models have shown strong performance on complex reasoning tasks, yet most work focuses on agent roles and orchestration while treating inter-agent communication as a fixed interface. Latent communication through internal representations such as key-value caches offers a promising alternative to text-based protocols, but existing approaches do not jointly optimize communication with multi-agent reasoning. Therefore we propose DiffMAS, a training framework that treats latent communication as a learnable component of multi-agent systems. DiffMAS performs parameter-efficient supervised training over multi-agent latent trajectories, enabling agents to jointly learn how information should be encoded and interpreted across interactions. Experiments on mathematical reasoning, scientific QA, code generation, and commonsense benchmarks show that DiffMAS consistently improves reasoning accuracy and decoding stability over single-agent inference, text-based multi-agent systems, and prior latent communication methods, achieving 26.7% on AIME24, 20.2% on GPQA-Diamond, and consistent gains across reasoning benchmarks.

Learning to Communicate: Toward End-to-End Optimization of Multi-Agent Language Systems
Multi-agent systems—where multiple AI agents collaborate to solve complex problems—have become a standard way to improve reasoning in large language models. Typically, these agents communicate by writing text to one another, which acts as a fixed interface. This paper introduces DiffMAS, a new framework that replaces this text-based communication with a "latent" (internal) channel. By allowing agents to share their raw internal memory, known as Key-Value (KV) caches, DiffMAS enables the entire multi-agent system to be trained as a single, unified, and differentiable model.

Moving Beyond Textual Communication

In most multi-agent systems, agents must translate their internal reasoning into human-readable text to pass information to the next agent. This process creates a bottleneck: the system cannot easily optimize how information is shared because the "message" is forced into a discrete, textual format. DiffMAS removes this barrier by using the model’s internal KV cache—the continuous mathematical representation of the model's "thought process"—as the communication medium. Because this process is continuous and differentiable, the system can use gradient-based learning to optimize how agents encode and interpret information across the entire chain of interaction.

How DiffMAS Works

The DiffMAS framework operates in two distinct stages. In the first stage, a series of agents work sequentially, each building upon a shared, growing "latent trace" of KV states. Instead of overwriting previous information, each agent appends its own contribution to this trace, creating a cumulative record of the reasoning process. In the second stage, the final agent uses this accumulated trace to perform the final reasoning and generate an answer. By applying supervised fine-tuning to this process, the model learns to refine its communication strategy, effectively teaching the agents how to better "talk" to each other through their internal memory.

Performance and Results

The researchers tested DiffMAS across a variety of challenging tasks, including advanced mathematics (AIME24/25), scientific reasoning (GPQA-Diamond), code generation (HumanEval+), and commonsense benchmarks. The results show that DiffMAS consistently outperforms traditional text-based multi-agent systems and other latent communication methods. Notably, the framework achieved significant accuracy gains, such as a 26.7% improvement on the AIME24 math benchmark and a 20.2% boost on GPQA-Diamond for the Qwen3-8B model. These gains were observed across various model sizes, demonstrating that the approach is effective for both mid-scale and larger language models.

Key Takeaways

The primary advantage of DiffMAS is its ability to avoid the "gradient attenuation" found in systems that rely on fixed, overwriting interfaces. In traditional setups, the signal used to train the system often fades as it passes through multiple agents. Because DiffMAS uses a concatenative approach—where all intermediate reasoning remains accessible—the training signal remains strong throughout the entire pipeline. While this method requires more memory as the latent trace grows, it provides a more robust way to optimize multi-agent collaboration, moving the field closer to truly end-to-end, learnable reasoning systems.

Comments (0)

No comments yet

Be the first to share your thoughts!