Locally Coherent, Globally Incoherent: Bounding Compositional Incoherence in Multi-Component LLM Agents
This paper investigates a fundamental reliability issue in AI systems that use multiple specialized LLMs to solve complex problems. Even when each individual component is well-calibrated and logically consistent on its own, the combined output of these agents can violate basic probability rules. This "locally coherent, globally incoherent" failure occurs because individual components often lack visibility into the constraints governing the entire system, leading to aggregated beliefs that are mathematically impossible. The research introduces a formal method to detect these errors at runtime and provides a way to repair them.
Measuring Incoherence
The author introduces the "compositional residual," denoted as $\varepsilon^{\star}$, which serves as a mathematical certificate of system-level failure. It measures the distance between the agent's combined output and the "joint coherent polytope"—the set of all possible outcomes that satisfy the system's logical constraints. By calculating this distance, developers can identify when an agent's combined claims are logically invalid. The paper demonstrates that this residual is computable at runtime using only the system's output and the known cross-component constraints.
Repairing Agent Logic
To fix these errors, the paper proposes a hierarchical projection method based on the Boyle–Dykstra algorithm. This technique acts as a deterministic repair mechanism, adjusting the agent's combined output so that it aligns with the necessary logical constraints without discarding the individual contributions of the specialist models. The research also provides a sequential monitoring tool, known as an e-process, which allows for continuous, anytime-valid testing of the system's coherence.
Empirical Findings
The study evaluated 1,876 ensemble cliques across four different LLMs and found that compositional incoherence is a widespread problem, appearing in 33% to 94% of cases depending on the complexity of the logical relations. This incoherence translates into measurable financial risk, or "regret," when the agent's outputs are used for betting or forecasting. Interestingly, the research found that common, intuitive mitigation strategies—such as using retrieval, partition-aware prompting, or an aggregator-LLM—often fail to solve the problem and can sometimes even make the system's performance worse.
Understanding the Limits
The paper establishes a "product-structure dichotomy," which explains exactly when local coherence is sufficient to guarantee global coherence. It shows that if the system's constraints are simple enough to be treated as independent parts, local coherence works. However, in any system with tighter coupling—where components must agree on shared information—local coherence is insufficient. The research concludes that while these failures are predictable and measurable, they are structural in nature, meaning they cannot be solved by simply prompting individual models to be more accurate; they require formal, system-wide mathematical corrections.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!