Back to AI Research

AI Research

Locally Coherent, Globally Incoherent: Bounding Com... | AI Research

Key Takeaways

  • Locally Coherent, Globally Incoherent: Bounding Compositional Incoherence in Multi-Component LLM Agents This paper investigates a fundamental reliability iss...
  • Multi-component LLM agents assemble probabilistic claims from components that each see only part of a joint problem; the composition can violate basic probability axioms even when every component is locally coherent.
  • A product-structure dichotomy characterises when local coherence suffices, and a Rayleigh-quotient prediction matches the observed residual within 7% on three of four relation classes.
  • A hierarchical Boyle-Dykstra projection repairs the composition deterministically; an anytime-valid e-process gives sequential coherence monitoring.
  • Three intuitive LLM-side mitigations(retrieval, partition-aware prompting, aggregator-LLM) each fail or regress.
Paper AbstractExpand

Multi-component LLM agents assemble probabilistic claims from components that each see only part of a joint problem; the composition can violate basic probability axioms even when every component is locally coherent. We formalise this locally coherent, globally incoherent failure via the compositional residual eps*, the L2 distance from the composed quote to the joint coherent polytope, computable at runtime from system output and the declared cross-component coupling constraints. A product-structure dichotomy characterises when local coherence suffices, and a Rayleigh-quotient prediction matches the observed residual within 7% on three of four relation classes. A hierarchical Boyle-Dykstra projection repairs the composition deterministically; an anytime-valid e-process gives sequential coherence monitoring. Across 1,876 ensemble cliques on a four-LLM mid-tier panel (frontier-panel rerun in Section 5.5), eps* > 0 on 33-94% of cliques, translating to +0.115 nats per bet of regret on 1,770 resolved bets under the proportional allocation rule (the gain collapses to +0.006 under bettors that themselves coherentise). Three intuitive LLM-side mitigations(retrieval, partition-aware prompting, aggregator-LLM) each fail or regress.

Locally Coherent, Globally Incoherent: Bounding Compositional Incoherence in Multi-Component LLM Agents
This paper investigates a fundamental reliability issue in AI systems that use multiple specialized LLMs to solve complex problems. Even when each individual component is well-calibrated and logically consistent on its own, the combined output of these agents can violate basic probability rules. This "locally coherent, globally incoherent" failure occurs because individual components often lack visibility into the constraints governing the entire system, leading to aggregated beliefs that are mathematically impossible. The research introduces a formal method to detect these errors at runtime and provides a way to repair them.

Measuring Incoherence

The author introduces the "compositional residual," denoted as $\varepsilon^{\star}$, which serves as a mathematical certificate of system-level failure. It measures the distance between the agent's combined output and the "joint coherent polytope"—the set of all possible outcomes that satisfy the system's logical constraints. By calculating this distance, developers can identify when an agent's combined claims are logically invalid. The paper demonstrates that this residual is computable at runtime using only the system's output and the known cross-component constraints.

Repairing Agent Logic

To fix these errors, the paper proposes a hierarchical projection method based on the Boyle–Dykstra algorithm. This technique acts as a deterministic repair mechanism, adjusting the agent's combined output so that it aligns with the necessary logical constraints without discarding the individual contributions of the specialist models. The research also provides a sequential monitoring tool, known as an e-process, which allows for continuous, anytime-valid testing of the system's coherence.

Empirical Findings

The study evaluated 1,876 ensemble cliques across four different LLMs and found that compositional incoherence is a widespread problem, appearing in 33% to 94% of cases depending on the complexity of the logical relations. This incoherence translates into measurable financial risk, or "regret," when the agent's outputs are used for betting or forecasting. Interestingly, the research found that common, intuitive mitigation strategies—such as using retrieval, partition-aware prompting, or an aggregator-LLM—often fail to solve the problem and can sometimes even make the system's performance worse.

Understanding the Limits

The paper establishes a "product-structure dichotomy," which explains exactly when local coherence is sufficient to guarantee global coherence. It shows that if the system's constraints are simple enough to be treated as independent parts, local coherence works. However, in any system with tighter coupling—where components must agree on shared information—local coherence is insufficient. The research concludes that while these failures are predictable and measurable, they are structural in nature, meaning they cannot be solved by simply prompting individual models to be more accurate; they require formal, system-wide mathematical corrections.

Comments (0)

No comments yet

Be the first to share your thoughts!