Back to AI Research

AI Research

Position: A Three-Layer Probabilistic Assume-Guaran... | AI Research

Key Takeaways

  • Position: A Three-Layer Probabilistic Assume-Guarantee Architecture Is Structurally Required for Safe LLM Agent Deployment This paper argues that the current...
  • No single guardrail can certify all three.
  • We sketch such an architecture and derive the compositional system-level safety bounds it admits via the chain rule of probability.
  • Position: A Three-Layer Probabilistic Assume-Guarantee Architecture Is Structurally Required for Safe LLM Agent Deployment
  • This paper argues that the current approach to securing LLM agents—relying on a single guardrail or filter—is fundamentally flawed.
Paper AbstractExpand

This position paper argues that enforcing LLM agent safety within a single abstraction layer is not merely suboptimal but categorically insufficient for deployed LLM agents -- a structural consequence of how agent execution works, not a contingent limitation of current systems. The three dimensions that jointly constitute safe operation -- semantic intent and policy compliance, environmental validity, and dynamical feasibility -- each depend on a strictly distinct set of information that becomes available at different stages of execution. No single guardrail can certify all three. We argue that the community must respond with a contract-based architecture in which each safety dimension is enforced by an independently certified layer whose probabilistic guarantee satisfies the next layer's assumption. We sketch such an architecture and derive the compositional system-level safety bounds it admits via the chain rule of probability. Three open problems stand between this and a deployable standard: bound estimation from non-i.i.d.\ traces, graceful degradation of contracts under deployment drift, and extension to multi-agent settings -- the most important unfinished business in LLM agent runtime assurance.

Position: A Three-Layer Probabilistic Assume-Guarantee Architecture Is Structurally Required for Safe LLM Agent Deployment
This paper argues that the current approach to securing LLM agents—relying on a single guardrail or filter—is fundamentally flawed. The authors contend that because LLM agents perform multi-step reasoning and interact with dynamic environments, safety cannot be guaranteed by a single layer. Instead, they propose a three-layer, contract-based architecture where each layer is independently certified to handle a specific dimension of safety, with each layer’s output serving as the necessary assumption for the next.

The Structural Necessity of Three Layers

The authors identify three distinct dimensions of safe operation: semantic intent (what the user wants), environmental validity (where the agent is operating), and dynamical feasibility (how the agent physically moves or acts). These dimensions rely on information that becomes available at different stages of execution. Because the information required to verify these dimensions is not available all at once, a single-layer system will always leave at least one dimension uncertified. The authors argue that this is a structural limitation of agent execution, not a temporary issue that can be solved by better prompt engineering or model tuning.

How the Architecture Works

The proposed framework uses a "probabilistic assume-guarantee" (A/G) model to link three specific layers:

  • User Assurance Layer: Operates before any world observation to verify that the agent’s plan aligns with user intent, ethical policies, and regulatory rules.

  • Operational Assurance Layer: Operates after assessing the world state to ensure the agent is within its "Operational Design Domain" (ODD). This confirms that the environment is suitable for the planned actions.

  • Functional Assurance Layer: Operates during the actual execution of the plan, using real-time sensor data and control-loop monitoring to ensure the agent’s physical actions remain safe.
    These layers are connected in a chain: the guarantee provided by the user layer becomes the assumption for the operational layer, and so on. This allows the system to calculate a total safety probability using the chain rule of probability, providing a modular way to certify the entire system.

Practical Implications and Challenges

The paper highlights that current single-layer defenses are insufficient, noting that popular agents often fail in physical and environmental contexts even when they pass semantic content checks. By moving to a multi-layer, information-driven design, the authors aim to create a more robust standard for safety-critical deployments. However, they identify three significant open problems that must be solved to make this architecture a standard:

  • Bound Estimation: Developing ways to estimate safety bounds from non-i.i.d. (non-independent and identically distributed) data traces.

  • Deployment Drift: Ensuring that safety contracts remain valid even as environments change over time.

  • Multi-Agent Settings: Extending this framework to handle complex scenarios where multiple agents interact, which the authors describe as the most important unfinished business in the field.

Comments (0)

No comments yet

Be the first to share your thoughts!