Position: A Three-Layer Probabilistic Assume-Guarantee Architecture Is Structurally Required for Safe LLM Agent Deployment
This paper argues that the current approach to securing LLM agents—relying on a single guardrail or filter—is fundamentally flawed. The authors contend that because LLM agents perform multi-step reasoning and interact with dynamic environments, safety cannot be guaranteed by a single layer. Instead, they propose a three-layer, contract-based architecture where each layer is independently certified to handle a specific dimension of safety, with each layer’s output serving as the necessary assumption for the next.
The Structural Necessity of Three Layers
The authors identify three distinct dimensions of safe operation: semantic intent (what the user wants), environmental validity (where the agent is operating), and dynamical feasibility (how the agent physically moves or acts). These dimensions rely on information that becomes available at different stages of execution. Because the information required to verify these dimensions is not available all at once, a single-layer system will always leave at least one dimension uncertified. The authors argue that this is a structural limitation of agent execution, not a temporary issue that can be solved by better prompt engineering or model tuning.
How the Architecture Works
The proposed framework uses a "probabilistic assume-guarantee" (A/G) model to link three specific layers:
User Assurance Layer: Operates before any world observation to verify that the agent’s plan aligns with user intent, ethical policies, and regulatory rules.
Operational Assurance Layer: Operates after assessing the world state to ensure the agent is within its "Operational Design Domain" (ODD). This confirms that the environment is suitable for the planned actions.
Functional Assurance Layer: Operates during the actual execution of the plan, using real-time sensor data and control-loop monitoring to ensure the agent’s physical actions remain safe.
These layers are connected in a chain: the guarantee provided by the user layer becomes the assumption for the operational layer, and so on. This allows the system to calculate a total safety probability using the chain rule of probability, providing a modular way to certify the entire system.
Practical Implications and Challenges
The paper highlights that current single-layer defenses are insufficient, noting that popular agents often fail in physical and environmental contexts even when they pass semantic content checks. By moving to a multi-layer, information-driven design, the authors aim to create a more robust standard for safety-critical deployments. However, they identify three significant open problems that must be solved to make this architecture a standard:
Bound Estimation: Developing ways to estimate safety bounds from non-i.i.d. (non-independent and identically distributed) data traces.
Deployment Drift: Ensuring that safety contracts remain valid even as environments change over time.
Multi-Agent Settings: Extending this framework to handle complex scenarios where multiple agents interact, which the authors describe as the most important unfinished business in the field.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!