Governing What You Cannot Observe: Adaptive Runtime...

Governing What You Cannot Observe: Adaptive Runtime Governance for Autonomous AI Agents
Autonomous AI agents are increasingly common in production, but they face a critical vulnerability: they can remain fully authorized while becoming unsafe due to silent behavioral shifts, adversarial manipulation, or changing decision patterns. Because these agents can drift without any code changes, traditional static security measures—which only check if an action is permitted—are insufficient. This paper introduces the Informational Viability Principle, a new framework that shifts governance from reactive, rule-based checks to predictive, adaptive oversight by continuously estimating the "unobserved risk" inherent in an agent's operations.

The Informational Viability Principle

The core of the proposed framework is the distinction between what a system can observe and what it cannot. The authors define a decision rule where an action is allowed only if the agent’s observed capacity, $S(x)$, exceeds the estimated bound on unobserved risk, $\hat{B}(x)$, by a specific safety margin. This unobserved risk is broken down into three distinct components: * Uncertainty ($U$): Risks arising from behavioral drift, where the agent’s actions shift away from its original baseline. * Structural Bias ($SB$): Risks from systematic discrimination or unfair treatment of specific population segments. * Reality Gap ($RG$): Risks that appear only when looking at a sequence of actions, such as "structuring" fraud, where individual steps seem benign but the total pattern is malicious.

The Agent Viability Framework (AVF)

To manage these risks, the authors propose the Agent Viability Framework, which is grounded in mathematical viability theory. This framework relies on three essential properties: continuous monitoring (P1), anticipation of future risks (P2), and monotonic restriction (P3), which ensures that governance can only tighten restrictions, never relax them. These properties are implemented through "RiskGate," a system that uses statistical tools like KL divergence and pattern matching to calculate a scalar Viability Index. This index allows the system to predict when an agent is likely to cross a safety threshold, moving governance from a reactive "kill switch" approach to a predictive, closed-loop autopilot.

Governance as a Predictive Process

By treating the agent’s health as a dynamic state, the framework can predict the "exit time"—the moment an agent is likely to violate safety constraints. This is particularly important because the authors observe that agents in non-stationary environments often hit failure points within a predictable window of operations. By using an "Autopilot" regulation map, the system can proactively manage the agent’s behavior, applying a kill-switch only as a last resort. This approach ensures that the governance system remains functional and effective even as the environment around the AI agent evolves.

Considerations and Future Work

The authors emphasize that this framework is designed to complement, not replace, existing production platforms like Amazon Bedrock AgentCore. While static platforms handle identity-centric authorization, RiskGate provides the adaptive, context-aware monitoring that static engines lack. The current paper provides the theoretical foundation, a reference implementation, and analytical validation against known failure taxonomies. The authors note that while the framework is theoretically sound, quantitative empirical evaluation and formal completeness proofs remain topics for future research.

Governing What You Cannot Observe: Adaptive Runtime... | AI Research

Key Takeaways

The Informational Viability Principle

The Agent Viability Framework (AVF)

Governance as a Predictive Process

Considerations and Future Work

Comments (0)

No comments yet