Back to AI Research

AI Research

Governing What You Cannot Observe: Adaptive Runtime... | AI Research

Key Takeaways

  • Governing What You Cannot Observe: Adaptive Runtime Governance for Autonomous AI Agents Autonomous AI agents are increasingly common in production, but they...
  • Autonomous AI agents can remain fully authorized and still become unsafe as behavior drifts, adversaries adapt, and decision patterns shift without any code change.
  • Contributions are the theoretical framework, the reference implementation, and analytical coverage against published agent-failure taxonomies; quantitative empirical evaluation is scoped as follow-up work.
  • Governing What You Cannot Observe: Adaptive Runtime Governance for Autonomous AI Agents
  • Because these agents can drift without any code changes, traditional static security measures—which only check if an action is permitted—are insufficient.
Paper AbstractExpand

Autonomous AI agents can remain fully authorized and still become unsafe as behavior drifts, adversaries adapt, and decision patterns shift without any code change. We propose the \textbf{Informational Viability Principle}: governing an agent reduces to estimating a bound on unobserved risk $\hat{B}(x) = U(x) + SB(x) + RG(x)$ and allowing an action only when its capacity $S(x)$ exceeds $\hat{B}(x)$ by a safety margin. The \textbf{Agent Viability Framework}, grounded in Aubin's viability theory, establishes three properties -- monitoring (P1), anticipation (P2), and monotonic restriction (P3) -- as individually necessary and collectively sufficient for documented failure modes. \textbf{RiskGate} instantiates the framework with dedicated statistical estimators (KL divergence, segment-vs-rest $z$-tests, sequential pattern matching), a fail-secure monotonic pipeline, and a closed-loop Autopilot formalised as an instance of Aubin's regulation map with kill-switch-as-last-resort; a scalar Viability Index $VI(t) \in [-1,+1]$ with first-order $t^*$ prediction transforms governance from reactive to predictive. Contributions are the theoretical framework, the reference implementation, and analytical coverage against published agent-failure taxonomies; quantitative empirical evaluation is scoped as follow-up work.

Governing What You Cannot Observe: Adaptive Runtime Governance for Autonomous AI Agents
Autonomous AI agents are increasingly common in production, but they face a critical vulnerability: they can remain fully authorized while becoming unsafe due to silent behavioral shifts, adversarial manipulation, or changing decision patterns. Because these agents can drift without any code changes, traditional static security measures—which only check if an action is permitted—are insufficient. This paper introduces the Informational Viability Principle, a new framework that shifts governance from reactive, rule-based checks to predictive, adaptive oversight by continuously estimating the "unobserved risk" inherent in an agent's operations.

The Informational Viability Principle

The core of the proposed framework is the distinction between what a system can observe and what it cannot. The authors define a decision rule where an action is allowed only if the agent’s observed capacity, $S(x)$, exceeds the estimated bound on unobserved risk, $\hat{B}(x)$, by a specific safety margin. This unobserved risk is broken down into three distinct components: * Uncertainty ($U$): Risks arising from behavioral drift, where the agent’s actions shift away from its original baseline. * Structural Bias ($SB$): Risks from systematic discrimination or unfair treatment of specific population segments. * Reality Gap ($RG$): Risks that appear only when looking at a sequence of actions, such as "structuring" fraud, where individual steps seem benign but the total pattern is malicious.

The Agent Viability Framework (AVF)

To manage these risks, the authors propose the Agent Viability Framework, which is grounded in mathematical viability theory. This framework relies on three essential properties: continuous monitoring (P1), anticipation of future risks (P2), and monotonic restriction (P3), which ensures that governance can only tighten restrictions, never relax them. These properties are implemented through "RiskGate," a system that uses statistical tools like KL divergence and pattern matching to calculate a scalar Viability Index. This index allows the system to predict when an agent is likely to cross a safety threshold, moving governance from a reactive "kill switch" approach to a predictive, closed-loop autopilot.

Governance as a Predictive Process

By treating the agent’s health as a dynamic state, the framework can predict the "exit time"—the moment an agent is likely to violate safety constraints. This is particularly important because the authors observe that agents in non-stationary environments often hit failure points within a predictable window of operations. By using an "Autopilot" regulation map, the system can proactively manage the agent’s behavior, applying a kill-switch only as a last resort. This approach ensures that the governance system remains functional and effective even as the environment around the AI agent evolves.

Considerations and Future Work

The authors emphasize that this framework is designed to complement, not replace, existing production platforms like Amazon Bedrock AgentCore. While static platforms handle identity-centric authorization, RiskGate provides the adaptive, context-aware monitoring that static engines lack. The current paper provides the theoretical foundation, a reference implementation, and analytical validation against known failure taxonomies. The authors note that while the framework is theoretically sound, quantitative empirical evaluation and formal completeness proofs remain topics for future research.

Comments (0)

No comments yet

Be the first to share your thoughts!