Prediction and Empowerment: A Theory of Agency thro...

Prediction and Empowerment: A Theory of Agency thro... | AI Research

Key Takeaways

Prediction and Empowerment: A Theory of Agency through Bridge Interfaces introduces a new framework for understanding how AI agents interact with the world.
We study agency under partial observability in deterministic physical or simulated worlds, where apparent randomness arises from uncertainty over initial conditions, fixed law bits, and unrolled exogenous noise.
Within this framework, we prove a separation between prediction, compression, and empowerment.
Perfect prediction can be achieved either by identifying the hidden quotient relevant to the target family or by overwrite control that makes the future target action-determined; high empowerment alone is insufficient.
Human--AI alignment is partly an interface-design problem, where the relevant bridge is between human intent, agent internal state, external tools, and world-side channel conditions.

Paper AbstractExpand

We study agency under partial observability in deterministic physical or simulated worlds, where apparent randomness arises from uncertainty over initial conditions, fixed law bits, and unrolled exogenous noise. We model sensing and actuation as bridge interfaces split between agent-controlled parameters and environment-controlled channel state, inducing a deterministic POMDP through a prior over latent microstates and many-to-one observation coarsening. Within this framework, we prove a separation between prediction, compression, and empowerment. Perfect prediction can be achieved either by identifying the hidden quotient relevant to the target family or by overwrite control that makes the future target action-determined; high empowerment alone is insufficient. Under refinable interfaces and sufficient memory, action-conditioned observation-compression progress reduces posterior uncertainty about the latent quotient, and when refinement requires steering world-side channel conditions, this creates target-conditioned interface empowerment. A bit-string specialization with a conserved information budget makes the resulting tradeoff explicit: prediction by identification requires internal capacity at least the relevant latent entropy, whereas overwrite control requires terminal action capacity over the controlled quotient. For modern AI agents, the results suggest a design principle rather than a theorem of inevitability: objectives should distinguish hidden-state identification, interface refinement, task-relevant controllability, and mere overwrite or distractor control. Human--AI alignment is partly an interface-design problem, where the relevant bridge is between human intent, agent internal state, external tools, and world-side channel conditions. This is a working draft: feedback and criticism is most welcome.

Prediction and Empowerment: A Theory of Agency through Bridge Interfaces introduces a new framework for understanding how AI agents interact with the world. The paper argues that standard models of agency often leave the "bridge"—the interface between an agent and its environment—too vague. By explicitly modeling how agents sense and act through these interfaces, the author provides a way to measure and diagnose why agents sometimes fail to learn or control the right things, even when they appear to be performing well.

Understanding the Bridge Interface

The paper models agency as a "bridge" consisting of two sides: the agent-owned settings (like prompts, queries, or actuator modes) and the environment-owned channel conditions (like authorization, sensor occlusion, or physical geometry). Because these interfaces are often limited, an agent’s ability to predict the future or control the world is constrained by the quality of this bridge. The author defines a "bridge gap" to quantify these limitations, showing that perfect prediction and high empowerment are not the same thing. An agent might be highly empowered to move a distractor object while remaining completely ignorant of the actual task-relevant information.

The Problem with Surrogate Objectives

A central contribution of this work is the "tight uniform regret-transfer theorem." It explains why common AI training objectives—such as maximizing information gain or empowerment—often fail to align with the actual goals of a task. The paper proves that if the "bridge gap" is large, an agent might optimize for a surrogate objective (like predicting a display screen) while failing to identify the underlying hidden state (like the actual object being displayed). The author demonstrates that these failures are not just theoretical; they are measurable deficits in bits of information that can be tracked during training.

Bridge-Gap Pursuit

To address these failures, the paper proposes an algorithm called Bridge-Gap Pursuit (BGP). Unlike standard reinforcement learning, which might reward an agent simply for "doing something," BGP uses a "bridge potential" to reward the agent for closing the gap between its current knowledge and the information required to solve the task. It specifically penalizes the agent for spending its limited control budget on irrelevant distractors or "overwrite" strategies—where an agent forces a predictable outcome rather than learning the true state of the environment.

Implications for AI Alignment

The author suggests that human-AI alignment is, in part, an interface-design problem. If an agent’s internal state, tools, and communication channels are not properly aligned with human intent, the agent may find "shortcuts" that satisfy its objective function without actually understanding the task. By distinguishing between hidden-state identification, interface refinement, and task-relevant controllability, the paper provides a design principle for building more robust agents that prioritize the information and control necessary for genuine goal achievement.