Against Proxy Optimization

Against Proxy Optimization
This paper investigates the conditions under which relying on simplified metrics—or "proxies"—to make decisions leads to harmful outcomes. While we often use proxies like test scores for education or GDP for welfare because true values are difficult to measure, this practice can lead to "proxy failure," where the pursuit of the proxy eventually degrades the very values we intended to improve. The author evaluates a formal model of this phenomenon to determine when and why optimizing for a proxy becomes counterproductive.

The Mechanics of Proxy Failure

The paper utilizes a model where the world is represented by various features, such as human well-being or global resources. A "true utility function" captures the complex, overall value of a state, while a "proxy utility function" focuses only on a subset of these features. The model assumes that resources are finite, meaning we cannot maximize every feature simultaneously. The core finding is that as an agent maximizes a proxy, it will inevitably drive all "unmentioned" features—those not tracked by the proxy—to their lowest possible levels. If these unmentioned features are important to our true goals, the resulting state will be highly undesirable, even if the proxy metric itself looks successful.

Evaluating Theoretical Conditions

The author examines a condition called "Compactness," which was previously proposed to explain why proxy failure occurs. Compactness suggests that as we pursue higher utility, we eventually hit a boundary where we can no longer improve without sacrificing something else. However, the author argues that this condition is problematic. It is both too weak, as it fails to guarantee that proxy optimization will actually decrease true utility in all cases, and too strong, as it rules out many reasonable utility functions that are bounded below. Because of these flaws, the author concludes that Compactness does not fully capture the nature of proxy failure.

A New Perspective: Minimal Balance

To better understand when proxy failure happens, the author proposes an alternative condition called "Minimal Balance." This condition states that in any state where true utility is maximized, there must be some unmentioned features that are not at their minimum possible levels. If this condition holds, it proves that maximizing a proxy will never lead to the best possible outcome for true utility. This provides a more robust logical foundation for why proxy optimization is inherently risky: it forces a trade-off that ignores the necessary balance required to maintain the things we truly care about.

Practical Implications

The research highlights that proxy failure is a structural problem rather than just a technical error. Because proxies are by definition incomplete, they naturally encourage the neglect of unmentioned features. While the author notes that proxies remain a pragmatic necessity when true values are too complex to measure, the findings serve as a warning. To avoid the "genie in the lamp" scenario—where we get exactly what we asked for but not what we wanted—decision-makers must be aware that long-term optimization of a simplified metric will eventually come at the expense of the broader, unmeasured values that define success.

Against Proxy Optimization | AI Research

Key Takeaways

The Mechanics of Proxy Failure

Evaluating Theoretical Conditions

A New Perspective: Minimal Balance

Practical Implications

Comments (0)

No comments yet