Back to AI Research

AI Research

Off-Policy Evaluation with Strategic Agents via Loc... | AI Research

Key Takeaways

  • Off-Policy Evaluation with Strategic Agents via Local Disclosure addresses a fundamental challenge in data-driven decision-making: how to accurately evaluate...
  • We study off-policy evaluation (OPE) under strategic behavior where decision subjects (or agents) respond to a decision maker's policy by strategically modifying their covariates.
  • Such behavior induces a policy-dependent covariate shift, breaking the standard assumption in existing methods that covariates are exogenous to the policy.
  • Related work addresses this challenge by imposing strong assumptions such as repeated interactions or full knowledge of agents' response behavior, substantially limiting its applicability to OPE.
  • In contrast, we consider a one-shot OPE setting where the decision maker has only partial knowledge of the agents' response behavior.
Paper AbstractExpand

We study off-policy evaluation (OPE) under strategic behavior where decision subjects (or agents) respond to a decision maker's policy by strategically modifying their covariates. Such behavior induces a policy-dependent covariate shift, breaking the standard assumption in existing methods that covariates are exogenous to the policy. Related work addresses this challenge by imposing strong assumptions such as repeated interactions or full knowledge of agents' response behavior, substantially limiting its applicability to OPE. In contrast, we consider a one-shot OPE setting where the decision maker has only partial knowledge of the agents' response behavior. Our key insight is that disclosing local information through post-hoc explanations reveals agents' pre-strategic covariates prior to adaptation, mitigating the information loss induced by strategic behavior. Leveraging this structure, we estimate a statistical model for the agents' responses and construct a doubly robust estimator for policy value. By assuming that the agents' cost sensitivity follows a conditional log-normal distribution, we establish consistency of the proposed estimator and validate our approach empirically. More broadly, our results highlight how interaction design can mitigate information asymmetry by revealing otherwise hidden structure in agents' strategic responses.

Off-Policy Evaluation with Strategic Agents via Local Disclosure addresses a fundamental challenge in data-driven decision-making: how to accurately evaluate a new policy when the people affected by it change their behavior in response. In fields like lending or education, individuals often modify their observable characteristics—such as credit scores or test results—to secure better outcomes. This creates a "covariate shift," where the population the policy acts upon changes, leading to biased performance estimates if the decision-maker assumes the population remains static. This paper introduces a framework to account for these strategic responses using a one-shot evaluation approach, even when the decision-maker has only partial knowledge of how agents make their choices.

The Problem of Strategic Adaptation

When a decision-maker implements a policy, they often observe individuals only after they have already adapted their behavior to meet the policy's criteria. This creates an information asymmetry: the decision-maker sees the "post-strategic" data but lacks access to the "pre-strategic" baseline. Because different individuals with different underlying characteristics might adapt to the same final state, it becomes impossible to distinguish between someone who naturally met the criteria and someone who strategically manipulated their data to get there. Without this distinction, the decision-maker cannot accurately predict how a new, different policy would affect the population.

Using Local Information Disclosure

The authors propose a solution called Local Information Disclosure (LID). Instead of making policy information public globally, the decision-maker provides personalized feedback—specifically, action recommendation-based explanations (ARexes)—to individuals who do not initially qualify for a positive outcome. By providing these specific recommendations, the decision-maker can observe the agent's original, pre-strategic covariates before they decide whether to adapt. This interaction design acts as a bridge, revealing the hidden structure of how agents respond to incentives. By observing these pre-strategic states, the decision-maker can better model the agents' behavior and anticipate how they will react to future policy changes.

Estimating Policy Value

To turn these insights into a reliable evaluation, the authors develop a statistical model for agent responses. They assume that while the "cost" of modifying one's characteristics is shared, each individual has a unique sensitivity to that cost, which follows a conditional log-normal distribution. By leveraging this model and the data collected through LID, the researchers construct a "doubly robust" estimator. This mathematical tool is designed to adjust for the strategic shifts in the population, allowing the decision-maker to calculate the expected value of a new policy more accurately. The authors prove that this estimator remains consistent under their defined conditions, meaning it provides a reliable estimate of policy performance as more data is collected.

Key Considerations

This approach shifts the focus of off-policy evaluation from a purely passive observation task to one that incorporates interaction design. By choosing how to disclose information to agents, the decision-maker can actively reduce the uncertainty caused by strategic behavior. However, this method relies on specific assumptions, such as the agents' cost sensitivity distribution and the requirement that the decision-maker can provide personalized recommendations. The research highlights that the way a system is designed to interact with its users is not just a matter of user experience, but a critical component of the system's ability to learn and improve through data.

Comments (0)

No comments yet

Be the first to share your thoughts!