AI Exposure Scores: what they measure, what they mi...

AI Exposure Scores: what they measure, what they mi... | AI Research

Key Takeaways

AI Exposure Scores: what they measure, what they miss, and what comes next examines the reliance of current labor policy debates on static "exposure scores"—...
A set of exposure scores calculated in 2023 has become a central empirical input to the future of work debate.
(2023) and referred to here as the GPTs are GPTs scores, they define exposure as the share of occupational tasks a large language model can assist with.
This work is a genuine methodological contribution, but as the scores travel from the time and place they were produced, the limitations the authors named do not always travel with them.
The first is structural, between what static exposure scores measure and what policy questions actually require.

Paper AbstractExpand

A set of exposure scores calculated in 2023 has become a central empirical input to the future of work debate. Produced by Eloundou et al. (2023) and referred to here as the GPTs are GPTs scores, they define exposure as the share of occupational tasks a large language model can assist with. This work is a genuine methodological contribution, but as the scores travel from the time and place they were produced, the limitations the authors named do not always travel with them. Two gaps have widened as a result. The first is structural, between what static exposure scores measure and what policy questions actually require. Taking the diffusion of these scores as a case study, we show how their temporal, geographic, and ontological limitations compound in policy-facing analyses, and we survey five families of research responding to these limits: dynamic and benchmark-based measures, ensemble methods, task-framework extensions, worker-centered metrics, and adoption and usage data. The second gap is the one we argue needs more attention: the coordination between researchers and policymakers. The policy-relevant work which ask who is harmed, who benefits, how, and when, continues to reference the static GPTs are GPTs scores without engagement with the methodological updates that would let these questions be answered more reliably. We then ask what additional steps towards navigating uncertainty remain: ex-post frameworks and the deliberate, political work of reimagining what futures are worthy of building towards are. Closing the research-policy gap is a shared task: policymakers must widen their evidence base, engage workers as epistemic partners, and shift from prediction to preparedness; researchers must build data infrastructure, adopt participatory methods, and write with policymakers in mind. Better measurement matters, but it will not close the second gap alone.

AI Exposure Scores: what they measure, what they miss, and what comes next examines the reliance of current labor policy debates on static "exposure scores"—metrics that estimate how much of an occupation’s tasks can be assisted by large language models. While acknowledging the original 2023 "GPTs are GPTs" research as a significant methodological contribution, the authors argue that these scores are being used in ways that ignore their inherent limitations. The paper aims to bridge the gap between static data and the complex, evolving needs of policy, calling for a more nuanced approach to understanding how AI impacts the workforce.

The Problem with Static Metrics

The core issue identified by the authors is a structural gap between what these static scores measure and what policymakers actually need to know. Because the original scores were calculated in a specific time and place, they carry temporal, geographic, and ontological limitations. When these scores are used in policy analysis without accounting for these constraints, the results can be misleading. The authors highlight that while the original researchers were transparent about these limitations, those warnings often fail to travel alongside the data as it is cited in broader policy discussions.

Expanding the Research Toolkit

To address the shortcomings of static exposure scores, the authors survey five emerging families of research that offer more dynamic alternatives:

Dynamic and benchmark-based measures: Moving beyond static snapshots to account for rapid technological change.
Ensemble methods: Combining multiple data sources to create a more robust picture of AI impact.
Task-framework extensions: Refining how we categorize and measure the specific tasks that make up different jobs.
Worker-centered metrics: Shifting the focus toward the actual experiences and needs of the workforce.
Adoption and usage data: Looking at real-world implementation rather than just theoretical potential.

Bridging the Research-Policy Gap

The authors argue that better measurement is necessary but insufficient on its own. A second, deeper gap exists between the research community and policymakers. Current policy discussions often focus on predicting the future of work using outdated metrics, rather than preparing for various potential outcomes. To close this gap, the authors propose a collaborative path forward: policymakers should broaden their evidence base and treat workers as "epistemic partners," while researchers must prioritize building better data infrastructure, adopting participatory methods, and ensuring their work is accessible and relevant to those making policy decisions.

Navigating Future Uncertainty

Beyond improving data, the paper emphasizes the need for a shift in mindset. The authors advocate for the use of "ex-post" frameworks—which evaluate outcomes after the fact—and the deliberate, political work of deciding what kind of future we want to build. By moving away from a sole focus on prediction and toward a focus on preparedness, the authors suggest that society can better navigate the uncertainties introduced by AI, ensuring that the debate over the future of work is grounded in both reliable evidence and shared values.