Think Fast: Estimating No-CoT Task-Completion Time Horizons of Frontier AI Models
Current safety protocols for frontier AI models often rely on monitoring "chain-of-thought" (CoT) reasoning—the explicit steps a model takes to solve a problem. However, as models become more advanced, they may develop the ability to perform complex reasoning internally without generating these visible thinking tokens. This paper investigates the risks posed by this "no-CoT" reasoning capability by measuring how effectively frontier models solve complex tasks without explicit reasoning steps and projecting how these capabilities might evolve.
Measuring Internal Reasoning
To understand the limits of no-CoT reasoning, the researchers evaluated frontier models across a massive dataset of over 30,000 questions. These questions spanned 43 different benchmarks, covering diverse domains such as mathematics, coding, puzzles, causality, theory-of-mind, and strategic reasoning. By testing the models' success rates without CoT, the team aimed to determine the extent to which models can bypass traditional oversight mechanisms.
Defining the Time Horizon
The study introduces two key metrics to quantify model performance:
50% Task-Completion Time Horizon (TH): This measures the amount of time a human would typically need to complete tasks that the AI model solves with a 50% success rate.
50% Reasoning Token Horizon: This measures the minimum number of reasoning tokens required by a model like o3-mini to achieve a 50% success rate on the same tasks.
These metrics allow researchers to compare the "internal" cognitive effort of an AI against human-equivalent time, providing a standardized way to track how much complex reasoning a model can perform silently.
Rapid Growth and Future Projections
The findings indicate that the no-CoT capabilities of frontier models are advancing quickly. The data shows that the 50% TH has been doubling roughly every year for the past six years. For instance, GPT-5.5 has already reached a TH of over 3 minutes, with a reasoning token horizon exceeding 1,500 tokens.
Based on these trends, the researchers project that frontier no-CoT time horizons could exceed 7 minutes by 2028 and reach 25 minutes by 2030. While these projections involve substantial uncertainty, the authors emphasize that the rapid growth of internal reasoning capabilities is a critical development. Consequently, they recommend that frontier AI developers explicitly track these no-CoT horizons to ensure that safety oversight remains effective as models become more capable of "thinking" without showing their work.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!