Back to AI Research

AI Research

Thinking with Reasoning Skills: Fewer Tokens, More... | AI Research

Key Takeaways

  • Thinking with Reasoning Skills: Fewer Tokens, More Accuracy Reasoning-focused AI models often solve complex math and coding problems by "thinking from scratc...
  • Reasoning LLMs often spend substantial tokens on long intermediate reasoning traces (e.g., chain-of-thought) when solving new problems.
  • We propose to summarize and store reusable reasoning skills distilled from extensive deliberation and trial-and-error exploration, and to retrieve these skills at inference time to guide future reasoning.
  • Unlike the prevailing \emph{reasoning from scratch} paradigm, our approach first recalls relevant skills for each query, helping the model avoid redundant detours and focus on effective solution paths.
  • We evaluate our method on coding and mathematical reasoning tasks, and find that it significantly reduces reasoning tokens while improving overall performance.
Paper AbstractExpand

Reasoning LLMs often spend substantial tokens on long intermediate reasoning traces (e.g., chain-of-thought) when solving new problems. We propose to summarize and store reusable reasoning skills distilled from extensive deliberation and trial-and-error exploration, and to retrieve these skills at inference time to guide future reasoning. Unlike the prevailing \emph{reasoning from scratch} paradigm, our approach first recalls relevant skills for each query, helping the model avoid redundant detours and focus on effective solution paths. We evaluate our method on coding and mathematical reasoning tasks, and find that it significantly reduces reasoning tokens while improving overall performance. The resulting lower per-request cost indicates strong practical and economic potential for real-world deployment.

Thinking with Reasoning Skills: Fewer Tokens, More Accuracy
Reasoning-focused AI models often solve complex math and coding problems by "thinking from scratch," a process that generates long, redundant chains of thought. This approach is not only slow but also expensive, as commercial models charge based on the number of tokens generated. This paper introduces Thinking with Reasoning Skills (TRS), a framework that replaces this repetitive trial-and-error process with a library of reusable, distilled reasoning strategies. By retrieving and applying these pre-learned skills, the model can navigate directly to a solution, reducing both the time and cost of inference without sacrificing accuracy.

How the Approach Works

TRS operates in two distinct phases: an offline distillation phase and an online retrieval phase. During the offline phase, the system takes long, successful reasoning traces and distills them into compact "skill cards." These cards contain structured, actionable advice, such as specific patterns to look for, common pitfalls to avoid, and verification steps.
When a new query arrives, the system searches its library for the most relevant skill cards and injects them into the model's prompt. This acts as a "navigation map," guiding the model toward an effective solution path. Because the model no longer needs to rediscover the logic for every problem, it avoids the redundant detours that typically inflate token counts.

Breaking the Efficiency-Accuracy Trade-off

Previous attempts to make AI reasoning more efficient—such as enforcing strict token budgets or compressing thoughts—often resulted in a sharp drop in accuracy, especially on difficult problems. The authors found that TRS avoids this trade-off. By providing the model with high-quality, distilled procedural memory rather than just forcing it to be brief, the system maintains or even improves accuracy. Across various math and coding benchmarks, TRS consistently reduced the number of reasoning tokens required, leading to lower per-request costs for real-world deployment.

Key Findings and Transferability

The researchers discovered that these reasoning skills are highly transferable. Skills distilled from one model can be effectively used to guide another, allowing developers to use a powerful model to create a library that improves the performance and efficiency of smaller, more cost-effective models.
The benefits of TRS are most pronounced on harder problems. While simple tasks may not require much guidance, the framework excels when the baseline model would otherwise struggle with complex, multi-step reasoning. The authors note that while TRS is a powerful tool for optimizing performance, its effectiveness can vary depending on the specific model and the quality of the skill library, suggesting that it is best used as a modular, adaptable component in a production pipeline.

Comments (0)

No comments yet

Be the first to share your thoughts!