Thinking with Reasoning Skills: Fewer Tokens, More Accuracy
Reasoning-focused AI models often solve complex math and coding problems by "thinking from scratch," a process that generates long, redundant chains of thought. This approach is not only slow but also expensive, as commercial models charge based on the number of tokens generated. This paper introduces Thinking with Reasoning Skills (TRS), a framework that replaces this repetitive trial-and-error process with a library of reusable, distilled reasoning strategies. By retrieving and applying these pre-learned skills, the model can navigate directly to a solution, reducing both the time and cost of inference without sacrificing accuracy.
How the Approach Works
TRS operates in two distinct phases: an offline distillation phase and an online retrieval phase. During the offline phase, the system takes long, successful reasoning traces and distills them into compact "skill cards." These cards contain structured, actionable advice, such as specific patterns to look for, common pitfalls to avoid, and verification steps.
When a new query arrives, the system searches its library for the most relevant skill cards and injects them into the model's prompt. This acts as a "navigation map," guiding the model toward an effective solution path. Because the model no longer needs to rediscover the logic for every problem, it avoids the redundant detours that typically inflate token counts.
Breaking the Efficiency-Accuracy Trade-off
Previous attempts to make AI reasoning more efficient—such as enforcing strict token budgets or compressing thoughts—often resulted in a sharp drop in accuracy, especially on difficult problems. The authors found that TRS avoids this trade-off. By providing the model with high-quality, distilled procedural memory rather than just forcing it to be brief, the system maintains or even improves accuracy. Across various math and coding benchmarks, TRS consistently reduced the number of reasoning tokens required, leading to lower per-request costs for real-world deployment.
Key Findings and Transferability
The researchers discovered that these reasoning skills are highly transferable. Skills distilled from one model can be effectively used to guide another, allowing developers to use a powerful model to create a library that improves the performance and efficiency of smaller, more cost-effective models.
The benefits of TRS are most pronounced on harder problems. While simple tasks may not require much guidance, the framework excels when the baseline model would otherwise struggle with complex, multi-step reasoning. The authors note that while TRS is a powerful tool for optimizing performance, its effectiveness can vary depending on the specific model and the quality of the skill library, suggesting that it is best used as a modular, adaptable component in a production pipeline.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!