Large language models (LLMs) are increasingly used to interact with external tools, but they often struggle to balance the need for deep reasoning with the requirement for precise, structured execution. The paper "Case-Based Calibration of Adaptive Reasoning and Execution for LLM Tool Use" introduces CAST, a framework that treats past tool-use experiences as "cases" to help models learn when to think deeply and how to avoid structural errors. By analyzing historical successes and failures, CAST enables models to autonomously adjust their reasoning effort and improve the accuracy of their tool invocations.
Learning from Past Execution
Rather than treating every task with the same level of effort, CAST organizes historical execution data into structured cases. Each case includes the original query, the reasoning steps taken, the tool call made, and the final outcome. From this data, the framework extracts two key signals: a "complexity profile" that estimates how much reasoning is necessary for a specific task, and a "failure profile" that identifies common structural pitfalls, such as incorrect function names or parameter mismatches.
Adaptive Reasoning and Optimization
CAST uses these profiles to guide the model during reinforcement learning. For simpler tasks, the model is encouraged to be concise, reducing unnecessary deliberation. For more complex tasks, it is incentivized to maintain a longer reasoning process to ensure constraints are met and arguments are normalized. Simultaneously, the failure profile provides granular feedback on the structure of the tool calls. This dual approach allows the model to learn a more efficient and reliable policy, effectively internalizing the lessons from past experiences to perform better on new, unseen tasks.
Performance Gains
Experiments conducted on the BFCLv2 and ToolBench benchmarks demonstrate that the CAST framework significantly improves tool-use performance. The model achieved up to a 5.85 percentage point increase in overall execution accuracy while reducing the average length of reasoning traces by 26%. These results suggest that by shifting from a "one-size-fits-all" approach to a case-based, adaptive strategy, LLMs can become more efficient and less prone to high-impact structural errors when interacting with external tools.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!