GIST-CMTF: Goal-State Inference for Causal Minimal Tool Filtering in LLM Agents
Tool-augmented AI agents are increasingly capable of performing complex tasks by using external tools like email, calendars, and file systems. However, these agents often struggle when a user’s request is ambiguous—such as asking to "take care of this email," which could mean anything from summarizing it to deleting it. While existing systems use Causal Minimal Tool Filtering (CMTF) to show only the necessary tools for a specific goal, they often fail if the agent guesses the wrong goal to begin with. This paper introduces GIST-CMTF, a new layer that sits before the tool-filtering process to infer the user's intended goal and determine if the agent has enough information to proceed or if it needs to ask the user for clarification first.
Solving the "Wrong-Goal" Problem
The authors identify a specific failure mode called "wrong-goal execution." This happens when an agent successfully follows a logical, step-by-step path to complete a task, but the task itself is not what the user actually wanted. Because the agent is acting on an incorrect assumption, it may perform irreversible actions, such as deleting an email or scheduling an appointment, based on a misunderstanding. GIST-CMTF addresses this by treating goal inference as a critical upstream step, ensuring the agent validates the user's intent before it ever exposes a tool for the user to interact with.
How GIST-CMTF Works
The system functions by predicting candidate symbolic goals based on the user's request and the current state of the task. Instead of just guessing a label, it maps the request to the same symbolic "state-transition" vocabulary used by the tool-filtering system. If the system is confident in its prediction, it proceeds with standard causal filtering. If the request is ambiguous or missing key information, the system treats "clarification" as a formal causal action. By framing clarification as a step within the agent's logic—rather than an ad-hoc fallback—the system can intelligently pause and ask the user for the missing details required to reach the correct goal.
Performance and Results
The researchers evaluated GIST-CMTF across seven different model backends and 120 controlled tasks, including scenarios involving email, calendar, and file management. The results show a significant improvement in reliability: GIST-CMTF achieved a 97.0% task success rate, compared to 80.1% and 82.9% for previous filtering methods. Most notably, it reduced the rate of "wrong-goal execution" from 19.4% down to just 2.5%. These findings demonstrate that for AI agents to be truly reliable, they must prioritize validating the user's goal before they are given the power to execute external actions.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!