Back to AI Research

AI Research

GIST-CMTF: Goal-State Inference for Causal Minimal... | AI Research

Key Takeaways

  • GIST-CMTF: Goal-State Inference for Causal Minimal Tool Filtering in LLM Agents Tool-augmented AI agents are increasingly capable of performing complex tasks...
  • Tool-augmented LLM agents rely on runtime filtering to decide which tools should be visible at each step.
  • Causal Minimal Tool Filtering (CMTF) reduces tool-choice confusion by exposing only the next causally necessary tool frontier, but it assumes that the user request has already been mapped to a symbolic goal state.
  • In practice, requests such as "handle my appointment" or "take care of this email" may correspond to multiple possible goals.
  • This creates wrong-goal execution, where an agent follows a valid causal tool path for an unintended objective.
Paper AbstractExpand

Tool-augmented LLM agents rely on runtime filtering to decide which tools should be visible at each step. Causal Minimal Tool Filtering (CMTF) reduces tool-choice confusion by exposing only the next causally necessary tool frontier, but it assumes that the user request has already been mapped to a symbolic goal state. In practice, requests such as "handle my appointment" or "take care of this email" may correspond to multiple possible goals. This creates wrong-goal execution, where an agent follows a valid causal tool path for an unintended objective. We introduce GIST-CMTF, a goal-state inference layer that predicts candidate symbolic goals over the same state-transition vocabulary used by CMTF, estimates ambiguity, and either applies CMTF or exposes clarification as a causal action that produces missing goal or state variables. We evaluate GIST-CMTF across seven model backends, six filtering methods, and 120 controlled tool-use tasks. GIST-CMTF achieves 97.0% task success, compared with 80.1% for top-goal CMTF and 82.9% for semantic-goal CMTF. It reduces wrong-goal execution from 19.4% under top-goal CMTF to 2.5%, while preserving the one-tool exposure of causal filtering and using substantially fewer tokens than all-tools exposure. These results suggest that reliable tool-augmented agents should validate goal state, not only tool relevance, before exposing external actions.

GIST-CMTF: Goal-State Inference for Causal Minimal Tool Filtering in LLM Agents
Tool-augmented AI agents are increasingly capable of performing complex tasks by using external tools like email, calendars, and file systems. However, these agents often struggle when a user’s request is ambiguous—such as asking to "take care of this email," which could mean anything from summarizing it to deleting it. While existing systems use Causal Minimal Tool Filtering (CMTF) to show only the necessary tools for a specific goal, they often fail if the agent guesses the wrong goal to begin with. This paper introduces GIST-CMTF, a new layer that sits before the tool-filtering process to infer the user's intended goal and determine if the agent has enough information to proceed or if it needs to ask the user for clarification first.

Solving the "Wrong-Goal" Problem

The authors identify a specific failure mode called "wrong-goal execution." This happens when an agent successfully follows a logical, step-by-step path to complete a task, but the task itself is not what the user actually wanted. Because the agent is acting on an incorrect assumption, it may perform irreversible actions, such as deleting an email or scheduling an appointment, based on a misunderstanding. GIST-CMTF addresses this by treating goal inference as a critical upstream step, ensuring the agent validates the user's intent before it ever exposes a tool for the user to interact with.

How GIST-CMTF Works

The system functions by predicting candidate symbolic goals based on the user's request and the current state of the task. Instead of just guessing a label, it maps the request to the same symbolic "state-transition" vocabulary used by the tool-filtering system. If the system is confident in its prediction, it proceeds with standard causal filtering. If the request is ambiguous or missing key information, the system treats "clarification" as a formal causal action. By framing clarification as a step within the agent's logic—rather than an ad-hoc fallback—the system can intelligently pause and ask the user for the missing details required to reach the correct goal.

Performance and Results

The researchers evaluated GIST-CMTF across seven different model backends and 120 controlled tasks, including scenarios involving email, calendar, and file management. The results show a significant improvement in reliability: GIST-CMTF achieved a 97.0% task success rate, compared to 80.1% and 82.9% for previous filtering methods. Most notably, it reduced the rate of "wrong-goal execution" from 19.4% down to just 2.5%. These findings demonstrate that for AI agents to be truly reliable, they must prioritize validating the user's goal before they are given the power to execute external actions.

Comments (0)

No comments yet

Be the first to share your thoughts!