Back to AI Research

AI Research

LedgerAgent: Structured State for Policy-Adherent T... | AI Research

Key Takeaways

  • LedgerAgent: Structured State for Policy-Adherent Tool-Calling Agents AI agents in customer service often struggle to maintain a clear picture of a task as i...
  • Policy-adherent tool-calling agents in customer-service domains must maintain task states across turns while calling tools and obeying domain policies.
  • Task states consist of relevant facts, identifiers, constraints, and conditions observed through user interaction and tool calls.
  • In standard agents, task states are not represented separately.
  • Observations, tool returns, and policy instructions are placed in the prompt, leaving agents to reconstruct the relevant states from the prompt each time they decide what to do next.
Paper AbstractExpand

Policy-adherent tool-calling agents in customer-service domains must maintain task states across turns while calling tools and obeying domain policies. Task states consist of relevant facts, identifiers, constraints, and conditions observed through user interaction and tool calls. In standard agents, task states are not represented separately. Observations, tool returns, and policy instructions are placed in the prompt, leaving agents to reconstruct the relevant states from the prompt each time they decide what to do next. This design makes state management implicit, creating two common failure modes. An agent may retrieve the right facts but later ground its decision in stale, missing, or incorrect information; and a syntactically valid tool call may still violate a domain policy that depends on the current task state. We introduce \textsc{LedgerAgent}, an inference-time method for tool-calling agents that maintains observed task states in a separate ledger and renders the states into the prompt. The ledger is also used to check state-dependent policy constraints before environment-changing tool calls are executed, blocking policy violations. Across four customer-service domains and a mixed panel of open- and closed-weight models, \textsc{LedgerAgent} improves average pass\textasciicircum{}k over a standard prompt-based tool-calling approach, with the largest gains under stricter multi-trial consistency metrics.

LedgerAgent: Structured State for Policy-Adherent Tool-Calling Agents
AI agents in customer service often struggle to maintain a clear picture of a task as it evolves over multiple turns. In standard setups, these agents rely on the conversation history to "remember" facts, such as order numbers or reservation details. Because this information is buried in a growing transcript, agents frequently make mistakes—either by acting on outdated information or by performing actions that violate company policies. This paper introduces LedgerAgent, a method that provides agents with an explicit, organized "ledger" to track task states and a "policy gate" to ensure all actions comply with domain rules before they are executed.

A Better Way to Track State

Instead of forcing the model to reconstruct the current situation from raw text, LedgerAgent maintains a structured, typed dictionary that stores key facts observed through tool calls. This ledger acts as a reliable, up-to-date reference point. Whenever the agent successfully retrieves information, that data is automatically added to the ledger in a standardized format. Before the agent makes its next move, this ledger is injected into the prompt, allowing the model to look up specific identifiers or statuses instantly rather than searching through past messages.

Preventing Policy Violations

A major challenge for customer-service agents is ensuring that their actions—such as issuing a refund or changing a flight—follow specific business rules. LedgerAgent addresses this with a "policy gate" that acts as a final check before any environment-changing action is finalized. This gate evaluates the agent's proposed action against a set of predefined logical rules. If an action violates a policy based on the current ledger state, the gate can block the request, suggest a revision, or provide feedback to the agent. This prevents errors from reaching the external system in the first place.

Proven Performance Gains

The researchers tested LedgerAgent across four different customer-service domains, including airline, retail, telecom, and telehealth, using a variety of open- and closed-weight models. The results show that LedgerAgent consistently outperforms standard prompt-based approaches, particularly in tasks that require modifying external systems. Notably, the method improves "pass^k" scores—a metric that measures how reliably an agent solves a task across multiple independent trials. These gains are achieved without requiring additional training or extra LLM calls, making it a highly efficient way to improve agent reliability.

Key Considerations

LedgerAgent is designed as an inference-time scaffold, meaning it works alongside existing models without requiring changes to their underlying weights. While it significantly improves consistency and adherence to rules, it is specifically built for structured environments where domain policies can be defined as logical predicates. By separating state tracking from the model's generation process, the approach ensures that the agent remains grounded in the actual data retrieved from the environment, rather than relying on potentially flawed memory of the conversation history.

Comments (0)

No comments yet

Be the first to share your thoughts!