Back to AI Research

AI Research

Shepherd: A Runtime Substrate Empowering Meta-Agent... | AI Research

Key Takeaways

  • Shepherd is a new runtime system designed to help "meta-agents"—AI agents that manage or supervise other agents—operate more effectively.
  • We introduce Shepherd, a functional programming model that formalizes meta-agent operations on target agents as functions, with core operations mechanized in Lean.
  • Shepherd records every agent-environment interaction as a typed event in a Git-like execution trace, enabling any past state to be forked and replayed.
  • The system forks the agent process and its filesystem $5\times$ faster than Docker, achieving $>95\%$ prompt-cache reuse on replay.
  • We demonstrate the model through three applications.
Paper AbstractExpand

We introduce Shepherd, a functional programming model that formalizes meta-agent operations on target agents as functions, with core operations mechanized in Lean. Shepherd records every agent-environment interaction as a typed event in a Git-like execution trace, enabling any past state to be forked and replayed. The system forks the agent process and its filesystem $5\times$ faster than Docker, achieving $>95\%$ prompt-cache reuse on replay. We demonstrate the model through three applications. First, in runtime intervention, a live supervisor increases pair coding pass rates from 28.8% to 54.7% on CooperBench. Second, in counterfactual meta-optimization, branching exploration outperforms baselines across four benchmarks by up to 11 points while reducing wall-clock time by up to 58%. Third, in Tree-RL training, forking rollouts at selected turns improves TerminalBench-2 performance from 34.2% to 39.4%. These results establish Shepherd as an efficient infrastructure for programming meta-agents. We open-source the system to support future research.

Shepherd is a new runtime system designed to help "meta-agents"—AI agents that manage or supervise other agents—operate more effectively. Currently, most agentic systems treat execution as a static, black-box process. Shepherd changes this by treating an agent’s entire execution as a first-class, inspectable object. By applying principles from functional programming, Shepherd allows meta-agents to observe, rewind, branch, and modify the behavior of other agents in real time, providing a structured foundation for more complex AI workflows.

A New Way to Manage Agents

Existing agentic runtimes often struggle because they are designed to help a single agent maintain its own state, rather than allowing an external supervisor to intervene. Shepherd introduces a "Git-like" execution trace where every action an agent takes is recorded as a typed event. Because these events are structured, a meta-agent can "subscribe" to an agent's activity, pause it, revert it to a previous state, or fork it into a new branch to test different outcomes. This allows for sophisticated control, such as stopping an agent before it makes a mistake or exploring multiple potential solutions simultaneously.

How the System Works

The core of Shepherd is built on four functional programming primitives:

  • Tasks: Agents are defined as typed functions, making them modular and easy to substitute or compose.

  • Effects: Every action—such as a tool call or file modification—is recorded as a distinct, observable event. This separates the agent's intent from its actual execution.

  • Scopes: These are isolated environments where agents run. A meta-agent can fork a scope to create a "sandbox" branch, allowing it to test a path without affecting the parent agent. If the path fails, the meta-agent can simply discard the branch.

  • Execution Trace: This acts as a persistent history of the agent's work, allowing meta-agents to navigate to any point in time and replay the exact state of the environment.

Performance and Efficiency

Shepherd is designed to be highly efficient, addressing the overhead typically associated with managing multiple agent states. It can fork an agent’s process and filesystem up to 5 times faster than standard Docker operations, regardless of the size of the environment. Furthermore, because the system preserves the exact state of the agent’s previous interactions, it achieves over 95% prompt-cache reuse when replaying tasks. This makes it practical to run complex meta-agent operations without incurring massive computational costs.

Real-World Applications

The researchers demonstrated Shepherd’s versatility across three key areas:

  • Live Supervision: By monitoring two agents working on code simultaneously, a meta-agent supervisor was able to intervene and coordinate their tasks, increasing the pass rate on the CooperBench benchmark from 28.8% to 54.7%.

  • Meta-Optimization: By branching execution paths to test different workflow edits, the system outperformed existing baselines by up to 11 points while reducing the time required to complete tasks by up to 58%.

  • Training: Using a technique called Tree-RL, meta-agents were able to fork rollouts at specific turns to better evaluate agent performance, significantly improving results on the TerminalBench-2 benchmark.

Comments (0)

No comments yet

Be the first to share your thoughts!