Shepherd is a new runtime system designed to help "meta-agents"—AI agents that manage or supervise other agents—operate more effectively. Currently, most agentic systems treat execution as a static, black-box process. Shepherd changes this by treating an agent’s entire execution as a first-class, inspectable object. By applying principles from functional programming, Shepherd allows meta-agents to observe, rewind, branch, and modify the behavior of other agents in real time, providing a structured foundation for more complex AI workflows.
A New Way to Manage Agents
Existing agentic runtimes often struggle because they are designed to help a single agent maintain its own state, rather than allowing an external supervisor to intervene. Shepherd introduces a "Git-like" execution trace where every action an agent takes is recorded as a typed event. Because these events are structured, a meta-agent can "subscribe" to an agent's activity, pause it, revert it to a previous state, or fork it into a new branch to test different outcomes. This allows for sophisticated control, such as stopping an agent before it makes a mistake or exploring multiple potential solutions simultaneously.
How the System Works
The core of Shepherd is built on four functional programming primitives:
Tasks: Agents are defined as typed functions, making them modular and easy to substitute or compose.
Effects: Every action—such as a tool call or file modification—is recorded as a distinct, observable event. This separates the agent's intent from its actual execution.
Scopes: These are isolated environments where agents run. A meta-agent can fork a scope to create a "sandbox" branch, allowing it to test a path without affecting the parent agent. If the path fails, the meta-agent can simply discard the branch.
Execution Trace: This acts as a persistent history of the agent's work, allowing meta-agents to navigate to any point in time and replay the exact state of the environment.
Performance and Efficiency
Shepherd is designed to be highly efficient, addressing the overhead typically associated with managing multiple agent states. It can fork an agent’s process and filesystem up to 5 times faster than standard Docker operations, regardless of the size of the environment. Furthermore, because the system preserves the exact state of the agent’s previous interactions, it achieves over 95% prompt-cache reuse when replaying tasks. This makes it practical to run complex meta-agent operations without incurring massive computational costs.
Real-World Applications
The researchers demonstrated Shepherd’s versatility across three key areas:
Live Supervision: By monitoring two agents working on code simultaneously, a meta-agent supervisor was able to intervene and coordinate their tasks, increasing the pass rate on the CooperBench benchmark from 28.8% to 54.7%.
Meta-Optimization: By branching execution paths to test different workflow edits, the system outperformed existing baselines by up to 11 points while reducing the time required to complete tasks by up to 58%.
Training: Using a technique called Tree-RL, meta-agents were able to fork rollouts at specific turns to better evaluate agent performance, significantly improving results on the TerminalBench-2 benchmark.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!