MOSS: Self-Evolution through Source-Level Rewriting in Autonomous Agent Systems
Autonomous agentic systems often become static after deployment, meaning they cannot learn from user interactions or fix recurring failures without manual intervention. While some systems allow agents to update text-based files like prompts or memory, they leave the core "harness"—the underlying code responsible for routing, state management, and system logic—untouched. MOSS is a new system designed to bridge this gap by enabling agents to perform self-rewriting at the source-code level. By modifying its own code, an agent can resolve structural failures that are physically unreachable through simple prompt or configuration changes.
Why Source-Level Adaptation Matters
The authors argue that source-level modification is a superior medium for agent evolution for four key reasons. First, it is Turing-complete, meaning it can represent any possible agentic structure, whereas text-mutable methods are limited to what the base model can interpret. Second, it is a strict superset of existing methods; any change possible via prompt editing can also be achieved through code. Third, code-based changes are deterministic, relying on logic rather than the base model’s ability to follow instructions. Finally, source-level changes do not suffer from "long-context drift," where an agent’s performance degrades over time as it is forced to re-read an ever-growing pile of prompt-based instructions.
The MOSS Evolution Pipeline
MOSS operates through a structured, multi-stage pipeline that ensures changes are safe and effective. When a failure is identified—either through an automated scan of user sessions or a direct user report—MOSS creates a batch of evidence. It then enters an iterative loop consisting of seven distinct stages: locating the issue, planning a fix, reviewing the plan, implementing the code, reviewing the code, evaluating the task performance, and issuing a final verdict. To ensure quality, MOSS delegates the actual code editing to a pluggable external coding agent, while the system itself maintains control over the process, stage ordering, and final decision-making.
Verification and Deployment
Before any changes are applied to a live system, MOSS verifies them using ephemeral "trial workers." These are temporary, production-equivalent containers that replay the failure-inducing tasks to ensure the new code actually solves the problem without introducing regressions. Once a candidate is verified, the system presents the plan and logs to the user for audit. If the user provides consent, MOSS performs an in-place container swap. This process is designed to preserve the user’s persistent state, such as memory and credentials, and includes a health-probe-gated rollback mechanism to automatically revert if the new version fails to initialize correctly.
Performance and Results
The authors tested MOSS using OpenClaw, a production-grade agentic system. In a single evolution cycle, MOSS successfully improved the agent’s performance on a four-task benchmark, raising the mean grader score from 0.25 to 0.61. This improvement was achieved entirely without human intervention, demonstrating that MOSS can effectively handle the complexities of a production environment, such as managing large codebases and maintaining live user state, which are typically beyond the scope of simpler, research-oriented self-evolving agents.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!