AutoAgent: Open Source Library for Autonomous AI Agent Optimization

AI engineers often face a repetitive and time-consuming cycle of prompt-tuning: writing system prompts, running agents against benchmarks, analyzing failure traces, and manually tweaking configurations. A new open-source library called AutoAgent, developed by Kevin Gu at thirdlayer.inc, aims to eliminate this manual labor by allowing an AI to autonomously engineer and optimize its own agent harness. In a 24-hour run, the library demonstrated its efficacy by achieving the number one spot on SpreadsheetBench with a 96.5% score and securing the top GPT-5 score on TerminalBench at 55.1%.

Automating the Agentic Loop

AutoAgent functions similarly to Andrej Karpathy’s autoresearch, but it is specifically designed for agent engineering. While autoresearch iterates through cycles to improve machine learning training, AutoAgent applies this logic to the agent harness—the scaffolding that includes system prompts, tool definitions, routing logic, and orchestration strategies. By automating the propose-test-evaluate loop, the system allows an AI to modify its own configuration, run benchmarks, and decide whether to keep or discard changes based on performance improvements.
The architecture relies on a clear separation of concerns between the human and the machine. The human defines the goal in a program.md file, which serves as the directive for the meta-agent. The meta-agent then inspects the agent.py file, which contains the harness under test, and iteratively rewrites it to improve performance. A results.tsv file tracks the history of these experiments, allowing the meta-agent to learn from past attempts and calibrate future iterations.

Domain-Agnostic Optimization

The library is built to be domain-agnostic, utilizing the Harbor format for benchmarks. Each task includes a configuration file, instructions for the agent, and a test suite that can employ either deterministic checks or an LLM-as-judge to verify performance. Because these tasks run in Docker containers, AutoAgent can be applied to any scorable domain, from spreadsheet manipulation to terminal command execution.
This approach shifts the role of the AI engineer from a manual coder to a director. Instead of directly editing the agent harness, the engineer provides high-level guidance, leaving the technical optimization to the meta-agent. Observations from the project suggest that same-family model pairing—such as using a Claude meta-agent to optimize a Claude task agent—may lead to more accurate failure diagnosis, indicating that the relationship between the meta-agent and the target agent is a significant factor in the optimization process.

AutoAgent: Open Source Library for Autonomous AI Agent Optimization

Key Takeaways

Automating the Agentic Loop

Domain-Agnostic Optimization

Comments (0)

No comments yet

AutoAgent: Open Source Library for Autonomous AI Agent Optimization

Key Takeaways

Automating the Agentic Loop

Domain-Agnostic Optimization

Get a Free AI Prompt Guide

Comments (0)

No comments yet