Orchard: An Open-Source Agentic Modeling Framework
Orchard is an open-source framework designed to make training autonomous AI agents more scalable, affordable, and reproducible. While many current agentic systems rely on proprietary infrastructure that is difficult to modify or share, Orchard introduces a lightweight, standardized environment layer. By decoupling the environment from specific training recipes and agent designs, it allows researchers to reuse data, evaluation protocols, and training methods across different domains, such as software engineering, web navigation, and personal assistance.
The Core: Orchard Env
At the heart of the framework is Orchard Env, a service built on Kubernetes that manages the "sandboxes" where agents operate. These sandboxes are isolated environments where an agent can safely execute code, browse the web, or perform tasks. Orchard Env uses a unique "in-pod agent" injection method, which allows it to work with any user-provided Docker image without requiring modifications. By routing commands directly to these sandboxes, the system achieves very low latency (0.28 seconds) and can handle high-concurrency workloads, such as running 1,000 sandboxes simultaneously, at a significantly lower cost than managed alternatives.
Agentic Modeling Recipes
The researchers demonstrated the framework's versatility by building three distinct agentic models:
Orchard-SWE: Focused on software engineering, this model uses a technique called "credit-assignment SFT" to learn from both successful and failed attempts at coding tasks. By applying reinforcement learning, it achieved a 67.5% success rate on the SWE-bench Verified benchmark, setting a new state-of-the-art for open-source models of its size.
Orchard-GUI: This vision-language agent is designed for computer use. Despite using a smaller 4B parameter model, it achieved an average success rate of 68.4% across three different web-navigation benchmarks, proving that environment-grounded training can make smaller models highly competitive with much larger proprietary systems.
Orchard-Claw: This model targets personal assistant tasks. It demonstrates that agent skills can be transferred across different "harnesses" (the software interfaces that connect the agent to the environment), achieving a 73.9% pass rate when paired with a robust harness.
Why This Matters for Research
The primary goal of Orchard is to remove the "foundational bottleneck" in agent research: the tendency for training data and methods to be locked into specific, rigid infrastructure. Because Orchard Env is a thin, standalone service, it acts as a common substrate. This means that a dataset collected for one project can be easily reused for another, and training recipes can be shared across the community without researchers needing to rebuild their entire backend. By providing an open, cost-effective, and portable infrastructure, the authors aim to accelerate innovation in how we train LLMs to interact with the world.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!