AI Research

Orchard: An Open-Source Agentic Modeling Framework | AI Research

Key Takeaways

Orchard: An Open-Source Agentic Modeling Framework Orchard is an open-source framework designed to make training autonomous AI agents more scalable, affordab...
Agentic modeling aims to transform LLMs into autonomous agents capable of solving complex tasks through planning, reasoning, tool use, and multi-turn interaction with environments.
Despite major investment, open research remains constrained by infrastructure and training gaps.
Many high-performing systems rely on proprietary codebases, models, or services, while most open-source frameworks focus on orchestration and evaluation rather than scalable agent training.
We present Orchard, an open-source framework for scalable agentic modeling.

Paper AbstractExpand

Agentic modeling aims to transform LLMs into autonomous agents capable of solving complex tasks through planning, reasoning, tool use, and multi-turn interaction with environments. Despite major investment, open research remains constrained by infrastructure and training gaps. Many high-performing systems rely on proprietary codebases, models, or services, while most open-source frameworks focus on orchestration and evaluation rather than scalable agent training. We present Orchard, an open-source framework for scalable agentic modeling. At its core is Orchard Env, a lightweight environment service providing reusable primitives for sandbox lifecycle management across task domains, agent harnesses, and pipeline stages. On top of Orchard Env, we build three agentic modeling recipes. Orchard-SWE targets coding agents. We distill 107K trajectories from MiniMax-M2.5 and Qwen3.5-397B, introduce credit-assignment SFT to learn from productive segments of unresolved trajectories, and apply Balanced Adaptive Rollout for RL. Starting from Qwen3-30B-A3B-Thinking, Orchard-SWE achieves 64.3% on SWE-bench Verified after SFT and 67.5% after SFT+RL, setting a new state of the art among open-source models of comparable size. Orchard-GUI trains a 4B vision-language computer-use agent using only 0.4K distilled trajectories and 2.2K open-ended tasks. It achieves 74.1%, 67.0%, and 64.0% success rates on WebVoyager, Online-Mind2Web, and DeepShop, respectively, making it the strongest open-source model while remaining competitive with proprietary systems. Orchard-Claw targets personal assistant agents. Trained with only 0.2K synthetic tasks, it achieves 59.6% pass@3 on Claw-Eval and 73.9% when paired with a stronger ZeroClaw harness. Collectively, these results show that a lightweight, open, harness-agnostic environment layer enables reusable agentic data, training recipes, and evaluations across domains.

Orchard: An Open-Source Agentic Modeling Framework
Orchard is an open-source framework designed to make training autonomous AI agents more scalable, affordable, and reproducible. While many current agentic systems rely on proprietary infrastructure that is difficult to modify or share, Orchard introduces a lightweight, standardized environment layer. By decoupling the environment from specific training recipes and agent designs, it allows researchers to reuse data, evaluation protocols, and training methods across different domains, such as software engineering, web navigation, and personal assistance.

The Core: Orchard Env

At the heart of the framework is Orchard Env, a service built on Kubernetes that manages the "sandboxes" where agents operate. These sandboxes are isolated environments where an agent can safely execute code, browse the web, or perform tasks. Orchard Env uses a unique "in-pod agent" injection method, which allows it to work with any user-provided Docker image without requiring modifications. By routing commands directly to these sandboxes, the system achieves very low latency (0.28 seconds) and can handle high-concurrency workloads, such as running 1,000 sandboxes simultaneously, at a significantly lower cost than managed alternatives.

Agentic Modeling Recipes

The researchers demonstrated the framework's versatility by building three distinct agentic models:

Orchard-SWE: Focused on software engineering, this model uses a technique called "credit-assignment SFT" to learn from both successful and failed attempts at coding tasks. By applying reinforcement learning, it achieved a 67.5% success rate on the SWE-bench Verified benchmark, setting a new state-of-the-art for open-source models of its size.
Orchard-GUI: This vision-language agent is designed for computer use. Despite using a smaller 4B parameter model, it achieved an average success rate of 68.4% across three different web-navigation benchmarks, proving that environment-grounded training can make smaller models highly competitive with much larger proprietary systems.
Orchard-Claw: This model targets personal assistant tasks. It demonstrates that agent skills can be transferred across different "harnesses" (the software interfaces that connect the agent to the environment), achieving a 73.9% pass rate when paired with a robust harness.

Why This Matters for Research

The primary goal of Orchard is to remove the "foundational bottleneck" in agent research: the tendency for training data and methods to be locked into specific, rigid infrastructure. Because Orchard Env is a thin, standalone service, it acts as a common substrate. This means that a dataset collected for one project can be easily reused for another, and training recipes can be shared across the community without researchers needing to rebuild their entire backend. By providing an open, cost-effective, and portable infrastructure, the authors aim to accelerate innovation in how we train LLMs to interact with the world.

Comments (0)

No comments yet

Be the first to share your thoughts!