Compiling Agentic Workflows into LLM Weights: Near-Frontier Quality at Two Orders of Magnitude Less Cost
This paper explores a new way to build AI agents by moving away from "orchestration frameworks"—the popular tools that act as a middleman between a user and an AI. Currently, frameworks like LangGraph or CrewAI manage agent behavior by injecting instructions and routing decisions at every step of a conversation. The authors propose a "subterranean" approach: instead of using an external manager, they compile the entire procedural workflow directly into the weights of a smaller, fine-tuned model. This allows the model to "self-orchestrate" naturally, eliminating the need for complex external logic at runtime.
How the Subterranean Approach Works
The process begins by defining an agent’s workflow as a flowchart with specific nodes and decision points. The researchers then generate thousands of synthetic conversations that follow every possible path through that flowchart. By fine-tuning a smaller model on this data, the procedure becomes part of the model’s internal knowledge rather than a set of instructions it has to read repeatedly. At runtime, the user simply talks to the model, which has learned to follow the workflow through its own internal statistical patterns.
Performance and Quality
The researchers tested this method across three domains: travel booking, Zoom technical support, and insurance claims. They found that an 8B-parameter model trained this way achieves 87–98% of the quality of a "frontier" model (like Claude Sonnet 4.5) that has the entire procedure provided in its prompt. In many cases, the compiled model actually outperformed the standard orchestration frameworks, particularly in consistency and naturalness. Because the model has internalized the workflow, it avoids the common "routing errors" that occur when an external orchestrator tries to decide which step to take next.
Significant Cost and Speed Advantages
One of the most striking findings is the massive reduction in cost and latency. Compiled models are 128–462 times cheaper per conversation than the standard in-context baseline. This is because the model no longer needs to process long, repetitive procedural instructions in every API call, and it can be self-hosted on smaller hardware. Furthermore, the "recompile" cycle—the time it takes to update the model when a procedure changes—takes only 30–50 minutes. This makes the approach a viable part of a standard software development lifecycle (CI/CD) rather than a slow, one-off research project.
Why This Matters
The paper argues that the industry’s preference for orchestration frameworks is based on perceived barriers that are actually quite manageable. While developers often worry that fine-tuning is too rigid or that small models aren't smart enough, this research shows that for procedural tasks, smaller models are highly effective. The authors conclude that persistent structure—the "rules" of a task—belongs in the model's weights, while the transient details of a specific conversation belong in the prompt. By shifting the workflow into the model itself, developers can create faster, cheaper, and more reliable agents.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!