Back to AI Research

AI Research

From Agent Loops to Deterministic Graphs: Execution... | AI Research

Key Takeaways

  • From Agent Loops to Deterministic Graphs: Execution Lineage for Reproducible AI-Native Work Modern AI systems often rely on "agent loops," where a model cont...
  • Large language model systems are increasingly deployed as agentic workflows that interleave reasoning, tool use, memory, and iterative refinement.
  • The goal is not to make the model a better one-shot writer, but to make evolving AI-generated work maintainable under change.
  • We compare execution-lineage replay against loop-centric update baselines on two controlled policy-memo update tasks.
  • These results show that final answer quality and maintained-state quality are distinct.
Paper AbstractExpand

Large language model systems are increasingly deployed as agentic workflows that interleave reasoning, tool use, memory, and iterative refinement. These systems are effective at producing answers, but they often rely on implicit conversational state, making it difficult to preserve stable work products, isolate irrelevant updates, or propagate changes through intermediate artifacts. We introduce execution lineage: an execution model in which AI-native work is represented as a directed acyclic graph (DAG) of artifact-producing computations with explicit dependencies, stable intermediate boundaries, and identity-based replay. The goal is not to make the model a better one-shot writer, but to make evolving AI-generated work maintainable under change. We compare execution-lineage replay against loop-centric update baselines on two controlled policy-memo update tasks. In an unrelated-branch update, DAG replay preserved the final memo exactly in all runs, with zero churn and zero unrelated-branch contamination, while loop baselines regenerated the memo and frequently imported unrelated context. In an intermediate-artifact edit, all systems reflected the new constraint in the final memo, but only DAG replay achieved perfect upstream preservation, downstream propagation, unaffected-artifact preservation, and cross-artifact consistency. These results show that final answer quality and maintained-state quality are distinct. Strong loop baselines can remain competitive at producing polished final outputs when the task is a bounded synthesis/update problem and all current sources fit in context, but immediate task success can mask partial state inconsistency that may compound over future revisions. Execution lineage provides stronger guarantees about what should change, what should remain stable, and how work evolves across revisions.

From Agent Loops to Deterministic Graphs: Execution Lineage for Reproducible AI-Native Work
Modern AI systems often rely on "agent loops," where a model continuously iterates, uses tools, and updates its conversational history until it reaches a final answer. While effective for simple tasks, this approach creates hidden dependencies and makes it difficult to track how work evolves. This paper introduces "execution lineage," a model that represents AI work as a directed acyclic graph (DAG). By treating intermediate steps as stable, identifiable artifacts rather than transient text, this approach ensures that AI-generated work remains consistent, maintainable, and reproducible even as requirements change.

Moving Beyond Conversational Loops

Current agentic workflows typically store state within a single, evolving conversational transcript. This makes it hard to isolate specific updates or understand exactly what a final answer depends on. When a small change is needed, these systems often perform "global recomputation," where the entire process is restarted. The authors argue that this is a structural failure: because the system lacks explicit boundaries between different stages of work, it cannot reliably distinguish between what should remain stable and what needs to be updated.

The Power of Execution Lineage

The proposed execution lineage model shifts the focus from prompt-based history to a structured graph of computations. Each node in this graph represents a specific unit of work with declared inputs and a clear output contract. By assigning each node an "execution identity," the system can determine exactly when a piece of work needs to be re-run and when it can be safely reused. This allows for "partial recomputation," where only the parts of the workflow affected by a change are updated, while the rest of the work remains preserved and consistent.

Results: Quality vs. Consistency

The authors compared their DAG-based approach against traditional loop-centric systems using policy-memo update tasks. While both methods could produce polished final outputs, they differed significantly in "maintained-state quality." When researchers introduced unrelated updates, loop-based systems often contaminated the final output with irrelevant context or unnecessarily regenerated the entire memo. In contrast, the execution lineage model achieved perfect preservation of unaffected work and ensured that changes propagated correctly through the graph. The study demonstrates that immediate task success in a single run can mask underlying inconsistencies that eventually cause problems in long-term, multi-step projects.

Key Takeaways for AI Systems

The core insight is that final answer quality and the stability of the underlying state are two different things. For long-lived, complex workflows, the system must be able to explain its dependencies and provide stable boundaries between stages. By adopting a graph-based structure, developers can move away from "prompt engineering tricks" and toward a more rigorous, systems-level approach that treats intermediate artifacts as first-class, addressable objects. This ensures that as AI-native work evolves, the system remains predictable and reliable.

Comments (0)

No comments yet

Be the first to share your thoughts!