AI Research

Synthetic Computers at Scale for Long-Horizon Produ... | AI Research

Key Takeaways

Synthetic Computers at Scale for Long-Horizon Productivity Simulation This paper introduces a new methodology for creating realistic, large-scale synthetic c...
Realistic long-horizon productivity work is strongly conditioned on user-specific computer environments, where much of the work context is stored and organized through directory structures and content-rich artifacts.
In preliminary experiments, we create 1,000 synthetic computers and run long-horizon simulations on them; each run requires over 8 hours of agent runtime and spans more than 2,000 turns on average.
These simulations produce rich experiential learning signals, whose effectiveness is validated by significant improvements in agent performance on both in-domain and out-of-domain productivity evaluations.
Synthetic Computers at Scale for Long-Horizon Productivity Simulation This paper introduces a new methodology for creating realistic, large-scale synthetic computer environments to train and evaluate AI agents.

Paper AbstractExpand

Realistic long-horizon productivity work is strongly conditioned on user-specific computer environments, where much of the work context is stored and organized through directory structures and content-rich artifacts. To scale synthetic data creation for such productivity scenarios, we introduce Synthetic Computers at Scale, a scalable methodology for creating such environments with realistic folder hierarchies and content-rich artifacts (e.g., documents, spreadsheets, and presentations). Conditioned on each synthetic computer, we run long-horizon simulations: one agent creates productivity objectives that are specific to the computer's user and require multiple professional deliverables and about a month of human work; another agent then acts as that user and keeps working across the computer -- for example, navigating the filesystem for grounding, coordinating with simulated collaborators, and producing professional artifacts -- until these objectives are completed. In preliminary experiments, we create 1,000 synthetic computers and run long-horizon simulations on them; each run requires over 8 hours of agent runtime and spans more than 2,000 turns on average. These simulations produce rich experiential learning signals, whose effectiveness is validated by significant improvements in agent performance on both in-domain and out-of-domain productivity evaluations. Given that personas are abundant at billion scale, this methodology can in principle scale to millions or even billions of synthetic user worlds with sufficient compute, enabling broader coverage of diverse professions, roles, contexts, environments, and productivity needs. We argue that scalable synthetic computer creation, together with at-scale simulations, is highly promising as a foundational substrate for agent self-improvement and agentic reinforcement learning in long-horizon productivity scenarios.

Synthetic Computers at Scale for Long-Horizon Productivity Simulation
This paper introduces a new methodology for creating realistic, large-scale synthetic computer environments to train and evaluate AI agents. As AI moves from simple chat-based tasks to complex, multi-step productivity work, agents require deep context—such as file structures, project histories, and professional artifacts—to function effectively. By generating thousands of unique, user-specific virtual computers, the authors provide a foundational environment where agents can practice long-horizon tasks like data analysis, document creation, and professional collaboration.

Building Realistic User Environments

The researchers create these synthetic environments by starting with a persona—a detailed profile of a professional, such as a financial advisor. This persona is expanded into a comprehensive user profile that includes career history, current projects, preferred software tools, and even specific document-handling habits. Using this profile, the system plans a complete filesystem, including directory hierarchies and a network of interconnected files. By establishing dependencies between files—where a final report might be derived from an earlier spreadsheet or a downloaded data set—the methodology ensures that the synthetic computer feels like a genuine, lived-in workspace rather than a collection of random files.

Simulating Long-Horizon Work

Once a synthetic computer is established, the researchers run long-horizon simulations using two distinct agents. A "setup agent" defines complex, month-long productivity objectives tailored to the specific user profile and the files already present on the computer. A "work agent" then takes over, navigating the filesystem, coordinating with simulated collaborators, and iteratively creating or revising professional deliverables like spreadsheets and presentations. These simulations are extensive, often spanning over 2,000 turns and requiring more than 8 hours of agent runtime, which allows the agents to learn how to plan, revise, and recover from failures in a realistic context.

Performance and Scalability

In preliminary experiments, the authors instantiated 1,000 synthetic computers to test their approach. The results indicate that these simulations provide valuable experiential learning signals, leading to measurable improvements in agent performance on both in-domain and out-of-domain productivity tasks. Because the methodology relies on a vast pool of potential personas, the authors argue that this approach can scale to millions or billions of synthetic user worlds. This scalability offers a promising path for future agent self-improvement and reinforcement learning, as it allows for the creation of diverse environments that cover a wide range of professions, roles, and productivity needs.

Comments (0)

No comments yet

Be the first to share your thoughts!