Synthetic Computers at Scale for Long-Horizon Productivity Simulation
This paper introduces a new methodology for creating realistic, large-scale synthetic computer environments to train and evaluate AI agents. As AI moves from simple chat-based tasks to complex, multi-step productivity work, agents require deep context—such as file structures, project histories, and professional artifacts—to function effectively. By generating thousands of unique, user-specific virtual computers, the authors provide a foundational environment where agents can practice long-horizon tasks like data analysis, document creation, and professional collaboration.
Building Realistic User Environments
The researchers create these synthetic environments by starting with a persona—a detailed profile of a professional, such as a financial advisor. This persona is expanded into a comprehensive user profile that includes career history, current projects, preferred software tools, and even specific document-handling habits. Using this profile, the system plans a complete filesystem, including directory hierarchies and a network of interconnected files. By establishing dependencies between files—where a final report might be derived from an earlier spreadsheet or a downloaded data set—the methodology ensures that the synthetic computer feels like a genuine, lived-in workspace rather than a collection of random files.
Simulating Long-Horizon Work
Once a synthetic computer is established, the researchers run long-horizon simulations using two distinct agents. A "setup agent" defines complex, month-long productivity objectives tailored to the specific user profile and the files already present on the computer. A "work agent" then takes over, navigating the filesystem, coordinating with simulated collaborators, and iteratively creating or revising professional deliverables like spreadsheets and presentations. These simulations are extensive, often spanning over 2,000 turns and requiring more than 8 hours of agent runtime, which allows the agents to learn how to plan, revise, and recover from failures in a realistic context.
Performance and Scalability
In preliminary experiments, the authors instantiated 1,000 synthetic computers to test their approach. The results indicate that these simulations provide valuable experiential learning signals, leading to measurable improvements in agent performance on both in-domain and out-of-domain productivity tasks. Because the methodology relies on a vast pool of potential personas, the authors argue that this approach can scale to millions or billions of synthetic user worlds. This scalability offers a promising path for future agent self-improvement and reinforcement learning, as it allows for the creation of diverse environments that cover a wide range of professions, roles, and productivity needs.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!