Agentic World Modeling: Foundations, Capabilities,...

Agentic World Modeling: Foundations, Capabilities,... | AI Research

Key Takeaways

Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond explores how AI systems can move beyond simple text generation to effectively navigate an...
As AI systems move from generating text to accomplishing goals through sustained interaction, the ability to model environment dynamics becomes a central bottleneck.
Agents that manipulate objects, navigate software, coordinate with others, or design experiments require predictive environment models, yet the term world model carries different meanings across research communities.
We introduce a "levels x laws" taxonomy organized along two axes.
The second identifies four governing-law regimes: physical, digital, social, and scientific.

Paper AbstractExpand

As AI systems move from generating text to accomplishing goals through sustained interaction, the ability to model environment dynamics becomes a central bottleneck. Agents that manipulate objects, navigate software, coordinate with others, or design experiments require predictive environment models, yet the term world model carries different meanings across research communities. We introduce a "levels x laws" taxonomy organized along two axes. The first defines three capability levels: L1 Predictor, which learns one-step local transition operators; L2 Simulator, which composes them into multi-step, action-conditioned rollouts that respect domain laws; and L3 Evolver, which autonomously revises its own model when predictions fail against new evidence. The second identifies four governing-law regimes: physical, digital, social, and scientific. These regimes determine what constraints a world model must satisfy and where it is most likely to fail. Using this framework, we synthesize over 400 works and summarize more than 100 representative systems spanning model-based reinforcement learning, video generation, web and GUI agents, multi-agent social simulation, and AI-driven scientific discovery. We analyze methods, failure modes, and evaluation practices across level-regime pairs, propose decision-centric evaluation principles and a minimal reproducible evaluation package, and outline architectural guidance, open problems, and governance challenges. The resulting roadmap connects previously isolated communities and charts a path from passive next-step prediction toward world models that can simulate, and ultimately reshape, the environments in which agents operate.

Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond explores how AI systems can move beyond simple text generation to effectively navigate and interact with the world. As agents take on tasks like manipulating physical objects, navigating software, or conducting scientific experiments, they require "world models"—internal representations that allow them to predict the consequences of their actions. This paper provides a unified framework to organize the diverse and fragmented research currently happening across fields like robotics, reinforcement learning, and AI for science.

A New Taxonomy for AI Capabilities

The authors introduce a "levels × laws" framework to categorize how these models function. The capability levels define the maturity of an agent’s internal model: * L1 Predictor: The most basic level, where the agent learns to predict the next step in a sequence based on past observations. * L2 Simulator: A more advanced level where the agent can perform multi-step "rollouts." It can simulate different future scenarios based on various actions, allowing it to plan and compare outcomes before committing to a decision. * L3 Evolver: The highest level, where the agent can autonomously recognize when its internal model is failing. Instead of just re-planning, it updates its own model based on new evidence, allowing it to learn and adapt to changing environments.

Governing-Law Regimes

Beyond capability levels, the paper identifies four "governing-law regimes" that dictate the constraints an agent must respect. These regimes help researchers understand where a model is most likely to succeed or fail: * Physical World: Focuses on perception and interaction with physical objects, such as robotics and autonomous driving. * Digital World: Centers on program semantics, such as web navigation and software tool use. * Social World: Deals with human-centric dynamics, including social coordination, dialogue, and multi-agent interactions. * Scientific World: Involves latent mechanisms and experimental data, where agents must perform hypothesis-driven discovery.

Bridging Research Communities

The paper synthesizes over 400 works to show that while these fields often operate in isolation, they share the same fundamental goal: building a reliable predictive substrate for decision-making. By applying this common language, the authors aim to move the field away from passive next-step prediction toward more robust, agentic systems. The framework is designed to be diagnostic, helping researchers identify which constraints their models are trying to satisfy and which specific capabilities they need to improve.

Future Directions and Challenges

The authors emphasize that these levels are not static; a single agent might operate as an L1 predictor for simple tasks while escalating to an L3 evolver when it encounters complex, persistent errors. The paper outlines several open problems, including the need for better evaluation practices and the challenge of "meta-world modeling," where the governing laws themselves become learnable. Ultimately, the roadmap suggests that the future of AI lies in creating models that can not only simulate the world but also actively reshape it through informed, evidence-based action.