Back to AI Research

AI Research

Self-Evolving World Models for LLM Agent Planning | AI Research

Key Takeaways

  • Self-Evolving World Models for LLM Agent Planning Large Language Model (LLM) agents often struggle with long-horizon tasks because they lack reliable foresig...
  • World models offer a principled way to equip long-horizon LLM agents with foresight: predictions of action consequences before execution.
  • However, unreliable foresight can be ignored, misused, or even degrade downstream decision-making.
  • In this paper, we introduce WorldEvolver, a self-evolving world model framework that revises its deployment-time context while keeping the downstream agent and all model parameters frozen.
  • We evaluate WorldEvolver on ALFWorld and ScienceWorld, measuring world model prediction accuracy on Word2World and downstream agent success rate on AgentBoard.
Paper AbstractExpand

World models offer a principled way to equip long-horizon LLM agents with foresight: predictions of action consequences before execution. However, unreliable foresight can be ignored, misused, or even degrade downstream decision-making. In this paper, we introduce WorldEvolver, a self-evolving world model framework that revises its deployment-time context while keeping the downstream agent and all model parameters frozen. WorldEvolver integrates three modules: (i) Episodic Memory, which exploits real action transitions through retrieval-based simulation; (ii) Semantic Memory, which extracts persistent heuristic rules from prediction-observation mismatches; and (iii) Selective Foresight, which filters low-confidence predictions before integrating them into agent reasoning context. We evaluate WorldEvolver on ALFWorld and ScienceWorld, measuring world model prediction accuracy on Word2World and downstream agent success rate on AgentBoard. Extensive experiments show that WorldEvolver achieves the highest prediction accuracy across three backbones and leads other world model baselines on downstream agent success rate, demonstrating that test-time memory revision enhances both predictive fidelity and planning performance.

Self-Evolving World Models for LLM Agent Planning
Large Language Model (LLM) agents often struggle with long-horizon tasks because they lack reliable foresight—the ability to accurately predict the consequences of their actions before executing them. While some agents attempt to learn from past experiences, they often face "distribution shifts" where the environment changes, rendering their internal models outdated. This paper introduces WorldEvolver, a framework that allows an agent’s world model to continuously evolve and improve its predictions during deployment without requiring expensive, time-consuming updates to the model's underlying parameters.

How WorldEvolver Works

Instead of retraining the LLM, WorldEvolver uses a non-parametric memory system that updates the context provided to the model at inference time. It relies on three core mechanisms:

  • Episodic Memory: This module stores actual, realized transitions from the environment. By retrieving past experiences that are similar to the current situation, the agent can ground its predictions in concrete, historical data.

  • Semantic Memory: This module acts as an exploration tool. It identifies mismatches between what the model predicted and what actually happened in the environment. It then uses an LLM "critic" to turn these failures into persistent heuristic rules, which are stored as context to guide future predictions.

  • Selective Foresight: Because unreliable predictions can actually harm an agent’s performance, this module acts as a filter. It calculates a confidence score for each prediction and only allows the agent to see the foresight if the model is sufficiently confident.

Keeping the Agent Frozen

A key design choice in WorldEvolver is that the downstream agent and the world model’s parameters remain entirely frozen. By focusing on revising the external memory rather than the model weights, the framework avoids the high computational costs and risks—such as "catastrophic forgetting"—associated with constant parameter updates. This allows the system to adapt to new environments in real-time, effectively bridging the gap between static models and dynamic, evolving tasks.

Performance and Results

The researchers evaluated WorldEvolver on benchmarks including ALFWorld and ScienceWorld. The results demonstrate that this memory-centric approach significantly improves both the accuracy of the world model’s predictions and the overall success rate of the agent in completing tasks. Across multiple model backbones, WorldEvolver consistently outperformed existing baselines, confirming that test-time memory revision is a highly effective strategy for enhancing predictive fidelity and planning performance in LLM agents.

Comments (0)

No comments yet

Be the first to share your thoughts!