Self-Evolving World Models for LLM Agent Planning

Self-Evolving World Models for LLM Agent Planning
Large Language Model (LLM) agents often struggle with long-horizon tasks because they lack reliable foresight—the ability to accurately predict the consequences of their actions before executing them. While some agents attempt to learn from past experiences, they often face "distribution shifts" where the environment changes, rendering their internal models outdated. This paper introduces WorldEvolver, a framework that allows an agent’s world model to continuously evolve and improve its predictions during deployment without requiring expensive, time-consuming updates to the model's underlying parameters.

How WorldEvolver Works

Instead of retraining the LLM, WorldEvolver uses a non-parametric memory system that updates the context provided to the model at inference time. It relies on three core mechanisms:

Episodic Memory: This module stores actual, realized transitions from the environment. By retrieving past experiences that are similar to the current situation, the agent can ground its predictions in concrete, historical data.
Semantic Memory: This module acts as an exploration tool. It identifies mismatches between what the model predicted and what actually happened in the environment. It then uses an LLM "critic" to turn these failures into persistent heuristic rules, which are stored as context to guide future predictions.
Selective Foresight: Because unreliable predictions can actually harm an agent’s performance, this module acts as a filter. It calculates a confidence score for each prediction and only allows the agent to see the foresight if the model is sufficiently confident.

Keeping the Agent Frozen

A key design choice in WorldEvolver is that the downstream agent and the world model’s parameters remain entirely frozen. By focusing on revising the external memory rather than the model weights, the framework avoids the high computational costs and risks—such as "catastrophic forgetting"—associated with constant parameter updates. This allows the system to adapt to new environments in real-time, effectively bridging the gap between static models and dynamic, evolving tasks.

Performance and Results

The researchers evaluated WorldEvolver on benchmarks including ALFWorld and ScienceWorld. The results demonstrate that this memory-centric approach significantly improves both the accuracy of the world model’s predictions and the overall success rate of the agent in completing tasks. Across multiple model backbones, WorldEvolver consistently outperformed existing baselines, confirming that test-time memory revision is a highly effective strategy for enhancing predictive fidelity and planning performance in LLM agents.

Self-Evolving World Models for LLM Agent Planning | AI Research

Key Takeaways

How WorldEvolver Works

Keeping the Agent Frozen

Performance and Results

Comments (0)

No comments yet