Agentifying Patient Dynamics within LLMs through Interacting with Clinical World Model
Sepsis management in the ICU is a high-stakes challenge that requires doctors to make sequential decisions about fluid and vasopressor treatments while patient physiology changes rapidly. While Large Language Models (LLMs) are excellent at processing medical knowledge and guidelines, they often struggle to predict how a specific patient will respond to a particular treatment over time. This paper introduces SepsisAgent, a system that pairs an LLM with a "Clinical World Model" to simulate patient responses, allowing the agent to test different treatment options before committing to a final recommendation.
The Propose–Simulate–Refine Workflow
The core innovation of SepsisAgent is its ability to move beyond static decision-making. Instead of simply choosing an action based on a patient's current state, the agent follows a three-step process: it proposes several potential treatment actions, queries the Clinical World Model to see how the patient might respond to each, and then refines its final choice based on those simulated outcomes. This allows the agent to compare counterfactual scenarios—essentially asking, "What would happen to this patient if I chose this dose versus that one?"—before making a clinical decision.
A Three-Stage Training Curriculum
The researchers discovered that simply giving an LLM access to a simulator is not enough, as models can easily misinterpret or over-trust the simulated data. To solve this, they developed a three-stage training curriculum:
- Supervised Fine-Tuning: The model is trained to predict patient outcomes, such as in-hospital mortality and the need for vasopressors, while learning to reason according to established medical guidelines. 2. Behavior Cloning: The agent learns how to interact with the Clinical World Model by imitating structured, multi-round reasoning traces that demonstrate how to use simulated feedback effectively. 3. Agentic Reinforcement Learning: The agent is further optimized through repeated interactions with the world model. By treating the simulator as a virtual environment, the agent learns to prioritize long-term patient stability over short-term, "greedy" physiological improvements.
Performance and Safety Results
When tested on MIMIC-IV sepsis trajectories, SepsisAgent outperformed traditional reinforcement learning and standard LLM-based approaches. It achieved higher off-policy values and demonstrated a superior safety profile, showing better adherence to clinical guidelines and a lower frequency of unsafe actions. Notably, the researchers found that the agent internalized the "regularities" of patient evolution during its training. This means that even when the simulator was removed, the agent remained better at predicting patient mortality and vasopressor requirements than it was before the training process, suggesting it had truly learned to understand the underlying dynamics of sepsis.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!