Back to AI Research

AI Research

Agentifying Patient Dynamics within LLMs through In... | AI Research

Key Takeaways

  • Agentifying Patient Dynamics within LLMs through Interacting with Clinical World Model Sepsis management in the ICU is a high-stakes challenge that requires...
  • Sepsis management in the ICU requires sequential treatment decisions under rapidly evolving patient physiology.
  • Although large language models (LLMs) encode broad clinical knowledge and can reason over guidelines, they are not inherently grounded in action-conditioned patient dynamics.
  • We introduce SepsisAgent, a world model-augmented LLM agent for sepsis treatment recommendation.
  • SepsisAgent uses a learned Clinical World Model to simulate patient responses under candidate fluid--vasopressor interventions, and follows a propose--simulate--refine workflow before committing to a prescription.
Paper AbstractExpand

Sepsis management in the ICU requires sequential treatment decisions under rapidly evolving patient physiology. Although large language models (LLMs) encode broad clinical knowledge and can reason over guidelines, they are not inherently grounded in action-conditioned patient dynamics. We introduce SepsisAgent, a world model-augmented LLM agent for sepsis treatment recommendation. SepsisAgent uses a learned Clinical World Model to simulate patient responses under candidate fluid--vasopressor interventions, and follows a propose--simulate--refine workflow before committing to a prescription. We first show that world-model access alone yields inconsistent LLM decision performance, motivating agent-specific training. We then train SepsisAgent through a three-stage curriculum: patient-dynamics supervised fine-tuning, propose--simulate--refine behavior cloning, and world-model-based agentic reinforcement learning. On MIMIC-IV sepsis trajectories, SepsisAgent outperforms all traditional RL and LLM-based baselines in off-policy value while achieving the best safety profile under guideline adherence and unsafe-action metrics. Further analysis shows that repeated interaction with the Clinical World Model enables the agent to learn regularities in patient evolution, which remain useful even when simulator access is removed.

Agentifying Patient Dynamics within LLMs through Interacting with Clinical World Model
Sepsis management in the ICU is a high-stakes challenge that requires doctors to make sequential decisions about fluid and vasopressor treatments while patient physiology changes rapidly. While Large Language Models (LLMs) are excellent at processing medical knowledge and guidelines, they often struggle to predict how a specific patient will respond to a particular treatment over time. This paper introduces SepsisAgent, a system that pairs an LLM with a "Clinical World Model" to simulate patient responses, allowing the agent to test different treatment options before committing to a final recommendation.

The Propose–Simulate–Refine Workflow

The core innovation of SepsisAgent is its ability to move beyond static decision-making. Instead of simply choosing an action based on a patient's current state, the agent follows a three-step process: it proposes several potential treatment actions, queries the Clinical World Model to see how the patient might respond to each, and then refines its final choice based on those simulated outcomes. This allows the agent to compare counterfactual scenarios—essentially asking, "What would happen to this patient if I chose this dose versus that one?"—before making a clinical decision.

A Three-Stage Training Curriculum

The researchers discovered that simply giving an LLM access to a simulator is not enough, as models can easily misinterpret or over-trust the simulated data. To solve this, they developed a three-stage training curriculum:

  1. Supervised Fine-Tuning: The model is trained to predict patient outcomes, such as in-hospital mortality and the need for vasopressors, while learning to reason according to established medical guidelines. 2. Behavior Cloning: The agent learns how to interact with the Clinical World Model by imitating structured, multi-round reasoning traces that demonstrate how to use simulated feedback effectively. 3. Agentic Reinforcement Learning: The agent is further optimized through repeated interactions with the world model. By treating the simulator as a virtual environment, the agent learns to prioritize long-term patient stability over short-term, "greedy" physiological improvements.

Performance and Safety Results

When tested on MIMIC-IV sepsis trajectories, SepsisAgent outperformed traditional reinforcement learning and standard LLM-based approaches. It achieved higher off-policy values and demonstrated a superior safety profile, showing better adherence to clinical guidelines and a lower frequency of unsafe actions. Notably, the researchers found that the agent internalized the "regularities" of patient evolution during its training. This means that even when the simulator was removed, the agent remained better at predicting patient mortality and vasopressor requirements than it was before the training process, suggesting it had truly learned to understand the underlying dynamics of sepsis.

Comments (0)

No comments yet

Be the first to share your thoughts!