An Infectious Disease Spread Simulation Based on Large Language Model Decision Making
This paper introduces a new way to simulate how people make health decisions during an infectious disease outbreak. By replacing traditional, rigid rule-based systems with Large Language Models (LLMs), the researchers created a more realistic agent-based simulation. This framework allows individual digital "agents" to make complex, context-aware choices—such as whether to report symptoms—based on their specific demographic backgrounds and the social or informational environment they inhabit.
How the Simulation Works
The researchers built their framework on a "Patterns of Life" simulator that models daily human routines, such as going to work or visiting restaurants. They added an infectious disease model (SEIR) to track how illnesses spread through physical contact. To make the agents act like real people, the team used census data to assign each agent a specific demographic profile, including age, income, education, and race.
Instead of using simple math formulas to decide if an agent reports an illness, the team prompted LLMs with these demographic profiles and situational contexts. To keep the simulation fast and consistent, they pre-generated these decisions and stored them in a "decision bank." When an agent in the simulation becomes symptomatic, it retrieves a decision from this bank based on its unique profile.
Testing Different Scenarios
The study explored three specific scenarios to see how different factors influence public health outcomes:
Independent Reasoning: Agents make decisions based solely on their own demographic background.
Household Influence: Agents are more likely to report symptoms if they know a family member has already reported an illness.
Message Framing: Agents receive different types of public health messages—such as those focused on personal risk, altruism, or statistical data—to see which approach is most effective at encouraging reporting.
Key Findings
The simulation results show that an agent's socioeconomic status is the most significant factor in whether they report an illness. Specifically, income and education levels were the primary drivers of variation in reporting rates. While factors like geography, the specific LLM model used, and the way public health messages were framed also had an impact, their effects were smaller and more consistent. By capturing these social and geographic differences, the framework provides a way to study how systemic inequities might influence the accuracy of disease surveillance data.
Considerations for Future Research
The researchers note that while LLMs are not designed to perfectly replicate human health behavior, they are useful for exploring how different variables influence population-level dynamics. Because the agents are built using real-world census data, this framework helps researchers understand how demographic disparities can lead to "reporting bias," where certain groups are underrepresented in official health data. This tool is intended to support public health experts in designing better interventions and analyzing how different communication strategies might affect diverse communities.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!