Prospective multi-pathogen disease forecasting using autonomous LLM-guided tree search
Public health officials rely on accurate disease forecasting to manage outbreaks, but creating these models is traditionally a slow, manual process performed by expert teams. This bottleneck makes it difficult to scale forecasting to new regions or handle emerging pathogens quickly. This paper introduces an autonomous system that uses Large Language Models (LLMs) to automatically generate, test, and refine the software code needed for disease forecasting, effectively removing the need for constant manual intervention.
How the system works
The framework utilizes an LLM-guided tree search to explore different modeling possibilities. Instead of relying on human experts to write every line of code, the system iteratively builds and evaluates potential forecasting models. To ensure the generated code remains scientifically sound, the researchers implemented an "automated judge-in-the-loop." This mechanism acts as a quality control layer, ensuring that the machine-generated models maintain structural fidelity to established epidemiological theories.
Performance in real-world testing
The researchers tested the system during the 2025-2026 US respiratory season, tasking it with forecasting influenza, COVID-19, and respiratory syncytial virus (RSV). The system successfully created a diverse set of models for each pathogen. When these machine-generated models were combined into an ensemble, they consistently matched or outperformed the gold-standard forecasting models currently curated by human teams at the Centers for Disease Control and Prevention (CDC). Notably, the system also performed well in "cold start" scenarios—situations where data is scarce—such as the early stages of an RSV outbreak.
Ensuring model reliability
A significant challenge in automated model generation is "reward hacking," where a system optimizes for a metric in a way that produces technically high scores but poor real-world results. The authors found that by optimizing for log-scale distance metrics, they could prevent this behavior. By translating epidemiological theory into transparent, executable code, the system provides a scalable way to deploy expert-level forecasting, potentially transforming how public health agencies respond to infectious diseases.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!