AI Research

Prospective multi-pathogen disease forecasting usin... | AI Research

Key Takeaways

Prospective multi-pathogen disease forecasting using autonomous LLM-guided tree search Public health officials rely on accurate disease forecasting to manage...
Probabilistic forecasting of infectious diseases is crucial for public health but relies on labor-intensive manual model curation by expert modeling teams.
This bespoke development bottlenecks scalability to granular geographic resolutions or emerging pathogens.
Here, we present an autonomous system using Large Language Model (LLM)-guided tree search to iteratively generate, evaluate, and optimize executable forecasting software.
In a fully prospective, real-time evaluation during the 2025-2026 US respiratory season, the system autonomously discovered methodologically diverse models for influenza, COVID-19, and respiratory syncytial virus (RSV).

Paper AbstractExpand

Probabilistic forecasting of infectious diseases is crucial for public health but relies on labor-intensive manual model curation by expert modeling teams. This bespoke development bottlenecks scalability to granular geographic resolutions or emerging pathogens. Here, we present an autonomous system using Large Language Model (LLM)-guided tree search to iteratively generate, evaluate, and optimize executable forecasting software. In a fully prospective, real-time evaluation during the 2025-2026 US respiratory season, the system autonomously discovered methodologically diverse models for influenza, COVID-19, and respiratory syncytial virus (RSV). Aggregating these machine-generated models yielded an ensemble that consistently matched or outperformed the gold-standard, human-curated Centers for Disease Control and Prevention (CDC) hub ensembles out-of-sample. The system successfully navigated data-scarce "cold start" scenarios for RSV. Moreover, controlled retrospective ablations revealed that optimizing log-scale distance metrics prevents reward hacking, while an automated judge-in-the-loop ensures structural fidelity to complex scientific theories. By autonomously translating epidemiological theory into accurate, transparent code, this framework overcomes the modeling labor bottleneck, enabling rapid deployment of expert-level disease forecasting at unprecedented scales.

Prospective multi-pathogen disease forecasting using autonomous LLM-guided tree search

Public health officials rely on accurate disease forecasting to manage outbreaks, but creating these models is traditionally a slow, manual process performed by expert teams. This bottleneck makes it difficult to scale forecasting to new regions or handle emerging pathogens quickly. This paper introduces an autonomous system that uses Large Language Models (LLMs) to automatically generate, test, and refine the software code needed for disease forecasting, effectively removing the need for constant manual intervention.

How the system works

The framework utilizes an LLM-guided tree search to explore different modeling possibilities. Instead of relying on human experts to write every line of code, the system iteratively builds and evaluates potential forecasting models. To ensure the generated code remains scientifically sound, the researchers implemented an "automated judge-in-the-loop." This mechanism acts as a quality control layer, ensuring that the machine-generated models maintain structural fidelity to established epidemiological theories.

Performance in real-world testing

The researchers tested the system during the 2025-2026 US respiratory season, tasking it with forecasting influenza, COVID-19, and respiratory syncytial virus (RSV). The system successfully created a diverse set of models for each pathogen. When these machine-generated models were combined into an ensemble, they consistently matched or outperformed the gold-standard forecasting models currently curated by human teams at the Centers for Disease Control and Prevention (CDC). Notably, the system also performed well in "cold start" scenarios—situations where data is scarce—such as the early stages of an RSV outbreak.

Ensuring model reliability

A significant challenge in automated model generation is "reward hacking," where a system optimizes for a metric in a way that produces technically high scores but poor real-world results. The authors found that by optimizing for log-scale distance metrics, they could prevent this behavior. By translating epidemiological theory into transparent, executable code, the system provides a scalable way to deploy expert-level forecasting, potentially transforming how public health agencies respond to infectious diseases.

Comments (0)

No comments yet

Be the first to share your thoughts!