SpatialEpiBench: Benchmarking Spatial Information and Epidemic Priors in Forecasting
Epidemic forecasting is essential for public health, yet it remains a difficult task due to data that is often sparse, noisy, and constantly changing. While researchers have increasingly turned to spatiotemporal models—which analyze how diseases spread across interacting regions—there has been no standardized way to evaluate these tools. This paper introduces SpatialEpiBench, a comprehensive benchmark designed to test how well different forecasting models perform in realistic public-health settings, specifically focusing on their ability to predict outbreaks and handle the complexities of real-world epidemiological data.
A Standardized Approach to Evaluation
Current methods for evaluating epidemic forecasts are often inconsistent, frequently relying on simple chronological splits that do not reflect how public health officials actually use data in real-time. SpatialEpiBench addresses this by providing a unified framework for 11 different epidemic datasets, including COVID-19 and influenza. The benchmark implements a "rolling-origin" evaluation protocol, which mimics the way models are updated as new information becomes available. By using standardized metrics and focusing on outbreak periods, the benchmark allows researchers to determine whether specific spatial information or epidemic-informed adjustments actually improve forecasting accuracy.
Testing Epidemic Priors
A key feature of this research is the evaluation of "epidemic priors"—modular, model-agnostic patches that can be added to existing deep learning architectures. These patches are designed to help general-purpose models better understand infectious disease dynamics without requiring a complete redesign of the model. The study tests four specific types of priors: calendar-based temporal adjustments, filtered loss functions to handle noisy data, auxiliary regularization based on SIR (Susceptible-Infectious-Removed) models, and neural network-based transfer learning. These tests aim to see if adding domain-specific structure can help models better capture the mechanics of an outbreak.
Surprising Performance Gaps
The researchers found that, despite the complexity of modern spatiotemporal models, most of them struggle to outperform a simple "last-value" baseline—a method that essentially predicts that the future will look exactly like the most recent observation. This trend persists from one day to one month ahead, even during active outbreaks where a more sophisticated model should theoretically provide a significant advantage. The study identifies three primary reasons for this performance gap: models often fail to anticipate the start of an outbreak, they struggle to process sparse or noisy data, and the geographic adjacency maps commonly used to represent spatial relationships may not provide enough meaningful information about how diseases actually spread between populations.
Implications for Future Development
The findings suggest that simply applying general spatiotemporal models to epidemic data is not enough to achieve reliable results. The limited utility of standard geographic adjacency indicates that future research may need to look beyond simple maps and incorporate more nuanced data, such as mobility or contact networks, to truly capture the spatial dynamics of an epidemic. By releasing the benchmark data, code, and protocols, the authors aim to provide a foundation for developing models that are not just theoretically interesting, but operationally useful for public health decision-making.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!