Back to AI Research

AI Research

SpatialEpiBench: Benchmarking Spatial Information a... | AI Research

Key Takeaways

  • SpatialEpiBench: Benchmarking Spatial Information and Epidemic Priors in Forecasting Epidemic forecasting is essential for public health, yet it remains a di...
  • Accurate epidemic forecasting is crucial for public health response, resource allocation, and outbreak intervention, but remains difficult with sparse, noisy, and highly non-stationary data.
  • Because epidemics unfold across interacting regions, spatiotemporal methods are natural candidates for improving forecasts.
  • Despite growing interest in spatial information, no standardized benchmark exists, and current evaluations often use simple chronological train-test splits that do not reflect real-time forecasting practice.
  • We address this gap with SpatialEpiBench, a challenging benchmark for spatiotemporal epidemic forecasting in realistic public-health settings.
Paper AbstractExpand

Accurate epidemic forecasting is crucial for public health response, resource allocation, and outbreak intervention, but remains difficult with sparse, noisy, and highly non-stationary data. Because epidemics unfold across interacting regions, spatiotemporal methods are natural candidates for improving forecasts. Despite growing interest in spatial information, no standardized benchmark exists, and current evaluations often use simple chronological train-test splits that do not reflect real-time forecasting practice. We address this gap with SpatialEpiBench, a challenging benchmark for spatiotemporal epidemic forecasting in realistic public-health settings. SpatialEpiBench includes 11 epidemic datasets with standardized rolling evaluations and outbreak-specific metrics. We evaluate adjacency-informed forecasting models with widely used epidemic priors that adapt general models to epidemiology, but find that most methods underperform a simple last-value baseline from 1 day to 1 month ahead, even during outbreaks and with these priors. We identify three major failure modes: (1) poor outbreak anticipation, (2) difficulty handling sparsity and noise, and (3) limited utility of common geographic adjacency for epidemiological spatial information. We release benchmark data, code, and instructions at this https URL to support development of operationally useful epidemic forecasting models.

SpatialEpiBench: Benchmarking Spatial Information and Epidemic Priors in Forecasting
Epidemic forecasting is essential for public health, yet it remains a difficult task due to data that is often sparse, noisy, and constantly changing. While researchers have increasingly turned to spatiotemporal models—which analyze how diseases spread across interacting regions—there has been no standardized way to evaluate these tools. This paper introduces SpatialEpiBench, a comprehensive benchmark designed to test how well different forecasting models perform in realistic public-health settings, specifically focusing on their ability to predict outbreaks and handle the complexities of real-world epidemiological data.

A Standardized Approach to Evaluation

Current methods for evaluating epidemic forecasts are often inconsistent, frequently relying on simple chronological splits that do not reflect how public health officials actually use data in real-time. SpatialEpiBench addresses this by providing a unified framework for 11 different epidemic datasets, including COVID-19 and influenza. The benchmark implements a "rolling-origin" evaluation protocol, which mimics the way models are updated as new information becomes available. By using standardized metrics and focusing on outbreak periods, the benchmark allows researchers to determine whether specific spatial information or epidemic-informed adjustments actually improve forecasting accuracy.

Testing Epidemic Priors

A key feature of this research is the evaluation of "epidemic priors"—modular, model-agnostic patches that can be added to existing deep learning architectures. These patches are designed to help general-purpose models better understand infectious disease dynamics without requiring a complete redesign of the model. The study tests four specific types of priors: calendar-based temporal adjustments, filtered loss functions to handle noisy data, auxiliary regularization based on SIR (Susceptible-Infectious-Removed) models, and neural network-based transfer learning. These tests aim to see if adding domain-specific structure can help models better capture the mechanics of an outbreak.

Surprising Performance Gaps

The researchers found that, despite the complexity of modern spatiotemporal models, most of them struggle to outperform a simple "last-value" baseline—a method that essentially predicts that the future will look exactly like the most recent observation. This trend persists from one day to one month ahead, even during active outbreaks where a more sophisticated model should theoretically provide a significant advantage. The study identifies three primary reasons for this performance gap: models often fail to anticipate the start of an outbreak, they struggle to process sparse or noisy data, and the geographic adjacency maps commonly used to represent spatial relationships may not provide enough meaningful information about how diseases actually spread between populations.

Implications for Future Development

The findings suggest that simply applying general spatiotemporal models to epidemic data is not enough to achieve reliable results. The limited utility of standard geographic adjacency indicates that future research may need to look beyond simple maps and incorporate more nuanced data, such as mobility or contact networks, to truly capture the spatial dynamics of an epidemic. By releasing the benchmark data, code, and protocols, the authors aim to provide a foundation for developing models that are not just theoretically interesting, but operationally useful for public health decision-making.

Comments (0)

No comments yet

Be the first to share your thoughts!