N. Shen and L. Bourouiba

Under review

The canonical epidemic susceptible-infected-recovered (SIR) model system is ubiquitous to assessing severity and guide interventions. It is typically applied to hierarchically aggregated data from spatially distinct subregions. The associated introduced heterogeneity can lead to significant error in estimated onset times and epidemic severity.  Here, we develop three analytical methods to extract SIR parameters from epidemic data, focusing on the reproduction number $R_0$ that quantifies epidemic severity. The estimation methods are applied to synthetically aggregated incidence data formulated by summing two independent SIR solutions of distinct reproduction numbers and separated by a relative delay. We assess the resulting error when applying the canonical SIR model and find that $R_0$ estimates from the aggregated data can under- or overestimate the reproduction numbers of the constituent epidemic waves. These biases can occur even when the model prediction appears to agree well with the incidence data. We provide synthetic examples when application of a single SIR model is insufficient to describe the aggregated epidemic dynamics and how to update the estimates for improved prediction ahead of epidemic peak. We provide sensitivity analyses of the method with respect to noise perturbation of the data.  Finally, we illustrate our approach using historical influenza data.