**[After reading this module, students should understand the Least Squares goodness-of-fit statistic. Students will be able to read an influenza data set from a comma delimited file into R, and understand the basic steps involved in a Monte Carlo parameter sweep method to fit an SIR model to the data to estimate the R0 of the influenza strain by minimizing the Least Squares statistic. Students will be aware that parameter estimates have uncertainties associated with them due to stochasticity (randomness) in the data.]**

**A really good reference for statistical data analysis (including fitting) is** Statistical Data Analysis, by G.Cowan.

Contents:

- Introduction
- Least squares goodness-of-fit statistic
- Finding the model parameters that minimize the Least Squares statistic: why we can’t just use linear regression methods for the models we usually use
- Monte Carlo parameter sweep method
- R code to fit to 2007-2008 confirmed influenza cases in Midwest
- Parameter estimates have uncertainties
- Potential pitfalls of using Least Squares

**Introduction**

When a new virus starts circulating in the population, one of the first questions that epidemiologists and public health officials want answered is the value of the reproduction number of the spread of the disease in the population (see, for instance, here and here).

The length of the infectious period can roughly be estimated from observational studies of infected people, but the reproduction number can only be estimated by examination of the spread of the disease in the population. When early data in an epidemic is being used to estimate the reproduction number, I usually refer to this as “real-time” parameter estimation (ie; the epidemic is still ongoing at the time of estimation).