Epidemic modelling with compartmental models using R

[After reading through this module you should have an intuitive understanding of how infectious disease spreads in the population, and how that process can be described using a compartmental model with flow between the compartments.  You should be able to write down the differential equations of a simple disease model, and you will learn in this module how to numerically solve those differential equations in R to obtain the model estimate of the epidemic curve]

An excellent reference book with background material related to these lectures is Mathematical Epidemiology by Brauer et al. 



Models of disease spread can yield insights into the mechanisms and dynamics most important to the spread of disease (especially when the models are compared to epidemic data).  With this improved understanding, more effective disease intervention strategies can potentially be developed. Sometimes disease models are also used to forecast the course of an epidemic, and doing exactly that for the 2009 pandemic was my introduction to the field of computational epidemiology.

There are lots of different ways to model epidemics, and there are several modules on this site on the topic, but let’s begin with one of the simplest epidemic models for an infectious disease like influenza: the Susceptible, Infected, Recovered (SIR) model.

Continue reading

Compartmental modelling without calculus

In another module on this site I describe how an epidemic for certain kinds of infectious diseases (like influenza) can be modelled with a simple Susceptible, Infectious, Recovered (SIR) model. Readers who have not yet been exposed to calculus (such as junior or senior high school students) may have been daunted by the system of differential equations shown in that post.  However, with only a small amount of programming experience in R, students without calculus can still easily model epidemics, or any other system that can be described with a compartmental model.  In this post I will show how that is done.
Continue reading

SIR infectious disease model with age classes

[After reading through this module, students should have an understanding of contact dynamics in a population with age structure (eg; kids and adults). You should understand how population age structure can affect the spread of infectious disease. You should be able to write down the differential equations of a simple SIR disease model with age structure, and you will learn in this module how to solve those differential equations in R to obtain the model estimate of the epidemic curve]



In a previous module I discussed epidemic modelling with a simple Susceptible, Infected, Recovered (SIR) compartmental model.  The model presented had only a single age class (ie; it was homogenous with respect to age).  But in reality, when we consider disease transmission, age likely does matter because kids usually make more contacts during the day than adults. The differences in contact patterns between age groups can have quite a profound impact on the model estimate of the epidemic curve, and also have implications for development of optimal disease intervention strategies (like age-targeted vaccination, social distancing, or closing schools).
Continue reading

SIR modelling of influenza with a periodic transmission rate

[After going through this module, students will be familiar with time-dependent transmission rates in a compartmental SIR model, will have explored some of  the complex dynamics that can be created when the transmission is not constant, and will understand applications to the modelling of influenza pandemics.]




Influenza is a seasonal disease in temperate climates, usually peaking in the winter.  This implies that the transmission of influenza is greater in the winter (whether this is due to increased crowding and higher contact rates in winter, and/or due to higher transmissibility of the virus due to favorable environmental conditions in the winter is still being discussed in the literature).  What is very interesting about influenza is that sometimes summer epidemic waves can be seen with pandemic strains (followed by a larger autumn wave).  An SIR model with a constant transmission rate simply cannot replicate the annual dual wave nature of an influenza pandemic.

Continue reading

ASU AML 610: probability distributions important to modelling in the life and social sciences

[After reading this module, students should be familiar with probability distributions most important to modelling in the life and social sciences; Uniform, Normal, Poisson, Exponential, Gamma, Negative Binomial, and Binomial.]

Probability distributions in general
Probability density functions
Mean, variance, and moments of probability density functions
Mean, variance, and moments of a sample of random numbers
Uncertainty on sample mean and variance, and hypothesis testing
The Poisson distribution
The Exponential distribution
The memory-less property of the Exponential distribution
The relationship between the Exponential and Poisson distributions
The Gamma and Erlang distributions
The Negative Binomial distribution
The Binomial distribution


There are various probability distributions that are important to be familiar with if one wants to model the spread of disease or biological populations (especially with stochastic models).  In addition, a good understanding of these various probability distributions is needed if one wants to fit model parameters to data, because the data always have underlying stochasticity, and that stochasticity feeds into uncertainties in the model parameters.  It is important to understand what kind of probability distributions typically underlie the stochasticity in epidemic or biological data.
Continue reading

Fitting the parameters of an SIR model to influenza data using Least Squares and the Monte Carlo parameter sweep method

[After reading this module, students should understand the Least Squares goodness-of-fit statistic.   Students will be able to read an influenza data set from a comma delimited file into R, and understand the basic steps involved in a Monte Carlo parameter sweep method to fit an SIR model to the data to estimate the R0 of the influenza strain by minimizing the Least Squares statistic.  Students will be aware that parameter estimates have uncertainties associated with them due to stochasticity (randomness) in the data.]

A really good reference for statistical data analysis (including fitting) is Statistical Data Analysis, by G.Cowan.



When a new virus starts circulating in the population, one of the first questions that epidemiologists and public health officials want answered is the value of the reproduction number of the spread of the disease in the population (see, for instance, here and here).

The length of the infectious period can roughly be estimated from observational studies of infected people, but the reproduction number can only be estimated by examination of the spread of the disease in the population.  When early data in an epidemic is being used to estimate the reproduction number, I usually refer to this as “real-time” parameter estimation (ie; the epidemic is still ongoing at the time of estimation).

Continue reading

ASU AML 610 Module VIII: Fitting to initial exponential rise of epidemic curves

In this module students will compare the performance of several fitting methods (Least squares, Pearson chi-squared, and likelihood fitting methods) in estimating the rate of exponential rise in initial epidemic incidence data.  Students will learn about the properties of good estimators (bias and efficiency).

A good reference source for this material is Statistical Data Analysis, by G.Cowan

Another good reference source (in a very condensed format) for statistical data analysis methods can be found here.

Properties of good estimators
Generating simulated exponential rise data
Estimation of the rate of exponential rise: Least Squares
Estimation of the rate of exponential rise: Pearson chi-squared
The Poisson maximum likelihood method
Estimation of parameter confidence intervals: any maximum likelihood method
Estimation of the rate of exponential rise: Poisson maximum likelihood method
Testing for over- or under-dispersion.
Correcting for over- or under-dispersion
Better method for determination of parameter estimates and their covariance when using the Pearson chi-squared method

Continue reading

Numerical methods to solve ordinary differential equations

After going through this module, students will be familiar with the Euler and Runge-Kutta methods for numerical solution of systems of ordinary differential equations.  Examples are provided to show students how complementary R scripts can be written to help debug Runge-Kutta methods implemented in C++.


Continue reading

AML610 module XI: practical problems when connecting deterministic models to data

Some (potentially) useful utilities for random number generation and manipulating vectors in C++

I’ve written some C++ code mainly related to vectors; calculating the weighted mean, running sum, extracting every nth element, etc).   There are also utilities related to random number generation from various probability distributions, and methods to calculate the CDF of various probability distributions.

The file UsefulUtils.h and UsefulUtils.cpp contain source code of a class that contains these utilities that can be useful when performing compartmental modelling in C++. These utilities will be used extensively in the examples that will be presented in this, and later, modules.  The file example_useful_utils.cpp gives examples of the use of the class.  It can be compiled with the makefile makefile_use with the command

make -f makefile_use example_useful_utils

Homework #4, due April 3rd, 2013 at 6pm. The data for the homework can be found here.

Continue reading

AML 610 Module XIII: Canadian hare lynx data

Canadian Hare Lynx Data

The file hare_lynx.txt contains data on the number of arctic hare and lynx pelts collected by the Hudson’s Bay company in Canada over the course of many years (data obtained from this website).  Do you think the Lotka-Volterra model is an appropriate model to fit to these data?

The R script hare_lynx_plot.R plots the Hare Lynx data:


Continue reading

The difference between mathematical and statistical modelling (plus some more basics of R)

[In this module, we will discuss the difference between mathematical and statistical modelling, using pandemic influenza as an example.  Example R code that solves the differential equations of a compartmental SIR model with seasonal transmission (ie; a mathematical model) is presented.  Also provided are an example of how to download add-on library packages in R, plus more examples of reading data sets into R, and aggregating the data sets by some quantity (in this case, a time series of influenza data in Geneva in 1918, aggregated by weekday).

Delving into how to write the R code to solve systems of ODE’s related to a compartmental mathematical model is perhaps slightly off the topic of a statistical modelling course, but worthwhile to examine; as mathematical and computational modellers, usually your aim in performing statistical analyses will be to uncover potential relationships that can be included in a mathematical model to make that model better describe the underlying dynamics of the system]

Optimizing model parameters to data (aka inverse problems)

[This presentation discusses methods commonly used to optimize the parameters of a mathematical model for population or disease dynamics to pertinent data.  Parameter optimization of such models is complicated by the fact that usually they have no analytic solution, but instead must be solved numerically. Choice of an appropriate "goodness of fit" statistic will be discussed, as will the benefits and drawbacks of various fitting methods such as gradient descent, Markov Chain Monte Carlo, and Latin Hypercube and random sampling.  An example of the application of the some of the methods using simulated data from a simple model for the incidence of a disease in a human population will be presented]

Continue reading