ASU AML610 Spring 2013/Fall 2014 Syllabus

Posted on August 22, 2014 by Sherry Towers

Objectives:

This course is meant to provide students in applied mathematics with the broad skill-set needed to optimize model parameters to relevant biological or epidemic data. The course will almost entirely be based on material posted on this website. Continue reading →

AML 610 Module XIII: Canadian hare lynx data

Posted on April 1, 2013 by admin

Canadian Hare Lynx Data

The file hare_lynx.txt contains data on the number of arctic hare and lynx pelts collected by the Hudson’s Bay company in Canada over the course of many years (data obtained from this website). Do you think the Lotka-Volterra model is an appropriate model to fit to these data?

The R script hare_lynx_plot.R plots the Hare Lynx data:

Continue reading →

AML 610 Module XII: submitting jobs in batch to the ASU Saguaro distributed-computing system

Posted on March 18, 2013 by admin

The ASU Advanced Computing Center (A2C2) maintains the Saguaro distributed computing system, that currently has over 5,000 processor cores.

ASU students in the spring semester of AML610 should have already applied for and received an account on the Saguaro system (per the instructions of last month’s email describing how to apply for an account).

Saguaro allows you to simultaneously run multiple jobs in batch, directing standard output to a log file. For this course, we will be using Saguaro to solve a system of ODE’s under a hypothesis for the parameters and initial conditions values (either chosen in a parameter sweep, or randomly chosen within some range); the output of the ODE’s will then be compared to a data set, and a best-fit statistic (like Least Squares, Pearson chi-squared, or Maximum likelihood) computed. The parameter values and best-fit statistics are then printed to standard output.

Access to cloud computing resources, and knowledge of how to utilize those resources, has many different potential applications in modelling. Learning how to use Saguaro as a tool in solving problems related to this course can thus potentially open up many further avenues of future research to you.

Homework #5, due Thus April 18th, 2013 at 6pm. Data for the homework can be found here.

Continue reading →

AML610 module XI: practical problems when connecting deterministic models to data

Posted on March 18, 2013 by admin

Some (potentially) useful utilities for random number generation and manipulating vectors in C++

I’ve written some C++ code mainly related to vectors; calculating the weighted mean, running sum, extracting every nth element, etc). There are also utilities related to random number generation from various probability distributions, and methods to calculate the CDF of various probability distributions.

The file UsefulUtils.h and UsefulUtils.cpp contain source code of a class that contains these utilities that can be useful when performing compartmental modelling in C++. These utilities will be used extensively in the examples that will be presented in this, and later, modules. The file example_useful_utils.cpp gives examples of the use of the class. It can be compiled with the makefile makefile_use with the command

make -f makefile_use example_useful_utils

Homework #4, due April 3rd, 2013 at 6pm. The data for the homework can be found here.

Continue reading →

Numerical methods to solve ordinary differential equations

Posted on February 20, 2013 by admin

After going through this module, students will be familiar with the Euler and Runge-Kutta methods for numerical solution of systems of ordinary differential equations. Examples are provided to show students how complementary R scripts can be written to help debug Runge-Kutta methods implemented in C++.

Contents

Euler’s method (finite difference method)
Using the Euler method to solve dN/dt = rho*N
Implementing the Euler method in C++
Comparison of the output of the C++ and R programs implementing the Euler method to solve dN/dt=rho*N
Dynamic time step calculation in the Euler method
Runge-Kutta method
Example of implementation of Runge Kutta in C++: Lotka-Volterra predator prey model

Continue reading →

ASU AML 610 Module IX: Introduction to C++ for computational epidemiologists

Posted on February 11, 2013 by admin

After going through this module, students should be familiar with basic skills in C++ programming, including the structure of a basic program, variable types, scope, functions (and function overloading), control structures, and the standard template library.

Hello world
Variable types
Strings
Boolean logic
Scope of variables
Introduction to functions
Arguments to main
Control structures (if/then/else, while, for loops)
Functions, pass by value
Functions, pass by reference
Overloading functions
The standard template library: vectors
Vectors in multi-dimensions
Parsing data from comma delimited files
Makefiles
Introduction to classes and objects

So far in this course we have used R to explore methods related to fitting model parameters to data (in particular, we explored the Simplex method for parameter estimation). As we’ve shown, parameter estimation can be a very computationally intensive process.

When you use R, it gives you a prompt, and waits for you to input commands, either directly through the command line, or through an R script that you source. Because R is a non-compiled language, and instead interprets code step-by-step, it does not have the ability to optimize calculations by pre-processing the code.

In contrast, compiled programming languages like C, java, or C++ (to name just a few) use a compiler to process the code, and optimize the computational algorithms. In fact, most compilers have optional arguments related to the level of optimization you desire (with the downside that the optimization process can be computationally intensive). Optimized code runs faster than non-optimized code.

Continue reading →

ASU AML 610 Module VIII: Fitting to initial exponential rise of epidemic curves

Posted on February 7, 2013 by admin

In this module students will compare the performance of several fitting methods (Least squares, Pearson chi-squared, and likelihood fitting methods) in estimating the rate of exponential rise in initial epidemic incidence data. Students will learn about the properties of good estimators (bias and efficiency).

A good reference source for this material is Statistical Data Analysis, by G.Cowan

Another good reference source (in a very condensed format) for statistical data analysis methods can be found here.

Contents:
Introduction
Properties of good estimators
Generating simulated exponential rise data
Estimation of the rate of exponential rise: Least Squares
Estimation of the rate of exponential rise: Pearson chi-squared
The Poisson maximum likelihood method
Estimation of parameter confidence intervals: any maximum likelihood method
Estimation of the rate of exponential rise: Poisson maximum likelihood method
Testing for over- or under-dispersion.
Correcting for over- or under-dispersion
Better method for determination of parameter estimates and their covariance when using the Pearson chi-squared method

Continue reading →

Fitting the parameters of an SIR model to influenza data using Least Squares and the graphical Monte Carlo method

Posted on January 29, 2013 by admin

[After reading this module, students should understand the Least Squares goodness-of-fit statistic. Students will be able to read an influenza data set from a comma delimited file into R, and understand the basic steps involved in the graphical Monte Carlo method to fit an SIR model to the data to estimate the R0 of the influenza strain by minimizing the Least Squares statistic. Students will be aware that parameter estimates have uncertainties associated with them due to stochasticity (randomness) in the data.]

A really good reference for statistical data analysis (including fitting) is Statistical Data Analysis, by G.Cowan.

Contents:

Introduction
Least squares goodness-of-fit statistic
Finding the model parameters that minimize the Least Squares statistic: why we can’t just use linear regression methods for the models we usually use
Monte Carlo parameter sweep method
R code to fit to 2007-2008 confirmed influenza cases in Midwest
Parameter estimates have uncertainties
Potential pitfalls of using Least Squares

Introduction

When a new virus starts circulating in the population, one of the first questions that epidemiologists and public health officials want answered is the value of the reproduction number of the spread of the disease in the population (see, for instance, here and here).

The length of the infectious period can roughly be estimated from observational studies of infected people, but the reproduction number can only be estimated by examination of the spread of the disease in the population. When early data in an epidemic is being used to estimate the reproduction number, I usually refer to this as “real-time” parameter estimation (ie; the epidemic is still ongoing at the time of estimation).

Continue reading →

ASU AML 610: probability distributions important to modelling in the life and social sciences

Posted on January 28, 2013 by admin

[After reading this module, students should be familiar with probability distributions most important to modelling in the life and social sciences; Uniform, Normal, Poisson, Exponential, Gamma, Negative Binomial, and Binomial.]

Contents:
Introduction
Probability distributions in general
Probability density functions
Mean, variance, and moments of probability density functions
Mean, variance, and moments of a sample of random numbers
Uncertainty on sample mean and variance, and hypothesis testing
The Poisson distribution
The Exponential distribution
The memory-less property of the Exponential distribution
The relationship between the Exponential and Poisson distributions
The Gamma and Erlang distributions
The Negative Binomial distribution
The Binomial distribution

Introduction

There are various probability distributions that are important to be familiar with if one wants to model the spread of disease or biological populations (especially with stochastic models). In addition, a good understanding of these various probability distributions is needed if one wants to fit model parameters to data, because the data always have underlying stochasticity, and that stochasticity feeds into uncertainties in the model parameters. It is important to understand what kind of probability distributions typically underlie the stochasticity in epidemic or biological data.
Continue reading →

Basic Unix

Posted on January 7, 2013 by admin

In the Arizona State University AML610 course “Computational and Statistical Methods in Applied Mathematics”, we will be ultimately be using super computing resources at ASU and the NSF XSEDE initiative to fit the parameters of a biological model to data. To do this, it is necessary to know basic Unix commands to copy, rename, and delete files and directories, and how to list directories and locate files. We will also be compiling all our C++ programs from the Unix shell, and in the command line directing the output of our programs to files.
Continue reading →

SIR modelling of influenza with a periodic transmission rate

Posted on December 12, 2012 by admin

[After going through this module, students will be familiar with time-dependent transmission rates in a compartmental SIR model, will have explored some of the complex dynamics that can be created when the transmission is not constant, and will understand applications to the modelling of influenza pandemics.]

Contents:

Introduction
Periodic transmission rate
The time-of-introduction of the virus is a parameter of the model
Some conclusions we can draw from the model
R code for SIR model simulation with a harmonic transmission rate
Things to try
More things to ponder

Introduction

Influenza is a seasonal disease in temperate climates, usually peaking in the winter. This implies that the transmission of influenza is greater in the winter (whether this is due to increased crowding and higher contact rates in winter, and/or due to higher transmissibility of the virus due to favorable environmental conditions in the winter is still being discussed in the literature). What is very interesting about influenza is that sometimes summer epidemic waves can be seen with pandemic strains (followed by a larger autumn wave). An SIR model with a constant transmission rate simply cannot replicate the annual dual wave nature of an influenza pandemic.

Continue reading →

SIR infectious disease model with age classes

Posted on December 11, 2012 by admin

[After reading through this module, students should have an understanding of contact dynamics in a population with age structure (eg; kids and adults). You should understand how population age structure can affect the spread of infectious disease. You should be able to write down the differential equations of a simple SIR disease model with age structure, and you will learn in this module how to solve those differential equations in R to obtain the model estimate of the epidemic curve]

Contents:

Introduction
Population contact patterns
An example of a contact matrix: kids and adults
SIR model with age structure
The reproduction number of the age structured SIR model
R code for simulating an age structured SIR model
Other kinds of class structure
Things to try

Introduction

In a previous module I discussed epidemic modelling with a simple Susceptible, Infected, Recovered (SIR) compartmental model. The model presented had only a single age class (ie; it was homogenous with respect to age). But in reality, when we consider disease transmission, age likely does matter because kids usually make more contacts during the day than adults. The differences in contact patterns between age groups can have quite a profound impact on the model estimate of the epidemic curve, and also have implications for development of optimal disease intervention strategies (like age-targeted vaccination, social distancing, or closing schools).
Continue reading →

Epidemic modelling with compartmental models using R

Posted on December 11, 2012 by admin

[After reading through this module you should have an intuitive understanding of how infectious disease spreads in the population, and how that process can be described using a compartmental model with flow between the compartments. You should be able to write down the differential equations of a simple disease model, and you will learn in this module how to numerically solve those differential equations in R to obtain the model estimate of the epidemic curve]

An excellent reference book with background material related to these lectures is Mathematical Epidemiology by Brauer et al.

Contents:

Introduction
Basic dynamics of infectious disease spread
The SIR compartmental model of disease spread
The SIR model system of equations
Numerically solving the SIR model system of equations in R
R code to model an influenza pandemic with an SIR model
Further things you can explore
Summary

Introduction

Models of disease spread can yield insights into the mechanisms and dynamics most important to the spread of disease (especially when the models are compared to epidemic data). With this improved understanding, more effective disease intervention strategies can potentially be developed. Sometimes disease models are also used to forecast the course of an epidemic, and doing exactly that for the 2009 pandemic was my introduction to the field of computational epidemiology.

There are lots of different ways to model epidemics, and there are several modules on this site on the topic, but let’s begin with one of the simplest epidemic models for an infectious disease like influenza: the Susceptible, Infected, Recovered (SIR) model.

Continue reading →

The basics of the R statistical progamming language

Posted on December 11, 2012 by admin

[After you have read through this module, and have downloaded and worked through the provided R examples, you should be proficient enough in R to be able to download and run other R scripts that will be provided in other posts on this site. You should understand the basics of good programming practices (in any language, not just R). You will also have learned how to read data in a file into a table in R, and produce a plot.]

Contents:

Why use R for modelling?
How to download R
Some example R code with an overview of basic R commands
Advancing on: programming constructs
Good programming practices (in any language)
Reading data files into R

Why use R for modelling?

I have programmed in many different computing and scripting languages, but the ones I most commonly use on a day to day basis are C++, Fortran, Perl, and R (with some Python, Java, and Ruby on the side). In particular, I use R every day because it is not only a programming language, but also has graphics and a very large suite of statistical tools. Connecting models to data is a process that requires statistical tools, and R provides those tools, plus a lot more.

Unlike SAS, Stata, SPSS, and Matlab, R is free and open source (it is hard to beat a package that is more comprehensive than pretty much any other product out there and is free!).

Continue reading →

Finding sources of data for computational, mathematical, or statistical modeling studies: free online data

Posted on April 3, 2012 by admin

[In this module we discuss methods for finding free sources of online data. We present examples of climate, population, and socio-economic data from a variety of online sources. Other sources of potentially useful data are also discussed. The data sources described here are by no means an exhaustive list of free online data that might be useful to use in a computational, statistical, or mathematical modeling study.] Continue reading →

Finding sources of data: extracting data from the published literature

Posted on April 2, 2012 by admin

Connecting mathematical models to predicting reality usually involves comparing your model to data, and finding model parameters that make the model most closely match observations in data. And of course statistical models are wholly developed using sources of data.

Becoming adept at finding sources of data relevant to a model you are studying is a learned skill, but unfortunately one that isn’t taught in any textbook!

One thing to keep in mind is that any data that appears in a journal publication is fair game to use, even if it appears in graphical format only. If the data is in graphical format, there are free programs, such as DataThief, that can be used to extract the data into a numerical file.

Continue reading →

Literature searches with Google Scholar

Posted on April 1, 2012 by admin

Google Scholar is search engine that indexes the scholarly literature across an array of publishing formats and disciplines. It provides a very powerful means to find literature associated with pretty much any research topic you can think of.

Continue reading →

AML 610 Fall 2014: List of modules

Posted on March 29, 2012 by admin

The syllabus for this course can be found here.

The final write-ups for final group projects are due Monday, December 1st, 2014. On Dec 2nd and 3rd students will meet with Prof Towers to receive feedback on their project and writeup.

Each of the project groups will perform an in-class 20 min presentation on Monday, Dec 8th, 2014 and Wed, Dec 10th, 2014. By Dec 9th, all group members are to submit to Prof Towers a confidential email, detailing their contribution to the group project, and detailing the contributions of the other group members.

The list of modules for the Fall 2014 course in computational and statistical methods for mathematical biologists and epidemiologists:

Literature searches with Google Scholar
Extracting data from graphs in published literature
Online sources of free data
- Homework #1, Due Sep 2nd 2014 at noon.
Module I: The basics of the R statistical progamming language
- Homework #2, Due Sep 10th 2014 at noon.
- as part of the homework, read the modules “How to write a good scientific paper”, and “How to download an R script from the internet and run it”.
Module II: Epidemic modelling with compartmental models
- Homework #3, Due Sep 17th 2014 at noon.
Module III: SIR disease model with age classes
- Homework#4, Due Sep 24th 2014 at noon.
Module IV: SIR modelling of influenza with a periodic transmission rate
Module V: fitting the parameters of an SIR model to influenza data using Least Squares and the Monte Carlo parameter sweep method
Module VI: an overview of goodness of fit statistics, and methods to fit parameters of mathematical models to data
Module VII: estimating parameter confidence intervals when using the Monte Carlo parameter sweep optimization method
- Homework#5, Due Wed Oct 8th 2014 at noon.
Module VIII: Basic Unix
Module IX: introduction to C++ for computational epidemiologists
- Homework #6, Due Mon Oct 20th 2014 at noon.
Module X: a C++ class to solve ordinary differential equations
- Homework #7, Due Mon Nov 3rd 2014 at noon. (assignment emailed)
- Getting started using the NSF XSEDE distributed computing system
- An example of submitting a batch job to XSEDE Stampede
Module XI: another example C++ program, fitting SIR model parameters to CDC Midwest 2007-08 B influenza data
- Homework #8, Due Mon Nov 10th 2014 at noon.
- Submitting jobs to the ASU A2C2 ASURE batch computing system
- Correcting the Pearson chi-squared statistic for over-dispersed data
- Homework #9, Due Wed Nov 19th 2014 at noon.
Module XII: another example of the parameter sampling model optimization method: using Negative Binomial likelihood
- How to ssh and scp between Unix machines without passwords
- R and C++ code related to our class publication project, and running the jobs in batch on the ASU A2C2 ASURE supercomputing system
- Homework #10, Due Mon Dec 1st 2014 at noon.
Module XIII: practical problems when connecting deterministic models to data
Module XIV: submitting jobs in batch to the ASU Saguaro distributed-computing system

Polymatheia

Category Archives: ASU AML610 Spring 2013 Lecture Series

ASU AML610 Spring 2013/Fall 2014 Syllabus

AML 610 Module XIII: Canadian hare lynx data

AML 610 Module XII: submitting jobs in batch to the ASU Saguaro distributed-computing system

AML610 module XI: practical problems when connecting deterministic models to data

Numerical methods to solve ordinary differential equations

ASU AML 610 Module IX: Introduction to C++ for computational epidemiologists

ASU AML 610 Module VIII: Fitting to initial exponential rise of epidemic curves

Fitting the parameters of an SIR model to influenza data using Least Squares and the graphical Monte Carlo method

ASU AML 610: probability distributions important to modelling in the life and social sciences

Basic Unix

SIR modelling of influenza with a periodic transmission rate

SIR infectious disease model with age classes

Epidemic modelling with compartmental models using R

The basics of the R statistical progamming language

Finding sources of data for computational, mathematical, or statistical modeling studies: free online data

Finding sources of data: extracting data from the published literature

Literature searches with Google Scholar

AML 610 Fall 2014: List of modules