[After reading through this module you should have an intuitive understanding of how infectious disease spreads in the population, and how that process can be described using a compartmental model with flow between the compartments. You should be able to write down the differential equations of a simple disease model, and you will learn in this module how to numerically solve those differential equations in R to obtain the model estimate of the epidemic curve]
Models of disease spread can yield insights into the mechanisms and dynamics most important to the spread of disease (especially when the models are compared to epidemic data). With this improved understanding, more effective disease intervention strategies can potentially be developed. Sometimes disease models are also used to forecast the course of an epidemic, and doing exactly that for the 2009 pandemic was my introduction to the field of computational epidemiology.
There are lots of different ways to model epidemics, and there are several modules on this site on the topic, but let’s begin with one of the simplest epidemic models for an infectious disease like influenza: the Susceptible, Infected, Recovered (SIR) model.
[After you have read through this module, and have downloaded and worked through the provided R examples, you should be proficient enough in R to be able to download and run other R scripts that will be provided in other posts on this site. You should understand the basics of good programming practices (in any language, not just R). You will also have learned how to read data in a file into a table in R, and produce a plot.]
I have programmed in many different computing and scripting languages, but the ones I most commonly use on a day to day basis are C++, Fortran, Perl, and R (with some Python, Java, and Ruby on the side). In particular, I use R every day because it is not only a programming language, but also has graphics and a very large suite of statistical tools. Connecting models to data is a process that requires statistical tools, and R provides those tools, plus a lot more.
Unlike SAS, Stata, SPSS, and Matlab, R is free and open source (it is hard to beat a package that is more comprehensive than pretty much any other product out there and is free!).
I am a statistician, mathematician, and physicist with a passion for research… I often start research projects in my spare time just because I am curious about something
I have a background in data mining, applied statistics, and high performance computing, and this toolbox of skills has happily been suited to many different endeavors. In recent years I have published papers and/or embarked on research in epidemiology, biology, physics, applied statistics, criminology, terrorism informatics, and archeoastronomy. Quite a few of my most recent papers have focused on the field of computational epidemiology; in particular computer modelling of influenza pandemics.
On this website I share information and computational tools related to my research endeavors, along with many links to helpful documents I have come across on the Internet. I will also be sharing a lot of material related to lectures and seminars I give in statistical and computational methods. Thus many of the posts on this site will have a “lecturing” tone… because they are part of a lecture. The material on this web site isn’t meant to constitute a blog.
I hope that readers will find the information I post here useful.
[In this module we discuss methods for finding free sources of online data. We present examples of climate, population, and socio-economic data from a variety of online sources. Other sources of potentially useful data are also discussed. The data sources described here are by no means an exhaustive list of free online data that might be useful to use in a computational, statistical, or mathematical modeling study.]Continue reading →
Connecting mathematical models to predicting reality usually involves comparing your model to data, and finding model parameters that make the model most closely match observations in data. And of course statistical models are wholly developed using sources of data.
Becoming adept at finding sources of data relevant to a model you are studying is a learned skill, but unfortunately one that isn’t taught in any textbook!
One thing to keep in mind is that any data that appears in a journal publication is fair game to use, even if it appears in graphical format only. If the data is in graphical format, there are freeprograms, such as DataThief, that can be used to extract the data into a numerical file.
Google Scholar is search engine that indexes the scholarly literature across an array of publishing formats and disciplines. It provides a very powerful means to find literature associated with pretty much any research topic you can think of.
The final write-ups for final group projects are due Monday, December 1st, 2014. On Dec 2nd and 3rd students will meet with Prof Towers to receive feedback on their project and writeup.
Each of the project groups will perform an in-class 20 min presentation on Monday, Dec 8th, 2014 and Wed, Dec 10th, 2014. By Dec 9th, all group members are to submit to Prof Towers a confidential email, detailing their contribution to the group project, and detailing the contributions of the other group members.
The list of modules for the Fall 2014 course in computational and statistical methods for mathematical biologists and epidemiologists:
In this course students will be introduced to statistical modelling methods such as linear regression, factor regression, and time series analysis. All modelling and data analysis will be performed in the R statistical programming language. The course meets on Thursdays from 12:00-2:45 pm in PSA 546. The course syllabus can be found here.
The course will be structured in a series of modules covering various topics. Some modules may take more than one lecture to cover. Homework will be occasionally assigned throughout the course, usually after completion of a module.
The final project for the course will account for 50% of the grade, and is required to be based on a statistical analysis of one or more data sets from this page and in conjunction with discussion with myself and perhaps other faculty to determine if the topic of the analysis is novel. Students are encouraged to work together on the project in groups of up to three, but the contributions of each student in the group to the project must be clearly defined. The final project write-up will be due approximately two weeks before the end of classes (date to be announced). Oral presentations of the projects will take place in class during the last week of classes.
There is no one textbook that covers the material in this course. Since students in this course have varying backgrounds in statistics, I strongly recommend that you go to the library and take a look at the statistical texts there, and find one or two that cover linear regression and/or time series analysis that you think are at your level. For recommended texts that I think are good, Statistical Data Analysis by Cowan is a good general text for various statistical methods. A standard textbook for linear regression is Applied Linear Statistical Models by Kutner et al. Two good textbooks that cover time series analysis is Time Series Analysis with Applications in R by Cryer, and Time Series Analysis and its Applications with R Examples by Shumway. The material covered in these texts is much more expansive in scope than the material covered in this course (because they cover material that would form the basis of at least three or four different courses). All of these texts are available off of libgen.info, but note that copyright infringement is a crime… one must never do such a terrible thing.