# Research in archaeoastronomy using computational, mathematical, and statistical methods, with free and open source software

[On this page I document and describe some of my work in archaeoastronomy, using mathematical, computational, and statistical methods to rigorously assess the probability that an archaeological site was used as an astronomical observatory.  I will give, as an example, all the programming code and other files I used to analyze a stone circle in the UK known as the "Merry Maidens".]

Visits: 993

# Archaeoastronomy: where on the horizon do the stars, Sun, Moon rise and set? (Part I)

[This post is part of a series of posts on archaeoastronomy using open source software]

In astronomy, the observer’s coordinate system for observing a star (or the Sun or Moon) involves two coordinates, azimuth and altitude.

Visits: 617

# Least Squares and Weighted Least Squares

[This is part of a series of modules on optimization methods]

The simplest, and often used, figure of merit for goodness of fit is the Least Squares statistic (aka Residual Sum of Squares), wherein the model parameters are chosen that minimize the sum of squared differences between the model prediction and the data.  For N data points, Y^data_i (where i=1,…,N), and model predictions at those points, Y^model_i, the statistic is calculated as (note that the model prediction depends on the model parameters):

Visits: 1112

# Markov Chain Monte Carlo parameter optimization method

In this module we will discuss the Markov Chain graphical Monte Carlo method for finding the best-fit parameters of a mathematical model when fitting the model predictions to a source of data.

Note that “Markov Chain” is an adjective that is applicable to a variety of contexts.    “Chain” means that you do many iterations of the Monte Carlo random sampling (ie: a “chain” of iterations).  In Markov Chain Monte Carlo, the distributions from which you sample depend on the current state of the system.

Markov Chain Monte Carlo methods can also be used (among other things) for stochastic compartmental modelling, but that is not what this module discusses (see here instead).  In this module we only talk about MCMC method specifically applied to parameter optimisation problems.  Much confusion often arises for graduate students when they read literature they have found that describes an analysis  employing Markov Chain methods, and they assume that it means compartmental modelling with Markov Chains.

Visits: 778

# Graphical parameter optimisation: Latin hypercube sampling

[This is part of a series of modules on optimisation methods]

Latin hypercube sampling model parameter optimisation is a computationally intensive algorithm that nonetheless have several advantages over other optimisation methods for application to problems in mathematical epidemiology/biology in that it doesn’t require computation of a gradient.

Visits: 844

# How to download an R script from the internet and run it

While you can input commands interactively into the R window, it is often more convenient to create a file (usually with a .R extension) that contains all the R code, and then ask R to run (aka: source) the commands in the file.

In the file short_test.R, I have put the R code to do a loop, and print out the numbers one through ten.  To run this script, first you need to create a folder on your computer that we will refer to as your working directory… Have this folder off of your root directory (C: directory in windows, and off of your base user directory in other platforms), and call this folder short_test_dir

Now, in R, you can use the setwd() command to change to that working directory (which tells R that from now on you want it to look only in that directory for files)

For windows, type at  the R command prompt

`setwd("C:/short_test_dir")`

and for Linux or Max OSX type

`setwd("~/short_test_dir")`

(the twiddle ~ means your home directory).  If you get an error at this point, it is because you did not properly create the folder short_test_dir under your home folder (or the C: directory in Windows), or you made a spelling mistake, or you forgot one or both quotes.

Now, a problem with Windows is that .R and data files downloaded from the internet tend to be saved with a .txt extension, and it is annoying to constantly have to remove it.  In addition, web browsers running on Windows seem to usually assume that any text file you are looking at on the internet must be HTML, so when downloading such files it puts HTML prefaces at the top, which prevent the files from being run in R.  To get around these problems, it is usually easiest to download the files using the R function download.file().

Thus, for windows, at the R command line type

`download.file("http://www.sherrytowers.com/short_test.R","short_test.R")`

and for Linux and Mac OSX type

`download.file("http://www.sherrytowers.com/short_test.R","short_test.R")`

If you get an error at this point, you’ve made a spelling mistake in the URL, or in the local directory name. Or you’ve forgotten one or more quotes.

Now, to run the file, type at the R command prompt

`source("short_test.R")`

You should see the numbers 1 through 10 being printed to the screen.

If you are creating your own .R file, you need to make sure that (particularly for Windows) a .txt is not appended to the end of it.

Now, using a text editor like Notepad (Windows) or TextEdit (Mac OSX), or whatever text editor you feel comfortable with, change the short_test.R file to print the numbers 11 to 100, but in a line, with each number separated by a space (rather than in a column).  Save the file, then make sure you can run the new file from the R command line.

Visits: 658

## Using the NSF XSEDE batch computing system

Logging in to NSF XSEDE Stampede

Before starting this module, all students in AML610 were asked to apply for an XSEDE user portal account by going to this link and filling in the form, and choosing a username.   Continue reading

Visits: 113

# A C++ class for numerically solving ODE’s

In previous modules, we have described how to use methods in the R deSolve library to numerically solve systems of ordinary differential equations, like the SIR model.  The default algorithm underlying the functions in the deSolve library is 4th order Runge-Kutta method, which involves an iterative process to obtain approximate numerical solutions to the differential equations.  Euler’s method is an even simpler method that can be used to estimate solutions to ODE’s, but 4th order Runge-Kutta is a higher order method that is more precise. Continue reading

Visits: 420

# An example C++ program to fit model parameters: using XSEDE

In Homework#7 of the 2014 fall AML610 course, we discussed an example where we had a time series of data for a disease vector, V, that can spread disease to a human population.  Once the  humans catch the disease, they recover after 1/gamma days and moved to the recovered compartment.

Visits: 240

# Another example C++ program to fit model parameters to data

In a past module, we examined how we could use methods in the R deSolve to fit the parameters of an SIR model to confirmed cases of influenza B in the Midwest region during the 2007-2008 flu season (the data were obtained from the CDC).  In that module, we used a Least Squares goodness-of-fit estimator. Continue reading

Visits: 461

# Submitting jobs to the ASU A2C2 ASURE batch computing system

The AML610  Fall 2014 course has received an allocation of 10,000 CPU hours on the A2C2 Asure batch computing system.  Students in the course have received an email from me describing how to sign up for an Asure account under this allocation.

Visits: 461

# Correcting for over-dispersion when using Pearson chi-squared

In this past module, we discussed the various merits and applicability of the Least Squares, Pearson chi-square, Poisson likelihood, and Negative Binomial likelihood statistics.

And in this past module we discussed how we can use the graphical Monte Carlo method (aka fmin plus a half method) to determine the one-std deviation confidence interval on our parameter hypotheses when using a likelihood statistic, and we also discussed how the Least Squares and Pearson chi-square statistics can be converted to likelihood statistics.

Visits: 337

Visits: 397

# Incorporating realistic probability distributions for compartment sojourns into a compartmental model

Page under construction…

Exponential probability distribution.

In reality, most likely time to leave the compartment is not at t=0!

Take, as an example, an SIR model (show equations)

The inherent assumption is that the sojourn time in the infectious stage is exponentially distributed (show proof); thus on average people recover after time t=1/gamma, but the most likely time to recover is at time t=0 after being infected!

But, say that we know, from observational studies of many people, that the probability of recovering after days 0, 1, 2, 3, etc looks like this: (show plot)

Obviously, the most likely day to recover is not on day 0!

Many probability distributions that look like “bumps”, defined in the range t=[0,inf] can be more or less adequately parameterized as a Gamma distribution.

Two parameters, shape factor (k) and scale factor (theta).  If shape factor an integer, then known as the Erlang distribution.  The mean of the distribution is mu=k*theta, and the variance is sigma^2=k*theta^2.

Now, conveniently, it happens that the Erlang distribution is the distribution of the sum of k exponentially distributed random variables with mean theta. (give R code)

This property provides a convenient trick to incorporate realistic sojourn probability distributions into a compartmental model.

The weighted average of that distribution is, and the weighted variance is. Doing the algebra yields that the shape factor is k~3, and theta~4.5

Why incorporate gamma distribution… does it make a difference to epidemic curve?

Visits: 169

# Mean, variance, covariance, and correlation of samples

[In this module we will discuss estimates of sample mean and variance, and also discuss the definition of covariance and correlation between two sets of random variables]

Visits: 1935

# Calculating the dynamic time step when using numerical methods to solve ODE’s

[In this module, students will learn how to calculate the time step needed when applying numerical solution methods, such as the Runge-Kutta or Euler's method, to solve systems of ODE's associated with compartmental models.  This is not only useful knowledge for obtaining deterministic solutions, but also for stochastic modelling methods like Markov Chain Monte Carlo (MCMC), Stochastic Differential Equations (SDE's), and Agent Based Models (ABM's)].

It’s easy!

For any compartmental model, to get the average time needed to obtain an average change of a total of one individual in the model compartments, simply sum up the flows out of each compartment at that point in time, and then take the inverse (note that births are included in this calculation as a flow “out” of a compartment, even though they are really flowing in).  Continue reading

Visits: 727

# Compartmental modelling with stochastic differential equations

[In this module, students will become familiar compartmental modelling with stochastic differential equations]

A good, but more mathematical, introduction to the material discussed here can be found in the paper Construction of Equivalent Stochastic Differential Equation Models by Allen et al (2008).

The content a previous module, formalism for preparing to do stochastic compartmental modelling, is imperative as a precursor to the content of this module.

Stochastic differential equation models can potentially be used for any dynamical system whose evolution can be described by a system of compartmental ODE’s, under certain provisos…

Visits: 287

# Stochastic compartmental modelling with Markov Chain Monte Carlo: Part II

[In this module, students will become familiar with Markov Chain Monte Carlo compartmental modelling by sampling from the Exponential distribution to simulate the time between state transitions.  Several examples will be given, including modelling of Aedes aegypti populations, the vector for Zika virus.]

In this past module, we discussed the Exponential and Poisson distributions.  In particular we noted the duality of these two distributions; the Exponential distribution describes the time between events in a Poisson process.

In this past module, we discussed how to perform simple Markov Chain Monte Carlo stochastic modelling of a compartmental model, by sampling at each small time step the change in compartments from the Poisson distribution.

Because of the duality of the Exponential and Poisson distributions we can just as easily perform MCMC stochastic compartmental modelling randomly Exponentially sampling the time between state transitions.

In order to do this, we make use of this special property of the Exponential distribution: If X~Exp(gamma) and Y~Exp(rho), then min(X,Y) is distributed as Exp(lambda) with lambda=gamma+rho
The proof of this property can be found here.

Thus, for example, given a model with two possible state transitions, with expected rate gamma and rho, respectively, we expect the rate for at least one of the transitions to take place to be gamma+rho.  Thus, we would randomly sample X~Exp(gamma) and Y~Exp(rho), and whichever is smallest is the state transition we allow to take place, and we increment the time by delta_t = min(X,Y).

Example: SIR model

Take, for example an SIR model:

The list of state transitions is:

Thus, infection is expected to occur at rate beta*S*I/N, and recovery is expected to occur at rate gamma*I.  Thus, we would randomly sample X~Exp(beta*S*I/N) and Y~Exp(gamma*I); if X is less than Y then an infection takes place, and we increment time by delta_t=X.  If Y is less than X, then a recovery takes place, and delta_t=Y.

The R script sir_markov_chain_iterated_exponential.R performs an MCMC stochastic simulation of this system, with options to sample from the Exponential distribution (lexponential=1 in the script), or the Poisson distribution (lexponential=0).  The script produces the plot:

Try running the script again with lexponential=0 to see how the plot changes.

Example: SEIR Model

The Susceptible, Exposed, Infected, Recovered model is described by the set of differential equations:

The table of possible transitions is

In this case, at each step of the simulation we randomly sample X~Exp(beta*S*I/N), Y~Exp(kappa*E), and W~Exp(gamma*I), and find the minimum, and select that as the transition that occurred.

The script seir_markov_chain_iterated_exponential.R performs 50 realisations of a MCMC stochastic SEIR model, sampling from the Exponential probability distribution to simulate the time between state transitions. The script produces the following plot:

Another example: Wolbachia in mosquito populations as a means to control Dengue, Chikungunya, and Zika virus outbreaks

Dengue virus is a mosquito borne arbovirus, primarily spread between humans by Aedes aegypti, a tropical/semi-tropical mosquito.  Because of the wide range of the mosquito across the world, approximately one third to half of the world’s population is at risk for DENV.

DENV typically causes a flu-like illness, but in some cases can cause haemorrhagic fever and death.  There are four serotypes of the virus, and infection with one will confer immunity to it, but not the other three.  Thus it is possible to get DENV several times.  There are currently no vaccines or specific treatment for DENV.

DENV does not spread directly from human-to-human.  Instead, a mosquito bites an infected person, and after a period of time (known as the extrinsic incubation period) the virus replicates in the mosquito’s gut, such that when it takes another blood meal it infects another person.

Chikungunya virus, CHIKV, is also spread Aedes aegypti mosquitoes.  Unlike DENV it only has one known serotype, so recovery from CHIKV confers lasting immunity.  However, new susceptible babies are always being born into the population, thus periodic outbreaks can occur when enough susceptibles build up, and an infected person is introduced to the population.  CHIKV rarely kills, but can cause crippling sequelae, including severe arthritis.  There is currently no vaccine or specific treatment for CHIKV.  In 2014/15 there was an explosive outbreak of CHIKV in the Americas; the entire population was susceptible at the beginning of the outbreak.

Zika virus is also spread by Aedes.  Like CHIKV, there is only one serotype, so recovery results in lasting immunity. There is currently no vaccine or specific treatment for Zika. Compared to DENV and CHIKV, Zika results in the most mild infection, and almost never kills.  In fact, it has been estimated that 80% of cases are asymptomatic.  However, there is recent evidence from Brazil and other countries that strongly suggests that infection with Zika virus early in pregnancy can cause significant birth defects in the fetus.  Thus, the most telling evidence of a large Zika outbreak is not the burden on the healthcare system of those who are actively infected, but the babies born with defects 6 to 9 months later.

It is for this reason that the WHO has declared a global health emergency due to Zika at the beginning of February, 2016.  Particularly worrisome is the fact that the 2016 summer Olympics are to be held in Brazil, meaning that there will likely be a huge influx of largely susceptible people to the region, and highly elevated probabilities of worldwide dispersal of the virus.  The only bright side is that the peak season for Aedes aegypti in Brazil is January to May...

It is worth noting here that Zika comes under the category of “neglected tropical diseases”, which are diseases primarily affecting low-income countries, and get very little research funding.  Zika has been so neglected, that very little study has been done of its epidemiology, including its incubation and infectious period, and the extrinsic incubation period in mosquitoes.

Currently, the only known means to control the spread of these viruses is to either prevent the mosquitoes from biting by wearing long sleeves or repellents (bed nets are not much use because Aedes primarily bite during the day), or to control the mosquito populations through the use of larvacides, insecticides, or removing breeding containers.  Almost complete eradication of Aedes aegypti in the Americas was achieved in the 1960′s with the latter methods by a man named Fred Soper.  Between the 1930′s to the 1960′s using DDT, and exhaustive house breeding-site searches, he managed to virtually eliminate Aedes aegypti throughout the Americas, and with it Dengue and Yellow Fever.

Aedes aegypti are “domestic” mosquitoes.  They had evolved along with humans, and prefer to breed in the nice clean water we store in tanks in our houses rather than in water in rivers or ponds.  We are also their favourite snack.  A tank of water can only hold so many larvae, however, because they are cannabilistic.  Perhaps largely due to this, the number of Aedes aegypti per household is usually quite modest… usually just a few per person. Note that only female mosquitoes bite… they need a blood meal in order to lay eggs.

It is believed that the spread of diseases like DENV, CHIKV and Zika is less due to dispersion of mosquitoes, and more due to the movement of humans between patches of mosquitoes.

As an aside, and related to dispersal distances; why do you think Aedes are more prevalent in cities?

A novel means of Aedes control involves Wolbachia bacteria. Wolbachia are bacteria that live within insect cells and are passed from one generation to the next through the insect’s eggs. Wolbachia is present in up to 60% of all the different species of insects around us including some mosquitoes that bite people, but not Aedes aegypti – the primary mosquito species involved in the transmission of DENV, CHIKV, and Zika.

We can produce Wolbachia-infected Aedes aegypti mosquitoes artificially in the laboratory (let’s refer to them as Wolbachia mosquitoes).  It turns out that Wolbachia mosquitoes have some interesting properties…. infection with Wolbachia impedes the the ability of DENV (and presumably CHIKV, and Zika) to replicate in the tissues of the mosquitoes.

Also, when a male mosquito infected with Wolbachia mates with a wild female mosquito, the wild female will not produce viable offspring.  When Wolbachia female mosquitoes mate with either Wolbachia males or wild males, they produce Wolbachia mosquitoes.

There is a hitch though; infection with Wolbachia appears to make Aedes mosquitoes live shorter lives. Also, it appears that their larvae aren’t quite as cannabilistic as wild larvaeAlso, their eggs are more likely to die if dessicated. This reduced fitness is almost certainly why we don’t see Wolbachia mosquitoes naturally in the wild.

However, the two very nice properties of Wolbachia mosquitoes (suppression of DENV and reduction in viable progeny of wild females) are so hopeful that the Bill and Melinda Gates Foundation has invested large amounts of money to examine the potential further.  The effect on wild populations of releases of all male Wolbachia mosquitoes have been studied, as well as the effect of releases of both male and female Wolbachia mosquitoes.  Note that the problem with releasing females is that even though they aren’t good transmitters of DENV, they still bite, and that annoys the populace.

Researchers in Australia did a test project where they did 10 releases, over 10 weeks, of male and female Wolbachia mosquitoes in the areas of about 1200 houses in Cairns, Australia.  Each week, approximately 20 to 25 mosquitoes per house were released.  Before the releases began, the researchers went door to door and got people to empty all sources of water in their yards.

As an aside here, Wolbachia isn’t always a good thing when it comes to human diseases; the disease filariasis is spread by a parastic worm that depend on Wolbachia to grow.  Controlling that disease requires reducing Wolbachia in the worms.

If we wish to use mathematical models to predict what will happen when Wolbachia mosquitoes are released, we need to take into account the inherently patchy nature of mosquito populations.  There may be, for instance, 50,000 houses in a city, with on average 10 wild Aedes per house.  But that doesn’t mean that every house has Aedes, and it certainly doesn’t mean that all 500,000 mosquitoes mix together and mate.  They are much more likely to mate with mosquitoes in near geographic proximity.  And things like roads, walls around houses, or industrial areas, can form boundaries of their patch that are not impossible to cross, but definitely represent barriers that impede their dispersal.

Because there are so few mosquitoes per house, and geographic patches like a city block or clusters of houses aren’t that large, stochastic models are appropriate. In the R file wolbach.R, I provide an example of a stochastic MCMC model for populations of mixed wild and Wolbachia mosquitoes.  The model I use has to take into account that there may be unequal numbers of males and females, particularly in the Wolbachia compartment, and that Wolbachia males and wild females cannot produce viable offspring.  The model I use in the script is (note the X’s are the wild mosquitoes, and the W’s are the Wolbachia mosquitoes):

Where A_Xf is the number of babies in the X compartment due to pairings of Xf, A_Xm is the number of babies in the X compartment due to pairings of Xm, and so on….
And where B_Xf is the number of babies in the W compartment due to pairings of Xf, B_Xm is the number of babies in the W compartment due to pairings of Xm, and so on….
Thus:

Where sigma is the fecundity of the X females, and rho is the fecundity of the W females.

A model similar to this one can be found here (warning: there are many typos in their model formulation).

In the wolbach.R script, I run the simulation for 90 days (about the length of a rainy season), and see how many X’s and W’s remain in the population.  With this model, we can examine the effect of releasing just male Wolbachia into the wild population, or a mix of male and female.  Running 100 realisations with an initial wild population of 100, and a Wolbachia population of 100 equally mixed between male and female, produces the following plot:

Questions: what are the assumptions of the model? How can we make it more realistic?

Visits: 1800

# Stochastic compartmental modelling with Stochastic Differential Equations

[In this module, students will become familiar with stochastic modelling with Stochastic Differential Equations for compartmental models and how they compare and contrast with Markov Chain Monte Carlo Methods.]

A good, but much more mathematical, introduction to the material discussed here can be found in the paper Construction of Equivalent Stochastic Differential Equation Models by Allen et al (2008).

Visits: 465