Polymatheia

Graphical parameter optimisation: Latin hypercube sampling

Posted on July 16, 2014 by Sherry Towers

Reply

[This is part of a series of modules on optimisation methods]

Latin hypercube sampling model parameter optimisation is a computationally intensive algorithm that nonetheless have several advantages over other optimisation methods for application to problems in mathematical epidemiology/biology in that it doesn’t require computation of a gradient.

Continue reading →

Markov Chain Monte Carlo parameter optimization method

Posted on July 15, 2014 by Sherry Towers

Reply

In this module we will discuss the Markov Chain graphical Monte Carlo method for finding the best-fit parameters of a mathematical model when fitting the model predictions to a source of data.

Note that “Markov Chain” is an adjective that is applicable to a variety of contexts. “Markov” processes assume that the predictions for the future state of a system depend solely on the current state of the system. “Chain” means that you do many iterations of the Monte Carlo random sampling (ie: a “chain” of iterations). In Markov Chain Monte Carlo, the distributions from which you sample depend on the current state of the system.

Markov Chain Monte Carlo methods can also be used (among other things) for stochastic compartmental modelling, but that is not what this module discusses (see here instead). In this module we only talk about MCMC method specifically applied to parameter optimisation problems. Much confusion often arises for graduate students when they read literature they have found that describes an analysis employing Markov Chain methods, and they assume that it means compartmental modelling with Markov Chains.

Continue reading →

Simplex method

Posted on July 14, 2014 by Sherry Towers

Reply

[This is part of a series of modules on optimization methods]

As we described in this module, gradient descent methods are commonly used to find the minimum of multivariate functions (for instance, the goodness-of-fit statistic, which depends on the model, and its parameters, and the data), but have limited applicability to model parameter fitting in mathematical epidemiology/biology due to the necessity of calculating the gradient at each step in the algorithm.

Simplex methods are an alternative optimization method for goodness-of-fit statistics that do not require calculation of a gradient, just the GoF statistic itself.

Continue reading →

Gradient descent parameter optimization method

Posted on July 13, 2014 by Sherry Towers

Reply

[This is part of a series of modules on optimization methods]

Once you have a figure-of-merit statistic for how well your model fits your data (for instance, a Least Squares or a Poisson negative log-likelihood statistic), we want to find the model parameters that minimize that statistic. Lets call this goodness of fit statistic S(theta), where theta is the vector of parameters upon which our model depends (and thus on which the goodness-of-fit parameter depends).

Continue reading →

Binomial likelihood

Posted on July 12, 2014 by Sherry Towers

Reply

[This is part of a series of modules on optimization methods]

The Binomial distribution is the probability distribution that describes the probability of getting k successes in n trials, if the probability of success at each trial is p. This distribution is appropriate for prevalence data where you know you had k positive results out of n samples. Your model estimates the probability of success, p, which depends on the model parameters.

$f(k;n,p) = \Pr(X = k) = {n\choose k}p^k(1-p)^{n-k}$ Continue reading →

Negative Binomial likelihood

Posted on July 11, 2014 by Sherry Towers

Reply

[This is part of a series of modules on optimization methods]

The Wikipedia pages for almost all probability distributions are excellent and very comprehensive (see, for instance, the page on the Normal distribution). The Negative Binomial distribution is one of the few distributions that (for application to epidemic/biological system modelling), I do not recommend reading the associated Wikipedia page. Instead, one of the best sources of information on the applicability of this distribution to epidemiology/population biology is this PLoS paper on the subject: Maximum Likelihood Estimation of the Negative Binomial Dispersion Parameter for Highly Overdispersed Data, with Applications to Infectious Diseases.

Continue reading →

Poisson likelihood

Posted on July 10, 2014 by Sherry Towers

Reply

[This is part of a series of modules on optimization methods]

When dealing with count data (for instance, the number of new cases of some disease per unit time, or the number of individuals that enter or leave a population due to death, birth or migration), the Poisson likelihood statistic is often used to assess how well a particular model describes the data. It is especially appropriate when there are just a few counts per bin, because in that case the Least Squares method is inappropriate. The Poisson likelihood statistic can in fact be applied to cases where some of the data bins have zero counts.

Continue reading →

Least Squares and Weighted Least Squares

Posted on July 9, 2014 by Sherry Towers

Reply

[This is part of a series of modules on optimization methods]

The simplest, and often used, figure of merit for goodness of fit is the Least Squares statistic (aka Residual Sum of Squares), wherein the model parameters are chosen that minimize the sum of squared differences between the model prediction and the data. For N data points, Y^data_i (where i=1,…,N), and model predictions at those points, Y^model_i, the statistic is calculated as (note that the model prediction depends on the model parameters):

Continue reading →

Optimizing model parameters to data (aka inverse problems)

Posted on July 1, 2014 by Sherry Towers

Reply

[This presentation discusses methods commonly used to optimize the parameters of a mathematical model for population or disease dynamics to pertinent data. Parameter optimization of such models is complicated by the fact that usually they have no analytic solution, but instead must be solved numerically. Choice of an appropriate “goodness of fit” statistic will be discussed, as will the benefits and drawbacks of various fitting methods such as gradient descent, Markov Chain Monte Carlo, and Latin Hypercube and random sampling. An example of the application of the some of the methods using simulated data from a simple model for the incidence of a disease in a human population will be presented]

Introduction
Some simulated disease data in a human population
Things to keep in mind when embarking on model parameter optimization
Choice of an appropriate goodness of fit statistic
- Least Squares
- Poisson likelihood
- Negative Binomial likelihood
- Binomial likelihood
Parameter optimization methods to optimize the goodness-of-fit statistic
- Gradient descent
- Simplex method
- Markov Chain Monte Carlo
- Graphical Monte Carlo: Latin hypercube sampling
- Graphical Monte Carlo: uniform random sampling

Continue reading →