Optimizing model parameters to data (aka inverse problems)

[This presentation discusses methods commonly used to optimize the parameters of a mathematical model for population or disease dynamics to pertinent data.  Parameter optimization of such models is complicated by the fact that usually they have no analytic solution, but instead must be solved numerically. Choice of an appropriate "goodness of fit" statistic will be discussed, as will the benefits and drawbacks of various fitting methods such as gradient descent, Markov Chain Monte Carlo, and Latin Hypercube and random sampling.  An example of the application of the some of the methods using simulated data from a simple model for the incidence of a disease in a human population will be presented]

Continue reading

Least Squares and Weighted Least Squares

[This is part of a series of modules on optimization methods]

The simplest, and often used, figure of merit for goodness of fit is the Least Squares statistic (aka Residual Sum of Squares), wherein the model parameters are chosen that minimize the sum of squared differences between the model prediction and the data.  For N data points, Y^data_i (where i=1,…,N), and model predictions at those points, Y^model_i, the statistic is calculated as (note that the model prediction depends on the model parameters):

ls Continue reading

Poisson likelihood

[This is part of a series of modules on optimization methods]

When dealing with count data (for instance, the number of new cases of some disease per unit time, or the number of individuals that enter or leave a population due to death, birth or migration), the Poisson likelihood statistic is often used to assess how well a particular model describes the data.  It is especially appropriate when there are just a few counts per bin, because in that case the Least Squares method is inappropriate.  The Poisson likelihood statistic can in fact be applied to cases where some of the data bins have zero counts.

Continue reading

Negative Binomial likelihood

[This is part of a series of modules on optimization methods]

The Wikipedia pages for almost all probability distributions are excellent and very comprehensive (see, for instance, the page on the Normal distribution). The Negative Binomial distribution is one of the few distributions that (for application to epidemic/biological system modelling), I do not recommend reading the associated Wikipedia page. Instead, one of the best sources of information on the applicability of this distribution to epidemiology/population biology is this PLoS paper on the subject: Maximum Likelihood Estimation of the Negative Binomial Dispersion Parameter for Highly Overdispersed Data, with Applications to Infectious Diseases.

Continue reading

Binomial likelihood

[This is part of a series of modules on optimization methods]

The Binomial distribution is the probability distribution that describes the probability of getting k successes in n trials, if the probability of success at each trial is p.  This distribution is appropriate for prevalence data where you know you had k positive results out of n samples. Your model estimates the probability of success, p, which depends on the model parameters.

 f(k;n,p) = \Pr(X = k) = {n\choose k}p^k(1-p)^{n-k} Continue reading

Gradient descent parameter optimization method

[This is part of a series of modules on optimization methods]

Once you have a figure-of-merit statistic for how well your model fits your data (for instance, a Least Squares or a Poisson negative log-likelihood statistic), we want to find the model parameters that minimize that statistic.  Lets call this goodness of fit statistic S(theta), where theta is the vector of parameters upon which our model depends (and thus on which the goodness-of-fit parameter depends).

Continue reading

Simplex method

[This is part of a series of modules on optimization methods]

As we described in this module, gradient descent methods are commonly used to find the minimum of multivariate functions (for instance, the goodness-of-fit statistic, which depends on the model, and its parameters, and the data), but have limited applicability to model parameter fitting in mathematical epidemiology/biology due to the necessity of calculating the gradient at each step in the algorithm.

Simplex methods are an alternative optimization method for goodness-of-fit statistics that do not require calculation of a gradient, just the GoF statistic itself.

Continue reading

Latin hypercube and random sampling parameter optimization method

[This is part of a series of modules on optimization methods]

Latin hypercube and random sampling for model parameter optimization are computationally intensive algorithms that nonetheless have several advantages over other optimization methods for application to problems in mathematical epidemiology/biology.  One of the main advantages is that the algorithm (particularly random sampling) is easily parallelizable to make use of distributed computing resources.

Continue reading