**[This is part of a series of modules on optimization methods]**

The Wikipedia pages for almost all probability distributions are excellent and very comprehensive (see, for instance, the page on the Normal distribution). The Negative Binomial distribution is one of the few distributions that (for application to epidemic/biological system modelling), I do **not** recommend reading the associated Wikipedia page. Instead, one of the best sources of information on the applicability of this distribution to epidemiology/population biology is this PLoS paper on the subject: Maximum Likelihood Estimation of the Negative Binomial Dispersion Parameter for Highly Overdispersed Data, with Applications to Infectious Diseases.

As the paper discusses, the Negative Binomial distribution is the distribution that underlies the stochasticity in *over-dispersed* count data. Over-dispersed count data means that the data have a greater degree of stochasticity than what one would expect from the Poisson distribution. In practice, this is frequently the case for count data arising in epidemic or population dynamics due to randomness in population movements or contact rates, and/or deficiencies in the model in capturing all intricacies of the population dynamics.

Recall that for count data with underlying stochasticity described by the Poisson distribution that the mean is mu=lambda, and the variance is sigma^2=lambda. In the case of the Negative Binomial distribution, the mean and variance are expressed in terms of two parameters, mu and alpha (note that in the PLoS paper above, m=mu, and k=1/alpha); the mean of the Negative Binomial distribution is mu=mu, and the variance is sigma^2=mu+alpha*mu^2. Notice that when alpha>0, the variance of the Negative Binomial distribution is always greater than the variance of the Poisson distribution.

There are several other formulations of the Negative Binomial distribution, but this is the one I’ve always seen used so far in analyses of biological and epidemic count data.

The probability of observing X counts with the Negative binomial distribution is (with m=mu and k=1/alpha):

Recall that m is the model prediction, and depends on the model parameters.

If we had N data points, we would take the product of the probabilities to get the overall likelihood for the model, and the best-fit parameters maximize this statistic. And just like we discussed with the Poisson likelihood, the negative of the sum of the logs of the individual probabilities (the negative log likelihood) is the statistic that is usually used, and minimized to determine the best-fit model parameters.

The drawback of the Negative Binomial likelihood statistic is that it is very long and messy to write down, and tends to be rather daunting to people new to likelihood methods. Use of the Poisson likelihood will generally give unbiased estimates of the model parameters…. the estimate of the parameter uncertainties will just be too low.