Binomial likelihood

[This is part of a series of modules on optimization methods]

The Binomial distribution is the probability distribution that describes the probability of getting k successes in n trials, if the probability of success at each trial is p.  This distribution is appropriate for prevalence data where you know you had k positive results out of n samples. Your model estimates the probability of success, p, which depends on the model parameters. $f(k;n,p) = \Pr(X = k) = {n\choose k}p^k(1-p)^{n-k}$

If we have a set of N data samples k_i and n_i (where i=1,…,N) and model estimates at those points p_i, the overall likelihood is and the negative log-likelihood is (note that we can ignore the n_i choose k_i term because it only depends on the data, not the model): An example of the an appropriate use of the Binomial likelihood:  I have a data time series of the number of tests for avian influenza on chicken flocks taken each month, and the number of those tests that were positive.  In this case, I could use an SIR model to estimate the model prediction for the prevalence, I (ie; the probability of a positive test, p), and this estimate of the prevalence depends on my model parameters, such as the transmission rate, and/or recovery rate.  The best-fit transmission rate and recovery rate minimize the Binomial negative log-likelihood.