In this module, students will become familiar with least squares linear regression methods. Note that before proceeding with any regression analysis, it is important to first perform initial data exploration, both with visualization analysis with histograms, boxplots, and scatter plots, and numerical summaries of the variables like the mean, standard deviations, maxima and minima, and correlations between variables. In this way, you can determine if there are any unexpected “quirks” or problems with the data (and, more often than not, there are).
In this post we discuss the calculation of the correlation coefficient between two variables, X and Y, and the partial correlation coefficient which controls for the effect of a potential confounding variable, Z
This page describes how to determine if count data are statistically consistent with some value. Count data are data counted in bins of some time span, for instance the number of influenza cases per day, or the number of murders per year.
Exploratory data analysis essentially is the process of getting to know your data by making plots and perhaps doing some simple statistical hypothesis tests. Getting to know your data is important before starting the process of regression analysis or any kind of more advanced hypothesis testing, because, more often than not, real data will have “issues” that complicate statistical analyses.
Example problems involving hypothesis testing
[In this module, students will become familiar with time series analysis methods, including lagged regression methods, Fourier spectral analysis, harmonic linear regression, and Lomb-Scargle spectral analysis]
[This presentation discusses methods commonly used to optimize the parameters of a mathematical model for population or disease dynamics to pertinent data. Parameter optimization of such models is complicated by the fact that usually they have no analytic solution, but instead must be solved numerically. Choice of an appropriate "goodness of fit" statistic will be discussed, as will the benefits and drawbacks of various fitting methods such as gradient descent, Markov Chain Monte Carlo, and Latin Hypercube and random sampling. An example of the application of the some of the methods using simulated data from a simple model for the incidence of a disease in a human population will be presented]
- Some simulated disease data in a human population
- Things to keep in mind when embarking on model parameter optimization
- Choice of an appropriate goodness of fit statistic
- Least Squares
- Poisson likelihood
- Negative Binomial likelihood
- Binomial likelihood
- Parameter optimization methods to optimize the goodness-of-fit statistic
- Gradient descent
- Simplex method
- Markov Chain Monte Carlo
- Latin hypercube and random sampling