Example problems involving hypothesis testing
Monthly Archives: September 2013
Exploratory Data Analysis: examples
Exploratory data analysis essentially is the process of getting to know your data by making plots and perhaps doing some simple statistical hypothesis tests. Getting to know your data is important before starting the process of regression analysis or any kind of more advanced hypothesis testing, because, more often than not, real data will have “issues” that complicate statistical analyses.
Hypothesis testing of count data (flowchart)
This page describes how to determine if count data are statistically consistent with some value. Count data are data counted in bins of some time span, for instance the number of influenza cases per day, or the number of murders per year.
Correlations, Partial Correlations, and Confounding Variables
In this post we discuss the calculation of the correlation coefficient between two variables, X and Y, and the partial correlation coefficient which controls for the effect of a potential confounding variable, Z
Protected: Statistical Data Analysis Hall of Shame
Least Squares linear regression
In this module, students will become familiar with least squares linear regression methods. Note that before proceeding with any regression analysis, it is important to first perform initial data exploration, both with visualization analysis with histograms, boxplots, and scatter plots, and numerical summaries of the variables like the mean, standard deviations, maxima and minima, and correlations between variables. In this way, you can determine if there are any unexpected “quirks” or problems with the data (and, more often than not, there are).
Hypothesis testing of sample means (flowchart)
On this page we give the flow chart for testing means of independent samples. For instance, the set of temperature measurements over a 10 year period for all days in July is pretty independent of the set of temperature measurements over a 10 year period for all days in January. An example of non-independent samples is the measurement of cancer tumor size in 100 patients before and after some cancer treatment; the final tumor size will of course be somewhat (or a lot) correlated to the tumor size at the beginning of treatment.