Exploratory Data Analysis: examples

Exploratory data analysis essentially is the process of getting to know your data by making plots and perhaps doing some simple statistical hypothesis tests.  Getting to know your data is important before starting the process of regression analysis or any kind of more advanced hypothesis testing, because, more often than not, real data will have “issues” that complicate statistical analyses.

Continue reading

Least Squares linear regression

In this module, students will become familiar with least squares linear regression methods. Note that before proceeding with any regression analysis, it is important to first perform initial data exploration, both with visualization analysis with histograms, boxplots, and scatter plots, and numerical summaries of the variables like the mean, standard deviations, maxima and minima, and correlations between variables.  In this way, you can determine if there are any unexpected “quirks” or problems with the data (and, more often than not, there are).

Continue reading

Hypothesis testing of sample means (flowchart)

On this page we give the flow chart for testing means of independent samples. For instance, the set of temperature measurements over a 10 year period for all days in July is pretty independent of the set of temperature measurements over a 10 year period for all days in January.  An example of non-independent samples is the measurement of cancer tumor size in 100 patients before and after some cancer treatment; the final tumor size will of course be somewhat (or a lot) correlated to the tumor size at the beginning of treatment.

Continue reading