Introduction
In this course students will be introduced to statistical modelling methods such as linear regression, factor regression, and time series analysis. All modelling and data analysis will be performed in the R statistical programming language. The course meets on Thursdays from 12:00-2:45 pm in PSA 546. The course syllabus can be found here.
The course will be structured in a series of modules covering various topics. Some modules may take more than one lecture to cover. Homework will be occasionally assigned throughout the course, usually after completion of a module.
The final project for the course will account for 50% of the grade, and is required to be based on a statistical analysis of one or more data sets from this page and in conjunction with discussion with myself and perhaps other faculty to determine if the topic of the analysis is novel. Students are encouraged to work together on the project in groups of up to three, but the contributions of each student in the group to the project must be clearly defined. The final project write-up will be due approximately two weeks before the end of classes (date to be announced). Oral presentations of the projects will take place in class during the last week of classes.
There is no one textbook that covers the material in this course. Since students in this course have varying backgrounds in statistics, I strongly recommend that you go to the library and take a look at the statistical texts there, and find one or two that cover linear regression and/or time series analysis that you think are at your level. For recommended texts that I think are good, Statistical Data Analysis by Cowan is a good general text for various statistical methods. A standard textbook for linear regression is Applied Linear Statistical Models by Kutner et al. Two good textbooks that cover time series analysis is Time Series Analysis with Applications in R by Cryer, and Time Series Analysis and its Applications with R Examples by Shumway. The material covered in these texts is much more expansive in scope than the material covered in this course (because they cover material that would form the basis of at least three or four different courses). All of these texts are available off of libgen.info, but note that copyright infringement is a crime… one must never do such a terrible thing.
- Module I: Introduction to the R statistical programming language
- Module II: The difference between mathematical and statistical modelling (plus some more basics of R)
- Finding sources of data: extracting data from the published literature
- Online sources of free data
- Hwk#2, due Thus Sep 12th at noon (data set for the homework is chicago_crime_summary.txt)
- Hwk#2 solutions
- Module III: Probability distribution and statistics review
- Literature searches with Google Scholar
- Correlations, partial correlations, and confounding variables
- Hypothesis testing of count data: flowchart
- Hypothesis testing of means (not count data): flowchart
- Statistical Data Analysis Hall of Shame
- Hypothesis testing example problems
- Examples of exploratory data analysis
- Hwk#3, due Mon Sep 30th at noon. Data set for the homework is chicago_pollution.txt
- Module IV: Linear regression in R
- Module V: Time series analysis