**Statistical Methods for Students in the Life and Social Sciences**

**(aka: How to be a Data Boss)**

**Objectives:**

**This course is meant to introduce students in the life and social sciences to the skill set needed to do well-executed and well-explicated statistical analyses. The course is aimed at students with little prior experience in statistical analyses, but prior exposure to “stats 101″-type courses is helpful. The course will be almost entirely based on material posted on this website. The course syllabus can be found here.** **There is no textbook for this course, but recommended reading is How to Lie with Statistics by Irving Geis, Statistical Data Analysis by Glen Cowan, and Applied Linear Statistical Models by Kutner et al (doesn’t really matter which edition).** **Upon completing this course:** **Students will have an understanding of basic statistical methods, including hypothesis testing, linear regression and generalized regression methods, and will understand common pitfalls in statistical analyses, and how to avoid them (and detect them, when reviewing papers!). If we have the following problem as the course progresses, students need to tell me, because it means that I need to adjust the pace and content of the course material:** **Upon completion of the course, students will have basic functionality in R, and will learn how to read in, manipulate, and export data in R, and will be able to create publication-quality plots in R. Methods for producing well-written scientific papers, and giving good oral presentations, are also heavily stressed throughout the course.**

**The Dr.Towers’ Golden Rules for Statistical Data Analysis:**

**All (or nearly all) data has stochasticity (ie; randomness) associated with it****A probability distribution underlies that stochasticity****Hypothesis test are based on that probability distribution****Anything calculated using data (like statistics like the mean or standard deviation, for example) has stochasticity associated with it, because the data are stochastic.****Every statistical analysis needs to start with a “meet and greet” with your data. Calculation of basic statistics (sample size, means, standard deviations, ranges, etc), and plots to explore the data and ensure no funny business is going on.****When doing regression, you need two things: a model that describes how the data depend on the explanatory variables, and a goodness-of-fit statistic (like Least Squares, or Binomial likelihood, or Poisson likelihood, etc)**

**List of course modules:**

- Good work habits, and requirements for homework
- Literature searches with Google Scholar
- Elements of scientific papers
- The basics of the R statistical programming language
- Difference between statistical and mathematical models
- Probability distributions important to modelling in the life and social sciences
- Descriptive statistics: mean, covariance, variance, and correlation
- Online sources of free data
- Extracting data from graphs in the published literature
- Bringing together disparate sources of data
- Correlations, partial correlations, and confounding variables
- Exploratory data analysis examples
- Least squares linear regression
- Producing well written manuscripts in a timely fashion
- Giving a good presentation
- t-tests and z-tests of means, and ANOVA
- Poisson regression
- Logistic regression
- Population standardized Poisson regression for data expressed as per capita rates
- Kolmogorov-Smirnov test to compare the shape of two distributions
- Negative Binomial likelihood fits for over-dispersed count data
**Homework #8, due Friday April 20th at noon. (in class presentations the week of April 23rd)**

- Numerical methods for propagation of uncertainties
- Least Squares fitting is equivalent to homoskedastic Normal likelihood fitting
- Model validation methods
- Making choropleth maps in R
- K-means clustering
- R Shiny (more examples here)

**Course expectations:** There will be regular homework projects assigned throughout the course, which will be worth 50% of the grade. Students are strongly encouraged to work together in groups to discuss issues related to the homework and resolve problems. However, plagiarism of code will not be tolerated. There also may be unannounced in-class pop quizzes during the semester. If these occur, they will be counted among the homework grades. The culmination of the course will be a group term project (two to three students collaborating together, with the project worth 50% of the final grade). Students will write-up the results of their project in a format suitable for publication, using the format required by a journal they have identified as being appropriate for the topic. A cover letter written to the editor of the journal is also required. **However, submission for publication is not required, but encouraged if the analysis is novel.** Students are responsible for locating and obtaining sources of data, and developing an appropriate statistical model for the project, so this should be something they begin to think about very early in the course. **This course has no associated textbook. Instead the course content consists of the modules that appear on this website.**** A textbook that students may find useful is Statistical Data Analysis, by G. Cowan** Students are expected to bring their laptops to class. Before the course begins, students are expected to have downloaded the R programming language onto their laptop from http://www.r-project.org/ (R is open-source free software). Final project write-ups will be due **Friday, April 13th**. Each of the project groups will perform an in-class 20 min presentation on **Monday, April 23rd, 2018 and Wed, April 25th, 2018**. During the week of April 16th, project groups will meet with Dr. Towers to discuss their final project write-ups, and their upcoming presentation. By Friday, April 27th, all group members are to submit to Prof Towers a confidential email, detailing their contribution to the group project, and detailing the contributions of the other group members.