EDUC 847

Welcome to EDUC 847

The Plan

The main idea is to have some time learning about different aspects of regression analysis, and to actually focus on doing the analyses.

Topics Date
Intro to R and Plotting 9 January 2024
No Class - MLK 15 January 24
Bivariate Linear Regression 22 January 24
Assumption Checking 29 January 24
Data transformations 5 Februry 24
Multiple Regression 12 Februry 24
Model Selection and Evaluation 26 Februry 24
Written Assignment #1 Reviews 26 Februry 24
Logistic Regression 4 March 24
TBD - Choose your own adventure 11 March 24
No Class - Final Projects Due 18 March 24

What might you learn?

There are lots of things that I hope you would learn: to use R tools to look at regression in a variety of forms, to read code, to write code, to find that you are not afraid (and actually like) to analyze data with R. By the end of all three sessions here are things I think you should be able to do after participating in each of the workshops:

Week One

Week 1 Slides

  • Import Data into R
  • Clean data and prepare data for Regression
  • Organize data for Regression
  • Vizualize data

Week Two

Week 2 Slides

  • Bivariate Regression
  • Residuals
  • Ordinary Least Squares
  • R2
  • Interpreting Regression Output.

Week Three

Week 3 Slides

  • Model Assumption Checking
    • Normality of residuals, Linearity, homogeneity of variance, no bad outliers
  • Hypothesis Testing for the Linear Model
  • Hypothesis Testing for the coefficients
  • Effect Sizes for the coefficients

Week Four

Week 4 Slides

  • transform data when it is not normally distributed
    • Log-Transform
    • Z-scores
  • visualize missingness in our dataset
  • use multiple imputation with chained equations.

Assignment One Assignment One

Week Five

Week 5 Slides

  • run a multiple regression
  • interpret a multiple regression
  • consider categorical variables
  • dummy code for categorical variables
  • visualize regression coefficients
  • handle interaction terms

Week Six

Week 6 Slides

  • create a logistic regression model
  • interpret logistic regression coefficients
  • use a confusion matrix to evaluate the predictive power of the model

Week Seven

Week 7 Slides

  • create a logistic regression model using machine learning
  • create a train test split.
  • use a confusion matrix to evaluate the predictive power of the model

Week Eight

Week 8 Slides

  • calculate Bayes factors
  • use Bayesian regression to compare models
  • interpret Bayes factors
  • use Bayes factors to determine what variable to include (and those to drop)

What is beyond the scope?

While I would love to say you’d be able to master doing regression in R in the short span of one quarter, you will not. I’ve spent a decade learning it and am proficient, but am constantly learning.

Similarly, R is a tool. You are the expert in your research, R can help you to uncover elements of your data. I don’t know how to analyze your data - learning R will empower you to analyze your data.

Why R?

I began learning R because I wanted to have something that worked across different platforms. But then I realized that the Open Source movement was something that was particularly interesting to me. But those aren’t the reason. I have come to see R as crucial for reproducibility. If I write a set of code to analyze a dataset, I can re-run this code, and get the same results. I can share the code with my publications, so someone else can check my work. This is critical to doing science. R allows me to do this in a way that Excel doesn’t.

Certainly other languages (Python, Julia, etc…) will also allow for this, but why I have stuck with R is that there is a rich community that are supportive of learning and growing.

Why Tidyverse?

Open source things tend to be a bit tribal - within R one debate is Base R vs. Tidyverse. This is a debate about how to teach R, and I have decided to use the Tidyverse. It might not ever matter to you, but I want you to know that I have thought about this, and my answer is to use Tidyverse for the following reasons:

  • Readability - Code should be readable by humans
  • Coherence with my practice - I generally try to use the Tidyverse approach in my work, so I feel like I can best help by using these tools
  • Coherence with themselves - These tools are organized in a way that helps them to be internally coherent
  • Support - There are loads of books and websites that are published open-source, and so you can typically find the help you need.

Resources for Learning R

Learning R is not easy, as I have learned things, I have tried to maintain a list of the different resources that I have drawn on.

The Basics

  1. Install R and RStudio
  2. Introduction to R (using Base R) - R Programming for Data Science
  3. My all time favorite book for learning statistics and R is Learning Statistics with R
  4. Danielle Navarro also has a course Robust Tools that gives a sense about the breadth of the tools that R provides.
  5. These two guides, The R Guide and R for Beginners

A bit more advanced

  1. Hadley Wickham’s book, R For Data Science
  2. Project Oriented Workflows
  3. WTF R Jenny Bryan is great at explaining things about R and so frequently, I find myself looking at her books.
  4. Cheat Sheets I can never remember all the commands.
    1. Data Transformation
    2. ggplot2
    3. Regular Expressions
  5. Happy Git With R Eventually you’ll need to use Git, and this is essential reading for making that transition.
  6. RMarkdown Markdown allows you to use reproducible code and integrate R into reporting and writing. This is how I do all my work.
  7. DataCarpentry runs workshops about data analysis and often are oriented to different research specialties.
    1. Social Science Data Carpentry has a social science oriented curriculum that looks great.
  8. The Effect A really great book that approaches data analysis from a causal inference perspective with code in R, Python, and Stata.
  9. Statistics and Data Visualization Using R: The Art and Practice of Data Analysis. A book with data sets and R scripts embedded.

Datasets