8.31: Introduction to Multiple Linear Regression

Learning objectives

  • distinguish between the simple linear regression model and the multiple linear regression (MLR) model

  • interpret coefficient estimates for continuous and categorical variables in MLR

  • interpret interaction terms in MLR

  • fit MLR models in R

Classwise videos

These videos cover the concepts. We’ll look at how to do these things in R during class on Thursday. If you have questions as you watch the videos, feel free to send me an email or slack message! I will address common questions at the beginning of class.

Click this invite link to join the Classwise course: Classwise join link

If you have problems with the invite link, try going directly to classwise.org and click “Login” and then “School SSO.” Then you should be able to use your email to access the videos directly on this page or on the classwise site. You do not need to create a new account.

After joining the course, you should be able to view the videos directly from the course website

Textbook

ISLR sections 3.1-3.3

Survey

Click here to complete the survey

Class code

In RStudio, run the following in the console:

download.file("https://raw.githubusercontent.com/anlane611/702-classcode/main/introtoMLRcode.qmd", destfile="introtomlrcode.qmd")

Application exercise

Palmer penguins data

Data were collected and made available by Dr. Kristen Gorman and the Palmer Station, Antarctica LTER, a member of the Long Term Ecological Research Network.

library(tidymodels)
library(tidyverse)
library(palmerpenguins)
  1. Use the glimpse() output to determine the following:

    • sample size and number of variables

    • which variables are categorical and which are numeric? is there any missing data? are the categorical variables stored appropriately?

  2. Select an appropriate outcome and primary (continuous) predictor of interest.

    • generate a scatter plot for the two variables (remember to label your axes!)

    • color the scatter plot by sex. Does an interaction term seem appropriate?

    • color the scatter plot by species. Does an interaction term seem appropriate?

  3. Fit a model regressing the outcome variable you selected onto the primary predictor of interest, sex, and species.

    • Write interpretations for the coefficient estimates, p-values, and confidence intervals

    • Is the species variable statistically significant? Conduct the appropriate test.

  4. Add an interaction term that seems appropriate based on the EDA from #2. Interpret the p-value and coefficient estimate for the interaction term in the context of the dataset.