10.3: Introduction to GLMs
Learning objectives
describe why linear regression won’t work for a binary variable
describe the purpose of a link function
compute an odds ratio from a 2x2 table
Classwise videos
If you have questions as you watch the videos, feel free to send me an email or slack message! I will address common questions at the beginning of class.
These videos cover a few concepts that will lay the groundwork for logistic regression. We are not fitting the model yet; we will do that in the next set of videos.
The first video describes why we need generalized linear models (and specifically logistic regression) and some basic GLM terminology.
The second video introduces odds ratios, which we will need to understand to interpret logistic regression coefficients.
Textbook
ISLR 4.2, 4.3 intro and 4.3.1
Application exercise
Comprehension questions
In logistic regression, which distribution is assumed for the outcome Y?
What is a link function?
Why do we use link functions?
Why do we use log odds instead of directly modeling \(Y\) or \(p\)?
How do we define odds?
Practice problem (you don’t need to use R)
A study is conducted to assess the effectiveness of a new drug to treat back pain. 735 participants first rate their baseline pain level. 366 are then assigned to the experimental drug and everyone else is assigned to the placebo group (participants do not know which group they are in). After 2 weeks, participants rate their level of back pain again. Responses are compared to initial ratings and participants are grouped into either "decreased pain" or "increase/no change." 482 participants experienced decreased pain, of which 289 were in the experimental drug group.
Draw the 2x2 table for this problem
Calculate the probability of decreased pain in the experimental drug group and the placebo group
Calculate the odds ratio of decreased pain for the experimental group compared to the placebo group
Write a sentence to interpret the odds ratio in the context of the problem
If we were to conduct a hypothesis test to assess statistical significance of the odds ratio, what should the null value be?
Write your own example relevant to your domain of interest.
Answer
| Reduced pain | Inc/no change | |
|---|---|---|
| Drug | 289 | 77 |
| Placebo | 193 | 176 |
P(Red|Drug) = 289/366 = 0.79
P(Red|Placebo) = 193/369 = 0.52
Odds(Red|Drug) = 0.8/0.2 = 3.76
Odds(Red|Placebo) = 0.52/0.48 = 1.08
OR = 3.76/1.08 = 3.48
“The odds of reduced pain are 3.42 times higher in the experimental drug group than the placebo group.”