10.3: Introduction to GLMs

Learning objectives

  • describe why linear regression won’t work for a binary variable

  • describe the purpose of a link function

  • compute an odds ratio from a 2x2 table

Classwise videos

If you have questions as you watch the videos, feel free to send me an email or slack message! I will address common questions at the beginning of class.

These videos cover a few concepts that will lay the groundwork for logistic regression. We are not fitting the model yet; we will do that in the next set of videos.

The first video describes why we need generalized linear models (and specifically logistic regression) and some basic GLM terminology.

Bernoulli distribution video

The second video introduces odds ratios, which we will need to understand to interpret logistic regression coefficients.

Textbook

ISLR 4.2, 4.3 intro and 4.3.1

Application exercise

Groups for this week

Comprehension questions

  • In logistic regression, which distribution is assumed for the outcome Y?

  • What is a link function?

  • Why do we use link functions?

  • Why do we use log odds instead of directly modeling \(Y\) or \(p\)?

  • How do we define odds?

Practice problem (you don’t need to use R)

A study is conducted to assess the effectiveness of a new drug to treat back pain. 735 participants first rate their baseline pain level. 366 are then assigned to the experimental drug and everyone else is assigned to the placebo group (participants do not know which group they are in). After 2 weeks, participants rate their level of back pain again. Responses are compared to initial ratings and participants are grouped into either "decreased pain" or "increase/no change." 482 participants experienced decreased pain, of which 289 were in the experimental drug group.

  • Draw the 2x2 table for this problem

  • Calculate the probability of decreased pain in the experimental drug group and the placebo group

  • Calculate the odds ratio of decreased pain for the experimental group compared to the placebo group

  • Write a sentence to interpret the odds ratio in the context of the problem

  • If we were to conduct a hypothesis test to assess statistical significance of the odds ratio, what should the null value be?

Write your own example relevant to your domain of interest.

Answer

Reduced pain Inc/no change
Drug 289 77
Placebo 193 176

P(Red|Drug) = 289/366 = 0.79

P(Red|Placebo) = 193/369 = 0.52

Odds(Red|Drug) = 0.8/0.2 = 3.76

Odds(Red|Placebo) = 0.52/0.48 = 1.08

OR = 3.76/1.08 = 3.48

“The odds of reduced pain are 3.42 times higher in the experimental drug group than the placebo group.”