10.5: Logistic Regression estimation and interpretation

Learning objectives

  • Describe the estimation procedure for a logistic regression model

  • Interpret coefficients, confidence intervals, and p-values in logistic regression

  • Generate predictions from a logistic regression model

Classwise videos

If you have questions as you watch the videos, feel free to send me an email or slack message! I will address common questions at the beginning of class.

Supplemental videos:

Maximum likelihood vs least squares in linear regression

Logistic regression: the basics

Logistic regression: likelihood and deviance

Textbook

ISLR 4.3.2-4.3.4

Application exercise

Groups for this week

This exercise uses a dataset that contains information on patients and whether or not they were diagnosed with heart disease.

Source and data dictionary

Kaggle link (may have more info on variables)

Read in the data:

heart <- read.csv("https://raw.githubusercontent.com/anlane611/datasets/main/heart.csv")
  1. Explore the data. Use the data dictionary to determine which variables are numeric and which are categorical. Create factor variables for the categorical variables.
  2. Provide summary statistics (N and %) for the outcome variable
  3. Calculate summary statistics for those with and without heart disease for the following variables: age, sex, chest pain type, cholesterol. (which summary statistics are appropriate for each variable?)
  4. Fit a logistic regression model regressing heart disease status on the predictors listed in #3.
    • Which variables are significantly associated with heart disease status?
    • What are the reference levels for the categorical variables?
    • Write interpretations for the coefficient estimates. Interpret the coefficient estimates in terms of log odds and odds/odds ratios.
    • Obtain confidence intervals for the coefficient estimates on the log odds and the odds scale. Write interpretations.

Interpretation example

logistic_mod <- glm(factor(target)~age+factor(sex)+factor(cp)+chol,
                    data=heart,
                    family="binomial")
summary(logistic_mod)

Call:
glm(formula = factor(target) ~ age + factor(sex) + factor(cp) + 
    chol, family = "binomial", data = heart)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.3725  -0.7048   0.2500   0.7283   2.3014  

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)   4.996949   1.272158   3.928 8.57e-05 ***
age          -0.063873   0.017857  -3.577 0.000348 ***
factor(sex)1 -1.893397   0.362270  -5.226 1.73e-07 ***
factor(cp)1   2.539393   0.448455   5.663 1.49e-08 ***
factor(cp)2   2.369649   0.356264   6.651 2.90e-11 ***
factor(cp)3   2.239028   0.536138   4.176 2.96e-05 ***
chol         -0.004843   0.002860  -1.693 0.090421 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 417.64  on 302  degrees of freedom
Residual deviance: 290.08  on 296  degrees of freedom
AIC: 304.08

Number of Fisher Scoring iterations: 5
exp(coef(logistic_mod))
 (Intercept)          age factor(sex)1  factor(cp)1  factor(cp)2  factor(cp)3 
 147.9610497    0.9381237    0.1505595   12.6719713   10.6936382    9.3842018 
        chol 
   0.9951684 
confint(logistic_mod)
Waiting for profiling to be done...
                   2.5 %        97.5 %
(Intercept)   2.56601978  7.5719389233
age          -0.09988456 -0.0296383710
factor(sex)1 -2.63098019 -1.2062295548
factor(cp)1   1.69939388  3.4701380328
factor(cp)2   1.69363365  3.0951948249
factor(cp)3   1.22495261  3.3471173833
chol         -0.01052805  0.0007980208
exp(confint(logistic_mod))
Waiting for profiling to be done...
                   2.5 %       97.5 %
(Intercept)  13.01392295 1942.9037732
age           0.90494188    0.9707965
factor(sex)1  0.07200785    0.2993237
factor(cp)1   5.47063055   32.1411787
factor(cp)2   5.43920900   22.0915421
factor(cp)3   3.40400477   28.4206895
chol          0.98952718    1.0007983

Log odds scale:

With each additional year, the log odds of heart disease decrease by 0.06, all else held constant.

The log odds of heart disease for males are 1.89 lower than the log odds of heart disease for females.

We are 95% confident that the true difference in log odds of heart disease between males and females is between -2.6 and -1.2.

Odds/odds ratio scale:

With each additional year, the odds of heart disease decrease 0.94 times (or: the odds decrease by 6%).

The odds of heart disease for males are .15 times the odds of heart disease for females (or: the odds are 85% lower).

We are 95% confidence that the true odds ratio of heart disease for males to females is between 0.07 and 0.3.