11.2: The Cox Proportional Hazards Model

Learning objectives

  • Define the Hazard function

  • Describe the proportional hazards assumption

  • Fit a Cox proportional hazards model in R

Classwise videos

If you have questions as you watch the videos, feel free to send me an email or slack message! I will address common questions at the beginning of class.

Textbook

ISLR 11.5

Application exercise

For this exercise, we will use the breast cancer data used in the first survival analysis lecture.

Code book:

Variable Description
pid patient identifier
age age, years
meno menopausal status (0=premenopausal, 1=postmenopausal)
size tumor size, mm
grade tumor grade
nodes number of positive lymph nodes
pgr progesterone receptors (fmol/l)
er estrogen receptors (fmol/l)
hormon hormonal therapy (0=no, 1=yes)
rfstime recurrence-free survival time
status 0=alive w/o recurrence, 1=recurrence or death

First, get a glimpse of the data and calculate some summary statistics. For example, what percentage of observations are censored?

Use the code below to generate Kaplan-Meier curves and perform the log-rank test.

library(survival)
breastcancer <- read.csv("https://raw.githubusercontent.com/anlane611/datasets/main/breastcancer.csv", row.names = 1)
attach(breastcancer)
#Kaplan-Meier curve for all patients
surv.fit <- survfit(Surv(rfstime,status)~1)
plot(surv.fit, xlab="Days", ylab="Estimated probability of survival")

#Kaplan-Meier curve by treatment group
surv.fit.trt <- survfit(Surv(rfstime, status) ~ factor(hormon))
plot(surv.fit.trt, xlab="Days", ylab="Estimated probability of survival",
     col=c(2,4))
legend("topright", c("Trt", "No Trt"), col=c(4,2), lty=1)

#Log-rank test
survdiff(Surv(rfstime, status) ~ factor(hormon))
Call:
survdiff(formula = Surv(rfstime, status) ~ factor(hormon))

                   N Observed Expected (O-E)^2/E (O-E)^2/V
factor(hormon)=0 440      205      180      3.37      8.56
factor(hormon)=1 246       94      119      5.12      8.56

 Chisq= 8.6  on 1 degrees of freedom, p= 0.003 

Now, let’s fit a Cox proportional hazards model. Let’s start by just using the main predictor of interest, hormone therapy treatment. We can use the coxph function in the survival package to fit the model.

fit.cox <- coxph(Surv(rfstime, status) ~ factor(hormon))
summary(fit.cox)
Call:
coxph(formula = Surv(rfstime, status) ~ factor(hormon))

  n= 686, number of events= 299 

                   coef exp(coef) se(coef)      z Pr(>|z|)   
factor(hormon)1 -0.3640    0.6949   0.1250 -2.911   0.0036 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

                exp(coef) exp(-coef) lower .95 upper .95
factor(hormon)1    0.6949      1.439    0.5438    0.8879

Concordance= 0.543  (se = 0.014 )
Likelihood ratio test= 8.82  on 1 df,   p=0.003
Wald test            = 8.47  on 1 df,   p=0.004
Score (logrank) test = 8.57  on 1 df,   p=0.003

Examine the summary output for the model with only hormonal treatment group as a predictor.

  • What are the meanings and values of n and number of events?

  • How does the p-value of the hormonal treatment variable compare to the p-value obtained from the log-rank test?

Note that the exponentiated value of the coefficient estimate is already provided in the summary output. We can interpret exp(coef)=0.7 by saying the risk of death/recurrence for the treatment group is 30% lower than the risk of death/recurrence for the non-experimental treatment group.

Calculate the mean survival time for each treatment group without considering censoring. What is the difference between the groups and how does this compare to the results obtained with the model?

Recall that the Cox model relies on the key assumption of proportional hazards: the ratio of hazards is constant over time. Let’s check this assumption for the model. We can do this using the cox.zph function in the survival package. This generates a p-value for each predictor to test the proportional hazards assumption. A significant p-value here indicates that the assumption may be violated.

The GLOBAL line performs a test for the full model. Here, since we only have one predictor, the global test is the same as the predictor test.

test.ph <- cox.zph(fit.cox)
test.ph
               chisq df    p
factor(hormon) 0.227  1 0.63
GLOBAL         0.227  1 0.63

We can also look at a plot of the Schoenfeld Residuals, which are a residual measure unique to survival models that are commonly used to test the proportional hazards assumption. A straight line indicates proportional hazards. Here, we see a fairly straight line with some veering off at later time points.

plot(test.ph)

Now, add a few predictors to the model that you think are most relevant to accurately assess the relationship between treatment and survival time.

  • Fit the model and interpret the coefficient estimates. Which predictor(s) are statistically significant? Controlling for the other predictors, is there still a significant relationship between hormonal treatment group and survival time?

  • Use the cox.zph function to assess the proportional hazards assumption.

Additional practice

Labs in ISLR section 11.8