10.24: Ordinal regression

Learning objectives

Identify when to use an ordinal regression model
Interpret an ordinal regression model
Generate an ordinal regression model in R

Classwise videos

If you have questions as you watch the videos, feel free to send me an email or slack message! I will address common questions at the beginning of class.

Dr. Väisänan does a great job of explaining ordinal regression, though the code is in Stata. The lab exercise in class will introduce the R code.

Textbook

ISLR does not cover ordinal regression.

Application exercise

Groups for this week

Data

A study looks at factors that influence the decision of whether to apply to graduate school. College juniors are asked if they are unlikely, somewhat likely, or very likely to apply to graduate school.

Source: https://stats.oarc.ucla.edu/r/dae/ordinal-logistic-regression/

Use the code below to access the data:

library(foreign) #package to access data type
library(MASS) #package for model
dat <- read.dta("https://stats.idre.ucla.edu/stat/data/ologit.dta")

Data dictionary:
- apply: response to question “how likely are you to apply to graduate school?” with 3 options: very likely, somewhat likely, unlikely
- pared: parental education status (1=at least one parent has a graduate degree, 0 otherwise)
- public: undergraduate institution type (1=publis, 0=private)
- gpa: current undergraduate GPA (4.0 scale)
Take a glimpse of the data. What is the outcome?
Calculate some summary statistics for the data. How many students are in each category of apply? What is the average GPA for each category of apply?
- Note: For a table showing the number of each parental education status and institution type per level of apply, you can either use table or count. We have seen the count function before, and table works the same way (it’s just not a tidyverse function). Specify the two variables within the function, where the first specified variable will show up in the row of the table and the second variable will show up in the columns (e.g., table(dat$pared, dat$apply)

Ordinal regression model (AKA proportional odds model or cumulative logit model)

We can fit the model using the polr function in the MASS package. Note that the syntax is the same as what we’ve seen before, but we don’t need to specify family like we do with glm, because the polr function is specialized for the ordinal model. The Hess=TRUE option just stores the necessary information in the ord_mod object for us to access the summary information; we will forego the technical details here.

ord_mod <- polr(apply ~ pared + public + gpa, data=dat, Hess=TRUE)
summary(ord_mod)

Call:
polr(formula = apply ~ pared + public + gpa, data = dat, Hess = TRUE)

Coefficients:
          Value Std. Error t value
pared   1.04769     0.2658  3.9418
public -0.05879     0.2979 -0.1974
gpa     0.61594     0.2606  2.3632

Intercepts:
                            Value   Std. Error t value
unlikely|somewhat likely     2.2039  0.7795     2.8272
somewhat likely|very likely  4.2994  0.8043     5.3453

Residual Deviance: 717.0249 
AIC: 727.0249

Recall from the lecture videos that with the ordinal model, we have the same predictor coefficients for each level of the outcome, but different intercepts. This is why we see two components to the summary output: the Coefficients table and the Intercepts table.

Notice that the output does not provide p-values by default. We can manually calculate them ourselves and then create a table to show the estimates, standard errors, t values, and p values:

pvals <- pnorm(-abs(summary(ord_mod)$coef[,"t value"]))*2
ctable <- cbind(summary(ord_mod)$coef,pvals)

ctable

                                  Value Std. Error    t value        pvals
pared                        1.04769010  0.2657894  3.9418050 8.087072e-05
public                      -0.05878572  0.2978614 -0.1973593 8.435464e-01
gpa                          0.61594057  0.2606340  2.3632399 1.811594e-02
unlikely|somewhat likely     2.20391473  0.7795455  2.8271792 4.696004e-03
somewhat likely|very likely  4.29936315  0.8043267  5.3452947 9.027008e-08

Like other models we have seen, the intercepts are not typically the focus of the interpretations. Let’s focus on interpreting the coefficient estimates. First, let’s exponentiate the estimates and confidence intervals to interpret on the odds ratio scale.

exp_coefs <- exp(cbind(OR=coef(ord_mod),confint(ord_mod)))

Waiting for profiling to be done...

We can interpret the coefficient estimate of pared as follows:

For students whose parents did attend college, the odds of being more likely (i.e., very or somewhat likely versus unlikely) to apply to graduate school is 2.85 times that of students whose parents did not go to college, holding constant all other variables.

Write the interpretations for the other two predictor variables

Go Further: What if you wanted to interpret the gpa variable per 0.1 increase instead of per 1.0 unit increase?

Assessment

We can access predicted probabilities of being in each category:

head(ord_mod$fitted.values)

   unlikely somewhat likely very likely
1 0.5488310       0.3593310  0.09183798
2 0.3055632       0.4759496  0.21848725
3 0.2293835       0.4781951  0.29242138
4 0.6161224       0.3126888  0.07118879
5 0.6560149       0.2833901  0.06059505
6 0.6609240       0.2797117  0.05936430

head(predict(ord_mod))

[1] unlikely        somewhat likely somewhat likely unlikely       
[5] unlikely        unlikely       
Levels: unlikely somewhat likely very likely

Generate the confusion matrix to assess the accuracy of the predictions.

Finally, we should assess the proportional odds assumption. To do this, we can compare the predicted probabilities using the multinomial model, which is a more precise model, to the predicted probabilities with the ordinal model. Note that this is a subjective process because it is difficult to know how different the predicted probabilities should be to conclude that the assumption is violated. It is also useful to think through whether or not it is reasonable to assume that the odds of being in one category vs another would increase proportionally.

To generate predictions, let’s create a new data frame with different combinations of the predictor values. It is typically easiest to use different combinations of the categorical variables and hold the continuous predictors constant.

new.data <- data.frame(pared=c(0,1,0,1),
                       gpa=mean(dat$gpa),
                       public=c(0,1,1,0))

The new data contains a constant value of GPA and each unique combination of the values of parental education and institution type

new.data

  pared      gpa public
1     0 2.998925      0
2     1 2.998925      1
3     0 2.998925      1
4     1 2.998925      0

Now we can compare predicted probabilities from the ordinal model and the multinomial model

library(nnet) #package for multinomial model
mult_mod <- multinom(apply~pared+gpa+public,
                     data=dat)

# weights:  15 (8 variable)
initial  value 439.444915 
iter  10 value 357.012275
final  value 356.996982 
converged

predict(ord_mod, new.data, type="probs")

   unlikely somewhat likely very likely
1 0.5882547       0.3324677  0.07927756
2 0.3470234       0.4650134  0.18796324
3 0.6024157       0.3224929  0.07509137
4 0.3338251       0.4690740  0.19710087

predict(mult_mod, new.data, type="probs")

   unlikely somewhat likely very likely
1 0.5840849       0.3426999   0.0732152
2 0.3690459       0.3689135   0.2620406
3 0.6387143       0.2465215   0.1147642
4 0.3316785       0.5040241   0.1642974

Notice that the predicted probabilities for the unlikely category are very similar for both models. We do see some differences for the other two categories, but overall, I would conclude that we do not have strong evidence that the proportional odds assumption is violated.

You may also want to generate the confusion matrix for the multinomial model and see how it compares to the confusion matrix you generated for the ordinal model.