Cox Regression
Cox regression (or proportional hazards regression) is a method for investigating the effect of several variables upon the time a specified event takes to happen. In the context of an outcome such as death, this event could occur in five days (mortality in the high-risk group) or perhaps three years (in the low-risk group). It is named after the statistician who developed it, Sir David Cox.
Introduction
Cox regression uses the proportional hazards model to express the effect of the predictors on survival time. What sets it apart is its ability to incorporate time-dependent covariates: variables whose values can vary with time. An example could be the amount of physical exercise: maybe a subject starts to exercise more at a certain point during the study, which could impact the time-to-event (survival).
The Cox model is expressed by the hazard function, which describes the risk at a time t, conditional on survival until time t, as a function of a set of covariates:
h(t | X) = h0(t) × exp(b1X1 + b2X2 + ... + bpXp).
where:
- t is the survival time.
- h(t) is the hazard function determined by a set of p covariates (X1, X2, ..., Xp).
- The coefficients (b1, ..., bp) measure the effect (i.e., the change in hazard) of the covariates.
- The term h0 is called the baseline hazard, and is the value of the hazard if all the Xi are equal to zero (the quantity exp(0) equals 1).
Advantages
-
Cox Regression accounts for censoring in the sense that it's able to handle the scenarios in which the outcome of interest has not yet occurred for several observations in the sample.
-
Naturally, it can handle more than one predictor variable.
-
It allows to study how the strength and nature of relationships between mortality and predictors changes with age.
Disadvantages
-
The main limitation of Cox regression is its assumption of proportional hazards, which may not be appropriate for all datasets and must be empirically tested.
-
It does not naturally admit random effects or a natural way to model interactions between covariates.
-
It is not as interpretable as other models, since its output is a hazard ratio and not a clear-cut probability.
Application Scenarios
Cox regression can be widely used in clinical trials or community trials to compare survival curves of two or more groups (like drug vs. placebo or different diet groups) and to explore the effects of predictor variables. It is commonly used in medical research for investigating the association between the survival time of patients and one or more predictor variables.
Python Implementation
Python's lifelines
module has a class called CoxPHFitter
to fit the Cox regression model. Here is a sample usage:
from lifelines import CoxPHFitter
# Instantiate the class to create a cph object
cph = CoxPHFitter()
# Fit the data to train the model
cph.fit(data, 'Survival_in_days', event_col='Death')
# Have a look at the significance of the features
cph.print_summary()
References: