2. Learning Objectives
• By the end of this session, the student will be able to
– Identify applications with time-to-event outcomes
– Construct a life table using the actuarial approach
– Construct a life table using the Kaplan–Meier approach
– Perform and interpret the log-rank test
– Assess Cox proportional hazards assumptions
– Compute and interpret a hazard ratio
10 March 2023 2
3. Introduction
• Health science studies often focus on describing disease-free survival, patient survival,
or time to interested events of a specific group of subjects. For example,
– Disease free and survival time of breast cancer after surgery
– Disease-free survival of osteosarcoma patients after receiving radiotherapy
– Probability of 2- and 5-years disease-free survival of ovarian cancer patients after
receiving treatments
• The time interval from a starting point and a subsequent event is known as the survival
time. Survival Analysis is statistical methods for analyzing survival data.
10 March 2023 3
4. Example
Summary statistics
– Median (using Kaplan Meier)
Bivariate association
– Log Rank Test
– Simple cox proportional hazard regression
Multivariate analysis
– Multiple Cox Proportional Hazard regression
Example
– Time to death
– Time to disease progression
– Time to cardiovascular event
– Time to pregnancy
10 March 2023 4
5. Event and Censoring
• Event: We may not observe all the event of interest within the defined follow-up period
– Thus, data are typically subject to censoring when a study ends before the event occurs
• Censoring: Subjects are said to be censored if they are lost to follow up or drop out of the
study, or if the study ends before they die or have an outcome of interest.
Figure 1. Illustration of censored observations, × = event, + = censored event.
10 March 2023 5
6. Censoring cont’d
• Left censoring: The patient is known to have experienced the event before the
start of the observation period, so the actual time-to event is shorter than the
interval between the origin and start of observation, but it is unknown by how
much.
• Right censoring: Observation of the patient is terminated before the event
occurs, so the actual time-to-event, if it were to occur, is longer than the
observation time, but it is unknown by how much.
10 March 2023 6
7. Event and Censoring cont’d
• Non-informative censoring: occurs when participants who drop out of the study
should do so due to reasons unrelated to the study.
• Informative censoring: occurs when participants are lost to follow up due to
reasons related to the study, e.g. in a study comparing disease-free survival after
two treatments for cancer, the control arm may be ineffective, leading to more
recurrences and patients becoming too sick to follow-up.
– Patients on the intervention arm may be completely cured by an effective treatment and
may no longer feel the need to follow-up. If these participants are routinely censored, the
true treatment effect will not be picked up and the results of the study will be biased.
10 March 2023 7
8. Rate
• The rate is defined as the events per person-time of observation.
Incidence rate = Number of new cases of disease
Total person−time at risk
Mortality rate = Number of deaths
Total person−time at risk
10 March 2023 8
9. Cont’d
10 March 2023 9
In total, these 14 subjects contribute 3+4+1+5+(6*10) = 73 person-years. Incidence rate = 3/73 events per person
year = 4.1 per 100 person year.
10. Cont’d
❖Must define the following
Studied patient must be free from interested event/disease at the baseline
enrollment
Clearly define the starting date for each patient, e.g., date of diagnosis, date of
receiving treatment, date of operation, etc.
Patient’s status, which refers to an occurrence of the interested event such as
death, recurrence, infection, remission, recovery & etc. Also, Censor: Loss to follow
up/Withdraw or terminate from the study
The end date: Death date if patients die, Date at the end of study
10 March 2023 10
11. Survivor Function
• The probability of surviving to time t or beyond, the survivor function or survival
curve S(t), which is given by
S(t) = P(T >= t) = 1- F(t)
• Calculation of S(t) for 10 lung cancer patients
10 March 2023 11
12. Example
• In survival analysis, we use information on event status and follow up time to estimate a survival
function. For instance a 20 year prospective study of patient survival following a myocardial infarction
might be as depicted in the figure below.
10 March 2023 12
– The horizontal axis represents time in years, and the vertical axis shows the probability of surviving. At time 0, the
survival probability is 1 (all participants are alive).
– At 2 years, the probability of survival is approximately 0.83.
– At 10 years, the probability of survival is approximately 0.55.
– The median survival is approximately 11 years.
– A survival curve close to 1 suggests very good survival, whereas a survival curve that drops sharply toward 0 suggests
poor survival.
13. Estimating the survivor function, S(t)
• There are two main methods to estimate S(t):
1. the life table method and
2. The Kaplan-Meier method
• An estimate of S(t) could be obtained by simply calculating the proportion of
individuals still alive at selected values of t, such as completed years.
10 March 2023 13
14. 1. Life table method for estimating S(t)
• Also known as the `actuarial method’.
• The approach is to divide the period of observation into a series of time
intervals and estimate the conditional (interval-specific) survival proportion
for each interval.
• The cumulative survivor function, S(t), at the end of a specified interval is
then given by the product of the interval-specific survival proportions for
all intervals up to and including the specified interval.
10 March 2023 14
15. Example
• A cohort study designed to study time to death which involves 20 participants
who are 65 years of age and older; they are enrolled over a 5 year period and
are followed for up to 24 years until they die, the study ends, or they drop out
of the study (lost to follow-up). In the study, there are 6 deaths and 3
participants with complete follow-up (i.e., 24 years). The remaining 11 have
fewer than 24 years of follow-up due to enrolling late or loss to follow-up.
10 March 2023 15
17. Cont’d
• To construct a life table, we first organize the follow-up times into equally spaced
intervals. In the table above we have a maximum follow-up of 24 years, and we
consider 5-year intervals (0-4, 5-9, 10-14, 15-19 and 20-24 years). We sum the number
of participants who are alive at the beginning of each interval, the number who die, and
the number who are censored in each interval.
10 March 2023 17
18. Cont’d
• In the absence of censoring, the interval-specific survival proportion is p = (L-d)/L,
where d is the number of events (deaths) observed during the interval and L is the
number of patients alive at the start of the interval.
• In the presence of censoring, it is assumed that censoring occurs uniformly throughout
the interval such that each individual with a censored survival time is at risk for, on
average, half of the interval. This assumption is known as the actuarial assumption.
• The effective number of patients at risk during the interval is given by L = L-½w where L
is the number of patients alive at the start of the interval and w is the number of
censorings during the interval.
10 March 2023 18
19. Example cont’d
Interval in Years
At Risk
Average # at risk
during interval
# of deaths during
interval
Lost to Follow-Up
Proportion dying
during interval
Proportion surviving
during interval
Cumulative survival
Probability
0 to 4 20 20-(1/2)=19.5 2 1 2/19.5 = 0.103 1-0.103 = 0.897 1*0.897 = 0.897
5-9 17 17-(2/2)=16 1 2 1/16 = 0.063 1-0.063 = 0.937 0.897*0.937 = 0.840
10-14 14 14-(4/2)=12 1 4 1/12=0.083 1-0.083=0.917 0.840*0.917=0.770
15-19 9 9-(3/2)=7.5 1 3 1/7.5=0.133 1-0.133=0.867 0.770*0.867=0.668
20-24 5 ? 1 4 ? ? ?
10 March 2023 19
The time is divided into equally spaced intervals in actuarial method to construct the
follow-up life table.
20. 2. Kaplan-Meier Estimator (Product Limit) approach
• The Kaplan-Meier estimator is a nonparametric estimator of the survivor function S(t).
• It has been developed by Edward Kaplan and Paul Meier since 1950.
• An issue with the life table approach shown above is that the survival probabilities can change
depending on how the intervals are organized.
• Kaplan-Meier approach rests on the assumption that censoring is independent of the likelihood
of developing the event of interest and that survival probabilities are comparable in participants
who are recruited early and later into the study.
• The main difference is the time intervals, i.e., with the actuarial life table approach we consider
equally spaced intervals, while with the Kaplan-Meier approach, we use observed event times
and censoring times.
10 March 2023 20
21. Life table using the Kaplan-Meier Approach
Time of events Live at the start(n) # of deaths(d)
Lost to
Follow-Up
Probability of dying
(d/n)
Probability of
surviving(1-d/n)
Probability of survivors at the
end of time (L)
0 20 0 1
1 20 1 1/20= 0.050 1-0.05=0.950 0.950
2 19 1 0/19 = 0.000 1-0 = 1 0.950*1 = 0.950
3 18 1 1/18=0.056 1-0.056=0.944 0.950*0.944=0.897
5 17 1 1/17=0.059 1-0.059=0.941 0.897*0.941=0.844
6 16 1 0 1 0.844
9 15 1 0 1 0.844
10 14 1 0 1 0.844
11 13 1 0 1 0.844
12 12 1 0 1 0.844
13 11 1 0 1 0.844
14 10 1 1/10=0.10 1-0.1=0.90 0.844*0.90=0.760
17 9 1 1 1/9=0.111 1-0.111=0.889 0.760*0.889=0.676
18 7 1 0 1 0.676
19 6 1 0 1 0.676
21 5 1 0 1 0.676
23 4 1 ¼=0.25 1-0.25=0.75 0.676*0.75= 0.507
24 3 3 0.507
10 March 2023 21
22. Median survival time
• Median survival time is defined as the survival point for which the probability of
survival is 50%.
• In the survival curve shown below, the symbols represent each event time, either a
death or a censored time.
• From the survival curve, we can also estimate the probability that a participant survives
past 10 years by locating 10 years on the X axis and reading up and over to the Y axis.
The proportion of participants surviving past 10 years is 84%.
• The median survival is estimated by locating 0.5 on the Y axis and reading over and
down to the X axis. The median survival is approximately 23 years.
10 March 2023 22
23. Kaplan Meier survival curve
10 March 2023 23
Figure. Kaplan-Meier Survival Curve using the above data.
24. Log rank test
• Suppose that there are 2 treatment regimes that each patient took.
• The question: is there a difference in the survival curves between patients who took
treatment 1 and those who took treatment 2?
• log rank test is used for comparing the survival curves of two or more treatments. It is
a non parametric test to compare the survival curves of two or several groups.
• Hypotheses
HO: the survival curves of the two treatments are the same
H1: the survival curves of the two treatments are different
10 March 2023 24
25. Log rank test
Both groups have identical distribution curves Both groups have different distribution curves
10 March 2023 25
27. Types of analysis in survival analysis
1. Non parametric survival analysis: Kaplan Meier, Log rank test
2. Semi parametric survival analysis: Cox proportional hazard regression
3. Parametric survival analysis: Weibull, gamma, lognormal…
10 March 2023 27
28. Cox regression
• Kaplan-Meier can be used to compare survival in different sub-groups and Log-rank for
significance tests. It cannot be used to explore the effects of several variables on survival.
• However, when there are several explanatory variables (when some of these are
continuous) a regression method such as Cox regression is preferred.
• Model of study in time to event data.
• Measure of association is HR (Hazard ratio).
• The model assume hazard ratio should be constant over a time.
• This property is called the proportional hazards assumption.
10 March 2023 28
29. Hazard Function
• The term `hazard rate' is the generic term used to describe the `event
rate’.
• The hazard function is the instantaneous event rate at time t, conditional
on survival up to time t.
• The hazard function is given by h(t) = f (t)/S(t)
• Hazard function h (t) is the probability that an individual will die (fail) at
time t, conditional on s/he has survived before time t.
10 March 2023 29
30. Cox regression
𝐡𝐢𝐭 = 𝐡𝟎𝐭 × 𝐞𝛃𝟎+𝛃𝟏𝐗𝟏+𝛃𝟐𝐗𝟐+⋯𝛃𝐩𝐗𝐩
Where
– hit: is the hazard for the ith case at time t
– h0t: is the baseline hazard
– P is the number of covariates
– βp is the value of the regression coefficient
– X1, X2 , Xp are covariates (predictors) who are included in the model.
10 March 2023 30
32. Variables in the chd data set
• Number of study participants: 337
• Number of variables: 12
10 March 2023 32
Variable Variables’ label
id Study participants’ identification number
chd Failure: 1 = event, 0 censored
time Time in study (years)
energy Indicator for high energy
Occupation Occupation of patients
Agecat Age of patients(1: <65; 2: >= 65)
Variable Variables’ label
height Height (cm)
weight Weight (kg)
doe Date of entry
dox Date of exit
dob Date of birth
age Age of patients
38. Log-rank test: SPSS output
10 March 2023 38
There is a statistically significant
difference between two groups
in survival time.
39. Simple Cox Regression
• The Cox Regression procedure is useful for modeling the time to a specified event,
based upon the values of given covariates.
• One or more covariates are used to predict a status (event).
• The central statistical output is the hazard ratio.
• Data contain censored and uncensored cases. Similar to logistic regression, but Cox
regression assesses relationship between survival time and covariates .
10 March 2023 39
40. Cox regression cont’d
– Status variable: the dependent in Cox regression, should be binary variable.
– Time variable: measures duration to the event defined by the status variable
(continuous or discrete).
– Covariates: independent/predictor variables. They can be categorical or continuous.
They also can be time-fixed or time-dependent.
– Interaction terms
– Categorical covariates: SPSS automatically convert them into a set of dummy
variables, omitting one category.
10 March 2023 40
41. Simple Cox Regression: SPSS Outputs
10 March 2023 41
• Exp(B), which can be interpreted as the predicted change in the hazard for a unit
increase in the predictor.
• For binary covariates, hazard ratio is the estimate of the ratio of the hazard rate in
one group to the hazard rate in another group.
• The hazard ratio for Patients aged < 65 is 4.065 times that of patient aged 65 and
above.
42. Multivariable cox regression model
• Ability to study the simultaneous effect of several variables/covariates on the
outcome. For example:
• Determine the predictors for the outcome
• In clinical trials: Adjust treatment differences for baseline imbalances in
demographic and clinical characteristics of the patients
• Adjust analysis for possible confounding in observational studies
10 March 2023 42
43. Proportional hazard assumption (PHA)
Graphical method (log-log method)
– It is subjective.
– The log cumulative baseline hazards for the strata are plotted against time
– The resulting curves should be parallel if the proportional hazards assumption
holds
The global goodness-of-fit test
– Proposed by Schoenfeld for testing the PH assumption.
– It is an objective test and it is usually recommended.
10 March 2023 43
44. Proportional hazard assumption (PHA)
• HR is constant overtime
• Several methods can be used:
– log(-log(survival))vs log(time)graph
– Time dependent cox model
Interaction term with time
– Schoenfeld residuals analysis(not available in SPSS)
10 March 2023 44
47. Summary
• After all variable is checked for PHA we will run multivariable cox
regression
• First use 0.2/0.25 p value cut off point to select variable for multivariable
cox regression
• Then use stepwise (backward or forward) variable selection method
• Select the best model using AIC, BIC or log likelihood ratio
• Finally, check overall model adequacy by using cox Snell residual
• Interpret the result
10 March 2023 47