Survival analysis — Glossary Aria Research

Extended definition

Survival analysis is the family of statistical methods for modeling time to an event (death in clinical trials, disease recurrence, mechanical failure, customer churn, credit default), with explicit treatment of censored data — observations in which the event did not occur by study end or the subject left before. The canonical estimator for the survival function $S(t) = P(T > t)$ is the Kaplan-Meier:

\hat{S}(t) = \prod_{t_i \leq t} \left(1 - \frac{d_i}{n_i}\right)

where $d_i$ is the number of events at time $t_i$ and $n_i$ the number at risk immediately before $t_i$ . Kaplan and Meier (1958, JASA) formalized this non-parametric estimator. For regression (covariate effects), the Cox proportional hazards model (1972, JRSS B) is dominant: it assumes that the hazard ratio between two subjects is constant over time, without specifying the baseline hazard form. The exponentiated coefficient $e^{\beta}$ is the hazard ratio — analogous to the odds ratio for event rate. Variants include parametric models (Weibull, log-logistic), frailty models (random effects), and competing risks methods (Fine-Gray).

When it applies

Survival analysis applies in any study whose primary outcome is time to an event. It is standard in oncological clinical trials (overall survival, progression-free survival), epidemiology (age-adjusted incidence), reliability engineering, churn analysis in business. It applies whenever there is censoring: ignoring censoring by treating “event has not yet occurred” as “event will not occur” produces systematic bias. It applies in comparing survival curves via log-rank (Mantel-Cox) or Gehan-Wilcoxon test, and in confounder adjustment via Cox. CONSORT requires reporting follow-up time and number at risk in clinical trial Kaplan-Meier curves.

When it does not apply

It does not apply when the outcome is binary without a temporal component — use logistic regression. It does not apply when the proportional-hazards assumption is violated — alternatives: stratification, time-varying covariates, parametric models. It does not apply with small $n$ and few events: hazard ratio with unstable CI and underpowered log-rank test. It does not apply directly to composite outcomes without clear hierarchical criteria across components (death vs. recurrence vs. hospitalization). In competing risks (subject can fail from mutually exclusive causes), standard Cox can give misleading results — Fine-Gray or cause-specific hazards are appropriate.

Applications by field

— Oncology: standard for overall survival (OS), progression-free survival (PFS), CONSORT analysis. — Cardiology: time-to-event in secondary prevention trials; meta-analyses of hazard ratios. — Reliability: Weibull for time-to-failure of components; right censoring endemic. — Business churn analysis: time to subscription cancellation; behavioral covariates via Cox.

Common pitfalls

The first pitfall is ignoring censoring and treating “not yet event” as “no event” — a systematic bias that underestimates the true event rate. The second is confusing hazard ratio with relative risk: HR is instantaneous ratio over time; RR is cumulative. In rare outcomes and short follow-up they coincide approximately; in general they differ. The third is failing to test the proportional-hazards assumption — Schoenfeld residuals and log-log plots are standard diagnostics. The fourth is interpreting Kaplan-Meier curves without reporting numbers at risk at chronological points: curve tails with small $n$ are statistically unstable and visually misleading. The fifth is confusing median survival with mean time: median is the time where $\hat{S}(t) = 0.5$ ; mean (area under the curve) requires extrapolation beyond follow-up and demands care.