Propensity score matching — Glossary Aria Research

Extended definition

Propensity Score Matching (PSM) is a family of causal inference methods in observational studies that matches treated individuals to controls based on the propensity score — the estimated probability of receiving treatment, given observed covariates:

e(X) = P(T=1 \mid X)

where $T$ is the treatment variable and $X$ the covariate vector. Rosenbaum and Rubin (1983, Biometrika) formalized the fundamental theoretical result: conditioning on the propensity score is sufficient to remove confounding bias from observed covariates (assumption of ignorability) — achieving balance between groups without matching on each covariate individually. Typical score estimation via logistic regression with $T$ as outcome. Matching uses different algorithms: 1:1 nearest neighbor, 1:k matching, caliper matching, optimal matching, full matching. Stuart (2010, Statistical Science) consolidated the modern review comparing methods. Quality metrics include standardized mean difference < 0.1 between matched groups on each covariate. Sensitivity analysis (Rosenbaum bounds) evaluates robustness to unobserved confounders of hypothesized magnitude.

When it applies

PSM applies in observational studies where randomization is infeasible or unethical: drug effect in real-world practice (not in trials), social program effect on self-selected participants, environmental exposure effects, surgical procedure effect in eligible patients. It applies when there is a rich set of measured covariates that plausibly capture selection mechanisms: age, sex, comorbidities, disease severity, socioeconomic factors. It applies in pharmacoepidemiological studies with large databases (claims, EHR), where clinical trials would be costly. It also applies in economics (government program evaluation), political science (institutional effects), education (program effect on volunteer students).

When it does not apply

It does not apply as a substitute for randomized trials when those are feasible and ethical: PSM corrects confounding from observed covariates but not from unobserved confounders. It does not apply when there is lack of overlap between groups: if treated and control propensity-score distributions do not share a common region, matching forces invalid extrapolation. It does not apply in scenarios where treatment causes the covariates used in the score: post-treatment covariates introduce bias. It does not replace mechanism analysis: PSM estimates the average effect under ignorability, does not explain how the effect occurs. In small samples, PSM loses statistically to simpler alternatives (regression with adjustment); PSM benefits appear in moderate to large $n$ .

Applications by field

— Pharmacoepidemiology: drug comparison in real-world practice using claims data; FDA increasingly accepts PSM evidence in regulatory decisions. — Health economics: effect of health insurance, public health programs. — Political science: effect of democratic institutions, conflict interventions. — Education: evaluation of voluntary programs where randomization is not feasible.

Common pitfalls

The first pitfall is treating PSM as magic that eliminates confounding: PSM corrects only observed-covariate confounding; unmeasured confounders remain a threat. Rosenbaum sensitivity analysis is partial defense. The second is failing to check post-matching balance: successful matching produces standardized mean difference < 0.1 on each covariate; without this check, balance may be illusion. The third is ignoring lack of overlap: very sick patients on treatment may have no control counterpart, and vice versa; restricting analysis to the overlap region is standard practice. The fourth is using the same score for unrelated subsequent analyses: PSM is specific to the studied outcome and covariates. The fifth is confusing PSM with classical regression with covariate adjustment: both are complementary; PSM has interpretability advantages for balance, regression can be more statistically efficient in some scenarios.