DATA & STATISTICS

Effect size

Quantitative measure of the magnitude of an observed effect or difference, independent of sample size. Includes the d (Cohen), r (correlation), and odds ratio families. A reporting component required by modern standards (DORA, ASA, APA, AMA).

Extended definition

Effect size is any quantitative measure of the magnitude of a phenomenon — difference between means, strength of association, proportion of variance explained — that is, ideally, independent of sample size. Contrast with the pp-value, which confounds magnitude and sample size: a large sample with a trivial effect can yield a tiny pp, while a small sample with a real effect can yield a high pp. The most common family in biomedical and behavioral research is Cohen’s dd for the difference between means:

d=xˉ1xˉ2sd = \frac{\bar{x}_1 - \bar{x}_2}{s}

where ss is the pooled standard deviation of the two groups. Cohen (1988) proposed magnitude conventions — small (d0.2d \approx 0.2), medium (d0.5d \approx 0.5), large (d0.8d \approx 0.8) — valid for behavioral sciences but not universal. Other families include rr (Pearson or partial correlation), η2\eta^2 and ω2\omega^2 (proportion of variance in ANOVA), odds ratio and relative risk (epidemiology), and standardized β\beta in regression. Modern reporting (DORA, ASA, APA, AMA, CONSORT) requires effect size alongside pp-value.

When it applies

Effect size is required whenever a quantitative result is reported in a modern academic manuscript. It is essential in meta-analysis (combining studies requires a comparable magnitude metric), in a priori power calculations (sample planning requires an estimate of expected effect), and in clinical interpretation (a statistically significant effect may be clinically irrelevant). It is also a critical tool in communication to non-technical audiences — a measure in standardized units is more interpretable than a raw coefficient without context.

When it does not apply

Effect size does not apply as the only measure when context also requires confidence interval and pp-value — the three complement, not replace, each other. Magnitude conventions (small/medium/large) are not universal: what is “large” in social psychology may be “small” in clinical epidemiology, and what is trivial in economics may be substantive in ecology. For variables with naturally interpretable units (mortality, monetary cost, days), the original metric may be more informative than a standardized effect size. In purely exploratory or descriptive designs, without a formal hypothesis, effect size loses part of its interpretive meaning.

Applications by field

Health and biomedical sciences: clinical trials with NNT (number needed to treat), odds ratios, absolute/relative risk reduction. — Psychology and behavioral sciences: natural territory of Cohen’s dd and rr; standard APA reporting. — Education: pedagogical interventions measured in standard-deviation gains in learning (Hattie and the like). — Meta-analyses across any field: combining studies requires transformation to a common effect metric.

Common pitfalls

The first pitfall is trusting universal magnitude conventions — d=0.5d = 0.5 can be “medium” in behavioral sciences and “huge” in a mortality reduction trial. The second is reporting only pp-value without effect size, an obsolete editorial practice forbidden by modern standards. The third is confusing statistical significance with practical relevance: d=0.03d = 0.03 with n=100,000n = 100{,}000 yields p<0.001p < 0.001 but is clinically trivial. The fourth is computing effect size from a converted pp-value (without access to raw data) without reporting this transformation as an approximation. The fifth is treating effect sizes as additive: simple meta-analysis of mean dd ignores between-study heterogeneity and produces potentially misleading conclusions.

Last updated —