Effect size — Glossary Aria Research

Extended definition

Effect size is any quantitative measure of the magnitude of a phenomenon — difference between means, strength of association, proportion of variance explained — that is, ideally, independent of sample size. Contrast with the $p$ -value, which confounds magnitude and sample size: a large sample with a trivial effect can yield a tiny $p$ , while a small sample with a real effect can yield a high $p$ . The most common family in biomedical and behavioral research is Cohen’s $d$ for the difference between means:

d = \frac{\bar{x}_1 - \bar{x}_2}{s}

where $s$ is the pooled standard deviation of the two groups. Cohen (1988) proposed magnitude conventions — small ( $d \approx 0.2$ ), medium ( $d \approx 0.5$ ), large ( $d \approx 0.8$ ) — valid for behavioral sciences but not universal. Other families include $r$ (Pearson or partial correlation), $\eta^2$ and $\omega^2$ (proportion of variance in ANOVA), odds ratio and relative risk (epidemiology), and standardized $\beta$ in regression. Modern reporting (DORA, ASA, APA, AMA, CONSORT) requires effect size alongside $p$ -value.

When it applies

Effect size is required whenever a quantitative result is reported in a modern academic manuscript. It is essential in meta-analysis (combining studies requires a comparable magnitude metric), in a priori power calculations (sample planning requires an estimate of expected effect), and in clinical interpretation (a statistically significant effect may be clinically irrelevant). It is also a critical tool in communication to non-technical audiences — a measure in standardized units is more interpretable than a raw coefficient without context.

When it does not apply

Effect size does not apply as the only measure when context also requires confidence interval and $p$ -value — the three complement, not replace, each other. Magnitude conventions (small/medium/large) are not universal: what is “large” in social psychology may be “small” in clinical epidemiology, and what is trivial in economics may be substantive in ecology. For variables with naturally interpretable units (mortality, monetary cost, days), the original metric may be more informative than a standardized effect size. In purely exploratory or descriptive designs, without a formal hypothesis, effect size loses part of its interpretive meaning.

Applications by field

— Health and biomedical sciences: clinical trials with NNT (number needed to treat), odds ratios, absolute/relative risk reduction. — Psychology and behavioral sciences: natural territory of Cohen’s $d$ and $r$ ; standard APA reporting. — Education: pedagogical interventions measured in standard-deviation gains in learning (Hattie and the like). — Meta-analyses across any field: combining studies requires transformation to a common effect metric.

Common pitfalls

The first pitfall is trusting universal magnitude conventions — $d = 0.5$ can be “medium” in behavioral sciences and “huge” in a mortality reduction trial. The second is reporting only $p$ -value without effect size, an obsolete editorial practice forbidden by modern standards. The third is confusing statistical significance with practical relevance: $d = 0.03$ with $n = 100{,}000$ yields $p < 0.001$ but is clinically trivial. The fourth is computing effect size from a converted $p$ -value (without access to raw data) without reporting this transformation as an approximation. The fifth is treating effect sizes as additive: simple meta-analysis of mean $d$ ignores between-study heterogeneity and produces potentially misleading conclusions.