Reproducibility and replicability — Glossary Aria Research

Extended definition

Reproducibility and replicability are related but distinct concepts, frequently confused. Reproducibility (in some fields called computational reproducibility) is the ability to obtain the same study results from the same data and same code — that is, a third researcher, with the original material, should arrive at the reported numbers. Replicability is the ability to obtain consistent results in an independent study, with new data collection under similar procedure — that is, the hypothesis holds when tested on a new sample. The distinction was formalized by Goodman, Fanelli, and Ioannidis (2016, Science Translational Medicine) and adopted by the National Academies of Sciences (2019, US) as a standard. The “replication crisis” — documented seminally by the Open Science Collaboration (2015, Science), which managed to replicate only 36% of 100 psychology studies published in top-tier journals — motivated the expansion of Open Science practices: preregistration, open data, open code, registered reports.

When it applies

Both concepts apply to any empirical research project, with different weights by field. Computational reproducibility is a minimum requirement in quantitative research: data and code should enable another researcher to arrive at the same results. Top-tier journals today require data and code availability (Nature, Science, PNAS, eLife have explicit policies; PLOS is older in this requirement). Replicability applies in planning confirmatory studies — preregistration, statistical analysis plans (SAP), and adequate statistical power are instruments. In applied ML, computational reproducibility includes random seed control, library versions, and hardware when relevant.

When it does not apply

Direct replicability does not apply to pure qualitative research in the strict sense — ethnographic data collection in a unique context is not replicable by construction; transferability is the analogous concept. In studies of rare historical, natural, or clinical events, literal replication is impossible. Computational reproducibility does not apply to research using confidential data that cannot be publicly shared (identifiable clinical data, industrial data under NDA) — alternatives: synthetic data, mediated restricted-access repositories, or detailed methodological description allowing another researcher to replicate with their own data.

Applications by field

— Health and biomedical sciences: preclinical reproducibility crisis documented (Begley & Ellis, 2012); ARRIVE and CONSORT respond. — Psychology: Open Science Collaboration (2015) is the seminal reference; many-labs projects and reforms in Q1 journals. — Computer science and ML: annual ML Reproducibility Challenge; policies at NeurIPS, ICML, ACL require code. — Empirical social sciences: AEA Data Editor; preregistration in experimental economics growing since 2018.

Common pitfalls

The first pitfall is using “reproducibility” and “replicability” as synonyms — in modern technical discourse, the distinction matters: a study can be reproducible (data and code allow redoing computations) but non-replicable (result does not hold in a new sample). The second is confusing reproducibility with transparency — making data available does not guarantee reproduction: incomplete code, undocumented dependencies, specific hardware can block. The third is treating replication failure as proof of fraud — failures can reflect legitimate variability across populations, contexts, or simply regression to the mean of effects inflated by innocent p-hacking. The fourth is trusting the original study’s “p < 0.05” as a guarantee: meta-science suggests that an effect estimated in a single study overestimates the true magnitude due to structural biases (publication bias, garden of forking paths). The fifth is failing to distinguish direct replication (same methodology) from conceptual replication (same construct, different method) — both relevant but communicating different things.