Equivalence Tests: A Practical Primer for t Tests, Correlations, and Meta-Analyses
๐ Original study โ๐ Appears in:
Plain English Summary
Here's a problem most people don't realize exists: when a scientific study finds "no significant result," that does NOT actually mean they proved nothing is happening. It just means the evidence was murky. Lakens offers a fix called equivalence testing (specifically the TOST procedure), which lets researchers formally demonstrate that an effect is so tiny it's practically zero. You pick the smallest effect you'd care about, then statistically show your result falls inside that "who cares" zone. This is a big deal for parapsychology, where failed replications of psychic experiments often get shrugged off as "inconclusive" rather than "genuine evidence against psi." Lakens provides formulas, spreadsheets, and an R software package so anyone can do this. One fair critique: the recommended way to set your "smallest interesting effect" can be a bit circular, since it depends on how many participants you can recruit rather than what your theory actually predicts.
Research Notes
Critical methodological resource for psi research: enables formal testing of 'evidence for no psi' versus 'inconclusive results.' Essential for evaluating replication failures (e.g., Bem FTF replications, Ganzfeld failures) and for designing informative null-result studies. The SESOI recommendationโsetting bounds based on maximum feasible sample sizeโis pragmatic but circular; better for theory development would be prespecifying bounds based on theoretical predictions.
Scientists should be able to provide support for the absence of a meaningful effect, but nonsignificant p-values cannot establish this. This tutorial introduces the two one-sided tests (TOST) procedure for equivalence testing: researchers specify upper and lower equivalence bounds based on the smallest effect size of interest (SESOI), then test whether observed effects fall within this range. Formulas and worked examples are provided for independent/dependent t-tests, correlations, and meta-analyses. An accompanying spreadsheet and R package (TOSTER) enable psychologists to perform equivalence tests and power analyses. Adopting equivalence testing prevents misinterpreting nonsignificant results as evidence for the null, enables replication studies to test for absence of meaningful effects, and encourages researchers to specify which effect sizes they find theoretically worthwhile.
Links
Related Papers
Cites
Companion
- Why Psychologists Must Change the Way They Analyze Their Data: The Case of Psi โ Wagenmakers, Eric-Jan (2011)
- An Agenda for Purely Confirmatory Research โ Wagenmakers, Eric-Jan (2012)
- Power failure: why small sample size undermines the reliability of neuroscience โ Button, Katherine S (2013)
- A Practical Solution to the Pervasive Problems of p Values โ Wagenmakers, Eric-Jan (2007)
- Small Telescopes: Detectability and the Evaluation of Replication Results โ Simonsohn, Uri (2015)
Also by these authors
More in Methodology
Paranormal belief, conspiracy endorsement, and positive wellbeing: a network analysis
Planning Falsifiable Confirmatory Research
Addressing Researcher Fraud: Retrospective, Real-Time, and Preventive Strategies โ Including Legal Points and Data Management That Prevents Fraud
Quantum Aspects of the Brain-Mind Relationship: A Hypothesis with Supporting Evidence
Paranormal beliefs and cognitive function: A systematic review and assessment of study quality across four decades of research
๐ Cite this paper
Lakens, Daniรซl (2017). Equivalence Tests: A Practical Primer for t Tests, Correlations, and Meta-Analyses. Social Psychological and Personality Science. https://doi.org/10.1177/1948550617697177
@article{lakens_2017_equivalence,
title = {Equivalence Tests: A Practical Primer for t Tests, Correlations, and Meta-Analyses},
author = {Lakens, Daniรซl},
year = {2017},
journal = {Social Psychological and Personality Science},
doi = {10.1177/1948550617697177},
}