Replication Unreliability in Psychology: Elusive Phenomena or "Elusive" Statistical Power?
π Original studyπ Appears in:
Plain English Summary
When psychic experiments fail to replicate, is the phenomenon fake -- or is the study just too small to catch it? This paper argues it's often the latter. By crunching numbers from past meta-analyses (studies that pool many experiments together), it shows that most research on controversial topics -- including subliminal priming, unconscious thinking, and ESP -- simply doesn't test enough people to reliably detect the tiny effects involved. The mismatch is staggering for ESP: you'd need about 3,450 participants per study, yet the average study uses just 128. That's like trying to hear a whisper from across a football stadium. Cleverly, the paper lumps psychic research in with mainstream psychology topics that have the exact same power problem, making the point that underpowered studies plague all of psychology, not just the paranormal fringe. It also recommends better statistical tools like Bayesian analysis and confidence intervals.
Research Notes
Provides the statistical power argument frequently cited by psi proponents: failed replications may reflect inadequate sample sizes rather than absent phenomena. Strategically places psi alongside mainstream controversial phenomena (subliminal priming, unconscious thought) to normalize the power-deficit problem. Central to Controversy 10 (meta-debate).
Analyzes whether the unreliability of replication in controversial psychological phenomena stems from insufficient statistical power rather than non-existence of the effects. Retrospective power analysis of meta-analyses covering four phenomena β subliminal semantic priming (d = 0.47β0.80), incubation effect (d = 0.29), unconscious thought theory (d = 0.22), and non-local perception across three protocols (d = 0.011β0.16) β reveals that except for semantic priming on categorization (power = 0.96), the typical study has power far below 0.90. Forced-choice ESP studies (d = 0.011) would require N = 3,450 participants for adequate power, yet average N = 128. Recommends alternatives to NHST including confidence intervals, equivalence testing, and Bayesian approaches.
Links
Related Papers
Cites
Companion
- A Practical Solution to the Pervasive Problems of p Values β Wagenmakers, Eric-Jan (2007)
- Commentary: Reproducibility in Psychological Science: When Do Psychological Phenomena Exist? β Heino, Matti T. J (2017)
- Bayesian and Classical Hypothesis Testing: Practical Differences for a Controversial Area of Research β Kennedy, J.E (2014)
- Measuring the Prevalence of Questionable Research Practices With Incentives for Truth Telling β John, Leslie K (2012)
Same Research Program
Also by these authors
Meta-Analysis of Free-Response Studies 2009-2018: Assessing the Noise-Reduction Model Ten Years On
On the Correspondence Between Dream Content and Target Material Under Laboratory Conditions: A Meta-Analysis of Dream-ESP Studies, 1966-2016
EEG Correlates of Social Interaction at Distance
More in Methodology
Paranormal belief, conspiracy endorsement, and positive wellbeing: a network analysis
Planning Falsifiable Confirmatory Research
Addressing Researcher Fraud: Retrospective, Real-Time, and Preventive Strategies β Including Legal Points and Data Management That Prevents Fraud
Quantum Aspects of the Brain-Mind Relationship: A Hypothesis with Supporting Evidence
Paranormal beliefs and cognitive function: A systematic review and assessment of study quality across four decades of research
π Cite this paper
Tressoldi, Patrizio E (2012). Replication Unreliability in Psychology: Elusive Phenomena or "Elusive" Statistical Power?. Frontiers in Psychology. https://doi.org/10.3389/fpsyg.2012.00218
@article{tressoldi_2012_replication,
title = {Replication Unreliability in Psychology: Elusive Phenomena or "Elusive" Statistical Power?},
author = {Tressoldi, Patrizio E},
year = {2012},
journal = {Frontiers in Psychology},
doi = {10.3389/fpsyg.2012.00218},
}