Skip to main content

Replication Unreliability in Psychology: Elusive Phenomena or "Elusive" Statistical Power?

πŸ“„ Original study
Tressoldi, Patrizio E β€’ 2012 Modern Era β€’ methodology

πŸ“Œ Appears in:

Plain English Summary

When psychic experiments fail to replicate, is the phenomenon fake -- or is the study just too small to catch it? This paper argues it's often the latter. By crunching numbers from past meta-analyses (studies that pool many experiments together), it shows that most research on controversial topics -- including subliminal priming, unconscious thinking, and ESP -- simply doesn't test enough people to reliably detect the tiny effects involved. The mismatch is staggering for ESP: you'd need about 3,450 participants per study, yet the average study uses just 128. That's like trying to hear a whisper from across a football stadium. Cleverly, the paper lumps psychic research in with mainstream psychology topics that have the exact same power problem, making the point that underpowered studies plague all of psychology, not just the paranormal fringe. It also recommends better statistical tools like Bayesian analysis and confidence intervals.

Research Notes

Provides the statistical power argument frequently cited by psi proponents: failed replications may reflect inadequate sample sizes rather than absent phenomena. Strategically places psi alongside mainstream controversial phenomena (subliminal priming, unconscious thought) to normalize the power-deficit problem. Central to Controversy 10 (meta-debate).

Analyzes whether the unreliability of replication in controversial psychological phenomena stems from insufficient statistical power rather than non-existence of the effects. Retrospective power analysis of meta-analyses covering four phenomena β€” subliminal semantic priming (d = 0.47–0.80), incubation effect (d = 0.29), unconscious thought theory (d = 0.22), and non-local perception across three protocols (d = 0.011–0.16) β€” reveals that except for semantic priming on categorization (power = 0.96), the typical study has power far below 0.90. Forced-choice ESP studies (d = 0.011) would require N = 3,450 participants for adequate power, yet average N = 128. Recommends alternatives to NHST including confidence intervals, equivalence testing, and Bayesian approaches.

Links

Related Papers

Also by these authors

More in Methodology

πŸ“‹ Cite this paper
APA
Tressoldi, Patrizio E (2012). Replication Unreliability in Psychology: Elusive Phenomena or "Elusive" Statistical Power?. Frontiers in Psychology. https://doi.org/10.3389/fpsyg.2012.00218
BibTeX
@article{tressoldi_2012_replication,
  title = {Replication Unreliability in Psychology: Elusive Phenomena or "Elusive" Statistical Power?},
  author = {Tressoldi, Patrizio E},
  year = {2012},
  journal = {Frontiers in Psychology},
  doi = {10.3389/fpsyg.2012.00218},
}