Small Telescopes: Detectability and the Evaluation of Replication Results
π Original study βPlain English Summary
When a replication study fails to confirm an original finding, does that mean the original was wrong -- or was the replication just too small to see the effect? Uri Simonsohn offers a clever framework called "small telescopes." Instead of simply asking whether the replication got a significant result, you ask whether its effect is statistically smaller than what the original study was barely powered to detect (called d33%, the effect size giving the original only a 33% chance of success). A handy rule of thumb emerges: replications generally need about 2.5 times the original sample size for a fair shot at settling the question. This is directly relevant to debates over replicating controversial findings like precognition -- some "failed" replications may have simply been too underpowered to be informative either way.
Research Notes
Key methodological framework for interpreting the psi replication debate. Directly applicable to evaluating whether failed replications of Bem's precognition experiments were genuinely informative or merely underpowered. Complements equivalence testing (Lakens 2017) and p-curve methods.
Introduces the "small telescopes" approach for evaluating replication results: instead of asking whether a replication effect is significantly different from zero or from the original estimate, the method tests whether it is significantly smaller than d33% β the effect size giving the original study only 33% power. Applied to three replication disputes, the approach shows that underpowered replications (e.g., Gamez et al. with 44% power) can be uninformatively noisy, while large well-powered replications can reject the original design's adequacy even when the replication finds a significant effect. A simple rule emerges: replications need approximately 2.5x the original sample size for 80% power to reject d33%.
Links
Related Papers
Companion
- Why Most Published Research Findings Are False β Ioannidis, John P.A (2005)
- Power failure: why small sample size undermines the reliability of neuroscience β Button, Katherine S (2013)
- Equivalence Tests: A Practical Primer for t Tests, Correlations, and Meta-Analyses β Lakens, DaniΓ«l (2017)
- Too Good to Be True: Publication Bias in Two Prominent Studies from Experimental Psychology β Francis, Gregory (2012)
- The "File Drawer Problem" and Tolerance for Null Results β Rosenthal, Robert (1979)
- Theoretical Risks and Tabular Asterisks: Sir Karl, Sir Ronald, and the Slow Progress of Soft Psychology β Meehl, Paul E (1978)
- Registered Reports: A Method to Increase the Credibility of Published Results β Nosek, Brian A (2014)
Also by these authors
More in Methodology
Paranormal belief, conspiracy endorsement, and positive wellbeing: a network analysis
Planning Falsifiable Confirmatory Research
Addressing Researcher Fraud: Retrospective, Real-Time, and Preventive Strategies β Including Legal Points and Data Management That Prevents Fraud
Quantum Aspects of the Brain-Mind Relationship: A Hypothesis with Supporting Evidence
Paranormal beliefs and cognitive function: A systematic review and assessment of study quality across four decades of research
π Cite this paper
Simonsohn, Uri (2015). Small Telescopes: Detectability and the Evaluation of Replication Results. Psychological Science. https://doi.org/10.1177/0956797614567341
@article{simonsohn_2015_small_telescopes,
title = {Small Telescopes: Detectability and the Evaluation of Replication Results},
author = {Simonsohn, Uri},
year = {2015},
journal = {Psychological Science},
doi = {10.1177/0956797614567341},
}