Replication and p Intervals: p Values Predict the Future Only Vaguely, but Confidence Intervals Do Much Better

Geoff Cumming

doi:10.1111/j.1745-6924.2008.00079.x

Replication and p Intervals: p Values Predict the Future Only Vaguely, but Confidence Intervals Do Much Better

Perspect Psychol Sci. 2008 Jul;3(4):286-300. doi: 10.1111/j.1745-6924.2008.00079.x.

Author

Geoff Cumming¹

Affiliation

¹ School of Psychological Science, La Trobe University, Melbourne, Victoria, Australia G.Cumming@latrobe.edu.au.

PMID: 26158948
DOI: 10.1111/j.1745-6924.2008.00079.x

Abstract

Replication is fundamental to science, so statistical analysis should give information about replication. Because p values dominate statistical analysis in psychology, it is important to ask what p says about replication. The answer to this question is "Surprisingly little." In one simulation of 25 repetitions of a typical experiment, p varied from <.001 to .76, thus illustrating that p is a very unreliable measure. This article shows that, if an initial experiment results in two-tailed p = .05, there is an 80% chance the one-tailed p value from a replication will fall in the interval (.00008, .44), a 10% chance that p <.00008, and fully a 10% chance that p >.44. Remarkably, the interval-termed a p interval-is this wide however large the sample size. p is so unreliable and gives such dramatically vague information that it is a poor basis for inference. Confidence intervals, however, give much better information about replication. Researchers should minimize the role of p by using confidence intervals and model-fitting techniques and by adopting meta-analytic thinking.