Brief Report
At what sample size do correlations stabilize?

https://doi.org/10.1016/j.jrp.2013.05.009Get rights and content

Highlights

  • Sample correlations converge to true value ρ, but are inaccurate in small samples.

  • From which sample size on do correlations only show minor fluctuations around ρ?

  • Monte-Carlo simulations were used to determine the “point of stability” (POS).

  • Necessary sample size depends on effect size, tolerable fluctuations, and confidence.

  • In typical scenarios n should approach 250 for stable estimates.

Abstract

Sample correlations converge to the population value with increasing sample size, but the estimates are often inaccurate in small samples. In this report we use Monte-Carlo simulations to determine the critical sample size from which on the magnitude of a correlation can be expected to be stable. The necessary sample size to achieve stable estimates for correlations depends on the effect size, the width of the corridor of stability (i.e., a corridor around the true value where deviations are tolerated), and the requested confidence that the trajectory does not leave this corridor any more. Results indicate that in typical scenarios the sample size should approach 250 for stable estimates.

Introduction

Most research in psychology seems to be concerned with the endeavor to determine the sign of an effect with some confidence, using the null hypothesis significance testing (NHST) procedure. Several authors, however, have argued that any field of science should move from binary decisions derived from the NHST procedure towards giving a more precise point estimate of the magnitude of an effect (Edwards and Berry, 2010, Kelley and Maxwell, 2003). Consider, for example, a correlation of r = .40 in a sample of 25 participants. This correlation is significantly different from zero (p = .047). Hence, it might be concluded with some confidence that there is “something >0” in the population, and the study would be counted as a success from the NHST perspective. However, plausible values of the true correlation ρ, as expressed by a 90% confidence interval, range from .07 to .65. The estimate is quite unsatisfactory from an accuracy point of view – in any scenario beyond the NHST ritual it will make a huge difference whether the true correlation in the population is .07, which would be regarded as a very small effect in most research contexts, or .65, which would be a very large effect in many contexts. Moreover, precise point estimates are relevant for a priori sample size calculations. Given the huge uncertainty in the true magnitude of the effect, it is hard to determine the necessary sample size to replicate the effect (e.g., for an intended power of 80% and ρ = .07: n = 1599; ρ = .40: n = 46; ρ = .65: n = 16).

In this contribution we deal with a related question of practical importance in personality research: At which sample size does a correlation stabilize? Many researchers might have observed that the magnitude of a correlation is pretty unstable in small samples, as the following empirical example demonstrates. Multiple questionnaire scales have been administered in an open online study (Schönbrodt & Gerstenberg, 2012; Study 3). The thick black line in Fig. 1 shows the evolution of the correlation between two scales, namely “hope of power” and “fear of losing control” when after each new participant the correlation is recalculated. It can be seen that the correlation evolved from r = .69 (n = 20, p < .001) to r = .26 (n = 274, p < .001). From a visual inspection, the trajectory did not stabilize up to a sample size of around 150. Data have not been rearranged – it is simply the order how participants dropped into the study. Some other correlations in this data set evolved from significantly negative to non-significant, others changed from one significant direction into the significant opposite, and some correlations were stable right from the beginning with only few fluctuations around the final estimate. But how do we get to know when a correlation estimate is sufficiently stable?

Section snippets

Definition and operationalization of stability

Suppose a true correlation of ρ = .40. When the estimate in the sample is .41 or .38, most researchers would agree that this is a rather trivial deviation from the true value. A stronger deviation like .26 or .57 could be deemed more problematic, depending on the research setting. And even stronger deviations like .10 or .65 (which would still be within the 95% CI at n = 25) probably would be judged unacceptable from a substantial point of view.

When talking about stability, minor fluctuations

Method and results

To compute a distribution of POS values, Monte-Carlo simulations have been run in the R environment for statistical computing (R Development Core Team., 2012). The complete source code for the computations can be downloaded from the online Supplementary material. The following steps have been performed for the simulation:

  • Simulate a bivariate Gaussian distribution with 1,00,000 cases and a specified correlation ρ (the “population”).

  • Draw B = 100,000 bootstrap samples

Discussion

It has been argued that for a cumulative growth of knowledge accurate estimates of the magnitude of an effect would be more fruitful than simple binary decision derived from NHST. Previous approaches concerned with the accuracy of estimates focused on confidence intervals around the point estimates. By defining the aspired level of accuracy one can compute the necessary sample size (Algina and Olejnik, 2003, Maxwell et al., 2008).

The current report extends this literature by applying a

Acknowledgments

Ideas for this simulation study partly originated on a summer school on robust statistics in 2011 (supported by EAPP and ISSID), and during discussions with Jens Asendorpf.

References (13)

  • F.D. Schönbrodt et al.

    An IRT analysis of motive questionnaires: The Unified Motive Scales

    Journal of Research in Personality

    (2012)
  • J. Algina et al.

    Sample size tables for correlation analysis with applications in partial correlation and multiple regression analysis

    Multivariate Behavioral Research

    (2003)
  • J. Cohen

    Statistical power analysis for the behavioral sciences

    (1988)
  • J. Cohen

    A power primer

    Psychological Bulletin

    (1992)
  • J.R. Edwards et al.

    The presence of something or the absence of nothing: Increasing theoretical precision in management research

    Organizational Research Methods

    (2010)
  • J.E. Hunter et al.

    Methods of meta-analysis: Correcting error and bias in research findings

    (2004)
There are more references available in the full text version of this article.

Cited by (1388)

View all citing articles on Scopus
View full text