Brief ReportAt what sample size do correlations stabilize?
Graphical abstract
Introduction
Most research in psychology seems to be concerned with the endeavor to determine the sign of an effect with some confidence, using the null hypothesis significance testing (NHST) procedure. Several authors, however, have argued that any field of science should move from binary decisions derived from the NHST procedure towards giving a more precise point estimate of the magnitude of an effect (Edwards and Berry, 2010, Kelley and Maxwell, 2003). Consider, for example, a correlation of r = .40 in a sample of 25 participants. This correlation is significantly different from zero (p = .047). Hence, it might be concluded with some confidence that there is “something >0” in the population, and the study would be counted as a success from the NHST perspective. However, plausible values of the true correlation ρ, as expressed by a 90% confidence interval, range from .07 to .65. The estimate is quite unsatisfactory from an accuracy point of view – in any scenario beyond the NHST ritual it will make a huge difference whether the true correlation in the population is .07, which would be regarded as a very small effect in most research contexts, or .65, which would be a very large effect in many contexts. Moreover, precise point estimates are relevant for a priori sample size calculations. Given the huge uncertainty in the true magnitude of the effect, it is hard to determine the necessary sample size to replicate the effect (e.g., for an intended power of 80% and ρ = .07: n = 1599; ρ = .40: n = 46; ρ = .65: n = 16).
In this contribution we deal with a related question of practical importance in personality research: At which sample size does a correlation stabilize? Many researchers might have observed that the magnitude of a correlation is pretty unstable in small samples, as the following empirical example demonstrates. Multiple questionnaire scales have been administered in an open online study (Schönbrodt & Gerstenberg, 2012; Study 3). The thick black line in Fig. 1 shows the evolution of the correlation between two scales, namely “hope of power” and “fear of losing control” when after each new participant the correlation is recalculated. It can be seen that the correlation evolved from r = .69 (n = 20, p < .001) to r = .26 (n = 274, p < .001). From a visual inspection, the trajectory did not stabilize up to a sample size of around 150. Data have not been rearranged – it is simply the order how participants dropped into the study. Some other correlations in this data set evolved from significantly negative to non-significant, others changed from one significant direction into the significant opposite, and some correlations were stable right from the beginning with only few fluctuations around the final estimate. But how do we get to know when a correlation estimate is sufficiently stable?
Section snippets
Definition and operationalization of stability
Suppose a true correlation of ρ = .40. When the estimate in the sample is .41 or .38, most researchers would agree that this is a rather trivial deviation from the true value. A stronger deviation like .26 or .57 could be deemed more problematic, depending on the research setting. And even stronger deviations like .10 or .65 (which would still be within the 95% CI at n = 25) probably would be judged unacceptable from a substantial point of view.
When talking about stability, minor fluctuations
Method and results
To compute a distribution of POS values, Monte-Carlo simulations have been run in the R environment for statistical computing (R Development Core Team., 2012). The complete source code for the computations can be downloaded from the online Supplementary material. The following steps have been performed for the simulation:
- •
Simulate a bivariate Gaussian distribution with 1,00,000 cases and a specified correlation ρ (the “population”).
- •
Draw B = 100,000 bootstrap samples
Discussion
It has been argued that for a cumulative growth of knowledge accurate estimates of the magnitude of an effect would be more fruitful than simple binary decision derived from NHST. Previous approaches concerned with the accuracy of estimates focused on confidence intervals around the point estimates. By defining the aspired level of accuracy one can compute the necessary sample size (Algina and Olejnik, 2003, Maxwell et al., 2008).
The current report extends this literature by applying a
Acknowledgments
Ideas for this simulation study partly originated on a summer school on robust statistics in 2011 (supported by EAPP and ISSID), and during discussions with Jens Asendorpf.
References (13)
- et al.
An IRT analysis of motive questionnaires: The Unified Motive Scales
Journal of Research in Personality
(2012) - et al.
Sample size tables for correlation analysis with applications in partial correlation and multiple regression analysis
Multivariate Behavioral Research
(2003) Statistical power analysis for the behavioral sciences
(1988)A power primer
Psychological Bulletin
(1992)- et al.
The presence of something or the absence of nothing: Increasing theoretical precision in management research
Organizational Research Methods
(2010) - et al.
Methods of meta-analysis: Correcting error and bias in research findings
(2004)
Cited by (1388)
Personality across diverse sexual orientations and gender identities in an online convenience sample
2024, Journal of Research in PersonalityTo get good data quality or study sadistic people? Are “inattentive responders” actually sadistic?
2024, Personality and Individual DifferencesFuture self-continuity promotes meaning in life through authenticity
2024, Journal of Research in PersonalityAn investigation into Markov chain Monte Carlo algorithms for Subset simulation: Emphasizing uncertainty analysis
2024, Computers and StructuresPerceived forcedness and perils of migration: Development and validation of a questionnaire for migrants in receiving countries
2024, International Journal of Intercultural RelationsEvaluating the consistency of dairy goat kids’ responses to two methods of assessing fearfulness
2024, Applied Animal Behaviour Science