Reliability and statistical power analysis of cortical and subcortical FreeSurfer metrics in a large sample of healthy elderly
Introduction
There is a long history of manual morphometry investigating the structural changes occurring with age. This research is based on manually tracing brain regions of interest on magnetic resonance imaging (MRI) data. While this approach has led to tremendous insight regarding age-related changes in the brain, it also heavily relies on multiple intensively trained human raters. Standardizing all aspects of these procedures across labs often is difficult. This might give rise to undesired variations in results (Raz and Rodrigue, 2006). However, over the last decade, surface-based morphometry tools made it possible to non-invasively quantify gray matter in the human brain in a more automated fashion. Software packages such as FreeSurfer1 or Caret2 provide measurements of cortical and subcortical gray matter features based on MRI data. Because the algorithms that are implemented in these packages work with minimal user intervention, it became feasible to investigate large samples and apply this approach to a wide variety of research questions. Structural brain measurements (a) change systematically in development and aging (Fjell et al., 2013b, Hogstrom et al., 2013, Salat et al., 2004, Sowell et al., 2003), (b) reflect neuroplasticity in the context of experience and practice (Engvig et al., 2010), (c) can be important predictors of neurodegenerative disease (Dickerson et al., 2013, Gaser et al., 2013), and (d) have a genetic component (Joshi et al., 2011).
In the field of aging, subcortical volumes (Jäncke et al., 2014), as well as a variety of cortical measures are being analyzed, for instance, cortical thickness, surface area or volume (Fjell et al., 2014, Hogstrom et al., 2013, Mills and Tamnes, 2014). Analyzing a variety of parameters is warranted as they provide independent and complementary information about brain anatomy (Meyer et al., 2013, Winkler et al., 2010). However, age-related cortical thinning may render the reconstruction of cortical surfaces less reliable and reduce statistical power in the analysis of interest when studying older samples (e.g., to detect within-person individual change in longitudinal studies of aging). Pertinent studies (e.g., Han et al., 2006, Jovicich et al., 2006, Jovicich et al., 2013, Morey et al., 2010, Schnack et al., 2010, Wonderlick et al., 2009) cannot adequately contribute to this matter as the majority of them investigated test-retest reliability in young samples and relied on rather small sample sizes (often around 5 to 20 subjects). As demonstrated by Shoukri et al. (2004) and recently emphasized by Buchanan et al. (2014), a large sample size is necessary to compute reliability within an acceptable confidence interval. Further limitations of previous studies include the restriction to subcortical or regional cortical analyses, the restriction to only one measure (e.g., cortical thickness), or reporting reliability metrics that are difficult to interpret (for example, absolute difference of cortical thickness, which might correspond to different percent changes in two different regions).
In addition to the topic of reliability, the field of human neuroimaging recently increased its focus on the issue of statistical power (Button et al., 2013, Suckling et al., 2014). A major concern with neuroimaging studies is that relatively small sample sizes are not sufficient to detect relatively small differences between groups or conditions (Yarkoni et al., 2010). Therefore, performing an a priori power analysis to determine the appropriate sample size in the planning phase of a study is recommended. However, in the field of neuroimaging, calculation of statistical power is complicated by the fact that power is not uniform over the entire cortex, which results in different required sample sizes for different brain regions (Pardoe et al., 2013). While there exists a well-described statistical framework for performing statistical power analyses, only a minority of neuroimaging studies perform power analyses beforehand (Button et al., 2013). Recent work provided sample size calculations for cortical thickness studies (Pardoe et al., 2013). Building on this study, we extend this framework by including additional anatomical measures (cortical surface area, volume, and subcortical volume), other types of power analyses (post-hoc and sensitivity), and additional statistical tests (paired sample t-test).
The primary objective of the present study is therefore to investigate test-retest reliability and statistical power of cortical and subcortical measures derived with FreeSurfer. In detail, we aim to
- 1.
conduct our calculations using data from a large sample (N = 189);
- 2.
use a standardized and easy-to-interpret metric for reliability, namely, the intraclass correlation coefficient (ICC);
- 3.
assess several cortical measures (cortical thickness, surface area, volume) as well as measures of subcortical volume;
- 4.
provide vertex-wise information in order to assess regional variability in reliability and statistical power;
- 5.
investigate the effect of surface-based smoothing kernel size on reliability and statistical power;
- 6.
publicly provide surface data of reliability and statistical power in order for others to inquire about reliability and statistical power in brain regions they are most interested in;
- 7.
publicly provide a tool that allows others to perform various types of power analyzes (e.g., to determine the required sample size to detect an effect in cortical or subcortical measures before performing an experiment or to determine the sensitivity of a previously published study).
Note that the results presented here apply to the specific scanner type and acquisition sequence used in the study. Results might deviate for data from different scanner types and other acquisition schemes. Importantly, the power analysis tool we provide also allows researchers to base their power analysis on data previously acquired at their local scanner with their specific sequence. This, therefore, enables researchers to customize power analysis to their situation.
Section snippets
Research participants
Data from 189 right-handed older adults (99 female; age: M = 70.4, SD = 5.0, min = 64, max = 87) were taken from the first wave of the LHAB (Longitudinal Healthy Aging Brain) database, which is currently being built at the International Normal Aging and Plasticity Center (University of Zurich, Switzerland) (Zöllig et al., 2011). Participants were cognitively healthy, right-handed, had no history of neurological or psychiatric disorder, and did not suffer from migraine, diabetes or tinnitus. Their Mini
Results
The results are presented in the order of the list of aims shown in the introduction.
Discussion
In this study, we investigated the reliability and statistical power of cortical and subcortical brain measures computed with FreeSurfer. Furthermore, we publicly provide a tool that enables researchers to perform power analyses. With this tool scientists could, for instance, calculate the sample size necessary to detect a difference in cortical thickness between a disease group and a healthy control group, or calculate the required sample size to detect brain changes in a repeated measures
Acknowledgments
The current analysis incorporates data from the Longitudinal Healthy Aging Brain (LHAB) database project, which is carried out as one of the core projects at the International Normal Aging and Plasticity Imaging Center/INAPIC and the University Research Priority Program “Dynamics of Healthy Aging” of the University of Zurich. The following members of the core INAPIC team were involved in the design, set-up, maintenance and support of the LHAB database: Anne Eschen, Lutz Jäncke, Mike Martin,
References (68)
- et al.
Spatiotemporal linear mixed effects modeling for the mass-univariate analysis of longitudinal neuroimage data
NeuroImage
(2013) - et al.
Test-retest reliability of structural brain networks from diffusion MRI
NeuroImage
(2014) - et al.
A comparison of voxel and surface based cortical thickness estimation methods
NeuroImage
(2011) - et al.
Heritability of head motion during resting state functional MRI in 462 healthy twins
NeuroImage
(2014) - et al.
Cortical surface-based analysis. I. Segmentation and surface reconstruction
NeuroImage
(1999) - et al.
Automatic parcellation of human cortical gyri and sulci using standard anatomical nomenclature
NeuroImage
(2010) - et al.
Detection of cortical thickness correlates of cognitive performance: reliability across MRI scan sessions, scanners, and field strengths
NeuroImage
(2008) - et al.
Increased cortical surface area of the left planum temporale in musicians facilitates the categorization of phonetic and temporal speech sounds
Cortex
(2013) - et al.
Effects of memory training on cortical thickness in the elderly
NeuroImage
(2010) - et al.
Cortical surface-based analysis. II: Inflation, flattening, and a surface-based coordinate system
NeuroImage
(1999)
Whole brain segmentation: automated labeling of neuroanatomical structures in the human brain
Neuron
Critical ages in the life course of the adult brain: nonlinear subcortical aging
Neurobiol. Aging
Mini-mental state. A practical method for grading the cognitive state of patients for the clinician
J. Psychiatr. Res.
Reliability of MRI-derived measurements of human cerebral cortical thickness: the effects of field strength, scanner upgrade and manufacturer
NeuroImage
A comparison between voxel-based cortical thickness and voxel-based morphometry in normal aging
NeuroImage
Reliability in multi-site structural MRI studies: effects of gradient non-linearity correction on phantom and human data
NeuroImage
MRI-derived measurements of human subcortical, ventricular and intracranial brain volumes: reliability effects of scan sessions, acquisition sequences, data analyses, scanner upgrade, scanner vendors and field strengths
NeuroImage
Differential aging of the brain: patterns, cognitive correlates and modifiers
Neurosci. Biobehav. Rev.
Highly accurate inverse consistent registration: a robust approach
NeuroImage
Within-subject template estimation for unbiased longitudinal image analysis
NeuroImage
Head motion during MRI acquisition reduces gray matter volume and thickness estimates
NeuroImage
Cortical volume, surface area, and thickness in schizophrenia and bipolar disorder
Biol. Psychiatry
Age-associated alterations in cortical gray and white matter signal intensity and gray to white matter contrast
NeuroImage
The influence of head motion on intrinsic functional connectivity MRI
NeuroImage
Cortical thickness or grey matter volume? The importance of selecting the phenotype for imaging genetics studies
NeuroImage
Reliability of MRI-derived cortical and subcortical morphometric measures: effects of pulse sequence, voxel geometry, and parallel imaging
NeuroImage
Cognitive neuroscience 2.0: building a cumulative science of human brain function
Trends Cogn. Sci.
Real-time optical motion correction for diffusion tensor imaging
Magn. Reson. Med.
Power failure: why small sample size undermines the reliability of neuroscience
Nat. Rev. Neurosci.
Validation of freesurfer-estimated brain cortical thickness: comparison with histologic measurements
Neuroinformatics
A power primer
Psychol. Bull.
The new statistics: why and how
Psychol. Sci.
Improved localizadon of cortical activity by combining EEG and MEG with MRI cortical surface reconstruction: a linear approach
J. Cogn. Neurosci.
Biomarker-based prediction of progression in MCI: comparison of AD signature and hippocampal volume with spinal fluid amyloid-β and tau
Front. Aging Neurosci.
Cited by (78)
The effect of a post-scan processing denoising system on image quality and morphometric analysis
2022, Journal of NeuroradiologyCitation Excerpt :This may be partially because the sample size was not large enough to detect atrophy. Previous reports have shown that in cross-sectional cortical thickness analyses, moderate smoothing reduces noise and within-subject variability, resulting in improvement in reliability and detectability, while smoothing also deteriorates spatial resolution, and small local change could be under-estimated when the FWHM was too large.25–27 A similar trend was observed in our cross-sectional and longitudinal cortical thickness analyses, which showed higher reliability with larger smoothing (FWHM = 0, 10, 20).
Participant followup rate can bias structural imaging measures in longitudinal studies
2021, Neuroimage: Reports