Background & Summary

Functional connectomics is a rapidly expanding area of human brain mapping14. Focused on the study of functional interactions among nodes in brain networks, functional connectomics is emerging as a mainstream tool to delineate variations in brain architecture among both individuals and populations58. Findings that established network features and well-known patterns of brain activity elicited via task performance are recapitulated in spontaneous brain activity patterns captured by resting-state fMRI (rfMRI)36,912, have been critical to the wide-spread acceptance of functional connectomics applications.

A growing literature has highlighted the possibility that functional network properties may explain individual differences in behavior and cognition4,7,8—the potential utility of which is supported by studies that suggest reliability for commonly used rfMRI measures13. Unfortunately, the field lacks a data platform by which researchers can rigorously explore the reliability of the many indices that continue to emerge. Such a platform is crucial for the refinement and evaluation of novel methods, as well as those that have gained widespread usage without sufficient consideration of reliability. Equally important is the notion that quantifying the reliability and reproducibility of the myriad connectomics-based measures can inform expectations regarding the potential of such approaches for biomarker identification1316.

To address these challenges, the Consortium for Reliability and Reproducibility (CoRR) has aggregated previously collected test-retest imaging datasets from more than 36 laboratories around the world and shared them via the 1000 Functional Connectomes Project (FCP)5,17 and its International Neuroimaging Data-sharing Initiative (INDI)18. Although primarily focused on rfMRI, this initiative has worked to promote the sharing of diffusion imaging data as well. It is our hope that among its many possible uses, the CoRR repository will facilitate the: (1) Establishment of test-retest reliability and reproducibility for commonly used MR-based connectome metrics, (2) Determination of the range of variation in the reliability and reproducibility of these metrics across imaging sites and retest study designs, (3) Creation of a standard/benchmark test-retest dataset for the evaluation of novel metrics.

Here, we provide an overview of all the datasets currently aggregated by CoRR, and describe the standardized metadata and technical validation associated with these datasets, thereby facilitating immediate access to these data by the wider scientific community. Additional datasets, and richer descriptions of some of the studies producing these datasets, will be published separately (for example, A high resolution 7-Tesla rfMRI test-retest dataset with cognitive and physiological measures19). A list of all papers describing these individual studies will be maintained and periodically updated at the CoRR website (http://fcon_1000.projects.nitrc.org/indi/CoRR/html/data_citation.html).

Methods

Experimental design

At the time of submission, CoRR has received 40 distinct test-retest datasets that were independently collected by 36 imaging groups at 18 institutions. All CoRR contributions were based on studies approved by a local ethics committee; each contributor’s respective ethics committee approved submission of de-identified data. Data were fully deidentified by removing all 18 HIPAA (Health Insurance Portability and Accountability)-protected health information identifiers, and face information from structural images prior to contribution. All data distributed were visually inspected before release. While all samples include at least one baseline scan and one retest scan, the specific designs and target populations employed across samples vary given the aggregation strategy used to build the resource. Since many individual (uniformly collected) datasets have reasonably large sample sizes allowing stable test-retest estimates, this variability across datasets provides an opportunity to generalize reliability estimates across scanning platforms, acquisition approaches, and target populations. The range of designs included is captured by the following classifications:

  • Within-Session Repeat.

    • o Scan repeated on same day

    • o Behavioral condition may or may not vary across scans depending on sample

  • Between-Session Repeat.

    • o Scan repeated one or more days later

    • o In most cases less than one week

  • Between-Session Repeat (Serial).

    • o Scan is repeated for 3 or more sessions in a short time-frame that is believed to be developmentally stable

  • Between-Session Repeat (Longitudinal developmental).

    • o Scan repeated at a distant time-point not believed to be developmentally equivalent. There is no exact definition of the minimum time for detecting developmental effects across scans, though designs typically span at least 3–6 months

  • Hybrid Design.

    • o Scans repeated one or more times on same day, as well as across one or more sessions

Table 1 presents an overview of the specific samples included in CoRR (Data Citations 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31). The vast majority included a single retest scan (48% within-session, 52% between-session). Three samples employed serial scanning designs, and one sample had a longitudinal developmental component. Most samples included presumed neurotypical adults; exceptions include the pediatric samples from Institute of Psychology at Chinese Academy of Sciences (IPCAS 2/7), University of Pittsburgh School of Medicine (UPSM) and New York University (NYU) and the lifespan samples from Nathan Kline Institute (NKI 1).

Table 1 CoRR sites and experimental design.

Data Records

Data privacy

Prior to contribution, each investigator confirmed that the data in their contribution was collected with the approval of their local ethical committee or institutional review board, and that sharing via CoRR was in accord with their policies. In accord with prior FCP/INDI policies, face information was removed from anatomical images (FullAnonymize.sh V1.0b; http://www.nitrc.org/frs/shownotes.php?release_id=1902) and Neuroimaging Informatics Technology Initiative (NIFTI) headers replaced prior to open sharing to minimize the risk of re-identification.

Distribution for use

CoRR data sets can be accessed through either the COllaborative Informatics and Neuroimaging Suite (COINS) Data Exchange (http://coins.mrn.org/dx)20, or the Neuroimaging Informatics Tools and Resources Clearinghouse (NITRC; http://fcon_1000.projects.nitrc.org/indi/CoRR/html/index.html). CoRR datasets at the NITRC site are stored in .tar files sorted by site, each containing the necessary imaging data and phenotypic information. The COINS Data Exchange offers an enhanced graphical query tool, which enables users to target and download files in accord with specific search criteria. For each sharing venue, a user login must be established prior to downloading files. There are several groups of samples which were not included in the data analysis as they were in the data contribution/upload, preparation or correction stage at the time of analysis: Intrinsic Brain Activity, Test-Retest Dataset (IBATRT), Dartmouth College (DC 1), IPCAS 4, Hangzhou Normal University (HNU 2), Fudan University (FU 1), FU 2, Chengdu Huaxi Hospital (CHH 1), Max Planck Institute (MPG 1)19, Brain Genomics Superstruct Project (GSP) and New Jersey Institute of Technology (NJIT 1) (see more details on these sites at the CoRR website). Table 1 provides a static representation of the samples included in CoRR at the time of submission.

Imaging data

Consistent with its popularity in the imaging community and prior usage in FCP/INDI efforts, the NIFTI file format was selected for storage of CoRR imaging datasets, independent of modalities such as rfMRI, structural MRI (sMRI) and dMRI. Tables 2, 3, 4 (available online only) provide descriptions of the MRI sequences used for the various modalities for each of the imaging data file types.

Table 2 Imaging parameters for sMRI scans in CoRR
Table 3 Imaging parameters for rfMRI scans in CoRR
Table 4 Imaging parameters for dMRI scans in CoRR

Phenotypic information

All phenotypic data are stored in comma separated value (.csv) files. Basic information such as age and gender has been collected for each site to facilitate aggregation with minimal demographic variables. Table 5 (available online only) depicts the data legend provided to CoRR contributors.

Table 5 Phenotypic protocols in CoRR

Technical Validation

Consistent with the established FCP/INDI policy, all data contributed to CoRR was made available to users regardless of data quality. Justifications for this decision include the lack of consensus within the functional imaging community on criteria for quality assurance, and the utility of ‘lower quality’ datasets for facilitating the development of artifact correction techniques. For CoRR, the inclusion of datasets with significant artifacts related to factors such as motion are particularly valuable, as it enables the determination of the impact of such real-world confounds on reliability and reproducibility21,22. However, the absence of screening for data quality in the data release does not mean that the inclusion of poor quality datasets in imaging analyses is routine practice for the contributing sites. Figure 1 provides a summary map describing the anatomical coverage for rfMRI scans included in the CoRR dataset.

Figure 1: Summary map of brain coverage for rfMRI scans in CoRR (N=5,093).
figure 1

The color indicates the coverage ratio of rfMRI scans.

To facilitate quality assessment of the contributed samples and selection of datasets for analyses by individual users23, we made use of the Preprocessed Connectome Project quality assurance protocol (http://preprocessed-connectomes-project.github.io), which includes a broad range of quantitative metrics commonly used in the imaging literature for assessing data quality, as follows. They are itemized below:

  • Spatial Metrics (sMRI, rfMRI)

    • o Signal-to-Noise Ratio (SNR)24. The mean within gray matter values divided by the standard deviation of the air values.

    • o Foreground to Background Energy Ratio (FBER)

    • o Entropy Focus Criteria (EFC)25. Shannon’s entropy is used to summarize the principal directions distribution.

    • o Smoothness of Voxels26. The full-width half maximum (FWHM) of the spatial distribution of image intensity values.

    • o Ghost to Signal Ratio (GSR) (only rfMRI)27. A measure of the mean signal in the ‘ghost’ image (signal present outside the brain due to acquisition in the phase encoding direction) relative to mean signal within the brain.

    • o Artifact Detection (only sMRI)28. The proportion of voxels with intensity corrupted by artifacts normalized by the number of voxels in the background.

    • o Contrast-to-Noise Ratio (CNR) (only sMRI)24. Calculated as the mean of the gray matter values minus the mean of the white matter values, divided by the standard deviation of the air values.

  • Temporal Metrics (rfMRI)

    • o Head Motion

      • Mean framewise displacement (FD)29. A measure of subject head motion, which compares the motion between the current and previous volumes. This is calculated by summing the absolute value of displacement changes in the x, y and z directions and rotational changes about those three axes. The rotational changes are given distance values based on the changes across the surface of a 50 mm radius sphere.

      • Percent of volumes with FD greater than 0.2 mm

      • Standardized DVARS. The spatial standard deviation of the temporal derivative of the data (D referring to temporal derivative of time series, VARS referring to root-mean-square variance over voxels)29, normalized by the temporal standard deviation and temporal autocorrelation (http://blogs.warwick.ac.uk/nichols/entry/standardizing_dvars).

    • o General

      • Outlier Detection. The mean fraction of outliers found in each volume using 3dTout command in the software package for Analysis of Functional NeuroImages (AFNI: http://afni.nimh.nih.gov/afni).

      • Median Distance Index. The mean distance (1-spearman’s rho) between each time-point’s volume and the median volume using AFNI’s 3dTqual command.

      • Global Correlation (GCOR)30. The average of the entire brain correlation matrix, which is computed as the brain-wide average time series correlation over all possible combinations of voxels.

Imaging data preprocessing was carried out with the Configurable Pipeline for the Analysis of Connectomes (C-PAC: http://www.nitrc.org/projects/cpac). Results for the sMRI images (spatial metrics) are depicted in Supplementary Figure 1, for the rfMRI scans in Supplementary Figure 2 (general spatial and temporal metrics) and Supplementary Figure 3 (head motion). For both sMRI and rfMRI, the battery of quality metrics revealed notable variations in image properties across sites. It is our hope that users will explore the impact of such variations in quality on the reliability of data derivatives, as well as potential relationships with acquisition parameters. Recent work examining the impact of head motion on reliability suggests the merits of such lines of questioning. Specifically, Yan and colleagues found that motion itself has moderate test-retest reliability, and appears to contribute to reliability when low, though it compromises reliability when high3133. Although a comprehensive examination of this issue is beyond the scope of the present work, we did verify that motion does have moderate test-retest reliability in the CoRR datasets (see Figure 2) as previously suggested. Interestingly, this relationship appeared to be driven by the lower motion datasets (mean FD<0.2mm). Future work will undoubtedly benefit from further exploration of this phenomena and its impact of findings.

Figure 2: Test-retest plots of in-scanner head motion during rfMRI.
figure 2

Total 1019 subjects who have at least two rfMRI sessions are selected. The green line indicates the correlation between the two sessions within the lower motion datasets (mean FD<0.2 mm). The blue line indicates the correlation for the higher motion datasets (mean FD >0.2 mm).

Beyond the above quality control metrics, a minimal set of rfMRI derivatives for the datasets were calculated for the datasets included in CoRR to further facilitate comparison of images across sites:

  • o Fractional Amplitude of Low Frequency Fluctuations (fALFF)34,35. The total power in the low frequency range (0.01–0.1 Hz) of an fMRI image, normalized by the total power across all frequencies measured in that same image.

  • o Voxel-Mirrored Homotopic Connectivity (VMHC)36,37. The functional connectivity between a pair of geometrically symmetric, inter-hemispheric voxels.

  • o Regional Homogeneity (ReHo)3840. The synchronicity of a voxel’s time series and that of its nearest neighbors based on Kendall’s coefficient of concordance to measure the local brain functional homogeneity.

  • o Intrinsic Functional Connectivity (iFC) of Posterior Cingulate Cortex (PCC)41. Using the mean time series from a spherical region of interest (diameter=8 mm) centered in PCC (x=−8, y=−56, z=26)42, functional connectivity with PCC is calculated for each voxel in the brain using Pearson’s correlation (results are Fisher r-to-z transformed).

To enable rapid comparison of derivatives, we: (1) calculated the 50th, 75th, and 90th percentile scores for each participant, and then (2) calculated site means and standard deviations for each of these scores (see Table 6 (available online only)). We opted to not use increasingly popular standardization approaches (for example, mean-regression, mean centering +/− variance normalization) in the calculation of derivative values, as the test-retest framework provides users a unique opportunity to consider the reliability of site-related differences. As can be seen in Supplementary Figure 4, for all the derivatives, the mean value or coefficient of variation obtained for a site was highly reliable. In the case of fALFF, site-specific differences can be directly related to the temporal sampling rate (that is, TR; see Figure 3), as lower TR datasets include a broader range of frequencies in the denominator—thereby reducing the resulting fALFF scores (differences in aliasing are likely to be present as well). This note of caution about fALFF raises the general issue that rfMRI estimates can be highly sensitive to acquisition parameters7,13. Specific factors contributing to differences in the other derivatives are less obvious (it is important to note that the correlation-based derivatives have some degree of standardization inherent to them). Interestingly, the coefficient of variation across participants also proved to be highly reliable for the various derivatives; while this may point to site-related differences in the ability to detect differences across participants, it may also be some reflection of the specific populations obtained at a site (or the sample size). Overall, these site-related differences highlight the potential value of post-hoc statistical standardization approaches, which can be used to handle unaccounted for sources of variation within-site as well43.

Table 6 Descriptive statistics for common derivatives
Figure 3: Individual differences in fALFF and the temporal sampling rate (TR).
figure 3

Median fALFF values across each individual whole brains are plotted against the corresponding TR for each site. Different colors indicate labels of different sites.

Finally, in Figure 4, we demonstrate the ability of the CoRR datasets to: (1) replicate prior work showing regional differences in inter-individual variation for the various derivatives that occur at ‘transition zones’ or boundaries between functional areas (even after mean-centering and variance normalization), and (2) show them to be highly reproducible across imaging sessions in the same sample. It is our hope that this demonstration will spark future work examining inter-individual variation in these boundaries and their functional relevance. These surface renderings and visualizations are carried out with the Connectome Computation System (CCS) documented at http://lfcd.psych.ac.cn/ccs.html and will be released to the public via github soon (https://github.com/zuoxinian/CCS).

Figure 4: Test-retest plots of individual variation-related functional boundaries.
figure 4

Detection of functional boundaries was achieved via examination of voxel-wise coefficients of variation (CV) for fALFF, PCC, ReHo and VMHC maps. For the purpose of visualization, coefficients of variation were rank-ordered, whereby the relative degree of variation across participants at a given voxel, rather than the actual value, was plotted to better contrast brain regions. Ranking coefficients of variation (R-CV) efficiently identified regions of greatest inter-individual variability, thus delineating putative functional boundaries.

To facilitate replication of our work, for each of Figures 1, 2,3 and Supplementary Figures 1–4, we include a variable in the COINS phenotypic data that indicates whether or not each dataset was included in the analyses depicted. We also included this information in the phenotypic files on NITRC.

Usage Notes

While formal test-retest reliability or reproducibility analyses are beyond the scope of the present data description, we illustrate the broad range of potential questions that can be answered for rfMRI, dMRI and sMRI using the resource. These include the impact of:

  • Acquisition parameters7,38,44

  • Image quality13

  • Head motion7,30,38,43,45

  • Image processing decisions13,30,38,43,4648 (for example, nuisance signal regression for rfMRI, spatial normalization algorithms, computational space)

  • Standardization approaches43

  • Post-hoc analytic choices13,49,50

  • Age5153

Of note, at present, the vast majority of studies do not collect physiological data, and this is reflected in the CoRR initiative. With that said, recent advances in model-free correction (for example, ICA-FIX54,55, CORSICA56, PESTICA57, PHYCAA58,59) can be of particular value in the absence of physiological data.

Additional questions may include:

  • How reliable are image quality metrics?

  • How does reliability and reproducibility impact prediction accuracy?

  • How do imaging modalities (for example, rfMRI, dMRI, sMRI) differ with respect to reproducibility and reliability? And within modality, are some derivatives more reliable than others?

  • Can reliability and reproducibility be used to optimize imaging analyses? How can such optimizations avoid being driven by artifacts such as motion?

  • How much information regarding inter-individual variation is shared and distinct among imaging metrics?

  • Which features best differentiate one individual from another?

One example analytic framework that can be used with the CoRR test-retest datasets is Non-Parametric Activation and Influence Reproducibility reSampling (NPAIRS60). By combining prediction accuracy and reproducibility, this computational framework can be used to assess the relative merits of differing image modalities, image metrics, or processing pipelines, as well as the impact of artifacts6163.

Open access connectivity analysis packages that may be useful (list adapted from http://RFMRI.org):

  • Brain Connectivity Toolbox (BCT; MATLAB)64

  • BrainNet Viewer (BNV; MATLAB)65

  • Configurable Pipeline for the Analysis of Connectomes (C-PAC; PYTHON)66

  • CONN: functional connectivity toolbox (CONN; MATLAB)67

  • Connectome Computation System (CCS; SHELL/MATLAB)13,38,39

  • Dynamic Causal Model (DCM; MATLAB) as part of Statistical Parameter Mapping (SPM)68,69

  • Data Processing Assistant for Resting-State FMRI (DPARSF; MATLAB)70

  • Functional and Tractographic Connectivity Analysis Toolbox (FATCAT; C) as part of AFNI71,72

  • Seed-based Functional Connectivity (FSFC; SHELL) as part of FreeSurfer73

  • Graph Theory Toolkit for Network Analysis (GRETNA; MATLAB)74

  • Group ICA of FMRI Toolbox (GIFT; MATLAB)75

  • Multivariate Exploratory Linear Optimized Decomposition into Independent Components (MELODIC; C) as part of FMRIB Software Library (FSL)76,77

  • Neuroimaging Analysis Kit (NIAK: MATLAB/OCTAVE)78

  • Ranking and averaging independent component analysis by reproducibility (RAICAR; MATLAB)79,80

  • Resting-State fMRI Data Analysis Toolkit (REST; MATLAB)81

Additional information

Tables 2,3,4,5,6 are only available in the online version of this paper.

How to cite this article: Zuo, X.-N. et al. An open science resource for establishing reliability and reproducibility in functional connectomics. Sci. Data 1:140049 doi: 10.1038/sdata.2014.49 (2014).