Introduction

In clinical research, the term ‘biomarker’ or ‘biological marker’ refers to a broad category of medical signs, or objective indications of medical state, that can be measured accurately and reproducibly and may influence and predict the incidence and outcome of disease.1 Increasingly, clinical neuroscience has shifted from a focus on identifying neural correlates of psychiatric conditions to using metrics derived from brain imaging to predict diagnostic category, disease progression, or response to intervention. Over the past several years, researchers have begun to integrate sophisticated machine learning approaches into studies of brain structure and function, with the result that several candidate ‘brain-based biomarkers’ associated with specific disorders have been proposed. The purpose of this review is to summarize recent progress in identifying brain-based biomarkers for autism spectrum disorder (ASD) and attention-deficit/hyperactivity disorder (ADHD), and highlight roadblocks that must be overcome before further progress can be made. Of note, the term ‘biomarker’ has in some cases been loosely applied in studies that do not strictly meet the definition of the term. In the interest of surveying all potentially relevant contributions to the literature, we review studies that have used supervised learning approaches to classify individuals into clinical categories based on neuroimaging data. As highlighted in other recent reviews of brain-based biomarkers in translational neuroimaging,2, 3 we note the significant challenges that lie ahead, including the development of classifiers that generalize across studies, sites and heterogeneous clinical populations.

ASD and ADHD are common disorders affecting youth, and share a high degree of comorbidity.4 Given their high prevalence (ASD affects an estimated 1.5% of children,5 and ADHD affects an estimated 11% of children and adolescents aged 4–17[ref. 6]) and the cost and consequences of delays in treatment,7 it is imperative to diagnose and predict outcomes for children with a high degree of accuracy as early as is feasible.8 Both ASD and ADHD are currently diagnosed on the basis of parent interview and clinical observation,9, 10, 11 with considerable and sometimes troubling differences emerging between sites. In the case of ASD, significant site differences have been reported in best-estimate clinical diagnoses such that even across sites with well-documented fidelity using standardized diagnostic instruments, clinical distinctions were not reliable.12 The identification of objective, reliable biomarkers is thus a critical yet elusive goal for neuroimaging researchers investigating these neurodevelopmental disorders.

What should a biomarker predict?

In order for a biomarker to be developed, it must be demonstrated that the marker is present before the onset of symptoms, and that it is specific to the disorder.13 We proceed with the caveat that very few of the neuroimaging studies reviewed here would fit this strict definition. Few have used prior-onset features to predict whether a child will later develop ASD or ADHD (but see ref. 14), and most published classification studies only distinguished patients from controls rather than distinguishing among different patient groups. Further, in all classification studies to date, group labels (for example, patient vs control) were determined by clinical diagnosis and reflect a binary designation that may not reflect the heterogeneity and dimensionality inherent to complex neurodevelopmental disorders. Although diagnosis following the Diagnostic and Statistical Manual of Mental Disorders (DSM-5)11 has been the norm in child psychiatry, it is worth noting that there has been a recent shift towards considering dimensions of behavior, consistent with the Research Domain Criteria (RDoC) approach.15 Although there has been some push back from the clinical community regarding the utility of adopting an RDoC framework,16 there is increasing recognition of the fact that dimensions of behavior can cut across traditional diagnostic categories. A few existing studies have focused on dimensions of symptoms using approaches such as support vector regression (see ‘Overview of classifiers’) that allow one to examine behaviors in a continuous rather than categorical manner. Here, we aim to review commonly used classifiers in neuroimaging, summarize current findings relevant to the development of brain-based biomarkers for ASD and ADHD, and discuss open challenges.

Overview of classifiers in neuroimaging

Neuroimaging research in clinical populations has recently begun testing the potential of the attributes of neuroimaging data sets to classify participants into a clinical disorder group and a typical control group. These machine-assisted techniques are often referred to as classifiers. Classifiers fall under the broad branch of machine learning, where computers use algorithms to learn which patterns account for the differences between two or more groups. This may involve establishing a rule whereby a given observation can be classified or sorted into an existing group. These rules can be thought of as features that cause the classifier to distinguish one group from another. By analyzing which features are driving the decisions of the classifier, the patterns of differences between groups can be inferred. Thus, pattern classification analysis of neuroimaging data is important in testing the diagnostic utility of neuroimaging-based markers of psychiatric disorders.

In classifying clinical populations, the classifier searches for differences in neural patterns between a healthy control group and a patient group. These patterns can be activation or lack of activation of a specific brain area, connectivity of a specific brain network, volumetric differences across brain areas, or other metrics derived from neuroimaging data. In other words, classifiers use individual features (such as brain activity, brain morphology or white matter orientation) to predict group membership of a given participant. By examining which features are most important to the classification algorithm, researchers can understand the specific patterns that account for the largest variance between clinical populations and controls. A limitation of this approach, however, is that the classifiers are multivariate combinations of different features, which can make interpretation of specific anatomical contributions difficult. When carefully constructed, biomarkers have the potential to provide insight into the neurological mechanisms of the disorder and targets for future treatment.

Several of the most commonly used classifiers in neuroimaging include: support vector machine (SVM), logistic regression, decision tree and random forest (RF). SVM is a supervised classification method that uses a subset of data for training, which can then be applied to new data. SVM locates the hyperplane (a high-dimensional plane) that optimally separates data into two groups (for example, patients versus controls) based on features of the data (Figure 1). In the case of neuroimaging data, each point in feature space corresponds to an individual subject. Individuals thus become points in a high-dimensional space, and a hyperplane can then be used for discrimination purposes.17 SVMs have been applied successfully in many brain state and disease state classification problems using functional magnetic resonance imaging (fMRI).18, 19 Logistic regression uses maximum likelihood to estimate the logistic function (an S-shaped curve); this models the probability that an observation belongs to a particular category.20 The decision tree method21 is a supervised learning strategy which builds a classifier from a set of training samples with a list of features (or attributes) and a class label. RF is a popular machine learning algorithm22 that is an ensemble learning method that randomly samples the data with replacement (bootstrapping) to construct many decision trees. Each decision tree is constructed, or grown, using a random subset of the input features. The majority vote of the trees makes up the ‘forest’ or the prediction. Although other types of classifiers can in principle be used in neuroimaging research, those reviewed here have been the most widely used to date.

Figure 1
figure 1

Illustration of support vector machine (SVM). SVM is one machine learning approach that is often used in classification studies. If there is a population of subjects (x=autism spectrum disorders, o=typically developing) with voxel values (v1 and v2, for example), then evaluation of one voxel at a time would not differentiate the two groups because there is a substantial amount of overlap between the two groups on each dimension (as shown by the dashed red and blue lines). A univariate analysis evaluating data one voxel at a time (e.g., either v1 alone v2) would not be able to detect group differences in such a scenario. However, if v1 and v2 are considered together, a plane separating the two groups can be constructed, thereby identifying a neighborhood where the two groups differ in spatial patterns of the anatomical or functional measures of interest.

There are many metrics that measure the performance of classifiers, but the most commonly used metric is accuracy. Accuracy assesses the percentage of participants that were correctly classified as being in the clinical or control group. In addition to accuracy, sensitivity and specificity are commonly reported measures of classifier performance. Sensitivity measures the proportion of clinical cases that are correctly identified, and in the binary clinical-control classifier, sensitive indicates the accuracy for the clinical group. Specificity measures the proportion of controls that are correctly identified, and for binary classifiers, indicates accuracy for the control group.

Although machine learning algorithms have opened up a new avenue to neuroimaging research by providing classification of disease states, methodological rigor needs to be ensured for accurate interpretation and application of findings. The goal of machine learning methods is to fit an algorithm to a set of training data in a way that the algorithm can subsequently be applied to new data. Fitting an algorithm to training data can be a relatively simple process, whereas creating a classifier that can generalize beyond the training data set is quite challenging. One of the commonly seen effects involves the classifier exhibiting high accuracy on the training data set but plummeting to chance-level accuracy when presented with new data. This problem is referred to as overfitting, and it often indicates that the algorithm has been fit to random noise of the data rather than its truly classifying features.23 Overfitting occurs when the classifier exhibits good performance on the training data but has poor generalizability to new data. Under-fitting, on the other hand, occurs when the classifier exhibits poor performance on both the training data and the testing data. Under-fitting indicates poor overall fit.

To ensure that a classifier can perform accurately on new data, it is important to reserve data specifically for testing. For example, a classifier could be trained on half of the data and then tested on the other half. However, it is important to maximize the information available for building the classifier, and it can be detrimental to reserve too much data for testing. To deal with this dilemma, a procedure referred to as cross-validation, the standard approach to measure predictive power of a data set, is used. In cross-validation, the available data are split into a training set, used to train the model, and a testing set, unseen by the model during training and used to compute a prediction error.24 One form of cross-validation, known as the ‘leave-one-out’ method, involves leaving one example out of the data set, training the classifier on the remaining items, and then testing one example. This is repeated for each example in the set, and the accuracy is computed from the performance on each of the examples. As this procedure can be computationally expensive with large data sets, ‘k-fold cross-validation’ can be used instead, which involves dividing the data into k-parts (for example, 5, 10) that are each held out for testing.20

In addition, a desirable quality of a machine learning algorithm is high stability. A classifier is deemed to have higher stability if modifying the training data does not largely change the resulting classification algorithm. For example, if removing a single example from the training set results in large perturbations in the resulting classification algorithm, it reflects the relative instability of the classifier. Stability is an important characteristic of a classifier that needs to be ensured in studies of classification analyses, as unstable predictions may result in poor reproducibility of findings.

Brain-based biomarkers of ASD

ASD is now thought to affect 1 in 68 children,25 making early diagnosis an urgent public health concern. The gold standard for clinical assessment includes administration of the Autism Diagnostic Observation Schedule 2 (ADOS-2)26 and the Autism Diagnostic Interview-Revised (ADI-R),10 which assess behavior by semi-structured play-based interviews and parent interviews, respectively. No objective, biological markers exist for diagnosing ASD. Although biomarkers can in principle also be derived from task-based fMRI features, these are less likely to be applied in a clinical setting where task difficulty may preclude some children from participation. Thus, we limit the current discussion to reviewing progress in the development of structural and intrinsic functional connectivity MRI-based biomarkers (Table 1). Although structural MRI and resting-state fMRI data are easier to collect from participants than task fMRI data, it should be noted that one major limitation of the existing literature is that nearly all studies focus on those with ASD without intellectual disability (ID) due to the requirement to stay still for extended periods of time and comply with verbal instructions in the scanner environment. Most studies consequently only recruit high-functioning individuals with ASD.

Table 1 Neuroimaging-based classification studies of ASD

Structural MRI

In one of the first detailed analyses of potential brain biomarkers in adults with ASD, Ecker et al.36 examined five morphological parameters including volumetric (cortical thickness and surface area) and geometric (average convexity or concavity, mean curvature and metric distortion of cortex) features derived from structural MRI data. Using SVM and a multiparameter classification approach, they found that all five parameters together produced up to 85% accuracy, 90% sensitivity and 80% specificity for discriminating individuals with ASD from neurotypical (NT) controls. The authors further demonstrated that the classifier built to discriminate ASD from NT individuals was clinically specific in that it somewhat successfully categorized individuals with ADHD as non-autistic. These results represent an important first step towards identifying structural biomarkers in ASD but are limited in their clinical utility in that only adults (aged 20–68) were examined. As ASD is a disorder with early life onset and variable developmental trajectory, studies of younger individuals are critical for disentangling disorder-specific neural signatures.

In a study of children and adolescents with autism, gray matter volume was used as a structural feature for classification. Using SVM searchlight classification, this study found that gray matter within default mode network regions (posterior cingulate cortex and medial temporal lobes: 92%, medial prefrontal cortex: 88%) could be used to discriminate between clinical and control groups with high accuracy.28 The earlier study of adults with ASD also reported morphometric abnormalities in the posterior cingulate that contributed to classification, suggesting that this cortical midline region may be a locus of dysfunction throughout the lifespan.

A more recent study of children used a unique approach, multi-kernal SVM, combining regional (cortical thickness, gray matter volume) and interregional (morphological change patterns between pairs of ROIs) features to achieve 96% classification accuracy.31 In this study, the neuroanatomical features contributing the most to classification were subcortical structures including putamen and accumbens, unlike the studies in older individuals.

Although the studies by Uddin and Wee converge in finding that white matter volume was a poor feature for classification of ASD, others have had considerably greater success using features derived from white matter. Lange et al.37 report high sensitivity (94%), specificity (90%) and accuracy (92%) using diffusion tensor imaging (DTI)-derived metrics in a discovery cohort and a small replication sample. This region-of-interest-based study specifically examined white matter microstructure in the superior temporal gyrus and temporal stem.

A recent study utilizing 590 6–35-year-old participants from the Autism Brain Imaging Data Exchange (ABIDE38) data set, however, was more pessimistic. Using linear and non-linear discriminant analysis to perform multivariate classification based on anatomical measures (gray matter volume, cortical thickness and cortical surface area) the authors achieved only 56 and 60% accuracy based on subcortical volumes and cortical thickness measures, respectively. The authors take these poor decoding accuracies to indicate that anatomical differences offer very limited diagnostic value in ASD.34

Another study also using the ABIDE data set examined morphometric features from structural MRIs of 361 individuals with ASD and 373 controls. Using a RF classifier, the authors demonstrate that only modest classification could be achieved using brain structural properties alone, but that sub-grouping individuals by verbal IQ, autism severity, and age significantly improved classification accuracy.35 This suggests that to achieve the highest classification accuracies in ASD, multiple different structural features may need to be combined with behavioral indices.

Resting-state functional MRI

Resting-state fMRI (rs-fMRI) has been increasingly used over the past decade to study the development of functional brain circuits, and to better understand the large-scale organization of the typically and atypically developing brain.39 Resting-state fMRI entails collecting functional imaging data from participants as they lay in the MRI scanner, typically fixating gaze on a cross-hair or with their eyes closed, and refraining from engaging in any specific cognitive task.40 Some of the advantages of using rs-fMRI in pediatric and clinical populations are that functional brain organization can be examined independent of task performance, imaging data can be acquired from otherwise difficult-to-scan populations41 and a full data set can be collected in as little as 5 min.42

In one of the first studies to use rs-fMRI to attempt to discriminate autism from typical development, Anderson and colleagues used pairwise functional connectivity measures from regions of interest (ROIs) across the entire brain to demonstrate that classification accuracy of 79% could be obtained using data from participants aged 8–42. For individuals under the age of 20, the classifier performed at 89% accuracy. The most informative connections contributing to successful classification were in areas of the default mode network, anterior insula, fusiform gyrus and superior parietal lobule.27 This work emphasizes the potential for classifiers to be more accurate and most informative when applied to subsamples within more restricted age ranges.

Another study examining a younger cohort (age 7–12) using a logistic regression classifier also found that the salience network, including the anterior insular cortices, contained information that could be used to discriminate children with ASD from TD children with 78% accuracy.30 This study used independent component analysis (ICA) maps as features for classification. Data collected at another institution could be classified with 80% accuracy based on the classifier built by the authors. This type of cross-site validation of classifiers is essential for clinical utility, but has proven to be difficult. Nielsen et al.29 utilized pairwise connectivity data from 964 subjects collected at 16 different sites to obtain 60% classification accuracy. As in the earlier study from Anderson et al., connections involving the DMN, parahippocampal and fusiform gyri, and insula contained the most information necessary for accurate classification. This study demonstrates the challenges of building classifiers that can perform accurately across multi-site data sets. Another recent study utilizing data from ABIDE found that the RF approach produced a classification accuracy of 91% based on a functional connectivity matrix of 220 ROIs across the brain. In this study, informative features were found to be located in somatosensory, default mode network, and visual and subcortical regions.33 The most recent study using resting-state fMRI data from the ABIDE database computed whole-brain connectomes (functional connectivity matrices between brain regions of interest) from several atlases to achieve classification accuracy (with support vector classification approaches) of 67%.43 Connections within the default mode network and parieto-insular connections contributed most to prediction of diagnostic category. Finally, a recent study conducted on a large Japanese cohort achieved 85% accuracy and found that discriminating features included functional connectivity between regions of the cingulo-opercular network.44 This rs-fMRI classification study used a unique combination of two machine learning algorithms, L1-regularized sparse canonical correlation analysis (L1-SCCA) and sparse logistic regression, and had fair generalizability across samples, achieving 75% accuracy in an independent validation cohort.45 Taken together, the emerging picture from rs-fMRI studies is that discriminating patterns of connectivity in ASD may reside in DMN and salience/cingulo-opercular network regions.

Multimodal MRI

Owing to the sparse nature of the ASD biomarker literature there is very little information on the use of multimodal MRI to classify ASD. However, there seems to be potential in combining structural and resting-state MRI features to discriminate between ASD and TD groups. A study investigating a wide range of classifiers reports that a random tree classifier using combined cortical thickness and functional connectivity measures resulted in improved classification and prediction accuracy compared to classification using the single imaging features.32 This study was comprehensive in its inclusion of classifiers and features, and the findings suggest an integrative model could be fruitful. However, a more systematic approach to evaluation of classification algorithms is necessary for further progress.

Brain-based biomarkers of ADHD

Compared with the ASD biomarker literature, the biomarker literature for ADHD has a greater number of studies (Table 2). This is in part due to the aggregation of a large (N=973), multi-site, publicly available data set called ADHD-200 (http://fcon_1000.projects.nitrc.org/indi/adhd200/)46 with which an orchestrated global machine learning competition was implemented. The ADHD-200 Global Competition provided a platform for the development of diagnostic classification tools for ADHD using structural and functional MRI data. The best classifier of the 21 competitors in terms of specificity, or the ability to accurately classify TD individuals, was from Eloyan et al.,47 who reported 61% accuracy, 94% specificity and 21% sensitivity to predict diagnosis (TD, ADHD-Inattentive or ADHD-Combined). Surprisingly, when only using phenotypic measures such as site of data collection, sex, age, handedness and IQ, another study48 achieved higher classification accuracy than any imaging-based classifiers (62.52%), whereas a more recent study demonstrated that combining phenotypic and functional imaging data achieved the better accuracy (65%) than phenotypic data alone (59.6%).49 As this competition, researchers have continued to test novel classifiers to improve upon these initial results. We highlight the most successful classifiers below and the features that led to successful prediction of ADHD diagnosis, focusing on binary (TD vs ADHD) classification.

Table 2 Neuroimaging-based classification studies of ADHD

Structural MRI

Structural MRI shows promise as a potential biomarker for ADHD. Several research groups have achieved impressive classification accuracies using predictors such as gray matter volume and surface area. One of the best-performing classifiers for ADHD used white matter alone to achieve 93% accuracy, with a sensitivity of 100% and specificity of 85%.56 The researchers reported reduced white matter in the central pons, which was predictive of ADHD diagnosis. Similarly, Peng et al.55 reported high classification accuracy (90.18%) using an extreme learning machine (ELM) algorithm with multiple cortical features. Discriminative brain regions included inferior frontal, temporal, occipital and insular cortices. Another study that used whole-brain gray matter volume to classify boys with ADHD and TD boys reported 79.3% accuracy, with 75.9% sensitivity and 82.8% specificity.54 Similar to the Peng et al.55 study, Lim and colleagues reported that the ventrolateral frontal cortex and insula were discriminative. In addition, this group also reported that limbic regions such as hippocampus, amygdala, hypothalamus and ventral striatum were predictive of ADHD status.54

The structural studies mentioned above were single site studies, possibly contributing to their high performance. Of note, one multi-site study using a variety of structural measures obtained high accuracies for binary classification (TD vs ADHD-I: 85.29%, TD vs ADHD-C: 79.40%), demonstrating that even multi-site studies have achieved good performance using structural measures.58

Resting-state functional MRI

On one of the largest data sets used to date in ADHD classification (N=1177)57 achieved 90% accuracy using measures of whole-brain functional connectivity and an artificial neural network algorithm. Of note, these researchers assessed accuracy of binary classification for TD/ADHD-I and TD/ADHD-C separately, which may have contributed to their success. Interestingly, the researchers regressed out the effects of age, IQ, handedness, sex and site from each feature set prior to classification, indicating that imaging-specific features predicted diagnostic status unique from phenotypic characteristics. They found that OFC-cortical and cortico-cerebellar functional connectivity was most discriminative. Similarly, using local and long-distance measures of functional connectivity, Cheng et al.60 found that frontal and cerebellar regions were most discriminative in classifying ADHD and TD children. Zhu et al.61 used a measure of local connectivity, regional homogeneity, to discriminate between ADHD and controls, and found that the most discriminative brain regions included the PFC, ACC, and cerebellum. Overall, functional connectivity of frontal and cerebellar brain regions appear to be good candidates for future use as features in discriminating individuals with ADHD from controls.

Multimodal MRI

Few studies have begun to combine structural and rsMRI data to classify individuals with ADHD from TD individuals. Of those that have, classifier performance tends to be poorer overall than for studies utilizing structural or functional data alone.50, 51, 52, 53, 59 One exception to this pattern was a study by Qureshi et al.,59 which used structural and functional data as predictors for an impressively high multi-class classification (TD, ADHD-I, ADHD-C; one vs all: 76.19%).59 Their success may have been due to employing rigorous feature selection in addition to testing more than one classifier, which demonstrated that the ELM algorithm outperformed the more traditional SVM.

Comorbidity of ASD and ADHD and cross-diagnostic classification

One particularly problematic issue in developing specific biomarkers is the presence of comorbidity across disorders. Rates of comorbid ADHD symptoms in children with ASD range from 37–85%,62 with ADHD the second-most common comorbidity in ASD.63 Conversely, rates of comorbid ASD in children with ADHD are lower, at about 22%.64, 65 Thus, it will be necessary to parse heterogeneity within these disorders prior to attempting to identify biomarkers.

Importantly, the Lim et al.54 study is the only study to date to test an ADHD-specific classifier by discriminating adolescents with ADHD from those with ASD. When classifying these two disorders, they report even higher accuracy than when discriminating ADHD from TD (accuracy 85.2 vs 79.3%). To our knowledge, this study is also the first to attempt to discriminate adolescents with ADHD, ASD, and healthy controls, achieving a balanced accuracy of 68.2%. One other study also considered both ASD and ADHD, but tested each disorder against healthy controls separately, missing an opportunity to employ cross-diagnostic classification.49 A recent study applied machine learning to scores derived from the Social Responsiveness Scale to differentiate between ASD and ADHD,66 but brain-based features were not evaluated.

Limitations of current approaches

Machine-assisted classification of neuroimaging data has provided a new direction to ASD and ADHD research that has important implications for diagnosis and treatment. First, the identification of reliable biomarkers can help provide mechanistic explanations of etiology and behavioral symptomatology. Second, in the long run, using such markers could better assist behavior-based diagnosis. This is especially important for complex or borderline cases, where misdiagnosis is not rare. Third, biomarker screenings could be used on infants and young children to assess their risk of developing a disorder, which is helpful in identifying children at high risk before they show symptoms.67 Thus, children at high risk could receive early, targeted treatment and intervention that would positively impact outcomes of disorders. Infant sibling designs aimed at studying high-risk children are among the most promising avenues for future research. For example, a recent prospective neuroimaging study found that hyperexpansion of the cortical surface between 6 and 12 months of age precedes brain volume overgrowth observed between 12 and 24 months in high-risk infants diagnosed with ASD at 24 months.14 This type of longitudinal work will be necessary to develop true biomarkers for any neurodevelopmental disorder.

Despite the promise and potential of neuroimaging-based markers, inconsistency in the current classification literature suggests that more empirical work must be undertaken. Several factors including participant age, type of classifier used, and sample size contribute to these inconsistencies. In the case of sample size, it is difficult to compare two studies where one tested 1000 participants (for example) and achieved a classification accuracy of 60% and another tested 40 participants and achieved a classification accuracy of 80%. Furthermore, as most studies used different classification algorithms and included different neuroimaging features, it is not possible at present to directly compare results from various studies. A recent review highlights these issues, pointing out that almost all studies that have reported high classification accuracies had sample sizes smaller than 100.3

It is important to note that biomarkers must meet several criteria before being used clinically. First, a biomarker should be present before symptoms begin to serve predictive value. All studies reviewed here were conducted in school-age children and older individuals. Sampling younger children is necessary to examine the predictive power of early-identified brain features. Furthermore, it is quite possible that features identified in the existing literature might reflect either causal or compensatory differences in brain function and structure, and that these features may not be the same as those that are most predictive at early ages. Studies should ideally prospectively screen young children and longitudinally track the occurrence of symptoms. It has already been shown in studies of high-risk infants that developmental trajectories of white matter are most predictive of later ASD diagnosis.68 This suggests that critical information necessary for accurate classification may be overlooked in cross-sectional investigations.

Second, a biomarker must be defined independently of diagnostic symptoms; otherwise, the classifier is validated using the same features that created it. In other words, the relationship between the biomarker and neuropathology must be clear.69 So far, neuroimaging studies have found an array of potential markers and there is not yet convergence on proposed mechanisms. Third, the biomarker must be specific to the disorder rather than a hallmark of general pathology. For instance, abnormal DMN functional connectivity has been associated with both ASD and schizophrenia.70, 71 There are also overlapping biomarkers in children with autism and their unaffected siblings.72, 73 To achieve specificity, the classifier must be tested on large samples that include a variety of disorders. Neuroimaging databases, such ABIDE and the UCLA Multimodal Connectivity Database, can help provide these large samples to test classifiers on. A classification model should be precisely defined and validated across research-sites and populations. Further, a biomarker should have high diagnostic performance in classification, as measured by sensitivity and specificity.69 One recent study in ASD has demonstrated that classification via behavioral measures (in this case scores on the Social Responsiveness Scale) outperformed classification based on analysis of rs-fMRI data.74 However, brain-based biomarkers are at a clear disadvantage compared with behavioral measures, as behavioral measures were designed based on diagnostic criteria. Once diagnoses are more biologically grounded rather than relying solely on observation and interview, it is the hope that early and more objective diagnoses can be achieved.

Other limitations of current approaches include the practice of recruitment of equal case and control samples. In almost all of the studies reviewed here, equal numbers of individuals with ASD or ADHD were included. However, if one wants to apply a classifier to the general population where the prevalence of the disorder ranges from 1.5 to 11%, this can lead to ascertainment bias. For example, with a near ideal classifier of 95% sensitivity and specificity, in 1000 individuals we would accurately identify about 14 of the 15 children with ASD (true positive), but we would inaccurately identify 49 of the remaining 986 as having ASD (false positive) if we assume no other disorders are present. In this example, only 14/63 (22%) of positive identifications would be true cases of ASD.

Biomarkers will not replace clinical assessments, which characterize the extent of specific deficits, but could potentially change treatment goals and methods.67 There are financial barriers to this development, however. MRI is an expensive tool that is unlikely to be used regularly outside of dense urban areas or major hospitals. In ‘Future directions’, we highlight alternative neuroimaging approaches that may prove more feasible in the clinical setting.

In the future, a biomarker could potentially be necessary to receive treatment or insurance coverage, but this will surely create issues if a subset of patients has symptoms but no identifiable biomarker. Finally, families should be given accurate information on what the presence of a biomarker means for their child. It is important not to develop a deterministic view, as it is likely that a biomarker only signifies an increased risk of developing the disorder. This will influence the choices that parents and physicians make regarding treatment.

Future directions: increasing sample heterogeneity

All of the reviewed studies recruited individuals without ID who could successfully complete MRI scans. Of course, this biases the studies to high-functioning individuals, and it is not clear how well the results generalize to the broader population of children with ASD and ADHD. Even within high-functioning individuals, MRI success rates can vary.41 There are a few approaches that in principle could be applied to a greater range of children at varying levels of intellectual function. Sedation and natural sleep75 have been used and can facilitate the collection of neuroimaging data from otherwise inaccessible children. In addition, neuroimaging approaches such as electroencephalography, magnetoencephalography or functional near-infrared spectroscopy (fNIRS), which do not place as stringent requirements on participants to remain motionless, are potentially useful tools with which to derive brain features for classification. Though these methods have limited spatial resolution, making it difficult to assess the contributions of specific brain regions, they can in principle be utilized to a greater extent in future work in biomarker development.

At present, the field has not yet reached the point where we can use brain-based biomarkers to diagnose individuals with a specific disorder. However, the studies reviewed here are instrumental in identifying key dysfunctional brain regions and circuits, moving us closer to understanding the biological basis of these prevalent neurodevelopmental disorders.