Elsevier

NeuroImage

Volume 41, Issue 2, June 2008, Pages 286-301
NeuroImage

Robust group analysis using outlier inference

https://doi.org/10.1016/j.neuroimage.2008.02.042Get rights and content

Abstract

Neuroimaging group studies are typically performed with the assumption that subjects used are randomly drawn from a population of subjects. The population of subjects is assumed to have a distribution of effect sizes associated with it that are Gaussian distributed. However, in practice, group studies can include “outlier” subjects whose effect sizes are completely at odds with the general population for reasons that are not of experimental interest. If ignored, these outliers can dramatically affect the inference results. To solve this problem, we propose a group inference approach which includes inference of outliers using a robust general linear model (GLM) approach. This approach models the errors as being a mixture of two Gaussian distributions, one for the normal population and one for the outliers. Crucially the robust GLM is part of a traditional hierarchical group model which uses GLMs at each level of the hierarchy. This combines the benefits of outlier inference with the benefits of using variance information from lower levels in the hierarchy. A Bayesian inference framework is used to infer on the robust GLM, while using the lower level variance information. The performance of the method is demonstrated on simulated and fMRI data and is compared with iterative reweighted least squares and permutation testing.

Introduction

Neuroimaging studies using fMRI and EEG/MEG are often carried out on groups of subjects with the intention of inferring on the general population from which the group of subjects have been randomly sampled. Such data are typically analysed using a mixed effects analysis in which the general population is assumed to have a distribution of effect sizes that are Gaussian distributed with unknown mean and variance (Woolrich et al., 2004). However, in practice group studies can include “outlier” subjects whose effect sizes are completely at odds with the general population for reasons that are not of experimental interest. This abnormal behaviour could arise from a number of sources. For example, it could be due to excessive subject motion, misunderstanding by the subject of the experiment instructions, or poor electrode contact in EEG. If ignored, these outliers can dramatically affect the inference results. For example, the estimate of the population variance can be inflated by outlier subjects, and the population mean estimates can be underestimated or overestimated.

One option is to visually inspect both the data and the results of a group analysis to deduce outliers. These outlier subjects can then be removed and the group study re-analysed without them. However, this is not an ideal approach as it can be difficult to rigourously inspect the data, it introduces subjective judgement, and it makes concrete decisions about whether to remove an entire subject data or not. While very useful exploratory approaches have been proposed that aid in this process (Kherif et al., 2003, Luo and Nichols, 2003, Seghier et al., 2007), human intervention is often still required. In contrast, the approach proposed in this article is automatic, and soft-assigns outlier behaviour in a spatially localised and probabilistic manner.

There has previously been work carried out on robust parametric general linear models (GLMs) when analysing fMRI time series from a single fMRI experiment (Diedrichsen and Shadmehr, 2005, Penny et al., 2007). Also, Wager et al. (2005) introduced the use of robust GLMs in the context of neuroimaging group analysis by using a number of different traditional robust estimation techniques such as Bisquare and Huber weighting. Essentially, these model outlier behaviour by assuming non-Gaussian errors. However, they do not place these robust group GLMs within the full context of a hierarchical group study. In particular, it has been demonstrated how group-level GLMs require the use of summary statistics from the first level to consist of not just the first-level effect sizes but also the variances of the effect sizes. When both the first-level effect sizes and their variances are used at the group level, more sensitive inference can be achieved by virtue of variance weighting, and more accurate group random effect variances can be inferred by avoiding implied negative variances (Beckmann et al., 2003, Woolrich et al., 2004).

An alternative to developing robust parametric approaches is the use of non-parametric permutation tests. This is a powerful way in which we can make our statistics robust to deviations from the assumption of a parametric Gaussian population distribution. These have previously been developed for analysing group fMRI studies, in particular in the context of taking into account the lower level variance information (Meriaux et al., 2006, Roche et al., 2007). To achieve their robustness, such permutation test approaches make weaker assumptions about the population distribution but this can come at the cost of sensitivity.

It is the aim of this work to implement a robust group-level GLM which combines the benefits of outlier inference with the benefits of using variance information from lower levels in a hierarchical group study. This will increase the robustness of parametric approaches while maintaining the benefits of increases sensitivity afforded by making strong distributional assumptions. This uses the same model as Penny et al. (2007), which was introduced for inferring on robust GLMs of fMRI time series data. This models the errors as a mixture of two Gaussian distributions, one for the normal population and one for the outliers. Penny et al. (2007) demonstrated how such a model can be more sensitive to underlying signal than the Bisquare estimation procedure (Wager et al., 2005). However their model was implemented for inferring at the first level on fMRI time series. Here we show how we can infer using this outlier model at the group level within the context of a hierarchical group study. This prohibits the use of a variational Bayes approach used by Penny et al. (2007). Instead inference is carried out within a Bayesian framework using an expectation–maximisation scheme.

Section snippets

Two-level GLM

Consider an experiment where there are NK subjects and that for each subject, k, the preprocessed fMRI data is a T × 1 vector Yk, the T × PK design matrix is Xk, and βk is a PK × 1 vector of parameter estimates (k = 1,…, NK). The preprocessed fMRI data, Yk, is assumed to have been prewhitened (Bullmore et al., 1996, Woolrich et al., 2001). An individual GLM relates first-level parameters to the Nk individual datasets:Yk=Xkβk+ɛk,where εk ~ N(0, σk2IT). Note that IT indicates a T × T identity matrix. In this

Inference

There are no solutions in the frequentist literature to these models when the variance components are unknown. Furthermore, inference is highly sensitive to any assumptions made, due to the low number of observations typically available at the subject level in fMRI. Hence, as in Woolrich et al. (2004), we infer on the hierarchical models in a Bayesian framework.

Results

Fig. 1 illustrates how the proposed approach works by showing the results of inferring on a group mean on a single voxel of example data. The inferred non-outlier and the outlier Gaussian distributions are shown separately. However, the outlier Gaussian distribution is only really visible when we zoom in on the tail of the mixture model as shown in the inset of Fig. 1.

In this particular case, we can see that subject 14 is inferred as a strong outlier with probability k  1. In contrast, subject

Simulated data

We will now use simulated data to investigate the performance of the robust GLM approach we are proposing in this article. We will refer to this proposed approach as the mixture of Gaussians (MOG). For comparison, we will consider three other approaches. Firstly, we will use ordinary least squares (OLS), which uses the traditional assumption that the errors are modelled as a single Gaussian. Secondly, we will use the robust regression Bisquare method (also known as the Tukey Bisquare or

Discussion

We have presented a method for performing group inference on neuroimaging data that includes inference of outliers. This approach models the errors as being a mixture of two Gaussian distributions, one for the normal population and one for the outliers. Crucially this forms part of a traditional hierarchical group model which uses GLMs at each level of the hierarchy. This combines the benefits of outlier inference with the benefits of using variance information from lower levels in the

Acknowledgments

Funding for Mark Woolrich is from the UK EPSRC. Thanks to Tim Behrens, Stephen Smith and Tom Nichols for their thoughts, and Tim Behrens for his dataset.

Cited by (415)

View all citing articles on Scopus
View full text