Elsevier

NeuroImage

Volume 39, Issue 3, 1 February 2008, Pages 1104-1120
NeuroImage

Multiple sparse priors for the M/EEG inverse problem

https://doi.org/10.1016/j.neuroimage.2007.09.048Get rights and content

Abstract

This paper describes an application of hierarchical or empirical Bayes to the distributed source reconstruction problem in electro- and magnetoencephalography (EEG and MEG). The key contribution is the automatic selection of multiple cortical sources with compact spatial support that are specified in terms of empirical priors. This obviates the need to use priors with a specific form (e.g., smoothness or minimum norm) or with spatial structure (e.g., priors based on depth constraints or functional magnetic resonance imaging results). Furthermore, the inversion scheme allows for a sparse solution for distributed sources, of the sort enforced by equivalent current dipole (ECD) models. This means the approach automatically selects either a sparse or a distributed model, depending on the data. The scheme is compared with conventional applications of Bayesian solutions to quantify the improvement in performance.

Introduction

Bayesian approaches to the inverse problem in EEG represent an exciting development over past years (see Baillet and Garnero, 1997, Russell et al., 1998, Sato et al., 2004, Jun et al., 2006, Nagarajan et al., 2006, Daunizeau et al., 2007, Nummenmaa et al., 2007 for some important developments). A special instance of Bayesian analysis rests on empirical Bayes in which spatial priors are estimated from the data. Parametric empirical Bayesian (PEB) models are simple hierarchical linear models under parametric assumptions (i.e., additive Gaussian random effects at each level). Their hierarchical form enables one level to constrain the parameters of the level below and therefore act as empirical priors (Efron and Morris, 1973, Kass and Steffey, 1989). In the context of the EEG inverse problem, the parameters correspond to unknown source activity and the priors represent spatially varying constraints on the values the parameters can take. PEB models furnish priors on the parameters through hyperparameters encoding the covariance components of random effects at each level. However, these models can also be extended hierarchically by inducing hyperpriors on the hyperparameters themselves (see Trujillo-Barreto et al., 2004, Sato et al., 2004, Daunizeau and Friston, 2007). This is the hierarchical extension considered in Sato et al. (2004) and evaluated using sampling techniques in Nummenmaa et al. (2007). Under these models, it is possible to estimate the inverse variance (i.e., precision) of each prior, even when the number of hyperparameters exceeds the number of observations. Sato et al. used this to estimate an empirical prior precision on a large number of sources on the cortical mesh. This estimation used standard variational techniques to estimate the conditional density of the parameters and precision hyperparameters. In this context, non-informative gamma hyperpriors on the precision of random effects are also known as automatic relevance determination or ARD priors (Neal, 1998, Tipping, 2001). This approach gives better results, in terms of location and resolution, compared to standard minimum norm estimators.

The approach taken here uses covariance as opposed to precision hyperparameters (see also Wipf et al., 2006). This has two advantages: first the fixed-form variational scheme used for estimation reduces to a very simple and efficient classical covariance component estimation based on ReML (Patterson and Thompson, 1971, Harville, 1977, Friston et al., 2007). This means one can consider a large range of models with additive covariance components in source space (e.g., different source configurations) using exactly the same variational scheme (i.e., there is no need to derive special update rules for different components). Second, one avoids the improper densities associated with non-informative ARD priors based on the gamma density (see Gelman, 2006).

Previously, we have described the use of parametric empirical Bayes (PEB) to invert electromagnetic models and localize distributed sources in EEG and MEG (Phillips et al., 2002a, Phillips et al., 2002b, Phillips et al., 2005). Empirical Bayes provides a principled way of quantifying the relative importance of spatial priors that replaces heuristics like L-curve analysis. Furthermore, PEB can accommodate multiple priors and provides more accurate and efficient source reconstruction than its precedents (Phillips et al., 2002a, Phillips et al., 2002b). After this, we explored the use of PEB to identify the most likely combination of priors using model selection, where each model comprises a different set of priors (Mattout et al., 2006). This was based on the fact that the restricted maximum likelihood (ReML) objective function used in the optimization of the model parameters is the log-likelihood, lnp(y|λ,m), of the covariance hyperparameters, λ for a model m and data y; a model is defined by its covariance components associated with activity over sources. We have since applied the ensuing inversion schemes to evoked and induced responses in both EEG and MEG (see Friston et al., 2006).

Finally, we showed that adding the entropy of the conditional density on the hyperparameters to the ReML objective function provides a free energy bound on the log-evidence or marginal likelihood lnp(y|m) of the model itself (Friston et al., 2007). Although this result is well-known to the machine learning community, it is particularly important here because it means one can use ReML within an evidence (i.e., free energy) maximization framework to optimize the parameters and hyperparameters of electromagnetic forward models. The key advantage of ReML is that optimization can proceed using the sample covariance of the data in measurement or channel space, which does not increase in size with the number of sources. The result is an efficient optimization, which uses classical methods, designed originally to estimate Gaussian covariance components (Patterson and Thompson, 1971). The ensuing approach is related formally to Gaussian process modeling (Ripley, 1994, Rasmussen, 1996, Kim and Ghahramani, 2006), where empirical Gaussian process priors are furnished by a hierarchical (PEB; Kass and Steffey, 1989) model.

The fact that ReML can be used to optimize a bound on the marginal likelihood or evidence means that it can be used for model selection, specifically to select or compare models with different Gaussian process priors. Furthermore, under simple hyperpriors, ReML selects the best model automatically. This is because the hyperpriors force the conditional variance of hyperparameters to zero, when their conditional mean is zero. This means the free energy is the same that one would obtain with formal model comparison. In short, ReML can be used to estimate the hyperparameters controlling mixtures of covariance components in both measurement and source space that generate data. If there are redundant components, ReML will automatically switch them off, or suppress them, to provide a forward model with the greatest evidence or marginal likelihood. This is an example of automatic relevance determination (ARD). ARD refers to a general phenomenon, in hierarchical Bayesian models, where maximizing the evidence (often through EM-like algorithms) leads to pruning away of unnecessary model components (see Neal, 1996, 1998).

Recently, Wipf et al. (2006) provided an extremely useful formulation of empirical Bayesian approaches to the electromagnetic inverse problem and show how existing schemes “can be related via the notion of automatic relevance determination (Neal, 1996) and evidence maximization (MacKay, 1992)”. The approach adopted here conforms exactly to the principles articulated in Wipf et al. and re-iterates the generality of free energy or evidence maximization. Wipf et al. (2006) also consider particular maximization schemes, based on standard variational updates, under inverse gamma hyperpriors. We use an ReML scheme, which is much simpler and uses log-normal hyperpriors. This allows us to use the Laplace approximation to the curvatures of the log-evidence during optimization (Friston et al., 2007).

In summary, this paper takes the application of ReML to the EEG inverse problem to its natural conclusion;1 instead of using a small number of carefully specified prior covariance components (e.g., Laplace, minimum norm, depth constraints etc.) we use a large number of putative sources with compact (but not necessarily continuous) support on the cortical surface. The inversion scheme automatically selects which priors are needed, furnishing sparse or distributed solutions, depending on the data. This provides a graceful balance between the two extremes offered by sparse ECD models and the distributed source priors implicit in weighted minimum norm solutions (see also Daunizeau and Friston, 2007). Critically, the inversion scheme is fast, principled and uses a linear model, even when sparse ECD-like solutions are selected.

This paper comprises three sections. In the first, we present the theory and operational details of the inversion scheme. We then compare its performance to existing applications using distributed constraints and simulated EEG data. In the final section, we illustrate its application to a real data set that is available at http://www.fil.ucl.ac.uk/spm.

Section snippets

Theory

This section describes the model and inversion scheme. In brief, we use ReML to estimate covariance hyperparameters at both the sensor and source levels. Once these hyperparameters have been optimized, the posterior mean and covariance of the parameters (source activity) are given by simple functions of the data and hyperparameters. Here, ReML can be regarded as operating in an evidence optimization framework, which leads to ARD phenomena and the elimination of redundant sources. We will show

Simulations

In this section, we use simulations to evaluate the performance of various models. We will look at model evidence, variance explained and pragmatic measures of spatial and temporal accuracy. We describe how the data were simulated and the models considered and then report comparative analyses. First, we consider the generative models used to simulate data. These models comprise the conventional forward model encoded in the lead-field matrix and the priors on the sources.

Analyses of real data

In this section, we use real data to provide some provisional validation of the MSP model through model comparison. We start by optimizing the number of MSPs, in the context of real data, and then compare the three models from the previous section, using an optimum MSP model. We conclude with anecdotal evaluations in relation to ECD and fMRI analyses, of the same experimental effects, which try to establish some construct and face validity.

Conclusion

This paper has described a new application of hierarchical or empirical Bayes to the distributed source reconstruction problem in EEG and MEG. The key contribution is the automatic selection of multiple cortical sources with compact spatial support that are specified in terms of empirical priors. This obviates the need to use priors with a specific form (e.g., smoothness or minimum norm) or with spatial structure (e.g., priors based on depth constraints or functional magnetic resonance imaging

Software note

The inversion scheme and models considered in this paper are implemented in the SPM academic software, which is available freely from http://www.fil.ion.ucl.ac.uk/spm. The MSP and other models are an integral part of the source reconstruction stream, which allows one to create conditional contrasts and their energy for any number of trials or types. The display format used by SPM adopts the same format used in Fig. 2, Fig. 12, Fig. 13.

Acknowledgments

The Wellcome Trust, the Medical Research Council and British Council funded this work. Jérémie Mattout is funded by the Foundation pour la Recherché Médicale (FRM). Christophe Phillips is funded by the Fonds de la Recherche Scientifique (FNRS).

References (44)

  • C. Phillips et al.

    An empirical Bayesian solution to the source reconstruction problem in EEG

    NeuroImage

    (2005)
  • M.A. Sato et al.

    Hierarchical Bayesian estimation for MEG inverse problem

    NeuroImage

    (2004)
  • N. Trujillo-Barreto et al.

    Bayesian model averaging

    NeuroImage

    (2004)
  • S. Baillet et al.

    A Bayesian approach to introducing anatomo-functional priors in the EEG/MEG inverse problem

    IEEE Trans. Biomed. Eng.

    (1997)
  • Y. Cointepas et al.
  • Daunizeau, J., Friston, K.J., 2007. A mesostate-space model for EEG and MEG. NeuroImage. 2007 Jul 24; [Epub ahead of...
  • A.P. Dempster et al.

    Maximum likelihood from incomplete data via the EM algorithm

    J. R. Stat. Soc., Ser. B

    (1977)
  • B. Efron et al.

    Stein's estimation rule and its competitors—an empirical Bayes approach

    J. Am. Stat. Assoc.

    (1973)
  • K.J. Friston et al.

    Bayesian estimation of evoked and induced responses

    Hum. Brain Mapp.

    (2006)
  • K.J. Friston et al.

    Bayesian decoding of brain images

    NeuroImage

    (2007)
  • M. Fuchs et al.

    Linear and nonlinear current density reconstructions

    J. Clin. Neurophysiol.

    (1999)
  • A. Gelman

    Prior distributions for variance parameters in hierarchical models

    Bayesian Anal.

    (2006)
  • Cited by (0)

    View full text