Multi-voxel pattern analysis (MVPA; or “pattern decoding”) approaches have rapidly gained popularity among the functional magnetic resonance imaging (fMRI) community (Haynes & Rees, 2006; Norman, Polyn, Detre, & Haxby, 2006; Tong & Pratte, 2012). While the general linear model (GLM) remains a cornerstone of fMRI analysis, it is now appreciated that information can also exist in a region’s distributed patterns of activity (Haxby et al., 2001).

Multi-voxel pattern analysis has been used to probe perceptual and cognitive phenomena with a high degree of specificity, including analysis at the level of object type (Eger, Ashburner, Haynes, Dolan, & Rees, 2008), orientation (Kamitani & Tong, 2005), and episodic memory (Chadwick Hassabis, Weiskopf, & Maguire, 2010), among others (Tong & Pratte, 2012). When decoding multi-voxel information, investigators frequently also evaluate univariate responses that might be associated with the decoded conditions. More than just motivating the use of multivariate techniques, these univariate comparisons often influence the inferences that are made about the cognitive or perceptual underpinnings of the recorded blood oxygenation level dependent (BOLD) activity.

Although they are often compared, there have been relatively few discussions of what it means for information to be contained in multi-voxel patterns or mean activation levels or of the implications of the alternative approaches taken to probe them. The aim of this article is to highlight and synthesize some of the current issues surrounding these questions. I discuss how findings of multivariate and univariate distinctions are interpreted and then examine methodological approaches taken by investigators to assess them.

A question of spatial frequency

An overall univariate response to all conditions (condition differences are discussed shortly) is typically interpreted as indicating that a region is engaged in processing that is relevant to the conditions of interest. After finding such a response increase, multivariate patterns are sometimes examined, to probe more nuanced information. Visual motion provides a good example of this, where “brain structures responsible for encoding visual motion (Zeki et al., 1991; Tooltell et al., 1995) have been identified using univariate approaches, while more specific information such as the direction of perceived motion has been successfully decoded using linear pattern classification (Kamitani & Tong, 2006)” (Kragel, Carter, & Huettel, 2012, p. 3). This can often explicitly guide the very selection of voxels used for subsequent decoding (feature selection), by selecting those voxels that respond to a task (over the full stimulus set) or to an independent localizer (ensuring independence; Kriegeskorte, Simmons, Bellgowan, & Baker, 2009). While a region’s above-baseline activation can indicate relevant neural processing, its absence does not preclude the presence of multivariate information. For instance, Harrison and Tong (2009) decoded the orientation of gratings held in visual working memory in visual areas when their mean activity levels were at baseline.

A differing mean response to one condition versus another reflects that the conditions can be distinguished at a lower spatial frequency than a multi-voxel pattern. The issue of spatial frequency has long engaged researchers using fMRI who have investigated information at levels that include brain networks (Fox et al., 2005), brain regions, and recently, cortical columns (Kamitani & Tong, 2005; although see Freeman, Brouwer, Heeger, & Merriam, 2011). The particular location of a cognitive or perceptual phenomenon on this spatial frequency continuum is a question that has influenced a number of recent investigations (e.g., Kohler et al., 2013; Misaki, Luh, & Bandettini, 2013). The question of whether information is contained in a region’s overall activation level or multi-voxel patterns also fits within this framework, since a homogeneous mean-response across all voxels is a lower spatial frequency than multi-voxel patterns in the same voxels (although the implications of the distinction have not been discussed as extensively as the voxel versus columnar distinction: Freeman et al., 2011; Kamitani & Tong, 2005; Kohler et al., 2013; Misaki et al., 2013).

Why ask about univariate influences?

Why do investigators of multi-voxel patterns care about whether information is present in mean activation? The reasons and evaluation approaches taken vary across investigations.

A natural extension of the (across-condition) univariate activation discussed above is the idea that a mean activation difference may reflect greater engagement of a specific cognitive process or resource used to complete the task (e.g., calculation; Clithero, Carter, & Huettel, 2009), so that a greater overall response to one condition is thought to reflect greater use of that resource. Relatedly, a subset of classes may have greater overall activation due to another process. For example, LaRocque et al. (2013) controlled for univariate amplitude, on the basis of prior findings that subsequently remembered stimuli may be accompanied by greater univariate activation (see also Smith, Kosillo, & Williams, 2011; Tong, Harrison, Dewey, & Kamitani, 2012).

Another interpretation posits that greater activation may indicate increased physiological arousal or attentional demand for one condition over another, which might then drive multi-voxel separation, potentially confounding the hypothesized basis for separation. Under this framing, the mean response is playing a similar role as reaction times or behavioral ratings, which are often compared between conditions in order to rule out attentional differences.

A broader interpretation distinguishes overall activation levels as being linked to “involvement of the region in a specific mental function” and multi-voxel patterns as revealing neuronal “representational content” (Mur, Bandettini, & Kriegeskorte, 2009, p. 1). This distinction can act as a useful conceptual framework (for example, when working with a stimulus space that has more than one dimension; Davis & Poldrack, 2013), although there are several reasons to avoid rigidly following a “multi-voxel pattern = representation / univariate activation = process” rule. Specifically, recent MVPA investigations have shown that multi-voxel patterns can contain information about cognitive processes such as cognitive control (Esterman, Chiu, Tamber-Rosenau, & Yantis, 2009; Tamber-Rosenau, Esterman, Chiu, & Yantis, 2011) without differences in mean activation. Equally, an increase in overall response might not exclusively indicate a domain-general “process” but could also reflect some specialization (or “modularity”), such as a greater response to presentations of hammers than to those of chairs in a voxel population that encodes tools. Alternatively, a voxel population may show greater preference for one feature. For example, a recent MVPA study decoded both color and motion signals in regions of the dorsal attention network (Liu, Hospadaruk, Zhu, & Gardner, 2011). Despite the existence of multi-voxel information for both features, the regions showed greater overall responses to motion and little overall response to color (interpreted as a general preference for motion; Liu et al., 2011). Another concern is that the process/representation distinction may be nonfalsifiable, with no opportunity to empirically test between them (see Davis & Poldrack, 2013, for a recent discussion). This concern emphasizes the importance of distinguishing a “cognitive framing” from an established conclusion about a neural system.

Without being able to distinguish between the many alternatives discussed above, caution is warranted for interpreting such univariate differences. One possible source of evidence for one explanation, versus another, may lie in relating activity pattern information to task performance and individual differences. Finding a relationship between classification and behavioral performance (e.g., working memory performance across stimuli) could link decoded information to a hypothesized cognitive function. Similarly, a relationship with individual differences (measured through an established assessment) can help link decoding to a cognitive function. Without such an external metric, it can be difficult to distinguish among alternative reasons for a univariate response difference.

Independently of any theoretical implications, the particular form that condition information takes can itself be of interest. Understanding whether conditions are encoded within a multivariate or univariate format can help with comparing results to prior findings (from both MVPA and GLM studies), testing predictions from computational models, identifying the best metrics for linking to individual differences, and guiding subsequent analyses. To elaborate on this last point, differences in univariate activation may motivate a subsequent functional connectivity analysis to identify networks with synchronized fluctuations in BOLD responses (Biswal, Yetkin, Haughton, & Hyde, 1995). On the other hand, conditions with distinguishable multi-voxel patterns may be better analyzed with a connectivity approach that draws on fluctuating multi-voxel information (e.g., informational connectivity; Coutanche & Thompson-Schill, 2013).

Prior findings might also generate specific predictions for the relative roles for mean activation and multi-voxel patterns. For example, higher neuronal firing rates have been reported in the hippocampal CA3 region with aging (e.g., Wilson, Ikonen, Gallagher, Eichenbaum, & Tanila, 2005; Yassa et al., 2011), which could influence the relative contributions of mean activation and multi-voxel patterns in younger and older adults.

How are univariate influences assessed?

The analytical approach taken to evaluate a univariate influence is more than a procedural choice; it directly impacts the conclusions that can be drawn. The position of an analysis procedure within a study’s analysis stream influences its explanatory role. Analyses that change the data, such as subtracting the (across-voxel) mean response at each time point, are typically used to address whether multi-voxel patterns are sufficient for separating conditions. In contrast, analyses conducted in parallel with MVPA (such as an ANOVA) speak to whether multi-voxel patterns are necessary for separation. Studies differ in whether they apply one (e.g., Chadwick et al., 2010), both (e.g., Esterman et al., 2009; Greenberg, Esterman, Wilson, Serences, & Yantis, 2010), or neither (e.g., Man, Kaplan, Damasio, & Meyer, 2012) strategy.

Are multivariate patterns sufficient for separating conditions?

To specifically assess whether a region’s multi-voxel patterns can discriminate conditions, some investigators eliminate the mean BOLD response from each pattern, so that subsequent successful decoding is attributable to the pattern itself. The most straightforward method is to mean-center the voxel population’s overall response at every time point prior to classification, giving every time point of every condition a mean response of zero (a process implicitly built into a correlation-based classifier).Footnote 1 The resulting mean-removed pattern has relative voxel amplitudes preserved (e.g., voxel A responds more than voxel B), without overall amplitude differences. If the conditions can be classified successfully without the mean, several conclusions are typically drawn: (1) Information is present within the pattern of voxel activity values (i.e., multivariate patterns are sufficient for decoding), and (2) successful decoding does not require overall amplitude differences.Footnote 2

It is important to note that removing the mean at each time point assumes that the analyzed voxels share an overall mean, rather than being composed of several subsets of voxels, each with its own different mean. This is less likely when the voxel population is small (such as in a searchlight analysis) or is selected on the basis of showing homogeneous responses (such as through an independent localizer). When the basis for selection is more arbitrary, however, care is needed to ensure that removing the region’s mean does not unintentionally create a classifiable “pattern” from adjacent voxel subpopulations that each had its own mean response, leaving behind a (classifiable) mean difference after the subtraction. Although this would still be a multivariate distinction with information about conditions, it is not the kind of multi-voxel pattern typically considered by investigators.Footnote 3 There is no substitute for viewing the data, and visualizing data after a mean-subtraction can be useful in helping to evaluate whether an unintentional “pattern” has been created.

Are multi-voxel patterns necessary for separating conditions?

Are conditions separable by their overall response? The most common way to answer this within MVPA investigations has been through a GLM-based statistical test, such as a t-test or ANOVA (e.g., Chadwick et al., 2010; Peelen & Caramazza, 2012). An alternative method is to submit the time points’ across-voxel mean through the same classification framework as that used to analyze the multi-voxel patterns (e.g., Coutanche, Thompson-Schill, & Schultz, 2011; Kohler et al., 2013; Tong & Pratte, 2012).

These two approaches—GLM tests and classifying means—differ in their strengths, weaknesses, and potential subsequent analyses. One key difference between the methods is their evaluation criteria. In a classification approach, the mean responses are evaluated in the same framework as the multivariate information, in which a trained classifier model generates a class prediction for each time point of an independent test set. A confusion matrix can be produced for both multi-voxel patterns and mean activation, to compare the types of classification errors made from each source of information. In contrast, an ANOVA and t-test remove information about individual trials to produce an F or t statistic, where the same test-statistic value can come from a variety of patterns of classification errors. Furthermore, while an ANOVA models the complete set of time points, a classification trains on a subset of data (e.g., training on 75 % in fourfold cross-validation). Classifying the mean therefore draws on the same amount of data as the multivariate analysis, unlike a GLM test. GLM and classification tests differ further in how they are affected by properties of the data, including variance and deviations from the normal distribution. When these significant differences are considered, it is clear that classifying a region’s mean is more analogous to the test used for multivariate information.Footnote 4 Since the two evaluation methods have similar levels of sensitivity and selectivity (e.g., AUC = 0.66 vs. 0.63 and 0.88 vs. 0.82 for two simulated signal-to-noise ratios; see the Appendix for simulation details), classifying the mean is an appealing and useful option for MVPA investigations, with many advantages over a GLM test.

Mean differences

Once a classification of the mean has been conducted, further analyses can elucidate how multi-voxel and (if present) univariate information is similar or distinct. MVPA studies do not typically ask whether information in one source adds additional explanatory power to the other (i.e., whether the information completely or only partially overlaps), perhaps because the common GLM test is not amenable to this question. Analyzing the mean through a classification framework can allow this, however. One approach is to compare the confusion matrices produced from each classification (i.e., their respective patterns of classification errors) to contrast the basis for separation using patterns and means. Alternatively (or additionally), we can examine the benefit of using both sources together, as compared with each source alone. A direct contrast of mean-only and pattern-only accuracies cannot speak to whether their information is redundant or nonredundant (for example, both mean and pattern may classify at 70 %, but is this the same 70 %?),Footnote 5 but we can examine whether decoding performance increases from adding the mean back onto the pattern and the symmetrical action of adding the pattern back onto this mean (where finding improvements in both symmetrical additions can help minimize any concern that might affect one).

One important consideration for any observation of superior multivariate performance is that a classifier might be sensitive to a very slight mean-responsive difference that occurs systematically across the voxels (see Kragel et al., 2012, for a simulation showing increased sensitivity). While this would represent information in the BOLD activity, its form is closer to a univariate difference than to a typically considered multi-voxel pattern, in which some voxels have greater activity and others have less activity (Norman et al., 2006). If we wish to understand whether a multivariate advantage comes from a multi-voxel pattern or from a very small increase across voxels, one strategy is to visualize the average pattern for each condition and compare their relative amplitudes (i.e., does one condition’s “pattern” have systematically higher voxel values than the other?). A paired t-test can compare the voxel values for the conditions to help identify a systematic difference between patterns. Alternatively, some recent MVPA studies, cognizant of this possibility, offer several alternative strategies. Kohler and colleagues (2013) examined the time series of mean activation levels and multivariate decoding accuracy at the TR-level. If a particular multivariate classifier is drawing on increased sensitivity to a systematic univariate difference, the authors reasoned that we would not expect a longer duration of univariate activation compared to multivariate decoding. Instead, some regions showed exactly this, suggesting that decoding was not simply due to increased sensitivity. Using a different approach that can also bear on this question, Brants, Baeck, Wagemans, and de Beeck (2011) examined the spatial frequency of information by systematically varying the degree of spatial smoothing. In the current debate about whether MVPA can access columnar information, investigators have applied increasing spatial smoothing to interfere with high spatial frequency information, to observe whether decoding performance is impaired (which would suggest that a classifier relies on high spatial frequencies; Kohler et al., 2013; Misaki et al., 2013; Op de Beeck, 2010; although see also Kamitani & Sawahata, 2010). Brants and colleagues have examined several types of patterns, including a consistent slight response difference across many voxels (which they termed a “one-scale” pattern). They presented evidence that this kind of “pattern” would benefit more from smoothing than some alternative pattern structures. This approach therefore gives an opportunity to compare the pattern structures that underlie different stimulus distinctions (e.g., basic categories vs. subordinate distinctions; Brants et al., 2011). A recent investigation found individual differences in the spatial scale of information (Misaki et al., 2013), highlighting the importance of examining this at the individual level.

Summary

Many of the questions and analyses discussed in the text are organized in Fig. 1 for convenience.

Fig. 1
figure 1

A flowchart of possible analysis questions. The questions in the figure are not exhaustive but outline some avenues of investigation that can be taken with pattern-analyzed data. The asterisks indicate that a very low response across all voxels can also drive classification performance (see the text for more details and a discussion of assessing this). MVP, multi-voxel pattern

The many possible formats of information within a region’s fMRI activity can prompt a range of interesting questions. In this article, I have discussed some of the issues, questions, and methodological approaches to examining the role of mean activation in MVPA investigations. The many alternative explanations for an underlying mean-activation difference between conditions warrant caution. An important direction for the field will be to consider empirical methods for distinguishing among alternatives. When an understanding of the relative contributions of multi-voxel and univariate sources is desired, a classification framework can be applied to mean responses, to give a number of benefits and possible subsequent analyses. With the continuing popularity and success of MVPA, it will be increasingly important to gain a better understanding of how multi-voxel and mean activation information relate to one another.