Significance Statement
Inverted encoding models (IEMs) are a powerful tool for reconstructing population-level stimulus representations from aggregate measurements of neural activity (e.g., fMRI or EEG). In a recent report, Liu et al. (2018) tested whether IEMs can provide information about the underlying tuning of single units. Here, we argue that using stimulus reconstructions to infer properties of single neurons, such as neural tuning bandwidth, is an ill-posed problem with no unambiguous solution. Instead of interpreting results from these methods as evidence about single-unit tuning, we emphasize the utility of these methods for assaying population-level stimulus representations. These can be compared across task conditions to better constrain theories of large-scale neural information processing across experimental manipulations, such as changing sensory input or attention.
Neuroscience methods range astronomically in scale. In some experiments, we record subthreshold membrane potentials in individual neurons, while in others we measure aggregate responses of thousands of neurons at the millimeter scale. A central goal in neuroscience is to bridge insights across all scales to understand the core computations underlying cognition (Churchland and Sejnowski, 1988). However, inferential problems arise when moving across scales: single-unit response properties cannot be inferred from fMRI activation in single voxels, subthreshold membrane potential cannot be inferred from extracellular spike rate, and the state of single ion channels cannot be inferred from intracellular recordings. These are all examples of an inverse problem in which an observation at a larger scale is consistent with an enormous number of possible observations at a smaller scale.
Recent analytical advances have circumvented challenges inherent in inverse problems by instead transforming aggregate signals from their native “measurement space” (e.g., activation pattern across fMRI voxels) into a model-based “information space” (e.g., activity level of modeled information channels). To make this inference possible, aggregate neural signals (fMRI voxel activation or EEG electrode activity) are modeled as a combination of feature-selective information channels, each with defined sensitivity profiles consistent with the single-unit literature (e.g., experimenter-defined tuning to a particular orientation; Fig. 1A; Brouwer and Heeger, 2009, 2011). When an aggregate neural signal is described with such an encoding model, it is possible to invert this model to infer the activity of each channel given a new pattern of neural activity [hence, these methods are often called inverted encoding models (IEMs); Sprague et al., 2015]. Importantly, rather than attempt to solve the inverse problem (how do single-units respond?), this method makes simplifying assumptions that enable transformation of one population-level measurement (aggregate neural signals in voxel or electrode space) into another (stimulus representations in “channel space”). These reconstructed “channel response functions” enable visualization, quantification, and comparison of population-level stimulus representations across manipulations of task conditions (Brouwer and Heeger, 2011, 2013; Scolari et al., 2012; Garcia et al., 2013; Sprague and Serences, 2013; Foster et al., 2017).
Recently, Liu et al. (2018) examined whether an IEM applied to fMRI data can be used to unambiguously infer the underlying response properties of single units. To this end, they manipulated the contrast of orientated gratings, because contrast only affects the amplitude of single-unit orientation tuning functions, but not their tuning width (Sclar and Freeman, 1982). The authors reasoned that, if the width of single-unit tuning functions does not change with stimulus contrast, and if population-level feature reconstructions derived from aggregate neural signals can be used to make meaningful inferences about single-unit tuning, then manipulating contrast should not change the width of population-level channel-response functions.
To test this prediction, the authors used an IEM to reconstruct representations of grating orientations for two different contrast levels. The authors modeled voxel responses as a sum of neural channels tuned to different orientations based on known visual response properties (Fig. 1A). After extracting activation patterns from visual cortex, the authors split data from each contrast condition into a training set, used to estimate how each modeled neural channel contributes to each voxel (Fig. 1B), and a testing set, which was used in conjunction with the best-fit model from the training set to compute channel response functions (Fig. 1C).
The authors found that reconstructed channel response functions in visual cortex were “broader” for low-contrast gratings than for high-contrast gratings (Fig. 2–4; Liu et al., 2018), which they suggest could be interpreted as evidence that single-unit orientation tuning width depends on stimulus contrast. However, because this observation conflicts with demonstrations from single-unit physiology that orientation tuning is contrast-invariant, Liu et al. (2018) sought to resolve this discrepancy using simulations.
The authors simulated cortical fMRI data under different conditions to assess how changes in single-unit responses might be reflected in reconstructed channel response functions. Each simulated voxel’s response was modeled as a noisy weighted sum of orientation-tuned neurons, each with a different orientation preference (Liu et al., 2018, their Fig. 3). Across runs of their simulations, the authors manipulated simulated response properties, like orientation tuning width of constituent model neurons and signal-to-noise ratio (SNR) of the voxel response. The authors found that by decreasing the response amplitude of each simulated neuron (thus, decreasing SNR) without changing the tuning width, they could almost exactly reproduce the broadening in the width of the channel response function when stimulus contrast was decreased (Liu et al., 2018, their Fig. 4). Interestingly, they also found that changes in modeled neural tuning width could alter the width of channel response functions. However, because such broadening is consistent with either a change in SNR or a change in neural tuning width, the authors conclude that it remains impossible to conclusively infer how changes in channel response functions relate to changes in neural tuning. Since it is plausible that low-contrast stimuli evoke weak, noisy responses relative to high-contrast stimuli, the authors argue this is a more parsimonious explanation for their observed data than overturning well-characterized results from the animal physiology literature and inferring that single-unit tuning properties change with contrast. Accordingly, the authors concluded that “changes in channel response functions do not necessarily reflect changes in underlying neural selectivity” (Liu et al., 2018, p 404).
This report makes an important contribution in its dissection of how model-based analysis methods can be sensitive to features of the data that might vary across conditions (e.g., SNR), and clearly demonstrates that changes in population-level channel response functions cannot and should not be used to infer changes in unit-level neural tuning properties. However, we would like to emphasize that this is not the intended purpose of the IEM approach, which is designed to assess population-level stimulus representations. Any inferences made about single-unit tuning from channel response functions are plagued by the same pitfalls encountered when attempting reverse inference about single-unit neural signals from aggregate measurements.
These issues are not unique to the IEM technique. For example, they also complicate interpretation of results from popular voxel receptive field (vRF) techniques. In these experiments, stimuli traverse the entire visual display while experimenters measure fMRI responses. Then, they fit a RF model that best describes how each voxel responds given the visual stimulus (Dumoulin and Wandell, 2008; Wandell and Winawer, 2015). Recent studies have demonstrated that changing task demands (e.g., locus of spatial attention) can change the shape and preferred position of vRFs (Sprague and Serences, 2013; Klein et al., 2014; Kay et al., 2015; Sheremata and Silver, 2015; Vo et al., 2017). While it is tempting to infer that single-neuron RFs change accordingly, it could instead be the case that each neuron maintains a stable RF, but different neurons are subject to different amounts of response gain, altering the voxel-level spatial sensitivity profile measured with these techniques. Moreover, because aggregate measurements like fMRI pool over neurons of different types (excitatory vs inhibitory), selectivity widths (narrow vs broad), and cortical layers (e.g., Layer IV vs Layer II/III), the ability to make inferences about single-unit encoding properties is further limited.
Liu et al. (2018)’s report also highlights that it is important to consider how an encoding model is estimated when comparing channel response functions across conditions. In their work, Liu et al. (2018) estimated separate encoding models for each contrast condition (Fig. 1B). But because SNR likely differed between conditions, the observed differences between reconstructions may result from differences in the training sets (i.e., different model fits), or from differences in the testing sets (i.e., different reconstructed activation patterns), or from a combination of the two. More generally, this training scheme can pose a problem for researchers who wish to minimize the effect of known SNR differences between their conditions to study some other variable (e.g., the effect of attention), since it is not possible to unambiguously attribute changes in reconstructed channel response functions to changes in the quality of the model fit or the quality of the representation supported by the population activity pattern, which can both differ between conditions. This problem is roughly akin to reporting a change in a ratio, which can result from changes in the numerator, denominator, or both. One way that others have mitigated this issue is by estimating an encoding model (Fig. 1B) using an unbiased (equal numbers of trials from each relevant condition) or neutral (entirely separate task used solely for model estimation) set of data. They then apply that single “fixed” encoding model to test data from multiple stimulus conditions to reconstruct stimulus representations from each condition. This implementation has the advantage that researchers can avoid problems with comparing channel outputs from different IEMs, so the only difference between conditions is the data used for stimulus reconstruction (Fig. 1C). We note that even with such a procedure the central result in Liu et al. (2018) could remain true: reconstructions under a fixed encoding model could still broaden with lower contrast. But, as discussed above, this would reflect a change in the quality of the population-level representation rather than provide unambiguous evidence for a change in underlying tuning of individual units. When interpreting results from IEM analyses, it is always critical to consider how the model was estimated.
It would be a mistake to conclude from Liu et al. (2018) that the IEM technique is not useful in the context of its intended purpose: to assay properties of large-scale, population-level neural representations. The quality of these large-scale representations surely depends on myriad factors occurring at the single-unit level. It remains a fascinating question to evaluate how single measurement units, at either the neural or voxel level, change their response properties across visual and task manipulations, but the goal of the IEM approach is to assay the net effect of all these modulations on the superordinate population-level representation. Moreover, few behaviors are guided by single neurons in isolation, and so assaying the joint activity of many neurons, and the resulting population-level representations, is necessary to gain insight into the neural underpinnings of cognition (Jazayeri and Movshon, 2006; Ma et al., 2006; Graf et al., 2011). Indeed, IEMs have been used to assay the time course of covert attention (Foster et al., 2017), understand the consequences of attentional manipulations within working memory (Sprague et al., 2016; Rahmati et al., 2018), evaluate how allocation of attention impacts the representation of irrelevant visual stimuli across the visual field (Sprague and Serences, 2013; Vo et al., 2017; Sprague et al., 2018), and probe the influence of top-down expectations on sensory stimulus representations (Myers et al., 2015; Kok et al., 2017).
We do not believe aggregate neural signals will ever be useful for unambiguously inferring single-unit response properties, including feature tuning. However, we see a bright future for collaborative efforts across labs studying similar questions in different model systems, such as human and macaque. When experiments are well-matched between species, both aggregate measurements in humans and single-unit responses in model systems can be used to inform our understanding of neural coding across different cognitive states. In bridging different levels of analysis, Liu et al. (2018) add to the growing literature using data-driven simulations to better understand the relationship between tuning properties and population-level feature representations (Sprague and Serences, 2013; Kay et al., 2015; Vo et al., 2017). Most importantly, their report underscores the importance of avoiding inferences about signal properties, such as single-unit neural feature tuning, that are fundamentally inaccessible via fMRI or EEG, even when using state-of-the-art acquisition and analysis techniques. We hope that future studies take these issues into account when interpreting findings from model-based analyses applied to aggregate measurement tools like fMRI and EEG. Finally, we remain optimistic that the IEM technique, when applied carefully and interpreted appropriately, will continue to reveal how experimental manipulations impact population-level representations of information.
Acknowledgments
Acknowledgments: We thank Clayton Curtis, Edward Ester, and John Serences for comments on early drafts of this manuscript and useful discussions.
Footnotes
The authors declare no competing financial interests.
This work was supported by the National Eye Institute (NEI) Grant F32-EY028438 (to T.C.S.), a National Science Foundation Graduate Student Fellowship (to V.A.V.), the NEI Grant R01-EY016407 (M.R.), and the National Institute of Mental Health Grant 2R01-MH087214-06A1 (K.C.S.A., J.J.F., and D.W.S.).
This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license, which permits unrestricted use, distribution and reproduction in any medium provided that the original work is properly attributed.