Abstract
The mammalian visual system consists of several anatomically distinct areas, layers, and cell types. To understand the role of these subpopulations in visual information processing, we analyzed neural signals recorded from excitatory neurons from various anatomical and functional structures. For each of 186 mice, one of six genetically tagged cell types and one of six visual areas were targeted while the mouse was passively viewing various visual stimuli. We trained linear classifiers to decode one of six visual stimulus categories with distinct spatiotemporal structures from the population neural activity. We found that neurons in both the primary visual cortex and secondary visual areas show varying degrees of stimulus-specific decodability, and neurons in superficial layers tend to be more informative about the stimulus categories. Additional decoding analyses of directional motion were consistent with these findings. We observed synergy in the population code of direction in several visual areas suggesting area-specific organization of information representation across neurons. These differences in decoding capacities shed light on the specialized organization of neural information processing across anatomically distinct subpopulations, and further establish the mouse as a model for understanding visual perception.
Significance Statement
This analysis is one of the first of the Allen Brain Observatory’s visual cortex dataset. The mouse has recently emerged as a powerful alternative to primates and carnivorous species as a model for studying visual perception. Mice offer the benefit of large-scale, high-throughput experiments and sophisticated genetic tools useful to investigating highly specific components of visual perception. Preliminary work in identifying the functional organization of mouse extrastriate areas has focused on single neurons and lacks analysis at the population level. Our population decoding analysis contributes novel evidence about the role of many distinct areas and layers of the mouse visual cortex in visual information processing to further establish the mouse as a viable model for future visual system research.
Introduction
Although the mouse has long been neglected as a model for studying neural visual information processing, it has recently emerged as a powerful alternative to primate and other carnivorous species. Mice offer the benefit of large-scale, high-throughput experiments and sophisticated genetic tools for investigating highly specific components of visual perception (Arenkiel and Ehlers, 2009). However, the use of mice in studying visual perception is currently limited by insufficient knowledge about the functional organization of the mouse visual cortex. Thus, we aim to characterize the population neural code associated with cortical organization of visual information processing.
Visual information is thought to be processed in a series of computations as it travels from the retina to the lateral geniculate nucleus and then through a series of visual cortices (Nassi and Callaway, 2009). The early visual system processes complex visual stimuli through the simultaneous encoding of different stimulus attributes, such as direction, orientation, and spatial and temporal frequency by individual neurons, while higher order visual cortices process nonlinear features (Orban, 2008). If we can build a simple population decoder to read out the information made accessible by the neural population (Fig. 1), we can provide insight to which of these features are encoded in specific populations of neurons (Graf et al., 2011).
The global topographic organization of the mouse visual cortex has been well characterized. Recent studies have retinotopically identified at least 10 visual areas with organized and complete representations of the entire visual field (Wang and Burkhalter, 2007; Marshel et al., 2011; Garrett et al., 2014). However, the neural population code, how information is collectively represented in the neural activity, has remained elusive. While progress has been made in identifying differences between the spatiotemporal information encoded by neurons in different visual areas, prior work has focused on single neurons and lacks analysis at the population level (Andermann et al., 2011; Marshel et al., 2011; Juavinett and Callaway, 2015). By decoding neural responses in large neural populations of 186 mice spanning six visual areas, we aim to better understand population coding in the mouse visual cortex.
Given neural responses from populations of just over one hundred visual cortical neurons, linear classifiers achieve high accuracy in two decoding tasks: one with six stimulus classes with complex spatiotemporal features and one with eight drifting grating directions. We found differential decoding accuracy between the primary (VISp), lateral (VISl), anterolateral (VISal), anteromedial (VISam), posteromedial (VISpm), and rostrolateral (VISrl) visual areas, which implies differential information representation in these visual areas. We also found differences between populations from different cortical depths, with superficial layer populations containing more information than those from deeper layers. Moreover, we found evidence that directional tuning in individual neurons does not necessarily predict the population decoding accuracy suggesting distributed representation of information. These results reveal novel evidence about the cortical organization of visual information processing.
Materials and Methods
Dataset
We analyzed data from the Allen Brain Observatory, downloaded on July 3, 2017 using the AllenSDK version 0.13.2. A full description of the Allen Brain Observatory’s data collection methodology is available in their Visual Coding Overview and Visual Stimuli technical whitepapers (Allen Institute for Brain Science, 2017). In brief, the Allen Brain Observatory recorded in vivo two-photon calcium imaging data at 30 Hz over a 400-µm field of view at a resolution of 512 × 512 pixels. We use data from 186 mice of the 216 mice imaged by the Allen Brain Observatory.
Recent studies have identified aberrant cortical activity in GCaMP6-expressing transgenic mouse lines, particularly in Emx1-Cre, a line included in Allen Brain Observatory dataset (Steinmetz et al., 2017). By screening somatosensory cortex epifluorescence movies before imaging and analyzing visual cortex two-photon calcium recordings after imaging, the Allen Brain Observatory detected aberrant activity resembling epileptiform interictal events in 10 Emx-IRES-Cre mice and seven Rbp4-Cre_KL100 mice. Data recorded from these 17 aberrant mice were excluded from our analysis. In addition, data from 12 mice were discarded due to the recording of fewer than 10 common neurons across three visual stimulus sessions. Lastly, data from one additional mouse was discarded due to a large number of missing values, resulting in a total of 186 mice with viable data. The sizes (Table 1–4) and Cre lines (Table 5, 6) of the populations varied among the targeted visual areas and depths.
A set of synthetic and natural stimuli, comprised of (1) drifting gratings, (2) static gratings, (3) locally sparse noise, (4) natural images, (5) natural movies, and (6) spontaneous activity (mean luminance gray), were displayed on an ASUS PA248Q LCD monitor at a resolution of 1920 × 1200 pixels (de Vries et al., 2018). Spherical warping was applied to all stimuli to account for the close viewing angle. The monitor was positioned 15 cm from the right eye of awake head-fixed mice, spanning 120° by 95° of visual space without accounting for the spherical warping. The stimuli were distributed into three sessions A, B, and C (or C2) which were presented over 3 d. The natural movie and spontaneous stimuli were presented in all sessions. Drifting gratings were presented in session A, static gratings and natural images in session B, and locally sparse noise in session C/C2. Session types C and C2 both contained the four-degree locally sparse noise stimulus (16 × 28 array of 4.65° patches). Session C2 also contained the eight-degree locally sparse noise stimulus (8 × 14 array of 9.3° patches), which was discarded from analysis since it was only shown to a subset of mice.
The static and drifting gratings stimuli were presented in a variety of orientations, spatial frequencies, and temporal frequencies. The static gratings stimulus was comprised of gratings presented at 6 orientations (separated by 30°), five spatial frequencies (0.02, 0.04, 0.08, 0.16, or 0.32 cycles/degree), and four phases (0, 0.25, 0.5, or 0.75). Each static grating condition was presented 50 times in a random order, with a duration of 0.25 s per condition. The drifting gratings stimulus was comprised of 40 grating conditions. Each grating condition was a combination of one of eight directions (separated by 45°) and one of five temporal frequencies (1, 2, 4, 8, or 15 Hz) at a spatial frequency of 0.04 cycles/degree. Each drifting grating condition was presented 15 times each in a random order, with a duration 2 s per condition followed by 1 s of mean luminance gray.
Pre-processing
The neural signal was quantified as fluorescence fluctuation ΔF∕F, calculated for each frame as , where the baseline F0 is the mean fluorescence of the preceding 1 s. For each of 186 neural populations, 3 h of ΔF∕F traces were separated into stimulus epochs.
To form samples for the stimulus classification, each epoch was divided into 10-s intervals, of which the final interval was discarded if it was <10 s. Neural populations used in the stimulus classification were composed of neurons common across the three imaging sessions A, B, and C (or C2) for each mouse (Tables 1, 2). For each 10-s interval, the mean fluorescence fluctuation per neuron was calculated and labeled with the corresponding stimulus class.
To form samples for the direction classification, the drifting gratings epoch was divided into 3-s intervals, of which the third second (during which a blank sweep of mean luminance gray was presented) was discarded. Neural populations used in the direction classification were composed of all neurons imaged during session A, and thus were larger than populations used in the stimulus classification (Table 3, 4). For each 2-s interval, the mean fluorescence fluctuation per neuron was calculated and labeled with the corresponding grating direction.
In both the stimulus and the direction decoding, mean ΔF∕F for each neuron were z-scored and combined to form the neural feature vectors in R n for classification, where n is the number of neurons in the population.
Neural decoding
We used linear classifiers to decode the stimulus classes based on the neural feature vectors. The classifiers were implemented in the Python programming language using the scikit-learn machine learning library version 0.19.0 (Pedregosa et al., 2011). Linear support vector machine (SVM) and multinomial logistic regression (MLR) were trained and tested with a nested cross-validation scheme. We principally split the data into training and test sets to form a 5-fold cross-validated prediction.
In Figures 2–7, we show only SVM classification results for simplicity. However, all results are based on data from both SVM and MLR classification, for which similar results were obtained (Fig. 8).
Because of the different duration of stimulus presentations, the stimulus classes had unbalanced numbers of samples. To build balanced training sets, we subsampled (without replacement) an equal number of responses from each class. The size of these subsamples was equal to 80% of the smallest class (spontaneous activity; 20 min out of total 177 or 156 min of recording used in samples, depending on if the mouse was shown C or C2). The test sets consisted of the remaining samples and were kept unbalanced.
The direction classes used in the direction decoding were evenly distributed throughout the stimulus presentation. The direction samples were randomly split intro training (80%) and test (20%) sets for all classification. The training set was assumed to be balanced due to the even distribution of classes throughout data collection.
Both classifiers were regularized using additive ℓ2-regularizer of the form . The regularization constant was optimized through a nested cross-validation within the first training set where the best that yielded the highest accuracy was chosen.
Subsampled population
To investigate the scaling of decoding performance as a function of population size, we made random subsamples (without replacement) of different sizes up to the number of neurons available for each mouse. We repeated the procedure 10 times to form 10 resampled subpopulations. We report accuracy values averaged over the 10 resampled datasets. The statistics of population sizes by group or decoding task can be found in Table 1–4.
To investigate the information carried by the joint population activity, we trained “correlation-blind” decoders with the same procedure but on a shuffled dataset where the joint structure was approximately independent. To generate the shuffled data, we randomly permuted the trials corresponding to the same target for each neuron.
Accuracy curve fitting
To extrapolate the accuracy as a function of population size, we used the following generalized logistic function: (1)with three parameters {a, b, c} with constraints a ≥ 0, c ≥ 0 and b ∈ [0,1]. Note that the c parameter allows a minimum accuracy expected from chance level performance for small population size. We fit the curve on the average accuracies obtained by subsampling using nonlinear least squares (van der Walt et al., 2011).
Statistical tests
To compare accuracy between cortical areas and imaging depths, we performed Tukey’s test at a 0.05 significance level (Tukey, 1949). Tukey’s test compares the mean accuracies of every pair with adjustment for multiple comparison. Ten imaging depths (175, 265, 275, 300, 325, 335, 350, 365, 375, and 435 µm) were sorted into four groups: 175, 265–300, 325–350, and 365–435 µm. We compared the six visual cortical areas (VISp, VISpm, VISl, VISal, VISam, and VISrl), four imaging depth groups, and six stimulus classes.
Orientation and direction selectivity
The neural activity recorded during the session A drifting gratings stimulus was used to calculate the orientation selectivity index (OSI) and direction selectivity index (DSI) for each neuron. We obtained OSI and DSI using the Allen SDK Drifting Gratings module, (2) (3)where Rpref is the mean response to the preferred orientation at the preferred temporal frequency, Rorth is the mean response to the orthogonal directions, and Rnull is the mean response to the opposite direction (Allen Institute for Brain Science, 2017, de Vries et al., 2018). The response was defined as the mean ΔF∕F during the grating presentation. Each condition was presented 15 times, and responses to all presentations were averaged together. The preferred direction and temporal frequency condition was defined as that grating condition that evoked the largest mean response.
Since ΔF∕F can be negative, OSI and DSI values can be >1 or even be negative. We excluded values below 0 (663 OSI values and 648 DSI values out of 26,186 cells) or above 2 (1871 OSI values and 1561 DSI values) following the Allen Institute guidelines. The full computation methodology for these indices can be found in the Allen Brain Observatory’s Visual Stimuli technical whitepaper (Allen Institute for Brain Science, 2017). To compare across visual areas, the OSI and DSI of all neurons in each area were averaged together (Fig. 4D,E). To compare across depths, the OSI and DSI of all neurons in each depth were averaged together (Fig. 7D,E).
Code accessibility
The code described in the paper is freely available online at https://github.com/catniplab/aboDecoding.
Results
Spatiotemporal structure of stimuli is differentially encoded among visual areas
To investigate differences in information processing between six mouse visual areas, statistical classifiers were fit to discriminate visual categories based on the population activity within each area. Neural activity was monitored through a fluorescent calcium sensor (GCaMP6f) selectively expressed in transgenic mice (de Vries et al., 2018). Recorded calcium signals were processed and discretized in time to yield feature vectors corresponding to neural activity of the population (see Materials and Methods). Mice were shown six types of stimuli which differed in their spatiotemporal structures, ranging from simple spatial structures (such as orientation gratings and sparse pixels) to complex natural scenes (Fig. 1A,B) The stimuli included static images as well as movies with complex long range correlations. A faithful recovery of these visual categories from neural activity reflects the potential information the neural population encodes about the stimuli.
Since the population size was variable across experiments, we compare the rate at which the classification accuracy improves as a function of population size (Fig. 2A). Classification accuracies from small randomly subsampled populations were near chance level, and gradually increased with the population size for all sessions analyzed (Fig. 2A, black dots). We fit a three-parameter sigmoid function to extrapolate up to 128 neurons for each session (Fig. 2A; see Materials and Methods). The averages within each of the six visual areas show similar increasing trends with accuracy approaching 90% for the population size of 128. Five areas (VISal, VISam, VISl, VISp, VISpm) significantly outperformed VISrl (Fig. 2B,C). We used a one-sided t test with a null hypothesis that secondary areas’ decoding performance is less than that of the primary visual cortex. For both the stimulus category decoding and direction decoding, we failed to reject the null hypothesis at the 0.05 significance level.
We examined the accuracy of decoding specific stimulus categories to further investigate encoding differences across visual areas. On average, natural movie and spontaneous categories were more difficult to decode (Fig. 3B,C). Although similar in overall decoding accuracy, the five high-performing visual areas (VISal, VISam, VISl, VISp, VISpm) show different patterns in per category accuracy (Fig. 3). We used a one-sided t test (p-values adjusted for multiple tests) to compare the decoding accuracy of the natural movie stimulus and all other stimulus categories within each visual area. The natural movie category is significantly harder (p < 0.001) to decode than other stimuli in populations from the anatomically adjacent VISp, VISl, and VISal (Fig. 3A).
Area dependent decoding of drifting gratings direction
Local visual orientation information is prevalently encoded in the primary visual cortex (Hubel and Wiesel, 1959; Priebe, 2016). Layer 2/3 neurons in the mouse visual cortex are also sensitive to orientation gratings and their directional motion (Marshel et al., 2011). However, mouse primary visual cortex seems to also serve the role of higher order visual function (Gavornik and Bear, 2014). We investigated whether the ability to decode vastly different stimulus categories is related to their capacity to represent orientation and direction. Using the average neural activity in 2-s windows corresponding to the duration of drifting grating presentation, we trained linear classifiers to decode the direction of drifting gratings.
Except for a few VISrl populations, direction decoding was again an increasing function of population size (Fig. 4A). VISrl showed the worst decoding performance at the 128 neuron level, and VISam/VISpm showed intermediate performance, while VISp, VISl, and VISal showed comparable population level encoding (Fig. 4B,C).
Surprisingly, the population decoding accuracy showed discrepancies from what is expected from individual neuron’s directional tuning sensitivity. Higher orientation and DSI (Fig. 4D,E) indicates the stronger representation of these basic visual features, which is highest in VISl followed by VISrl. However, the joint activity decoding showed VISl being on par with VISp and VISal, while the VISrl population was much less informative. This suggests that excitatory neurons in VISp and VISal are more synergistic (a tendency for the population to contain more information than individual neurons; Brenner et al., 2000; Latham and Nirenberg, 2005) and that there is relatively more redundancy in the VISl population.
This synergistic population code is corroborated by the general trend of inferior performance of the correlation-blind decoder. The correlation-blind decoder was trained on the trial-shuffled neural data, hence removing the noise correlation. In Figure 5, for all areas except VISrl there is a significant drop in performance, which indicates the joint activity of the population carries extra information.
Superficial layers are more informative about the spatiotemporal signatures of visual stimuli
In rats, neurons in the superficial layers of V1 are known to have better orientation selectivity and less spontaneous activity (Girman et al., 1999), suggesting a laminar organization of visual information processing. To investigate whether similar laminar differences exist in mice, we analyzed the decoding accuracy of stimulus classes as a function of recording depth (Fig. 6). There were six different Cre lines with specific targets (for full distribution, see Table 6). Since there was little difference across Cre lines, we present the results grouped by depth.
The 325- to 350-µm depth group (dominated by Nr5a1 Cre line; Table 6) consistently showed the worst decoding performance across both the stimulus category and direction decoding tasks (Fig. 7). Meanwhile, the most superficial group (imaging depth of 175 µm corresponding to either Cux2 or Emx1 Cre lines, putative layer 2/3) significantly outperformed the deeper populations (Fig. 6), with high decoding performance across all stimulus categories (Fig. 9). However, this superficial layer did not show distinctly superior direction decoding (Fig. 7B). This suggests that the spatiotemporal structure of each visual category extra to the overall orientation information is better represented in the superficial layers. Although there may be worsening of signal-to-noise ratio as the imaging depth increases, both decoding schemes do not show monotonic degradation of performance as a function of depth (Figs. 6, 7).
The OSI and DSI showed contrary trends (Fig. 7D,E). Deeper layers had relatively larger OSI but smaller DSI, suggesting the temporal component of the drifting gratings may be better represented in the superficial layers. Despite larger DSI, the 325- to 350-µm group performed worse than the 365- to 435-µm group, again an unexpected observation likely due to the spatial organization of the code.
Discussion
The focus of this study was investigating how stimulus classes and drifting grating directions can be inferred from neural population responses in mouse visual areas. In primates, it has been well established that visual processing occurs through a hierarchical structure, in which the primary visual cortex provides input to secondary visual areas (Maunsell and Newsome, 1987; Felleman and Van Essen, 1991; Orban, 2008). The rat visual cortex has also been characterized as having a hierarchical organization (Coogan and Burkhalter, 1993). Results from this analysis corroborate recent studies which have suggested that this simple hierarchy may also be present in the mouse visual cortex (Wang and Burkhalter, 2007; Berezovskii et al., 2011). In both decoding tasks, the overall decoding performance of populations from secondary visual areas was equal to or worse than the primary visual cortex (VISp), suggesting that secondary areas do not encode any more information than is encoded by the primary visual cortex. This is supported by findings that the mouse primary visual cortex has a more diverse set of stimulus preferences than secondary areas VISal and VISpm (Andermann et al., 2011).
Differences in stimulus-specific decoding performance between populations from different visual areas suggest areal differences in visual information representation. On average, the spontaneous stimulus and the natural movie stimulus are significantly harder to decode than other stimuli, but this trend is not seen in all areas (Fig. 3). Anatomically adjacent visual areas display similarities in their stimulus-specific decoding performance. The adjacent anteromedial (VISam) and posteromedial (VISpm) areas showed no difference in performance for specific stimuli. In contrast, in populations from the adjacent primary (VISp), anterolateral (VISal), and lateral (VISl) visual areas, it was significantly harder to decode the natural movie stimulus than other stimuli. These anatomic trends in stimulus-specific decoding may be attributed to specialized input pathways from the primary visual cortex (Marshel et al., 2011).
The existence of these information processing streams is further supported by the similar direction decoding performance of anatomically adjacent areas. The same groups emerge in the direction decoding as in the stimulus-specific analysis. The adjacent primary (VISp), anterolateral (VISal) and lateral (VISl) visual areas performed similarly, as did the adjacent anteromedial (VISam) and posteromedial (VISpm) areas. The poor performance of the latter group of visual areas (VISam and VISpm) as well as the rostrolateral (VISrl) visual area suggests a lack of direction sensitive information encoding in the population. We speculate that the relative poor performance of VISrl compared to VISam in the population decoding to be in the distribution of well-tuned neurons; VISam had lower single neuron DSI on average but more heterogeneous distribution.
Marshel et al. (2011) presented drifting grating stimuli (using the same set of directions but differing sets of temporal and spatial frequencies as the Allen Brain Observatory) to 28 mice and found, based on the mean DSI of each area and the proportion of neurons with a DSI >0.5, that layer 2/3 (130–180 µm below the dura surface) populations in the anterolateral (VISal), rostrolateral (VISrl), and anteromedial (VISam) visual areas were significantly more direction selective than the primary visual cortex (VISp). The results of our population direction decoding analysis (Figs. 4, 5) of 186 mice are inconsistent with the single neuron findings of Marshel et al. (2011; note that there were differences in the methods for estimating DSI; see Materials and Methods). The direction decoding accuracy of VISam and VISrl populations are significantly lower than that of VISp, suggesting that these populations are less direction selective than those in VISp. Trial shuffled decoding analysis (Fig. 5) showed that synergistic spatial correlations within trial could contribute to such discrepancies (Brenner et al., 2000; Averbeck et al., 2006). Furthermore, the similar decoding accuracy of VISal and VISp populations suggests that VISal is not significantly different from VISp in its direction selectivity.
Across all visual areas, individual neurons encode enough attributes of a stimulus in their responses that the majority of small populations outperformed chance level accuracy in the stimulus decoding (chance equal to 16.67%) as well as in the direction decoding (chance equal to 12.5%). However, in the direction decoding, individual neurons from VISrl populations and those from the 325- to 350-µm depth group performed at chance level, suggesting a lower proportion of direction sensitive encoding in these neurons relative to other areas and depths. Neurons in VISam have previously been characterized as extremely robust and selective (Marshel et al., 2011). However, our direction decoding analysis shows that decoding accuracy for small VISam populations of one to four neurons remains at or close to chance level, suggesting that these neurons are not especially selective. Even with larger VISam populations, direction decoding accuracy remained low relative to other areas.
Despite some discrepancies with recent characterizations of mouse visual areas, this study provides novel evidence of the functional and anatomic organization of the mouse visual cortex. The results corroborate broad trends in visual information processing, supporting the existence of information processing streams and a hierarchical organization in the mouse visual cortex.
Acknowledgments
Acknowledgements: We thank Michael Buice and L. Craig Evinger for thoughtful discussions. We would also like to thank the Allen Institute for Brain Science for supporting open data and open science.
Footnotes
The authors declare no competing financial interests.
K.E. was partially supported by Simons Foundation and I.S. was supported by the Institute for Advanced Computational Science. Additionally, this work was partly supported by National Science Foundation IIS-1734910.
This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license, which permits unrestricted use, distribution and reproduction in any medium provided that the original work is properly attributed.