Measuring Stimulus-Evoked Neurophysiological Differentiation in Distinct Populations of Neurons in Mouse Visual Cortex

Abstract Despite significant progress in understanding neural coding, it remains unclear how the coordinated activity of large populations of neurons relates to what an observer actually perceives. Since neurophysiological differences must underlie differences among percepts, differentiation analysis—quantifying distinct patterns of neurophysiological activity—has been proposed as an “inside-out” approach that addresses this question. This methodology contrasts with “outside-in” approaches such as feature tuning and decoding analyses, which are defined in terms of extrinsic experimental variables. Here, we used two-photon calcium imaging in mice of both sexes to systematically survey stimulus-evoked neurophysiological differentiation (ND) in excitatory neuronal populations in layers (L)2/3, L4, and L5 across five visual cortical areas (primary, lateromedial, anterolateral, posteromedial, and anteromedial) in response to naturalistic and phase-scrambled movie stimuli. We find that unscrambled stimuli evoke greater ND than scrambled stimuli specifically in L2/3 of the anterolateral and anteromedial areas, and that this effect is modulated by arousal state and locomotion. By contrast, decoding performance was far above chance and did not vary substantially across areas and layers. Differentiation also differed within the unscrambled stimulus set, suggesting that differentiation analysis may be used to probe the ethological relevance of individual stimuli.


Introduction
The visual system acts on incoming stimuli to extract meaningful features and guide behavior, a process that transforms physical input into conscious percepts. Since the early experiments of Hubel and Wiesel (1959), neuroscience has yielded considerable insight into the visual system by analyzing neural response properties to uncover which features cells are tuned to and how their activity relates to behavior. Modern decoding approaches have revealed stimulus information in population responses (Quiroga and Panzeri, 2009). However, that a population of neurons represents stimulus information does not imply that this information is used to generate conscious percepts (Brette, 2019). Consequently, despite the success of these "outside-in" methods (Buzsáki, 2019) in understanding neural coding, it remains unclear how the coordinated activity of large neuronal populations relates to what the observer actually sees.
Is there an objective approach that can shed light on this question? Differentiation analysis-measuring the extent to which a population of neurons expresses a rich and varied repertoire of states-has been proposed as one such approach (Boly et al., 2015;Mensen et al., 2017Mensen et al., , 2018. Differentiation analysis exemplifies "inside-out" methodology in that the spatiotemporal diversity of neural activity (neurophysiological differentiation; ND) is quantified without reference to the stimulus or other experimental variables imposed a priori by the investigator, in contrast to feature tuning or decoding analyses.
A visual stimulus can be considered meaningful to the observer if it evokes rich and varied perceptual experiences (phenomenological differentiation). For example, an engaging movie is meaningful in this sense, as it evokes many distinct percepts with high-level structure; conversely, flickering "TV noise" essentially evokes a single percept with no high-level structure to a human observer, although, at the level of pixels, any two frames of noise are likely to be more different from each other than a pair of frames from a movie (stimulus differentiation; SD). Since conscious percepts are determined by brain states, physical differences must underlie phenomenological differentiation. Thus, one can expect measures of ND to correlate with subjective perception of the "richness" or "meaningfulness" of stimuli to the extent that such measures capture the relevant physical, i.e., neuronal, differences. This has indeed been shown in human studies using fMRI and EEG (Boly et al., 2015;Mensen et al., 2017Mensen et al., , 2018. Moreover, integrated information theory (IIT) posits a fundamental relationship between ND and subjective experience itself (Tononi, 2004;Oizumi et al., 2014;Marshall et al., 2016;Tononi et al., 2016), and several studies have shown that loss of ND is implicated in loss of consciousness (Casali et al., 2013;Barttfeld et al., 2015;Hudetz et al., 2015;Wenzel et al., 2019).
Although studies in human subjects suggest that ND can provide a readout of stimulus-evoked phenomenological differentiation (Boly et al., 2015;Mensen et al., 2017Mensen et al., , 2018, the low spatial resolution of fMRI and EEG has precluded identifying the cell populations that underlie this correspondence. A longstanding fundamental question is which neuronal populations contribute directly to generating conscious percepts Tononi et al., 2016;Mashour et al., 2020). Differentiation analysis may shed light on this question, but to do so, it must be applied to signals from specific populations of neurons.
To address this gap, we used in vivo two-photon calcium imaging in mice to measure ND evoked by naturalistic and phase-scrambled movie stimuli in excitatory cell populations in cortical layers (L)2/3, L4, and L5, across five visual cortical areas: primary (V1), lateromedial (LM), anterolateral (AL), posteromedial (PM), and anteromedial (AM). We hypothesized that unscrambled naturalistic stimuli, which presumably elicit meaningful visual percepts, would evoke greater ND than their meaningless phase-scrambled counterparts.
We find that unscrambled stimuli evoke greater ND than scrambled stimuli specifically in L2/3 of areas AL and AM and not in the other neuronal populations. We contrast this layer-specific and area-specific finding with a decoding analysis that shows that information about the stimulus category, whether meaningful or meaningless, is present in most populations. This highlights a key difference: ND is more plausibly correlated with stimulus meaningfulness than the information measured by decoding, since the latter may not be functionally relevant (Brette, 2019). Furthermore, we find differences in evoked ND among the unscrambled stimuli that suggest that differentiation analysis can probe meaningfulness of individual stimuli.

Materials and Methods
Our experimental design is summarized in Figure 1. We collected calcium imaging data from L2/3, L4, and L5 in each of five visual areas (V1, LM, AL, PM, and AM) across Figure 1. Experimental design. A, Data were acquired using a standardized two-photon calcium imaging pipeline based on that described in de  and Groblewski et al. (2020;Materials and Methods). Briefly, a custom headframe was implanted; ISI was performed to delineate retinotopically mapped visual areas; the mouse was habituated to the passive viewing paradigm over the course of approximately two weeks; and two-photon calcium imaging was performed in the left visual cortex while animals viewed stimuli presented to the contralateral eye in several experimental sessions. B, Example of an ISI map. C, Schematic of the two-photon imaging rig (reproduced with permission from Fig. 1D in de Vries et al. 2020). During the imaging sessions, head-fixed mice were free to run on a rotating disk. Locomotion velocity was recorded and pupil diameter was extracted from video of the 45 experimental sessions (nine mice, three per layer; L2/ 3, two males; L4, three males; L5, one male; 15 sessions per transgenic line; 9 sessions per area; 5 6 1 sessions per mouse; number of cells shown in Extended Data Fig.  1-2). Areas V1, LM, AL, PM, and AM respectively correspond to areas VISp, VISl, VISal, VISpm, and VISam in the Mouse Brain Common Coordinate Framework v3 (Wang et al., 2020). The two-photon calcium imaging pipeline is described in detail in de Vries et al. (2020) and Groblewski et al. (2020). All animal procedures were performed in accordance with the Allen Institute animal care committee's regulations.
Our sample size was selected on the basis of a pilot study using existing, publicly available calcium imaging data from the Allen Institute Brain Observatory (de Vries et al., 2020). We measured spectral differentiation of responses to a movie stimulus (clips from the film Touch of Evil) versus artificial stimuli (drifting gratings and locally sparse noise) across eight experimental sessions, from which we estimated that we required at least three sessions per layer/area pair to have statistical power of at least 0.8.

Transgenic mice
We maintained all mice on reverse 12/12 h dark/light cycle following surgery and throughout the duration of the experiment and performed all experiments during the dark cycle. We used the transgenic mouse line Ai93, in which GCaMP6f expression is dependent on the activity of both Cre recombinase and the tetracycline controlled transactivator protein (tTA; Madisen et al., 2010). Triple transgenic mice (Ai93, tTA, Cre) were generated by first crossing Ai93 mice with Camk2a-tTA mice, which preferentially express tTA in forebrain excitatory neurons.

Surgery
Transgenic mice expressing GCaMP6f were weaned and genotyped at ;P21, and surgery was performed between P37 and P63. The craniotomy was centered at X = -2.8 mm and Y = 1.3 mm with respect to lambda (centered over the left mouse visual cortex). A circular piece of skull 5 mm in diameter was removed, and a durotomy was performed. A coverslip stack (two 5-mm and one 7-mm glass coverslips adhered together) was cemented in place with Vetbond. Metabond cement was applied around the cranial window inside the well to secure the glass window.

Intrinsic imaging
To define area boundaries and target in vivo two-photon calcium imaging experiments to consistent retinotopic locations, retinotopic maps for each animal were created using intrinsic signal imaging (ISI) while mice were lightly anesthetized with 1-1.4% isoflurane. This procedure and data processing pipeline are described in detail in de .

Habituation
Following successful ISI mapping, mice spent two weeks being habituated to head fixation and visual stimulation. During the second week, mice were head-fixed and presented with visual stimuli, starting with 10 min and progressing to 50 min of visual stimuli by the end of the week. During this week they were exposed to the "mouse montage 2" stimulus (see below, Stimuli).

Imaging
Calcium imaging was performed using a two-photonimaging instrument (Nikon A1R MP1). Laser excitation was provided by a Ti:Sapphire laser (Chameleon Vision -Coherent) at 910 nm. Mice were head-fixed on top of a rotating disk and free to run at will. The screen center was positioned 118.6 mm lateral, 86.2 mm anterior, and 31.6 mm dorsal to the right eye. The distance between the screen and the eye was 15 cm. Movies were recorded at 30 Hz using resonant scanners over a 400-mm field of view.

Locomotion
Locomotion velocity was recorded from the running wheel and preprocessed as follows. First, artifacts were removed using custom code that iteratively identified large positive or negative peaks (indicative of artifactual discontinuities in the signal) in several passes of scipy. signal.find_peaks (specific parameters were manually chosen for each session). Remaining artifacts were continued animal's right eye. D, Example frame from a two-photon movie. Imaging data were processed as described in de Vries et al. (2020) to obtain DF/F 0 traces. E, Schematic of the five visual areas targeted in this study. F, Ten randomized blocks of 12 30-s movie stimuli were presented; 4 s of mean-luminance gray was presented between stimuli. The first 60-s period was mean-luminance gray (spontaneous activity); the second 60-s period was a high-contrast sparse noise stimulus (not analyzed in this work). G, Still frames from the eight naturalistic (left) and four artificial (right) movie stimuli (see Materials and Methods, Stimuli). Two of the naturalistic stimuli, "mouse montage 1" and "mousecam," were phase-scrambled to destroy high-level image features while closely matching loworder statistics (see Materials and Methods, Phase scrambling; Extended Data Fig. 1-1). H, Representative calcium imaging and behavioral data. A heatmap of DF/F 0 values is shown for 228 neurons simultaneously imaged in L2/3 of AL during presentation of four stimuli, with locomotion velocity and normalized pupil diameter plotted below. Numbers of cells recorded from each layer and area are listed in Extended Data Figure 1-2. Calcium indicator kinetics did not differ across cell populations (Extended Data Fig. 1-3). then manually removed by inspecting the resulting timeseries and visually identifying clear discontinuities. The removed samples were filled using linear interpolation (pandas.Series.interpolate).
For the effect size analysis, the fraction of time spent running was calculated by binarizing the preprocessed velocity timeseries at a threshold of 2.5 cm/s.

Pupillometry
Pupil diameter was extracted from video of the mouse's ipsilateral eye (relative to the stimulus presentation monitor) using the AllenSDK (https://github.com/AllenInstitute/ AllenSDK) as described in de Vries et al. (2020).
Briefly, for each frame of the video an ellipse was fitted to the region corresponding to the pupil as follows: a seed point within the pupil was identified via convolution with a black square; 18 rays were drawn starting at this seed point, spaced 20°apart; the candidate boundary point between the pupil and iris along that ray was identified by a change in pixel intensity above a session-specific threshold; a RANSAC algorithm was used to fit the an ellipse to the candidate boundary points using linear regression with a conic section constraint; and fitted parameters of the regression were converted to ellipse parameters (coordinates of the center, lengths of the semi-major and semi-minor axes, and angle of rotation with respect to the x-axis). Pupil diameter was taken to be twice the semimajor axis of the fitted ellipse.
The resulting timeseries contained some artifacts, which we removed by the same combination of automated and manual methods used for the locomotion timeseries (see above, Locomotion). Each pupil diameter timeseries was then normalized by dividing by the maximum diameter that occurred within the 10 blocks of stimulus presentations during that session.

Event detection
Discrete calcium events were detected from the DF/F 0 traces using the L 0 -penalized method of Jewell and Witten (2018) and Jewell et al. (2020). This procedure, which replaces the continuous relative changes in fluorescence with discrete, real valued events, is described in detail in de Vries et al. (2020); code is available at https:// github.com/AllenInstitute/visual_coding_2p_analysis/blob/ master/visual_coding_2p_analysis/l0_analysis.py.

Stimuli
We created twelve 30-s greyscale naturalistic and artificial movie stimuli.
The eight naturalistic stimuli (Extended Data Fig. 1-1, top) consisted of three montages of six 5-s clips, spliced together with jump cuts, and four continuous stimuli. The "mouse montage 1" stimulus contained clips of conspecifics, a snake, movement at ground level through the underbrush of a wooded environment, and a cat approaching the camera. The "mouse montage 2" stimulus contained different footage of movement through the wooded environment; different footage of a cat approaching the camera; conspecifics in a home cage filmed from within the cage; crickets in a home cage filmed from within the cage; footage of the interior of the home cage with environmental enrichment (a shelter, running wheel, and nesting material); and a snake filmed at close range orienting toward the camera. The "human montage" contained clips of a man talking animatedly to an off-screen interviewer; a café table where food is being served; automobile traffic on a road viewed from above; a woman in the foreground taking a photograph of a city skyline; footage of a road filmed from the passenger seat of a vehicle; and a close shot of a bowl of fruit being tossed. The four continuous stimuli were: footage of a snake at close range orienting toward the camera; crickets in a home cage filmed from within the cage; a man writing at a table; movement through a wooded environment at ground level; and conspecifics in a home cage. No two stimuli contained identical clips.
The four artificial stimuli (Extended Data Fig. 1-1, bottom) consisted of two phase-scrambled versions of the "mouse montage 1" stimulus, a phase-scrambled version of the "mousecam" stimulus (see below, Phase scrambling), and a high-pass-filtered 1/f noise stimulus.
Stimuli were presented in a randomized block design with 10 repetitions, with 4 s of static mean-luminance gray presented between stimuli (Fig. 1F). 60 s of meanluminance gray (to record spontaneous activity) and a 60-s high-contrast sparse noise stimulus were also presented in the beginning of each session (not analyzed in this work).

Phase scrambling
Two methods of phase scrambling were used: temporal and spatial, described in detail below. Briefly, for the temporal scrambling we independently randomized the phase of each pixel's intensity timeseries in contiguous, nonoverlapping windows of 1 s. For the spatial scrambling, we randomized the phase of the spatial dimensions of the three-dimensional spectrum of each window. The "mouse montage 1" stimulus was phase-scrambled using both procedures to obtain the "mouse montage 1, temporal phase scramble" and "mouse montage 1, spatial phase scramble" stimuli. The "mousecam" stimulus was scrambled using the spatial procedure to obtain the "mousecam, spatial phase scramble" stimulus.

Temporal phase scramble
First, the stimuli were windowed into contiguous, nonoverlapping 1-s segments (30 frames each). For each 1-s window, we applied the following procedure: We estimated the one-dimensional spectrum of each pixel's intensity timeseries with the discrete Fourier transform (DFT) using the NumPy function numpy.fft.fft. The phase and magnitude of each spectrum were computed with numpy.angle and numpy.abs, respectively. For each pixel, we generated a 14-element random vector drawn uniformly from the interval [0, 2p ]. A randomized phase was then obtained for that pixel by concatenating the first element of the original phase, the random vector, the 15th element of the original phase, and the negative reversed random vector. This yielded a 30-element phase vector with the required conjugate symmetry of the spectrum of a 1-s real-valued signal sampled at 30 frames per second. The randomized phase was then combined with the spectral magnitude and transformed back into the time domain with the inverse DFT using numpy.fft.ifft, yielding a temporally phase-scrambled version of that pixel's intensity timeseries. Each pixel's timeseries was independently phasescrambled in this fashion.
This resulted in 30 independently phase-scrambled 1-s windows. These windows were then concatenated to obtain the full 30-s temporally phase-scrambled stimulus.

Spatial phase scramble
First, the stimuli were windowed into contiguous, nonoverlapping 1-s segments (30 frames each). For each window, we applied the following procedure. The threedimensional Fourier spectrum (frame, width, and height) was estimated with the DFT using numpy.fft.fftn. The phase and magnitude of the spectrum were computed with numpy.angle and numpy.abs, respectively. To randomize the phase in the spatial dimensions, we generated a random signal in the time domain with the same dimensions as a stimulus frame (192 pixels wide by 120 pixels high) and computed its phase in the frequency domain as described above. This two-dimensional random spatial phase was added to the spatial dimensions of the three-dimensional stimulus phase. After being randomized in this way, the stimulus phase was recombined with the spectral magnitude and transformed back into a time-domain signal with the inverse DFT using numpy.fft.ifftn. The 30 resulting phase-scrambled 1-s windows were then concatenated to obtain the full 30-s spatially phase-scrambled stimulus.

Effect of phase scrambling
The greyscale movie stimuli were represented in the stimulus presentation software as arrays of unsigned eight-bit integers. The limitations of this representation resulted in phase-scrambled stimuli with power spectra that were close but not identical to the power spectrum of their unscrambled counterparts.
Specifically, although the phase scrambling procedures described above leave the power spectrum unchanged, they do not necessarily preserve the range of the resulting real-valued signal. In our case, applying these procedures to our stimuli resulted in phase-scrambled stimuli in which the pixel intensities occasionally lay outside the range [0, 255]. Thus, to represent the phase-scrambled stimuli with eight-bit integers, we truncated the result so that negative intensities were set to 0 and intensities .255 were set to 255. This operation does affect the power spectra, and as a result the spectra of the unscrambled and scrambled stimuli are closely matched but not equal.

Differentiation analysis Spectral differentiation
Our analysis of the responses to the stimuli follows the techniques developed in previous work in humans (Boly et al., 2015;Mensen et al., 2017Mensen et al., , 2018. The spectral differentiation measure of ND used by Mensen et al. (2018) was designed for analysis of timeseries responses to continuous movie stimuli, and was found to be positively correlated with subjective reports of stimulus "meaningfulness." We employed this measure with our calcium imaging data on single-trial responses: (1) the DF/F 0 trace of each cell during stimulus presentation was divided into 1-s windows; (2) the power spectrum of each window was estimated using a Fourier transform; (3) the "neurophysiological state" during each 1-s window was defined as a vector in the high-dimensional space of cells and frequencies (i.e., the concatenation of the power spectra in that window for each cell); (4) the ND in response to a given stimulus was calculated as the median of the pairwise Euclidean distances between every state that occurred during the stimulus presentation. A schematic illustration is shown in Figure 2, and an illustration of how the measure behaves for different types of signals is shown in Extended Data Figure 2-1.
The relationship between action potentials and the resulting calcium imaging signal is complex (Chen et al., 2013;Deneux et al., 2016;Pachitariu et al., 2018;Ledochowitsch et al., 2019;Wei et al., 2020;Huang et al., 2021;Siegle et al., 2021a). The DF/F 0 signal approximately represents a convolution of the underlying spike train with the calcium-dependent fluorescence response kernel, which depends nonlinearly on the spike rate. A consequence of the nonlinearity is that calcium imaging is much more sensitive to burst-like activity than to isolated spikes. Since this convolution affects the spectral properties of the signal, some discussion of its impact on the spectral differentiation measure is warranted. The energy of the GCaMP6f response is concentrated in low frequencies, so for our purposes, the effect of the convolution is that differences among the spectral states are amplified at lower frequencies and attenuated at higher frequencies. Thus, when applied to the DF/F 0 signal, the measure will be less sensitive to short-timescale differences between activity patterns than it would if it were applied directly to the ground-truth spike train. This is not necessarily a disadvantage, as our aim in using a spectral measure in the first place was to achieve temporal smoothing to detect differences in temporal structure on the scale of the state window size.
We normalized spectral differentiation values by the square root of the number of cells in the recorded population, reasoning as follows. Consider a hypothetical population of cells that each exhibit the same temporal pattern of activity. The spectral differentiation of such a population will be proportional to the square root of its size because the Euclidean distance is used to compare neurophysiological states. If we have two such populations differing only in the number of cells, their activity should be considered equally differentiated for our purposes, since their temporal patterns are identical; any differences in spectral differentiation would be due to the (arbitrary) number of cells captured in the imaging session. Thus, we divided by the square root of the population size to remove this dependency.
To investigate the properties of the signal that drive differences in spectral differentiation, we applied the measure to discrete L 0 calcium events detected from the DF/F 0 traces (see above, Event detection) and obtained similar results as in the main analysis (Extended Data Fig.  3-8). This indicates that the observed differences in spectral differentiation are driven by differences in the largetimescale patterns of responses rather than small-timescale spectral differences within the windows, consistent with the sparsity of calcium responses in this dataset. We also measured ND of DF/F 0 traces with transients removed. Transients were defined as the 200 ms (six imaging samples) following a L 0 calcium event. This analysis yielded similar results as well (Extended Data Fig. 3-9), indicating that ND differences are not driven solely by initial transients in the calcium response.
For the analysis of SD (Fig. 8), stimuli were first blurred with a circular Gaussian filter whose half width at half maximum was set to the median radius of a L2/3 V1 receptive field (RF) as measured by de Vries et al. (2020; 8.92°) to account for the coarseness of mouse vision. SD was then calculated by treating each pixel of the stimulus as a "cell" and applying the spectral differentiation measure to the traces of pixel intensities over time.

Multivariate differentiation
We also measured ND using a multivariate approach that considers spatiotemporal differences in activity patterns. For each experimental session, we selected DF/F 0 traces recorded during presentations of unscrambled stimuli and their scrambled counterparts and concatenated them to obtain an m Â n matrix of responses, where m is the number of two-photon imaging samples and n is the number of traces. We used a nonlinear dimensionality reduction procedure, Uniform Manifold Approximation and Projection for Dimension Reduction (Python package umap-learn; McInnes et al., 2020), to reduce this matrix to m Â 8 with parameters UMAP(n_components = 8, metric="euclidean", n_neighbors = 50, min_dist = 0.5). Each row of the resulting matrix was an 8-dimensional vector that represented the state of the cell population during the corresponding two-photon sample. We then grouped the rows of the resulting matrix by stimulus presentation. Each row vector can be thought of as a point in R 8 , so that each trial was associated with a cloud of points corresponding to the population states that the stimulus evoked during that presentation.
The intuition motivating this approach is that we can operationalize the notion of ND by measuring the dispersion of this point cloud. The more distant two points are, the more different are the corresponding responses of the cell population; thus, if a stimulus evokes many different population states, the point cloud will be more spread out in response space. Therefore, we measured ND evoked during each stimulus presentation by finding the centroid of the associated point cloud and taking the mean Euclidean distance of each point to the centroid.
In the multivariate differentiation analysis of calcium events, population response vectors were obtained by summing event magnitudes within 1-s bins to match the definition of the neurophysiological state of the population used in the spectral differentiation analyses. Because the event data were sparse and because including many duplicate instances of the zero vector will reduce the sensitivity of the multivariate differentiation measure to differences among bins in which the population was active, bins with no events were discarded before the dimensionality reduction step.

Statistical analyses
All analyses were performed with custom Python and R code, using numpy , scipy , pandas (Reback, 2020), scikit-learn Figure 2. Spectral differentiation analysis. ND was computed as follows. A, For each cell, the DF/F 0 trace during stimulus presentation was divided into 1-s windows. B, The power spectrum of each window was estimated. C, The "neurophysiological state" during each 1-s window was defined as a vector in the high-dimensional space of cells and frequencies (i.e., the concatenation of the power spectra in that window for each cell). D, The ND of the response to a given stimulus was calculated as the median of the pairwise Euclidean distances between every state that occurred during the stimulus presentation. An illustration of how the measure behaves is shown in Extended Data Figure 2-1. (Pedregosa et al., 2011), matplotlib (Hunter, 2007), seaborn (Waskom, 2020), lme4 (Bates et al., 2015), multcomp (Hothorn et al., 2008), and emmeans (Lenth, 2020). See Table 2 for distributions of data, types of statistical test used, and confidence intervals.

Linear mixed effects (LME) models
For analysis of ND across all experimental sessions (Figs. 3, 5; Extended Data Figs. 3-1, 3-8, 3-9, 5-1), we employed LME models using the lmer function from the lme4 package in R with REML = FALSE (Bates et al., 2015). The distributions of ND values for both spectral and multivariate differentiation measures were well-approximated by log-normal distributions, so we applied a logarithmic transformation to ND values before statistical modeling.
To test layer specificity, we then fit a reduced model with the interaction removed ["differentiation ; 1 1 layer 1 stimulus_category 1 (1 | session)"] and used a likelihood ratio test to compare the two models.
Next, we fit an LME model with cortical area, stimulus category, and their interaction as fixed effects, with experimental session as a random effect [lme4 formula: "differentiation ; 1 1 area * stimulus_category 1 (1 | session)"]. To test area specificity, we fit a reduced model with the interaction removed ["differentiation ; 1 1 area 1 stimulus_category 1 (1 | session)"] and used a likelihood ratio test to compare the two models. Figure 3. ND elicited by unscrambled versus scrambled stimuli is higher in L2/3 of areas AL and AM. The difference in ND of responses to unscrambled versus scrambled stimuli is plotted for each session by layer (A), area (B), and layer-area pair (C). Each point represents the difference between the mean ND of responses to the two unscrambled and the three scrambled stimuli during a single experimental session. Similar results were found contrasting naturalistic versus artificial stimuli across the entire stimulus set (Extended Data Fig. 3-1). To demonstrate the robustness of this effect, we conducted several further analyses. Sensitivity analyses showed similar findings for various choices of analysis parameters (Extended Data Figs. 3-2, 3-3, 3-4) and when pupil diameter and locomotion were included as covariates in the LME models (Extended Data Figs. 3-5, 3-6, 3-7). We found similar results when we performed the same analysis on discrete calcium events detected from the DF/F 0 traces with an L 0 -regularized algorithm (see Materials and Methods, Event detection), indicating that the effect is driven by differences in the large-timescale patterns of responses rather than small-timescale spectral differences within windows (Extended Data Fig. 3-8). Finally, we also found similar results when we removed event-triggered transients from the DF/F 0 traces, indicating that the effect is not driven solely by initial transients in the calcium response (Extended Data Fig. 3-9). A, B, Asterisks indicate significant post hoc one-sided z-tests in the layer (A) and area (B) interaction LME models (*p , 0.05; ***p , 0.001). Boxes indicate quartiles; whiskers indicate the minimum and maximum of data lying within 1.5 times the interquartile range of the 25% or 75% quartiles; diamonds indicate observations outside this range. C, Mean values are indicated by bars.
To test for differences in ND among the unscrambled continuous stimuli ["snake (predator)," "crickets (prey)," "man writing," "mousecam," and "conspecifics"; Fig. 7], we fit an LME model with stimulus as a fixed effect and experimental session as a random effect [lme4 formula: "differentiation ; 1 1 stimulus 1 (1 | session)"] and used a likelihood ratio test to compare this to a reduced model without the stimulus term.
We visualized these results by plotting the difference in mean ND for each experimental session; however, no averaging was performed in the statistical analyses.

Post hoc tests
We performed post hoc one-sided z-tests to reject the null hypothesis that mean ND for scrambled ! mean ND for unscrambled in favor of the alternative hypothesis that mean ND for scrambled , mean ND for unscrambled using the glht function from the multcomp package in R on each LME model with contrasts between stimulus categories (unscrambled or scrambled) within each layer and area, respectively. P values were adjusted for multiple comparisons using the single-step method in multcomp (Hothorn et al., 2008).
Post hoc two-sided z-tests for pairwise differences among the unscrambled continuous stimuli were performed with the emmeans function from the emmeans package in R ["emmeans(model, pairwise ; stimulus)", with p values adjusted for multiple comparisons using Tukey's method; Lenth, 2020].
For all post hoc tests, simultaneous 95% confidence intervals (CIs) were obtained using the confint methods of the respective model objects. Effect sizes are reported as the Cohen's d value for each pairwise comparison. Cohen's d was calculated with the pooled SD: ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi n 1 À 1 ð Þs 2 1 1 ðn 2 À 1Þs 2 2 n 1 1 n 2 À 2 s :

Permutation tests
Permutation tests were performed for each experimental session to test whether spectral differentiation evoked by unscrambled stimuli was greater than that evoked by scrambled stimuli (Table 1). We obtained a null distribution by randomly permuting the trial labels (unscrambled or scrambled) 20,000 times and computing the difference in mean spectral differentiation on unscrambled and scrambled trials for each permutation. P values were computed as the fraction of permutations for which the permuted difference was greater than the observed difference, and significance is reported at the level of a = 0.05.

Mediation analyses
Mediation analyses were conducted using the mediation package in R (Tingley et al., 2019).
We analyzed whether the mean event magnitude during a trial mediated the effect of stimulus category on differentiation values by fitting LME models for the mediator and outcome, including arousal variables as covariates, and using the mediate function [mediator model: "mean_magnitude ; 1 1 stimulus_type 1 pupil_diameter 1 locomotion 1 (1 | session)"; outcome model: "differentiation ; 1 1 mean_magnitude 1 stimulus_type 1 pupil_diameter 1 locomotion 1 (1 | session)"; treatment: "stimulus_category"; mediator: "mean_magnitude"]. This analysis assesses the contribution of the treatment variables on the outcome variable via each of two causal paths: (1) stimulus category and arousal level affect the mean event magnitude, which then in turn affects the measured ND (mediated); and (2) stimulus category and arousal directly affect the measured ND (direct).
For the analysis shown in Extended Data Figure 7-1, we fit LME models for each arousal variable (locomotion and pupil diameter) as a mediator, in each case including the other arousal variable as a covariate [e.g., "locomotion ; 1 1 stimulus 1 pupil_diameter 1 (1 | session)"], and an outcome model ["differentiation ; 1 1 stimulus 1 locomotion 1 pupil_diameter 1 (1 | session)"]. Mediation for a particular pair was evaluated using the mediate function with the "treat.value" and "control.value" arguments. For each stimulus pair identified as eliciting significantly different ND in our post hoc LME analyses, and for each arousal variable, this analysis assesses the contribution of the effect of stimulus on ND via each of two causal paths: (1) the stimulus affects arousal as measured by pupil diameter or locomotion, which then in turn affects ND (mediated); and (2) stimulus directly affects ND, independent of arousal level (direct).

Decoding analyses
For each experimental session, we decoded stimulus category (unscrambled or scrambled) using linear discriminant analysis with the Python package scikitlearn (Pedregosa et al., 2011). First, the responses to each category were concatenated to form an s Â (n · t) matrix, where s is the number of stimulus presentation trials, n is the number of cells recorded, and t is the number of two-photon imaging samples in a single trial. To obtain a tractable number of features for linear discriminant analysis, we used PCA to reduce the dimensionality of the matrix such that the number of components c was sufficient to retain 99% of the variance along the rows, yielding an s Â c matrix [sklearn.decomposition. PCA(n_components = 0.99)]. This was then used to Entries contain the fraction of sessions in which the mean ND of responses to unscrambled stimuli was significantly greater than responses to their scrambled counterparts at a threshold of a = 0.05. For each session, a null distribution was obtained by randomly permuting trial labels (unscrambled or scrambled) 20,000 times and computing the difference in mean ND on unscrambled and scrambled trials for each permutation. P values were computed as the fraction of permutations for which the permuted difference was greater than the observed difference.
train a shrinkage-regularized LDA classifier with fivefold cross-validation [sklearn.discriminant_analysis. LinearDiscriminantAnalysis(solver='lsqr', shrinkage='auto')]. We report the mean balanced accuracy score (sklearn.metrics.balanced_accuracy_ score) on the heldout test data across cross-validation folds. Chance performance is 0.5. For Extended Data Figure 6-1, we used the same procedure as described above, but the classifier was trained to decode stimulus identity rather than category; chance performance is 1/12. For Extended Data Figure 7-2, we used the same procedure but trained the classifier using only responses to the five continuous naturalistic stimuli, and classifier performance was evaluated for each stimulus separately with the F1 score.

Code accessibility
The analysis code used in this work is freely available online at https://github.com/wmayner/openscope-differentiation. The code is also available as Extended Data 1.

Results
Using in vivo two-photon calcium imaging ( Fig. 1A-D), we recorded from the left visual cortex of awake mice while they passively viewed stimuli presented to the contralateral eye. We used the transgenic mouse lines Cux2, Rorb, and Rbp4, in which GCaMP6f is expressed in excitatory neurons predominantly in L2/3, L4, and L5, respectively (three mice each; Cux2, two males; Rorb, three males; Rbp4, one male; see Materials and Methods, Transgenic mice). Visual cortical areas were delineated via ISI (Fig. 1B). Data were collected from L2/3, L4, and L5 in each of five areas (V1, LM, AL, PM, and AM; Fig. 1E) across 45 experimental sessions (15 sessions per transgenic line; 9 sessions per area; 5 6 1 sessions per mouse; number of cells shown in Extended Data Fig. 1-2). Mice were head-fixed and free to move on a rotating disk while pupil diameter and running velocity were recorded. During each 70-min session, 12 30-s movie stimuli were presented in a randomized block design with 10 repetitions, with 4 s of mean-luminance gray shown between stimulus presentations (Fig. 1F,G; Extended Data Fig. 1-1). Stimuli were presented in greyscale but were not otherwise modified (in particular, it should be noted that spatial frequencies beyond the mouse acuity limit will appear blurred to the mice). Representative DF/F 0 traces and behavioral data are shown in Figure 1H. One imaging session in L5 of AL was excluded from our analyses because of technical problems with the two-photon recording.
To measure ND, we employed a method from Mensen et al. (2018) for analyzing a set of timeseries recorded during the presentation of a continuous stimulus (Fig. 2). Briefly, the power spectrum of each cell's DF/F 0 trace was estimated in 1-s windows. The cells' power spectra during simultaneous windows were concatenated to form a vector representing the neurophysiological state of the population during that window. We calculated ND for each trial as the median Euclidean distance between the 30 population states elicited over the course of the 30 s stimulus. We computed distances in the frequency domain rather than the time domain to focus on differences in overall population state rather than differences in precise timing of DF/F 0 transients. To account for variability in the size of the imaged populations we divided ND values by the square root of the number of cells (see Materials and Methods, Spectral differentiation). Spectral differentiation is zero when the set of DF/F 0 traces is perfectly periodic with a period of 1 s (the window size), and it is high when many traces exhibit temporally varied patterns across the 30 s (Extended Data Fig. 2-1). The measure scales with the magnitude of the signal and thus has no well-defined maximum.
To compare the differentiation of responses to naturalistic and artificial stimuli, we generated Fourier phase-scrambled versions of two of our movie stimuli. Phase-scrambling destroys the naturalistic structure of the stimulus while closely matching the power spectrum (the spectrum was not conserved exactly because of numerical representational limitations of the stimulus format; see Materials and Methods, Phase scrambling). Note that operations that leave the power spectrum of a signal unchanged will not affect its spectral differentiation.
For the "mouse montage 1" stimulus (a montage of six 5-s naturalistic movie clips), we performed the phasescrambling in two ways: (1) along the temporal dimension, on each pixel independently; and (2) along the two spatial dimensions, on all pixels. For the "mousecam" stimulus (a continuous 30-s clip of movement at ground level through the underbrush of a forest) we performed only the spatial phase-scrambling. This yielded two unscrambled stimuli and three scrambled stimuli (Extended Data Fig. 1-1). The full set of twelve stimuli was designed to span different levels of putative ethological relevance; here, we focus on the comparison of the unscrambled stimuli to their scrambled versions because low-order stimulus statistics are controlled and thus the contrast can be more easily interpreted.

Unscrambled stimuli elicit more differentiated responses compared with scrambled stimuli
We hypothesized that the unscrambled stimuli would elicit higher ND than their phase-scrambled counterparts. We tested this by fitting LME models with experimental session as a random effect (see Materials and Methods, LME models); mean differences in ND of responses to unscrambled versus scrambled stimuli are shown in Figure  3. We obtained similar results contrasting naturalistic versus artificial stimuli across the entire stimulus set (Extended Data Fig. 3-1). ND values were approximately log-normally distributed, so we applied a logarithmic transform to ND in all statistical analyses (see Materials and Methods, Statistical analyses).
Increased differentiation for unscrambled stimuli is specific to areas AL and AM The increased ND in response to unscrambled stimuli was area-specific (Fig. 3B). We fitted an LME model with stimulus category, area, and their interaction as fixed effects and found a significant interaction (likelihood ratio test, x 2 (4) = 15.203, p = 0.00430). Post hoc tests showed that the unscrambled versus scrambled difference was specific to AL and AM [one-sided z-test; V1, z = 0.704, It is conceivable that these results are artifacts of our implementation of the spectral differentiation measure. To check the robustness of our findings, we performed a sensitivity analysis in which we systematically varied (1) the distance metric used to assess differences between population states; (2) the window length that defines the state of the neural population; (3) the frequency bin spacing in the spectra; and (4) the window function and amount of overlap used in the spectral estimation step (Extended Data Figs. 3-2, 3-3, 3-4). The results were qualitatively the same for nearly all combinations of these parameters we tested.
Since arousal state modulates neuronal activity in visual cortex (Niell and Stryker, 2010;Polack et al., 2013;Reimer et al., 2014;McGinley et al., 2015b;Vinck et al., 2015;Dadarlat and Stryker, 2017;Salkoff et al., 2020), the increase in firing rates seen during periods of high arousal raises the possibility that the differences in ND we observed could be due to changes in arousal alone rather than stimulus category. To rule this out, we repeated the sensitivity analysis of our main results while including locomotion and pupil diameter as covariates in the LME models. Consistent with the simpler models, for nearly all parameter combinations, L2/3 of AL and AM emerged as the cell populations in which ND is greater for unscrambled versus scrambled stimuli (3)(4)(5)(6)(3)(4)(5)(6)(7), indicating that the measured arousal variables are insufficient to fully explain the differences in ND we observed. We also analyzed whether the mean magnitude of calcium events (a proxy for firing rate) mediated the effect of stimulus category and found evidence for both mediated and direct effects (mediated effect: 0.1351, 95% CI [0.0784, 0.20] i , p , 2e-16; direct effect: 0.0951, 95% CI [0.0400, 0.15] j , p = 0.002; proportion of total effect mediated: 0.5898, 95% CI [0.3963, 0.80] k , p , 2e-16). That is, unscrambled stimuli led to increased ND relative to scrambled stimuli both directly, independent of mean event magnitude, and indirectly, via increases in mean event magnitude that in turn increased ND. Thus, while a portion of the effect of stimulus category was mediated by changes in population firing rate, this mediated effect is likewise insufficient to fully explain our results.

Permutation tests for individual experimental sessions
The above analysis shows that the mean ND elicited by unscrambled stimuli is greater than that elicited by their phase-scrambled counterparts, and that this effect is driven by L2/3 cells in areas AL and AM. We also analyzed ND at the level of individual sessions with nonparametric permutation tests. For each session, we obtained a null distribution by randomly permuting the trial labels (unscrambled or scrambled) 20,000 times and computing the difference in mean ND on unscrambled versus scrambled trials for each permutation. P values were computed as the fraction of permutations for which the permuted difference was greater than the observed difference.
The results of the individual session analyses were consistent with the LME analyses (Table 1). In all sessions recorded from L2/3 of AL and AM, responses to unscrambled stimuli were significantly more differentiated than to scrambled stimuli (p , 0.05).

Arousal is correlated with effect size
Locomotion and pupil diameter can be considered behavioral indications of engagement with the environment (Bennett et al., 2013;Ganea et al., 2020;Jacobs et al., 2020). We found that in L2/3 of AL and AM, effect sizes were positively correlated with locomotion activity (Pearson's r = 0.896; two-sided t test; t (4) = 4.030, p = 0.0157, 95% CI [0.308, 1.00] l ; Fig. 4, top left) and pupil diameter (r = 0.716; t (4) = 2.054, p = 0.109, 95% CI [-0.227, 1.00] m ; Fig. 4, top right), suggesting that the difference in ND is more clear when the animal is engaged. However, we note that the relatively restricted range of observed mean locomotion fraction and pupil diameter values in the sessions of interest limits the generalizability of these conclusions.

Multivariate analysis also shows increased differentiation for unscrambled stimuli
Spectral differentiation is a univariate measure in the sense that the coordinates of the population state vectors are orthogonal, so that each squared difference term in the Euclidean distance reflects differences only within a given cell's responses across time. To ensure that our results were not due to this method of measuring ND, we also employed a multivariate approach that considers spatiotemporal differences in activity patterns across the cell population. For each session, the dimensionality of the population response vectors was reduced to 8 using UMAP (McInnes et al., 2020). In the resulting 8-dimensional space, ND was measured as the mean Euclidean distance to the centroid of the set of responses corresponding to that stimulus (see Materials and Methods, Multivariate differentiation).

Decoding analysis does not reveal layer or area specificity
We next asked whether the layer and area specificity of our ND results would be reflected in our ability to decode the stimulus category (unscrambled or scrambled) from population responses. We performed fivefold cross-validated linear discriminant analysis to decode stimulus category for each session and scored the classifier using balanced accuracy (see Materials and Methods, Decoding analyses). Decoding performance was high for most areas and layers (Fig. 6), in contrast to the unscrambled-scrambled difference in ND. Performance was also high across layers and areas when we decoded stimulus identity, rather than category, using responses to all 12 stimuli (Extended Data Fig. 6-1).

Differences in ND among individual stimuli
We also investigated whether ND differed among stimuli within the same category. This analysis was restricted to the set of unscrambled stimuli without jump cuts, i.e., the five naturalistic continuous 30-s clips, to avoid potential confounds in comparing stimuli with and without abrupt transitions between different scenes. Here, we used data from all layers and areas, since although responses from L2/3 of AL and AM drove the unscrambled/scrambled differences, within-category differences might not be restricted to that subset. We fitted an LME model with stimulus as a fixed effect and found it was significant (likelihood ratio test, x 2 (4) = 32.115, p = 1.812e-6). Post hoc pairwise two-sided z-tests (adjusted for multiple comparisons), shown in Figure 7, revealed that the predator stimulus (a snake) evoked significantly higher differentiation than clips of conspecifics (z = 3.229, p = 0.0110, Cohen's d = 0.156, 95% CI [0.015, 0.180] dd ), prey (crickets; z = 3.928, p = 8.149e-4, Cohen's d = 0.181, 95% CI [0.036, 0.201] ee ), and a man writing (z = 5.249, p = 1.522e-6, Cohen's d = 0.232, 95% CI [0.076, 0.241] ff ). The "mousecam" clip of movement through a wooded environment also evoked significantly higher differentiation than the clip of a man writing (z = 3.396, p = 0.00615, Cohen's d = 0.154, 95% CI [0.020, 0.185] gg ). Mediation analysis showed a mixture of direct and arousal-mediated effects, indicating that changes in arousal cannot fully account for these differences (Extended Data Fig. 7-1). Here, we present the main effect of stimulus; for an exploration of interactions with layer and area, and a comparison to decoding, see Extended Data Figure 7-2.

SD does not explain ND
It is possible that ND does not reflect functionally relevant visual processing but is instead merely inherited from the differentiation of the stimulus itself. To rule out this possibility, we computed the SD by treating each pixel of the stimulus as a "cell" and applying the spectral differentiation measure to the traces of pixel intensities over time after blurring the stimulus to account for the coarseness of mouse vision (see Materials and Methods, Spectral differentiation). Within L2/3 of AL and AM, the mean ND elicited by each stimulus was positively correlated with SD (Pearson's r = 0.746, one-sided t test; t (10) = 3.542, p = 0.00267, 95% CI [0.393, 1.00] hh ; Fig. 8). However, the noise stimulus is a highly influential observation (Cook's D = 2.318, an order of Figure 5. Multivariate differentiation analysis. The mean difference in the mean centroid distance of responses to unscrambled versus scrambled stimuli is plotted for each session by layer (A), area (B), and layer-area pair (C). ND elicited by unscrambled versus scrambled stimuli is higher in L2/3 and areas AL and AM, consistent with the spectral differentiation analysis. We found similar results when we analyzed discrete L 0 calcium events detected from the DF/F 0 traces (see Materials and Methods, Event detection; Extended Data Fig. 5-1). A, B, Asterisks indicate significant post hoc one-sided z-tests in the layer (A) and area (B) interaction LME models (**p , 0.01; ***p , 0.001). Boxes indicate quartiles; whiskers indicate the minimum and maximum of data lying within 1.5 times the interquartile range of the 25% or 75% quartiles; diamonds indicate observations outside this range. C, Mean values are indicated by bars. magnitude larger than the next most influential observation). If we exclude this stimulus, we find a weaker correlation (r = 0.258; one-sided t test; t (9) = 0.801, p = 0.222, 95% CI [-0.307, 1.00] ii ). Furthermore, there was no evidence of a relationship with ND when considering only the scrambled stimuli and their unscrambled counterparts (r = -0.378; twosided t test; t (3) = -0.708, p = 0.530, 95% CI [-0.945, 0.756] jj ). Thus, we conclude that ND is not inherited from SD. We also did not find a relationship with stimulus luminance, contrast, or spectral energy (Extended Data Fig. 8-1).

Discussion
Our results show that excitatory L2/3 neurons in visual areas AL and AM have more differentiated responses to stimuli with naturalistic structure than to phase-scrambled stimuli with closely matched low-order statistics, indicating that these populations are uniquely sensitive to highlevel natural features in this stimulus set. We found this difference at the level of single experimental sessions, and it was robust to complementary methods of measuring ND. Effect sizes were larger with increasing pupil diameter and locomotion, suggesting sensitivity to the animal's arousal level. Decoding analysis showed a marked lack of area and layer specificity: stimulus category could be accurately decoded from the activity of most cell populations we surveyed. In addition to the differences between unscrambled and scrambled stimuli, we found differences in ND among unscrambled stimuli.
Finally, we argued that ND is not merely inherited from the differentiation of the stimulus.
The precise functional specialization of visual areas in the mouse remains unclear (Glickfeld and Olsen, 2017). Recent large-scale anatomical (Harris et al., 2019) and functional (Siegle et al., 2021b) studies have uncovered a "shallow hierarchy" in which V1 lies at the base, followed by LM, RL, AL, and PM, with AM at the top. In this light, our finding that ND in L2/3 of AL and AM is sensitive to high-level naturalistic structure could be interpreted as a reflection of hierarchical processing, which may be constructing a richer dynamical repertoire for perception of naturalistic stimuli at higher hierarchical levels. Interestingly, we did not find this effect in PM, despite its intermediate position between AL and AM in the hierarchy, suggesting that such hypothetical processing toward richer repertoires is not fully determined by the one-dimensional hierarchy, but may involve specific pathways through subsets of visual areas. These observations indicate that differentiation analysis may help refine our understanding of functional specialization of brain areas and uncover differences between them that can be used to direct further investigations.
A recent study found that feedback projections from higher visual areas to L2/3 excitatory neurons in V1 create a second RF surrounding the feedforward RF and that these RFs are mutually antagonistic, pointing to a role for these neurons in predictive processing (Keller et al., 2020). If this pattern is present at higher levels of the visual hierarchy, then the layer specificity we find could be explained by a scenario in which feedback to AL and AM from areas higher in the putative dorsal stream (Marshel et al., 2011;Wang et al., 2012) are integrated with feedforward inputs in L2/3 to compute prediction errors about high-level visual features. In this scenario, the naturalistic stimuli, which contain high-level features that are presumably less predictable, would elicit more prediction errors and thus more differentiated activity.
Stimulus-evoked activity in cortex is modulated by arousal level and behavioral state (McGinley et al., 2015b;Salkoff et al., 2020). Locomotion is associated with heightened arousal, increased membrane depolarization, firing rates, and signal-to-noise ratio, and enhanced stimulus encoding (Niell and Stryker, 2010;Bennett et al., 2013;Polack et al., 2013;Vinck et al., 2015;Dadarlat and Stryker, 2017). Pupil diameter can serve as an index of arousal (McGinley et al., 2015a,b;Larsen and Waters, 2018). Larger pupil size is associated with increases in the gain, amplitude, signal-to-noise ratio, and reliability of responses in V1 (Reimer et al., 2014). Thus, our finding that increased pupil diameter and locomotion are associated with larger effect sizes could be explained by an increase in response gain or amplitude in V1 that is inherited by downstream AL and AM: since the ND in these areas is selective for naturalistic structure, increased bottom-up drive could accentuate unscrambled-scrambled differences in ND.
Alternatively, response gain or amplitude in higher visual areas could be modulated directly by subcortical arousal systems. The noradrenergic and cholinergic Figure 6. Stimulus category (unscrambled or scrambled) can be accurately decoded from most layers and areas. Each point represents the mean fivefold cross-validated balanced accuracy score of linear discriminant analysis performed on a single session (see Materials and Methods, Decoding analyses). Chance performance is 0.5. We found similar results when decoding stimulus identity across all 12 stimuli (Extended Data Fig. 6-1).
systems are likely candidates, although it is not clear why noradrenergic modulation would cause an effect specific to L2/3; as for cholinergic modulation, Pafundo et al. (2016) showed that V1 and LM are differentially modulated by basal forebrain stimulation such that the response gain and reliability of excitatory L2/3 neurons was enhanced in V1 but not in LM, despite an even distribution of basal forebrain axons across all layers in both areas. However, neuromodulatory regulation of activity in other visual areas, in particular AL and AM, has not yet been characterized in great detail. Another possibility is a topdown effect, where increases in arousal and locomotion Figure 7. Pairwise differences in ND among unscrambled, continuous stimuli. Post hoc pairwise comparisons using data from all neuronal populations are plotted against their p values (adjusted for multiple comparisons). Boxes show mean ND for each stimulus. ND of the "snake (predator)" stimulus is significantly greater than that of "crickets" and "man writing" at a threshold of a = 0.01, and greater than "conspecifics" at a = 0.05. ND of the "mousecam" stimulus is greater than that of "man writing" at a = 0.01. Mediation analysis showed a mixture of direct and arousal-mediated effects, indicating that changes in arousal cannot fully account for these differences (Extended Data Fig. 7-1). Pairwise differences in ND and decoding performance stratified by layer and area are shown in Extended Data Figure 7-2. Figure 8. SD does not explain ND. Mean ND elicited by each stimulus in L2/3 of AL and AM, plotted against SD. SD was computed by treating each pixel of the movie as a "cell" and applying the spectral differentiation measure to traces of pixel intensities over time after blurring the movie with a Gaussian filter to account for the coarseness of mouse vision. Across all stimuli, mean ND is positively correlated with SD (Pearson's r = 0.746; one-sided t test; t (10) = 3.542, p = 0.00267, 95% CI [0.393, 1.00] hh ). However, here the noise stimulus is a highly influential observation (Cook's D = 2.318, an order of magnitude larger than the next most influential observation). With the noise stimulus excluded, the correlation is weaker (r = 0.258; one-sided t test; t (9) = 0.801, p = 0.222, 95% CI [-0.307, 1.00] ii ). Moreover, there was no evidence of a relationship with ND when considering only the scrambled stimuli and their unscrambled counterparts (r = -0.378; two-sided t test; t (3) = -0.708, p = 0.523, 95% CI [-0.945, 0.756] jj ). ND was also not explained by variation in stimulus luminance, contrast, or spectral energy (Extended Data Fig. 8-1). reflect increased attentional engagement that favors processing of high-level stimulus features, selectively increasing ND for the unscrambled stimuli. In the passive viewing paradigm employed here, in which the animal is not motivated to attend to the stimuli, the top-down modulation of sensory processing may vary considerably across the experimental session as arousal and attention fluctuate.
Although differentiation analysis revealed area-specific and layer-specific differences in responses to unscrambled and phase-scrambled stimuli, our ability to decode stimulus category from neural responses was remarkably similar across areas and layers. These findings are consistent with a growing literature that reveals a dissociation between encoding and function (Erlich et al., 2015;Katz et al., 2016;Tsunada et al., 2016;Liu and Pack, 2017;Jin and Glickfeld, 2020;Zatka-Haas et al., 2021). The contrast between our ND and decoding results highlights an important distinction: decoding reveals information content, but this information is necessarily measured from the extrinsic perspective (Tononi, 2004;Oizumi et al., 2014;Tononi et al., 2016;Buzsáki, 2019). The presence of information about a stimulus in a neural circuit does not imply that the information is functionally relevant (Brette, 2019). As an extreme example, stimulus category would presumably be perfectly decodable from photons impinging on the retina, but this would reveal nothing of interest about perception. By contrast, ND is an intrinsic measure defined without reference to a stimulus (Boly et al., 2015;Mensen et al., 2017Mensen et al., , 2018. In the brain, a complex evolved system in which activity is energetically costly, ND may be a signature of functionally relevant dynamics. The dissociation we find between ND and decoding indicates that differentiation analysis can point to populations of interest that are not revealed by detecting stimulus information. Finally, we also found that the predator stimulus and the "mousecam" stimulus elicited significantly higher ND than other unscrambled continuous stimuli. The predator stimulus finding is intriguing because that stimulus has lower luminance, contrast, and spectral energy than the clip of conspecifics in a home cage (Extended Data Fig.  8-1); given the importance of detecting natural predators, the high ND evoked by this stimulus may reflect its salience to the visual system, driven by high-level features such as the presence of the predator rather than loworder stimulus statistics. This also demonstrates that differentiation analysis can probe differences in visual responses at the level of individual stimuli.
It is important to note the limitations of these data. First, calcium imaging provides an imperfect proxy for neuronal activity. The fluorescence signal from calcium indicators is more sensitive to bursts of spikes than sparse, low-frequency spiking (Chen et al., 2013;Pachitariu et al., 2018;Ledochowitsch et al., 2019;Wei et al., 2020;Huang et al., 2021;Siegle et al., 2021a). Such sparse activity may contribute to ND but would not be present in this dataset. However, given the typically sparse spiking activity of L2/3 excitatory neurons compared with deeper layers (Barth and Poulet, 2012), it is possible that this limitation only obscures even stronger L2/3 specificity. Second, for this exploratory study we used a range of naturalistic stimuli and a limited number of phase-scrambled control stimuli to include diverse high-level features. Future studies could test our findings using a larger set of artificial stimuli controlling for other low-level characteristics, e.g., optical flow, in addition to the power spectrum. Third, the restricted range of the average arousal measures we observed in our experiments in L2/3 of AL and AM limits the generalizability of the association we observed between effect size and arousal state. Fourth, while we observed medium to very large effect sizes within individual experimental sessions in L2/3 of AL and AM (Cohen's d = 0.57-1.25), the overall effect was relatively subtle (Cohen's d = 0.34) because of variability in ND values across sessions. There was also considerable variability in arousal state and locomotor activity across trials. To the extent that these factors modulate effect size, future work might uncover larger effects by employing an active paradigm where the animal is motivated to attend to the stimuli.
In summary, we measured stimulus-evoked differentiation of neural activity with cellular resolution and found increased ND in response to unscrambled versus scrambled stimuli. This effect was specific to L2/3 excitatory cells in AL and AM and was enhanced at higher arousal levels. To our knowledge, this study is the first to systematically measure stimulus-evoked differentiation with cellular resolution across multiple cortical areas and layers. These results advance our understanding of the functional differences among visual areas, and future work should integrate our findings into the emerging picture of a shallow hierarchy in the mouse visual system, for example by investigating potential differences in neuromodulation among areas or the contrast between AL/AM and PM. Differentiation analysis is motivated by IIT, and provides an intrinsic, "inside-out" analytical approach that complements extrinsic, "outside-in" measures such as decoding performance, which in this dataset did not distinguish specific cell populations. This method can be used to compare individual stimuli and may provide a readout of the degree to which a given stimulus induces a rich and varied perceptual experience. Future studies should investigate stimulus-evoked differentiation with cellular resolution in humans and nonhuman primates, where subjective reports are available, and thereby determine the contributions of distinct cell populations to ND while correlating ND with phenomenology.