## Visual Abstract

## Abstract

Although sensory cortex is thought to be important for the perception of complex objects, its specific role in representing complex stimuli remains unknown. Complex objects are rich in information along multiple stimulus dimensions. The position of cortex in the sensory hierarchy suggests that cortical neurons may integrate across these dimensions to form a more gestalt representation of auditory objects. Yet, studies of cortical neurons typically explore single or few dimensions due to the difficulty of determining optimal stimuli in a high dimensional stimulus space. Evolutionary algorithms (EAs) provide a potentially powerful approach for exploring multidimensional stimulus spaces based on real-time spike feedback, but two important issues arise in their application. First, it is unclear whether it is necessary to characterize cortical responses to multidimensional stimuli or whether it suffices to characterize cortical responses to a single dimension at a time. Second, quantitative methods for analyzing complex multidimensional data from an EA are lacking. Here, we apply a statistical method for nonlinear regression, the generalized additive model (GAM), to address these issues. The GAM quantitatively describes the dependence between neural response and all stimulus dimensions. We find that auditory cortical neurons in mice are sensitive to interactions across dimensions. These interactions are diverse across the population, indicating significant integration across stimulus dimensions in auditory cortex. This result strongly motivates using multidimensional stimuli in auditory cortex. Together, the EA and the GAM provide a novel quantitative paradigm for investigating neural coding of complex multidimensional stimuli in auditory and other sensory cortices.

## Significance Statement

The auditory cortex is thought to be integral for the perception of complex sounds, which are characterized by multiple stimulus dimensions, such as center frequency, intensity, and bandwidth. Traditional studies of cortical neurons only consider one or few dimensions of sound at a time, but it is possible that cortical neurons integrate across these dimensions when processing sounds. Here, we apply an evolutionary algorithm and a generalized additive model to quantitatively explore cortical response to 5-dimensional auditory stimuli. Our results demonstrate that cortical neurons are significantly driven by interactions across stimulus dimensions in ways that are not captured by low-dimensional characterizations and motivate the use of multidimensional stimuli in the study of sensory cortices.

## Introduction

How does a sensory system recognize complex objects? This fundamental question in neuroscience remains poorly understood. Complex objects typically carry information about many stimulus dimensions. For example, an auditory object could be characterized by frequency, bandwidth, amplitude modulation, intensity, location, and other parameters. Given the importance of sensory cortex in the perception of objects and its place in the sensory hierarchy, it is likely to play an important role in integrating different stimulus dimensions represented separately in the periphery. For such neurons, the traditional approach of characterizing the neuron’s response to each dimension separately (e.g., a one-dimensional tuning curve) may be incomplete. Indeed, previous studies analyzing the separability of the spectrotemporal receptive field (STRF) of neurons have shown that cortical STRFs may not be separable (Depireux et al., 2001; Sen et al., 2001; Linden et al., 2003) and that cortical neurons can display nonlinear tone-tone interactions (Shamma et al., 1993; Nelken et al., 1994; Suga, 1994; Calford and Semple, 1995; Brosch and Schreiner, 1997; Sutter et al., 1999; Kadia and Wang, 2003). Thus, the use of multidimensional stimuli to probe cortex is likely to reveal important aspects of cortical processing and integration missed by traditional methods.

Evolutionary algorithms (EAs) have tremendous potential for investigating the neural coding of multidimensional stimuli in a wide variety of systems in neuroscience (Bleeck et al., 2003; Yamane et al., 2008; Carlson et al., 2011; Hung et al., 2012; DiMattina and Zhang, 2013; Chambers et al., 2014). EAs have been applied to investigate the coding of sounds in inferior colliculus and cochlear nucleus (Bleeck et al., 2003), 2-dimensional shapes in visual area V4 (Carlson et al., 2011), and 3-dimensional shapes in inferotemporal cortex (Yamane et al., 2008). Recently, an EA was developed for finding highly effective stimuli for maximizing the firing rate of auditory cortical neurons in a 5-dimensional stimulus space (Chambers et al., 2014). The EA probed cortical neurons with 5-dimensional stimuli and gradually modified these stimuli “online” based on the neural firing rate, to find stimuli that were most effective in driving the neuron. The parameter space in this situation is vast and impossible to explore using traditional methods. The EA was successfully able to identify highly effective stimuli for cortical neurons in five dimensions over several “generations” of stimuli.

Given the integrative role of auditory cortex, we hypothesized that cortical neurons show interactions across stimulus dimensions. Two important challenges arise in testing this hypothesis. First, traditional 1-dimensional tuning curves are insufficient, as interactions between dimensions must also be considered. Second, the use of multiple linear regression methods is not appropriate, given the nonlinear dependence of the cortical response on stimulus dimensions (e.g., nonlinear level tuning). Thus, an appropriate method should blend the ability to model interactions possessed by traditional linear models with nonlinear tuning curves. Here, we apply the generalized additive model (GAM) to address these issues. The GAM is a powerful method for performing nonlinear regression in statistics (Hastie and Tibshirani, 1991; Hastie et al., 2011). It can be thought of as an extension of the linear model (LM) and the generalized linear model (GLM) to allow a more flexible nonlinear dependence on stimulus dimensions. While the LM and the GLM have been widely applied to characterize sensory neurons (Klein et al., 2000; Theunissen et al., 2000; Depireux et al., 2001; Theunissen et al., 2001; Calabrese et al., 2011), the GAM remains underutilized in neuroscience. The objective of this study is to demonstrate the capacity of the GAM to uncover interactions between stimulus dimensions that cannot be revealed by more traditional 1-dimensional tuning curves, and to quantitatively characterize such interactions.

## Materials and Methods

### Experimental data

The quantitative methods in this paper were applied to experimental data collected by Chambers et al. (2014). An EA was used for online stimulus optimization based on single-unit spike feedback in the primary auditory cortex (A1) of awake, passively listening mice. Multichannel silicon probes (Neuronexus) were surgically implanted into A1 (located with functional mapping to reveal the characteristic tonotopic gradient) of male BCA/CaJ mice 8-10 weeks of age. At least 48 hours after implantation, the mice underwent recording sessions, during which a series of acoustic stimuli were presented by the EA and the responses from a well-isolated single unit from one of the 16 channels were recorded and analyzed online in order to drive the algorithm. The EA explored an acoustic space consisting of the following: center frequency (CF; 4-64 kHz in 0.1 octave increments), intensity (I; 10-60 dB in 10 dB increments), spectral bandwidth (BW; pure tone, 1.25 octave band in 0.25 octave increments), sinusoidal amplitude modulation frequency (AM; unmodulated [0 Hz], 70 Hz in 10 Hz increments), and speaker location (L; all permutations of left, right, top, and center). Acoustic stimuli were presented for 400 ms with 600 ms between each, and the firing rate was calculated during the entire stimulus window (0-400 ms). Each run of the EA search procedure began with 50 stimuli selected at random from the pool of 177,120 potential stimuli. Each stimulus was presented twice during the session. At the end of each generation of 50 stimuli, the stimuli were rank-ordered with respect to firing rate responses. The top 10 were used as “breeders” for the next generation such that their “offspring” were created by randomly shifting at least one acoustic dimension to a neighboring value. The most effective stimulus from the first generation was termed the “yardstick” and was repeated in all subsequent generations to estimate the effect of adaptation across generations. After the first generation, each generation consisted of 39 breeder-based, 1 yardstick, and 10 randomly selected stimuli from the stimulus pool to avoid focusing on local maxima. The maximum response magnitude was the maximum value of the firing rate over 6 generations of the EA for each neuron. For some single units, two runs of the EA were performed in order to compare the convergence from independent starting points. Results by Chambers et al. (2014) successfully showed the ability of EAs to converge on stimuli that maximized the firing rate. The methods in this paper further analyze the data gathered by the EA by quantifying the relationship between the stimulus dimensions and the neural response. Further experimental details can be found in Chambers et al. (2014).

### GAM

The GAM can be thought of as an extension of the LM and the GLM that allows a more flexible nonlinear dependence on stimulus dimensions. Here, the GAM is used to express average neural firing rate *r* as a sum of nonlinear functions. In this study, 15 possible nonlinear function terms are considered: 5 terms corresponding to individual stimulus dimensions (CF, I, BW, AM, and L) and their 10 possible combination pairs, defined as “interaction terms.” This may be represented as follows:
where the predicted neural response *r*, assumed to have a normal distribution, is related to predictor variables *x _{i} *and their unique pairs. Here,

*x*represents 1 of the 5 stimulus dimensions, and

_{i}*f*is a smooth function, with each of the 15

_{i}*f*terms individually determined.

### Model selection

A GAM was developed for each of *n* = 50 neurons using an iterative algorithm. GAM model training was performed on all stimuli presented during the EA. First, we performed an exhaustive search of all 32 possible models consisting only of individual stimulus dimensions. Because each of these models had a distinct number of free parameters, we used the Akaike Information Criterion (AIC) to determine the best model (Wood, 2006). In model selection, AIC considers the tradeoff between goodness of fit and complexity, with a lower AIC value indicating a better model. The lowest AIC value was used to determine the best of 32 GAMs containing 1-5 “main” dimensions. We then expanded this initial GAM by testing GAMs that contained these main dimensions and all of their possible interaction terms (up to 1024 possible models). From this family of possible models for each neuron, we selected the final overall best GAM as the one with the lowest AIC value. The median adjusted *r*^{2} value of the 50 final GAMs was 0.50, with values ranging from 0.11 to 0.89.

### Significant dimensions and interactions

For each neuron’s final GAM, the significance of the contribution of each term was analyzed. Significant terms were defined as those with *p* < 0.05. The number of significant interaction terms and the total number of significant terms were quantified for each neuron. The significant terms were summarized using a matrix where entries on the diagonal indicate the significance of main dimensions and entries on the upper triangle indicate the interaction terms. To quantify the contribution of interaction terms to the overall accuracy of a neuron’s GAM, the interaction terms were subtracted from the model, and the adjusted *r*^{2} value was recalculated.

### Visualization

A neuron’s entire “response space” was visualized by either a 1-dimensional “response curve” or a 2-dimensional “response surface.” One-dimensional response curves were generated by holding all other dimensions at their median values. Confidence intervals on 1-dimensional plots were defined as ±2× the SE. To visualize interactions, we plotted a neuron’s 2-dimensional response surfaces, fixing all other dimensions at their median values. The response surface effectively expresses the firing rate as a function of two main dimensions and one interaction term as follows:

GAM fitting and visual analysis were performed using the *mgcv* package in R (Wood, 2006). This software constrained the response spaces to be smooth using penalized regression splines and generalized cross-validation.

### Quantifying adaptation

Because the EA progressively converges on a single optimum stimulus, it is possible that the accuracy of the GAMs may be impacted by adaptation. In evaluating this possibility, we quantified adaptation as the percent difference in neural firing rate between the first and final presentations of the yardstick stimulus. Adaptation to the yardstick stimulus was a 46.4 ± 36.4% decrease in firing rate.

## Results

### Presence of significant interaction terms in A1

Figure 1 shows an example comparison between two neurons and their dependency on CF and I. Although both neurons are significantly driven by CF and I as individual dimensions, only the second (Fig. 1*D-F*) is impacted by a significant interaction term between these two. The first neuron appears to follow a tuning curve with respect to frequency and an inverse relationship with intensity (Fig. 1*B*), and these 1-dimensional trends are reflected in the 2-dimensional response surface (Fig. 1*C*). However, with the inclusion of an interaction term, the response surface of the second neuron does not have such obvious relationships to its individual dimensions (Fig. 1*E*,*F*).

Figure 2 further explores the impact of an interaction term on a neuron’s response space. We separated the firing rate of a single neuron whose GAM has an interaction term between CF and I into individual dimension terms and an interaction term, which add together to create the overall modeled response. Inclusion of an interaction term appears to capture complexities of the neuron’s response that are missed when only individual dimensions are considered. To quantify this, adjusted *r*^{2} values are calculated for each GAM and then recalculated after dropping all interaction terms from this GAM. Such a removal of interaction terms causes a decrease in adjusted *r*^{2} from 0.50 ± 0.19 to 0.29 ± 0.16 with a median decrease in accuracy of 37.4% (data not shown). As shown in Figure 3, A1 neurons have the potential to exhibit dependence on many interaction terms, which highlights a level of complexity of neuronal response that cannot be captured when only considering individual dimensions.

### Population responses

Figure 4 summarizes the dependence of A1 responses across the population. Of the 5 individual dimension terms considered, CF was the most common, significantly modulating the firing rate of 100% of sample neurons. This was followed by I (76.0%), BW (68.0% each), AM (62.0%), and L (34.0%) (Fig. 4*A*). GAMs produced for A1 neurons had 2.4 ± 1.9 interaction terms and 5.8 ± 2.6 total significant terms (Fig. 4*B*). Of the 50 neurons analyzed, only 7 had no significant interaction terms.

## Discussion

EAs have long been recognized as a potentially powerful tool and applied successfully to characterize sensory processing of multidimensional stimuli in a handful of studies (Bleeck et al., 2003; Yamane et al., 2008; Carlson et al., 2011; Hung et al., 2012; DiMattina and Zhang, 2013; Chambers et al., 2014). However, some significant barriers have prevented the widespread use of EAs. First, it is not entirely clear whether it is strictly necessary to probe the full multidimensional space of stimuli (e.g., using an EA) or whether adequate information can be obtained by exploring each dimension individually. Second, quantitative methods for analyzing the multidimensional data sets resulting from the EA are lacking. In this study, we applied the GAM to address these issues.

Previous quantitative methods for characterizing cortical responses have included the LM and GLM. A specific instance of the LM, the STRF, has been applied extensively to investigate cortical responses (Klein et al., 2000; Theunissen et al., 2000, 2001; Depireux et al., 2001; Calabrese et al., 2011), and the GLM has extended this approach (Calabrese et al., 2011). Other studies have indicated that cortical STRFs may not be separable (Depireux et al., 2001; Sen et al., 2001; Linden et al., 2003) and that cortical neurons can display nonlinear tone-tone interactions (Shamma et al., 1993; Nelken et al., 1994; Suga, 1994; Calford and Semple, 1995; Brosch and Schreiner, 1997; Sutter et al., 1999; Kadia and Wang, 2003), suggestive of interactions across multiple stimulus dimensions. Although these studies have revealed aspects of cortical processing, they have been limited in two important ways. First, these studies did not quantify interactions across more than 2 stimulus dimensions explicitly. Second, the LM and GLM are limited in their ability to capture strong and diverse patterns of nonlinearities evident in cortical neurons (Theunissen et al., 2000; Bar-Yosef et al., 2002; Machens et al., 2004; Sadagopan and Wang, 2009). These limitations can be surpassed, in principle, by using information theoretic techniques for characterizing cortical neurons, such as maximally informative dimensions (Sharpee et al., 2004; Atencio et al., 2008, 2009,2012). However, this approach has the disadvantage that it is highly data intensive and limited by the “curse of dimensionality” (i.e., the difficulty of searching for optimal stimuli in a high dimensional stimulus space) and is therefore difficult to apply for multidimensional stimuli. Thus, there continues to be a need in the field for quantitative methods that are able to characterize the neural coding of multidimensional stimuli, including nonlinearities.

The GAM extends the LM and GLM by allowing the neural response (in our case, the average firing rate) of the neuron to be expressed as sum of nonlinear functions (Hastie and Tibshirani, 1991; Hastie et al., 2011). These nonlinear functions are constrained to be smooth but can otherwise be highly flexible in form. The importance of this feature of the GAM can be appreciated by considering the dependence of the response on the stimulus dimensions. For example, the dependence of the firing rate on frequency is highly nonlinear and can take a on a wide diversity of shapes in cortex. The dependence of firing rate on intensity can be quasilinear up to a certain level in some cases but is often sigmoidal or nonmonotonic over the relevant range of intensities. The GAM allows a flexible way to model such relationships. Moreover, the GAM also allows modeling interactions between stimulus dimensions, by including interaction terms (see Materials and Methods).

The GAM results confirmed familiar aspects of cortical responses but also revealed novel aspects. For example, we found that all neurons in our dataset were significantly modulated by frequency, consistent with the fact that frequency is a fundamental dimension for cortical responses (Merzenich et al., 1975; Guo et al., 2012). We also found that, while many cortical neurons were sensitive to sound intensity, a proportion of cortical neurons in our data were not significantly modulated by intensity, consistent with results showing the emergence of neurons robust to intensity variations at the cortical level (Billimoria et al., 2008; Sadagopan and Wang, 2008; Chambers et al., 2014). In addition to these 2 fundamental dimensions, cortical neurons also showed sensitivity to BW, AM, and L, with decreasing proportions of cortical neurons sensitive to each.

Most importantly, the GAM revealed that cortical neurons can be sensitive to interactions across stimulus dimensions. On average, cortical neurons were sensitive to ∼3 interactions across different stimulus dimensions in a 5-dimensional stimulus space. These results indicate that low dimensional characterizations of cortical neurons, which explore 1 or 2 dimensions, while fixing others to arbitrary values, are likely to miss important aspects of cortical response. This strongly motivates the use of EA in characterizing responses in sensory cortex using multidimensional stimuli.

A potential challenge facing EAs is adaptation in firing rate. Thus, adaptation may also have influenced the quality of the GAM. However, we found no significant correlation between the amount of adaptation and the performance (adjusted *r*^{2}) of the GAM (*p* = 0.152, correlation coefficient = 0.2059). One potential explanation is that, with the exception of a single “yardstick” stimulus, stimuli in the EA were rarely repeated exactly. Previous studies have shown that adaptation in auditory cortex can be highly stimulus specific (Nelken, 2014). This may have mitigated the effects of adaptation on the EA and GAM.

Overall, our results suggest that primary auditory cortex integrates information across multiple stimulus dimensions both at the single neuron level, through multiple interactions within single neurons, and at the population level, through a diverse range of interactions across different neurons. Object formation likely requires several hierarchical steps to accomplish. Although this process likely is not completed within A1, the interactions we observed may be a key computation toward object formation. Future experimental and theoretical studies investigating the synaptic and network mechanisms underlying interactions, and the impact of interactions on single neuron and population coding should clarify whether and how such interactions contribute to the cortical substrate for complex object recognition.

### Future Directions

Several additional future directions merit further exploration. First, the GAM analysis performed here was run on the collected dataset using the EA after the experiments were complete. In the future, it would be interesting to apply the GAM online as the data are collected. Second, the GAM and EA were applied to a specific family of 5-dimensional stimuli. In principle, this approach could be applied to any family of sounds that can be parameterized systematically. For example, a similar paradigm could be used with parametrically specified multitone combinations or ripple stimuli. However, it remains unclear how high the dimensionality of the stimulus space can be to still remain tractable for exploration with the EA, within typical time limits for data collection. Third, the GAM as applied here takes into account encoding by the firing rate only ignoring temporal structure in the responses. The vast majority of literature on GAMs is for static inputs, with a few extensions for dynamic inputs. In the future, it would be interesting to extend the GAM to account for temporal structure in neural responses.

## Footnotes

The authors declare no competing financial interests.

This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International, which permits unrestricted use, distribution and reproduction in any medium provided that the original work is properly attributed.