Encoding of Spatial Attention by Primate Prefrontal Cortex Neuronal Ensembles

Abstract Single neurons in the primate lateral prefrontal cortex (LPFC) encode information about the allocation of visual attention and the features of visual stimuli. However, how this compares to the performance of neuronal ensembles at encoding the same information is poorly understood. Here, we recorded the responses of neuronal ensembles in the LPFC of two macaque monkeys while they performed a task that required attending to one of two moving random dot patterns positioned in different hemifields and ignoring the other pattern. We found single units selective for the location of the attended stimulus as well as for its motion direction. To determine the coding of both variables in the population of recorded units, we used a linear classifier and progressively built neuronal ensembles by iteratively adding units according to their individual performance (best single units), or by iteratively adding units based on their contribution to the ensemble performance (best ensemble). For both methods, ensembles of relatively small sizes (n < 60) yielded substantially higher decoding performance relative to individual single units. However, the decoder reached similar performance using fewer neurons with the best ensemble building method compared with the best single units method. Our results indicate that neuronal ensembles within the LPFC encode more information about the attended spatial and nonspatial features of visual stimuli than individual neurons. They further suggest that efficient coding of attention can be achieved by relatively small neuronal ensembles characterized by a certain relationship between signal and noise correlation structures.


Introduction
Electrophysiological studies in nonhuman primates have shown that neural activity in the lateral prefrontal cortex (LPFC) is strongly modulated by selective visual attention: the responses of single neurons to attended targets are enhanced relative to responses to unattended distractors (Rainer et al., 1998;Buschman and Miller, 2007;Bichot et al., 2015). For the case of spatial attention, these phenomena translate into the encoding of attended relative to unattended locations by individual single neurons (Lennert and Martinez-Trujillo, 2011). However, several studies have suggested that information coding by neuronal populations cannot be determined simply from measurements in individual neurons. The amount of information encoded by a population of neurons also depends on interactions between simultaneously active neurons that generate correlated firing (Maynard et al., 1999;Averbeck et al., 2006;Seidemann et al., 2009). This is important to consider in neurophysiological studies, since behavior is likely generated by the activity of neuronal ensembles composed of many individual neurons sharing connectivity patterns rather than by the activity of single neurons acting in isolation.
Technological advances, such as the emergence of tetrodes and multielectrode recording arrays, allow progress beyond the single-neuron approach in electrophysiological studies toward investigating the properties of many neurons recorded simultaneously (neuronal ensembles; Yuste, 2015). Many studies in macaque monkeys using multielectrode array recordings have examined the encoding of signals in cortical areas related to motor processing to further the development of neural prosthetics (Wessberg et al., 2000;Musallam et al., 2004). However, relatively fewer studies have examined executive control signals related to attention in high-order associative areas using the same techniques (Kiani et al., 2015;Tremblay et al., 2015;Astrand et al., 2016). For the LPFC, it has been recently demonstrated that ensembles of neurons encode information about the allocation of spatial attention, and that the encoded information is influenced by correlated firing between units (i.e., noise correlations; Tremblay et al., 2015). One issue that remains less investigated is whether LPFC neuronal ensembles can encode other task-relevant (nonspatial) features of visual stimuli. Furthermore, it remains unclear how the coding of spatial locations and other features changes as a function of the size and composition of neuronal ensembles.
To address these issues, we simultaneously recorded the activity of neurons in the LPFCs of two rhesus macaque monkeys while they allocated attention to one of two moving random dot patterns (RDPs) positioned in opposite hemifields. The monkeys had to deploy spatial attention but then respond to a change in another feature of the stimulus: its motion direction. We find that individual units were tuned for both the attended location and the motion direction of the stimulus. However, there is little overlap between the coding of spatial attention and motion direction in the populations of recorded units. Using a linear classifier, we demonstrate that the performance of neuronal ensembles at decoding the attended location or the motion direction of the stimulus is higher than the performance of the best single units. Finally, we compare the performance of ensembles built using different methods and found that ensembles of relatively small sizes (Ͻ60), that do not necessarily include the best performing individual units, maximize decoding accuracy.

Animals
Two adult male monkeys (Macaca mulatta; R, 9.7 kg; S, 10.2 kg) participated in the experiments. All animal procedures were performed in accordance with the McGill University animal care committee regulations. During the training and testing periods, the animals received fluids as reward for correctly performing the task. We also gave the animals fresh fruits and vegetables as supplements when finishing a session. Body weight, water intake, and mental and physical well-being were monitored daily. None of the animals were killed for the purpose of this study.

Visual stimuli
The stimuli were back-projected on a screen using a video projector (NEC WT610, 1024 ϫ 768-pixel resolution, 75 Hz) and custom-made software running on an Apple G4 Power PC. The animals viewed the screen at a distance of 57 cm (i.e., 1 cm on the screen corresponded to 1°of visual angle). The stimuli were random dot patterns (RDPs) generated by plotting colored dots (white, 76.39 cd/m 2 ; gray, 10.83 cd/m 2 ; pink, 22.68 cd/m 2 ; green, 11.26 cd/m 2 ; blue, 10.96 cd/m 2 ; red, 8.92 cd/m 2 ; turquoise, 44.14 cd/m 2 ) on a dark gray background (0.74 cd/m 2 ) with a density of 3 dots/deg 2 within a circular stationary virtual aperture. All dots within one RDP moved coherently at a speed of 15°/s and were replotted at the opposite side when they crossed the border of the aperture. The radius of the aperture was 4°, and it was centered 8°from the fixation spot.

Task
The animals initiated a trial by keeping gaze within a 2°-radius window (4°in monkey S) centered on a small fixation spot (0.24 deg 2 ). Gaze position was monitored using an infrared video-based eye tracker (EyeLink 1000, SR Research). After a 353-ms fixation period, two moving RDPs appeared, one located to the left and the other to the right of the fixation spot. The patterns were composed of white dots on a dark background that moved either up (0°) or down (180°) relative to the vertical. After a variable interval (294, 471, or 647 ms) after the RDPs' onset, both patterns changed to different colors (i.e., the left one to green and the right one to red). The task for the animals was to identify one of the two RDPs as the target based on its color and covertly attend to it while ignoring the other (the distractor). After 706 ms, the color was removed and the RDPs returned to white. The animals had to maintain attention on the target and wait 753-1600 ms for a brief motion direction change in the target stimulus (118-ms duration, 32°clockwise from the current direction) and release the button within 100 -650 ms. In 50% of the trials, the distractor changed motion direction before the target. In those trials, the monkey had to keep holding the lever until the target changed. Which of the two colors indicated a target was based on an ordinal color-rank rule the monkey had learned over the training sessions (turquoise Ͼ red Ͼ blue Ͼ green Ͼ pink Ͼ gray; Lennert andMartinez-Trujillo, 2011, 2013). Each correctly performed trial was rewarded with a drop of juice. A sequence of correct trials yielded a slight increase in reward size. Trials in which the monkey responded to the distractor change (false alarms), did not respond to the target change within the reaction time window (misses), or broke fixation before the end of a trial (fixation breaks), were terminated without reward. The different trial types were presented in random sequence. Only correctly performed trials were included in the analysis unless otherwise indicated.

Surgical procedures
The surgical operations were conducted under general anesthesia using isofluorane administered through endotracheal intubation. The animals were implanted with titanium head posts used to restrain head motion during training and recording sessions. We chronically implanted a 10 ϫ 10 multielectrode array (96 channels, Blackrock Microsystems) in each monkey's left LPFC. The array was positioned on the cortical surface anterior to the knee of the arcuate sulcus and caudal to the posterior end of the principal sulcus, known as area 8A in the macaque monkey (Petrides, 2005).

Electrophysiological recordings
We recorded from all 96 channels from the left LPFC of both animals. Data were recorded using a Cerebus Neuronal Signal Processor (Blackrock Microsystems) via a Cereport adapter. After 1ϫ amplification in the head stage (ICS-96), the neuronal signal was bandpass filtered (0.3 Hz to 7.5 kHz) and digitized (16 bit) at a sample rate of 30 kHz. For each channel, spike waveforms were detected by manually thresholding (ϳ4 times the root mean square of the noise amplitude) the digitally high-pass filtered (250 Hz, 4 pole) raw voltage trace. The extracted spikes and associated waveforms were sorted offline using both manual and semiautomatic techniques using OfflineSorter (Plexon) and Matlab (MathWorks).

Data analysis
Analysis of spike data (firing rates) and statistical tests were performed using Matlab. Unless indicated otherwise, our analyses were computed for a 500-ms window during a postcue/sustained attention period (150 ms after color offset; 650 ms after color offset). For further details on the results of the statistical analyses, refer to Table 1.
From the recordings, we extracted a total of 1081 units [556 in monkey S over 5 sessions (156,103,107,102,88) and 525 in monkey R over 4 sessions (151,107,127,140)] with a firing rate higher than 0.1 Hz. We used a Wilcoxon rank-sum test to determine whether units had significantly different (p Ͻ 0.05) firing rates in the postcue period or in a 500-ms window during the color cue presentation compared with baseline, which was a 700-ms window centered around stimulus onset. This resulted in 391 units in monkey S (70%) and 462 units in monkey R (88%).

Neuronal selectivity
We determined neuronal selectivity by performing twoway ANOVA using the factors target location and motion direction. For units with a significant main effect, we determined which of the stimulus parameters yielded the highest mean firing rate (i.e., a unit that showed a main effect of location was considered ipsi-selective when its mean firing rate was higher for targets presented ipsilaterally to the recording site than for contralaterally presented targets).

Spike density functions
The activity of single selective units and the selective populations were plotted as trial-average spike density functions, generated by convolving the spike train with a Gaussian kernel (width 25 ms) and normalizing by the maximum firing rate in each unit's preferred stimulus condition. We determined the latency and magnitude of the difference in response to preferred and nonpreferred features by comparing the mean activity of the selective units in 20-ms bins using a paired t test (p Ͻ 0.05). Once five consecutive bins were significant, we took the first of those as the latency of the selectivity. We used the mean t-value across all 20-ms bins to quantify the magnitude of the difference.

Significant proportions of selective units
We investigated whether our proportions of selective units found in each animal were significantly different from chance by determining the proportions in a randomized population. For this, we randomly permuted the entire trial order and recomputed the ANOVA. We repeated this permutation and recomputation 1000 times, compared the mean proportions of the shuffled populations to the actual proportions in the data using a 2 test (p Ͻ 0.05), and calculated 95% confidence intervals using the Wilson score interval.

Anatomic clustering
We determined whether location-and directionselective units were significantly organized or clustered in space across the array using Moran's I, a metric of spatial autocorrelation (Moran, 1950). Moran's I ranges from -1 to 1, with negative values indicating that similar feature values are spatially repellant and positive values indicating that similar values are spatially clustered. We compared our values to chance, obtained by shuffling each electrode's feature label 1000 times and then taking the 95th percentile range of values.

Decoding stimulus features
We used a L2-regularized linear support vector machine (SVM, liblinear v2.1;Fan et al., 2008) to decode the stimuli features (specifically, target location and motion direction) from the task-related units during the postcue epoch when both RDPs had the same color and direction. The regularization parameter was the optimal penalty parameter C (refer to Eq. 1 in Fan et al., 2008), identified by conducting a grid search. To assess the accuracy of the decoder, we used a cross-validation technique: The decoder was trained on 90% of the trials for a given neural ensemble and then tested on the remaining 10% (10-fold cross-validation). The SVM was iteratively trained and tested on different subsamples of the trials until each trial New Research was in the test set at least once and used for training nine times. Furthermore, we balanced the number of trials between the conditions by identifying the minimum number of trials for the unique conditions. Of the other conditions, a random subsample equal to the minimum number of trials was selected. We repeated the subsampling procedure 10 times. We used two different procedures to determine decoding performance for neuronal ensembles of various sizes. First, we sorted the units from highest to lowest by their individual performance in decoding attended location and motion direction, respectively. To build an ensemble of size n ϩ 1, we iteratively added the next-best-performing unit to the ensemble of size n, i.e., the highest performing unit was considered an ensemble of size 1, and to build the n ϩ 1 ensemble we added the second-best performing unit, and so forth to build increasingly larger ensembles. We refer to the resulting ensemble from this method as the best single-unit ensemble (BSU). We also used a second procedure, best ensemble (BE), by again starting with the best single unit but then paired it with all remaining units to find the pair that maximized the decoder's performance. Then we used this pair and combined it iteratively with each of the remaining units to find the best trio that maximized the decoder's performance. The procedure was repeated for the best quartet and so on. Note that the BE method optimizes ensemble performance and allows for the possibility that the best single units do not necessarily make up the best ensembles. Because we recorded a different number of units each day, we performed this analysis only for the minimum ensemble size across all recording sessions for each animal (94 for monkey R and 61 for monkey S). We used unpaired t tests (p Ͻ 0.05) to compare the decoder's performances for the different ensemble types between monkeys. To assess the contribution of correlations in the ensemble, we shuffled trial order within the same condition to destroy shared trial-by-trial variability and recomputed the SVM classification analysis on those shuffled ensembles. This procedure was repeated 100 times. We used paired t tests (p Ͻ 0.05) to assess differences in decoding accuracy with and without correlations. Furthermore, we used exact tests to compare the obtained decoding performance at each ensemble size to chance performance, which we obtained by randomizing the entire trial order and rebuilding the ensembles for each surrogate data set. We repeated this shuffling procedure 10 times. Last, we compared the maximum decoding performance and the ensemble size at which it was reached with Wilcoxon signed-rank tests (p Ͻ 0.05). Superscript letters listed with p-values correspond to the statistical tests shown in Table 1.

Decoding behavior
We predicted target location and motion direction from the activity of the BE yielding the maximal decoder performance under various behavioral conditions. Specifically, we compared decoding accuracy between all trials (1) considering correct trials only, (2) considering error trials only, and (3) narrowing included error trials down to false positives (i.e., responses to the distractor) only. Because of only a small number of error and false-positive trials, we were unable to balance the number of trials between the different outcome conditions. Thus, we restricted statistical analyses to use exact tests (p Ͻ 0.05) for comparing the decoding performance and chance performance, obtained by shuffling trial labels 100 times and recomputing the decoding accuracy.

Behavioral performance
We trained two adult monkeys (Macaca mulatta), R and S, to maintain their gaze on a central fixation point while covertly attending to one of two white, peripherally presented, moving RDPs presented on a dark background. In a given trial, the RDPs appeared simultaneously and moved in the same direction, changing colors after a variable time. Based on a color rule (Lennert andMartinez-Trujillo, 2011, 2013), the monkeys had to identify and attend to the target RDP while ignoring the other RDP (distractor). Briefly, we taught the animals an arbitrarily arranged ordinal hierarchy of six isoluminant colors. In each trial, the higher-ranking colored RDP was the target. The animals were rewarded for releasing a button after correctly detecting a brief motion direction change in the target, ignoring distractor changes ( Fig. 1A and Methods).
Both monkeys learned the task and performed above chance level (50%) in all experimental sessions (Fig. 1B, left panel; 87.6% correct trials in monkey R and 64.97% correct trials in monkey S, respectively). Most error trials of both monkeys were responses to the distractor (false alarms) rather than failures to respond (misses; Fig. 1B, middle panel). The latter indicates that the animals indeed attended to the target and ignored the distractor but also that the task was challenging for the animals. Monkey R had a better performance and faster reaction times than monkey S (Fig. 1B, right panel, 327 and 436 ms, respectively). Unless stated otherwise, we considered only correct trials for our analyses of neuronal responses.

Neuronal selectivity
While the animals performed the task, we recorded the responses of a total of 556 units (single units and multiunits) in monkey S (5 sessions) and 525 in monkey R (4 sessions) using 96-channel microelectrode (Utah) arrays chronically implanted in the left area 8A, located on the cortical surface between the posterior end of the principal sulcus and the knee of the arcuate sulcus (Petrides, 2005). Our arrays were located slightly dorsal to the principal sulcus ( Fig. 2A). Of those recorded, 853 units (79%) showed significantly different firing rates during the color cue presentation and/or the postcue epoch compared with a window of 700 ms centered at stimulus onset (Wilcoxon rank-sum test; p Ͻ 0.05 a ). The variables of interest in the following analyses were the allocation of spatial attention and the direction of the stimuli; therefore, we concentrated on the postcue or attentional period in which the stimuli on the screen did not change color or direction.
To examine the tuning of single units, we used a 2-way ANOVA with target location and motion direction as factors (p Ͻ 0.05 b ). Fig. 2B shows an example unit (recorded from monkey R) that responded more strongly when the target location was contralateral to the recording site (left panel), or when the RDP's motion direction was down (right panel). Similarly, Fig. 2C shows an example unit (recorded from monkey S) responding more strongly on trials when the target stimulus was presented ipsilaterally (left panel), or when the RDPs' motion direction was up (right panel). These units were recorded from different electrodes in the array (see schematics in Fig. 2A), and they encode the target location (red squares) and motion direction (green squares), respectively.
We next examined the proportions of selectivity for the attended location (location selective) and the stimulus' motion direction (direction selective) in the entire population (Fig. 3A). In monkey R (left panel), we identified 56% of the units (285 of 462) to be selective for at least one of the two variables, attended location and motion direction. Of those, 59% (36% Ϯ 4.38% of the total population) were location selective, 28% (17% Ϯ 3.45%) were direction selective, and 13% (8% Ϯ 2.52%) showed selectivity for both attended location and direction. To determine whether these proportions were different from those expected by chance, we compared them to those obtained using a randomization procedure (chance estimate). For the randomization procedure, we used the same trials and units as in the original data but shuffled the trial labels. In monkey R, the proportion of location selective cells predicted by chance was 4.94%, which was significantly smaller than that found in the real population ( 2 test, p ϭ 8.29 ϫ10 Ϫ10 c ). Similarly, the proportion of directionselective units was significantly smaller in the randomized population than in the real data (5.04%, 2 test, p ϭ 0.002 d ), as well as the proportion of units encoding both variables (0.23%, 2 test, p ϭ 0.004 e ).
In monkey S (Fig. 3A, right panel), 49% of the units (191 of 391) were classified as selective for the attended location or direction. The majority of cells, 77% (38% Ϯ 4.8% of the total population), were direction selective, 17% (8% Ϯ 2.75%) were location selective, and 6% (3% Ϯ 1.71%) were both location and direction selective. The proportion of direction-selective units was significantly higher than expected by chance (5.21%, 2 test, p ϭ 4.13 ϫ10 Ϫ10 g ), whereas the proportion of location selective units was not significantly different from that found in a randomized population (9.39%, 2 test, p ϭ 0.6972 f ). The proportion of units selective for both features was also not different from chance (0.21%, 2 test, p ϭ 0.1016 h ). To assess whether the (nonsignificant) proportion of units that did have location selectivity showed a true effect, we compared the size of the isolated effect in the original data with that in the shuffled data by computing an index of sensitivity (D=). The magnitude of the effect was similar in both groups of units (Wilcoxon rank-sum test, p ϭ 0.553). Thus, it is unclear whether the effect isolated in the original data was a true effect of attention that is present in a small number of units or reflected noise in our data. With the current analysis, we cannot fully reject the latter scenario. . The monkeys initiated a trial by fixating on the central point while pressing a button. After this initial fixation period, two white moving RDPs appeared peripherally of the fixation point and changed to two different colors after a random interval. The animals had to identify the higher-ranking color (the target) and allocate their attention to it before the color cue was extinguished and the RDPs returned to white. The monkeys had to maintain central fixation and covert attention until there was a brief motion direction change in the relevant stimulus. In 50% of the trials, the distractor changed before the target, in those cases, the monkeys had to keep pushing the button as only a release after the target change was rewarded with juice. B, Percentage of hits, errors, and mean response time for monkey R (black bars) and monkey S (white bars). Averaged across all color combinations. Error bars denote standard deviation across sessions.
The distributions of selectivities were different in the two animals. Whereas animal R had a larger proportion of units selective for the attended location than units selective for motion direction, animal S showed an inverse pattern. One possible explanation for this result is that the animals had different training histories and because selectivity in the LPFC for different features may be affected by experience; thus exposure to different tasks may have shaped neuronal selectivities in a different manner for each animal. To investigate this issue, we plotted the training history of the animals (Fig. 3B). Monkey R was extensively trained and participated in other experiments using a similar color-rank order task shown in Fig. 1. In this task, although motion direction is important to detect the response cue, the spatial location of the target is of primary importance, i.e., the animals had to decide whether the right or left RDP was the target. On the other hand, monkey S had first been trained in the same task, but before undergoing testing in the current experiments, it was extensively trained in various match-to-sample tasks that required matching the direction of two moving RDPs or Gabor patches. In those tasks, location was an irrelevant variable for determining what the target was: only motion direction was important (Mendoza-Halliday et al., 2014). Using an ANOVA with the factors target location and motion direction on the averaged activity recorded on each electrode, we tracked the distribution of selective channels over the recording sessions in each animal (Fig. 3C). Our goal was to examine whether the proportions of selective cells were relatively stable over time. Interestingly, from one session to the next, there were only few channels with significant tuning in common, but overall there were very similar distributions of selectivities (see color bars). This suggests that the selectivity was stable over recording time and the difference between animals was not due to an outlier session. Thus, it is possible that the differences in neuronal selectivities are due to differences in training history between animals. However, another equally possible explanation is that the area we recorded from was slightly different in both animals and the proportion of units may change depending on the relative location of the arrays. To test this hypothesis, we wanted to assess whether there was significant spatial clustering and/or even a difference therein. We mapped the preferred location or motion direction of an electrode onto its cortical position (the left panels in Fig.  3D,E show representative example sessions for each monkey). To examine whether neurons with similar preferences were anatomically clustered (Fig. 3D,F, right panels), we used Moran's I, a metric of spatial autocorrelation, and compared it to the 95th percentile range of chance values obtained by shuffling the electrodes' preference labels 1000 times. Although some neurons with similar preferences were isolated from nearby electrodes, in general the analysis revealed no significant clustering neurons selective for attended location or stimulus motion direction in the areas covered by the arrays in the two monkeys. The lack of spatial clustering as well as the similar position (i.e., position of the arrays relative to the sulci) led us to favor the training history hypothesis to explain the differences in neuronal selectivity between animals.
To examine the population activity profiles, we pooled the responses of units selective for the attended location and motion direction. Because units could be selective for one location (i.e., ipsilateral or contralateral) or motion direction (i.e., up or down), we pooled units after aligning their responses to their preferred direction or location. To gauge the latency and magnitude of the difference in response to preferred and nonpreferred stimuli, we performed a paired t test on the responses of the selective units using time bins of 20 ms. The latency was determined as the first of five consecutive significant bins (p Ͻ 0.05) and the magnitude as the mean t-value across all bins. The difference between responses to the preferred (red) and nonpreferred (blue) target location is more pronounced in monkey R than in monkey S (t ϭ 12.36 Ϯ 5.09 i and 4.15 Ϯ 1.61, i respectively; Fig. 4A,B, left panels). On the other hand, the difference in the responses to the preferred (orange) and nonpreferred (green) direction seems to be less distinct between the two animals (t ϭ 7.14 Ϯ 2.98 j and 11.41 Ϯ 3.57 j for monkey R and S, respectively; Fig. 4A,B, right panels). One detail in this figure is that in the two animals, the discrimination between motion directions (time when the responses to the preferred and nonpreferred directions diverge; Fig. 4A,B, left panels, black arrow; 320 and 260 ms after stimulus onset for monkey R and S, respectively) appears to start earlier than the discrimination between the attended and unattended locations (time where responses to attended and unattended locations start diverging; Fig. 4A,B, right panels, black arrow; 228 and 48 ms after color cue onset in monkey R and S, respectively). This is likely because information about the stimulus direction was available to the animal earlier than information about the target location. In other words, the animals probably identified the motion direction earlier and then directed their attention to the target and ignored the distractor.
These results indicate that average population responses are modulated by attended location and the stimulus motion direction in both animals. However, the degree to which the populations do so is different between the animals, particularly for the case of spatial attention. This is also concordant with the higher behavioral performance in monkey R relative to monkey S.

Decoding attended location and motion direction from neuronal ensembles
We used a binary linear classifier, support vector machine (Cortes and Vapnik, 1995;SVM), to decode spatial attention and the stimuli's motion direction independently from ensembles of simultaneously recorded task-related units in each session. We used the SVM as a proxy to assess the ability of a downstream entity (single neuron or neuronal ensemble) to read out the information from the recorded LPFC neuronal ensemble. Decoding accuracy was assessed using a cross-validation procedure in which 90% of the data were used for training and the remaining 10% for testing (10-fold cross-validation; see Methods).
The amount of information encoded by a neuronal ensemble has been shown to vary with the number of units in the ensemble (Tremblay et al., 2015). To investigate this continued represent 95% confidence intervals. Shading indicates proportion found in data with shuffled trial labels. Asterisks mark significant differences in proportions compared to chance proportions ‫,ءءء(‬ p Ͻ 0.001; ‫,ءء‬ p Ͻ 0.05, 2 -test). The majority of selective cells found in monkey R were location selective, and the majority of selective cells found in monkey S were direction selective. B, Timeline of the monkeys' training. At the time of recording the task presented in this paper (ColorScale Task 2), monkey R had received exclusive training on a spatial attention task involving a color scale (Lennert and Martinez-Trujillo, 2011;. Despite a 2.5-yr pause between the two tasks, monkey R performed the task very well. After its initial training, monkey S was extensively trained on delayed-matchto-sample tasks involving motion directions (for example, Mendoza-Halliday et al., 2014). Monkey S had a Ͼ4-yr break from a color scale task, during which it became an expert for motion direction tasks. C, We tracked the proportion of selective electrodes/channels per recording session in each animal to see whether the distributions were approximately stable over time. To test the spatial clustering hypothesis, each electrode's categorical selectivity of on example session was mapped into the array for monkey R (D) and monkey S (E). Left panels: colors indicate whether units on an electrode were selective. White channels had no activity; black channels indicate unwired electrodes. Right panels: magnitude of spatial clustering of preferred stimuli in monkey R (D) and S (E). Black line depicts Moran's I (metric of spatial autocorrelation) calculated over increasing spatial scales. Gray shaded area represents chance values.
New Research issue, we used two different procedures of progressively building (adding units to) neuronal ensembles and obtained a decoder performance value for each ensemble size and composition. First, we decoded from each single unit independently and sorted the units based on their performance from highest to lowest. Then, we built neuronal ensembles (e.g., n ϭ 2, 3, 4, . . . 94) by iteratively adding the next-best-performing unit (Fig. 5A, left, best single unit ensemble or BSU method). Second, we used a variation of this method: in each iteration, instead of adding the next-best-performing single unit to the ensemble, we added the unit that maximized the decoder's performance when added to the ensemble. In this procedure, the existing ensemble of size n is paired with each one of the remaining units that have not been added, and a value of decoding accuracy for all n ϩ 1 ensembles is obtained. The n ϩ 1 ensemble that yielded the highest decoding was chosen. Then we kept this best-performing ensemble of n ϩ 1 units and repeated the procedure (Fig.  5A, right). Note that applying this procedure does not search for the entire space of possible n-size ensembles, which was computationally unreachable in reasonable time. We refer to this building method as the best ensemble (BE; see also Methods).
The difference between these two procedures is that in the BSU procedure, performance of individual units dic-tates which unit is added to the ensemble. If the main factor that determines performance is the coding properties of individual units, this procedure should yield the best decoding performance with fewer units. On the other hand, in the BE procedure, the contribution of a unit to information coding by the entire ensemble determines its ensemble membership. In other words, the BE procedure takes into account not only the performance of the added unit considered in isolation but how the added unit interacts with the rest of the ensemble. If the tuning properties of individual units, and not the interactions, solely determine the ensemble performance, these two procedures should lead to identical ensembles of n units in each iteration (for all ensemble sizes). Fig. 5B,C shows the results of two example sessions when decoding attended location in monkey R and motion direction in monkey S, respectively. The green lines indicate the decoding performance using the BE building method as a function of ensemble size, and the magenta lines indicate it for the BSU building method. The black and gray lines denote chance levels, obtained by shuffling the trial labels, for the BE and BSU ensembles, respectively. The circular markers show the individual units' decoding performances in the order in which they were added to the ensembles. Colored markers indicate whether a unit was selective for either a target (attended) location (blue) or motion direction (yellow). For the BSU building method, we see, as anticipated, a steady decline in individual units' decoding performances (red lines), whereas for the BE method, sometimes low performing/untuned units were added before high performing/tuned ones (green lines). Importantly, both ensemble methods consistently yield higher decoding accuracy than the best single-unit decoding accuracy (first unit on the x-axis or ensemble with n ϭ 1). Note that the lines converge at the maximum ensemble size: for this n, the ensembles are the same and hence the decoding accuracy should be similar.
Next, we confirmed these results at the population level (Fig. 6). Because the number of units we recorded varied between sessions, we considered the minimum ensemble size for our analyses (n ϭ 94 in monkey R and n ϭ 61 in monkey S), truncated the data accordingly, and plotted the average across sessions. We compared the decoder performance of the different ensemble types (magenta for BSU and green for BE) with the average performances based on surrogate data in which the trial labels were randomized 10 times and new ensembles for each shuffle were built (gray for shuffled BSU and black for shuffled BE) and with ensembles for which the trial labels had been permutated within condition 100 times, i.e., removing the simultaneity of recordings and effectively removing noise correlations (dashed lines). The lines inside the table on top of the plots indicate ensemble sizes that were significantly different from each other for the different comparisons (BE vs. shuffled, BSU vs. shuffled, BE vs. decorrelated BE, and BSU vs. decorrelated BSU).
When we decoded target location, the performance of all BSU sizes for monkey R (Fig. 6A) was significantly higher than chance performance (exact test, p Ͻ 0.01 k ). Similarly, the performance of 93 of 94 BE ensemble sizes was also higher than chance (exact test, p Ͻ 0.01 l ). Removing noise correlations from the ensembles reduced Figure 5. Ensemble building procedure. A, We ranked individual units based on their information content, as assessed by SVM and then, starting with the most informative unit, either iteratively added the next best unit to the ensemble (BSU procedure) or looped through the remaining units to identify which pair yielded the highest performance, then looped through the remaining units to identify the best trio, etc. (BE procedure). B, Example session from monkey R when decoding target location. The decoding accuracy in percentage is shown as a function of ensemble size for both building procedures (green, BE; magenta, BSU). Decoding accuracy expected by chance is shown in gray for BSU and in black for BE. Circular markers indicate the individual units' decoding accuracy and the order in which they get added to the ensemble. Colored markers mark selectivity for the decoded feature. The red line connects the markers that make up the BSU ensemble, and the green line connects those that make up the BE ensemble. C, Example session from monkey S when decoding motion direction. decoding accuracy for smaller ensembles (n Ͻ 40) using the BE method (paired t test, 0.0091 Ͻ p Ͻ 0.0489 m ) and had no effect on BSU ensembles (p Ͼ 0.2074 n ). For monkey S (Fig. 6B), we found the same trend across all BE ensemble sizes, although the exact tests failed to reach statistical significance (p Ͼ 0.08 p ). The same was true for the BSU ensembles (p Ͼ 0.2400 o ). Removing noise correlations had similar effects as in monkey R: it decreased decoding performance in the majority of BE ensembles (57/61, paired t test, 1.59 ϫ 10 Ϫ4 Ͻ p Ͻ 0.0495 q ) and only in few BSU ensembles (2/61, 0.0373 Ͻ p Ͻ 0.0499 r ). One reason we did not see a statistically significant difference between the data-based ensembles and those built on random data may be that the ensembles had different compositions that might have affect the comparison negatively. Nonetheless, our data indicate that even in monkey S, there is a systematic trend that relatively small and heterogeneous neuronal ensembles can encode stimulus information, even when the proportion of single units tuned for that feature in the ensemble is low. This may be due to the linear classifier weighing the contribution of different neurons to information coding unequally, as well Figure 6. Decoding from neuronal ensembles using SVM. We decoded target location (A, B) and motion direction (C, D) during the postcue epoch from monkey R and S, respectively. The SVM's performance (left ordinate) is shown as a function of ensemble size (abscissa). We truncated the plots to show only the performance for the minimum number of units across sessions. Green lines indicate decoding from BE ensembles, and magenta indicates BSU ensembles. Average decoding performance from ensembles built out of shuffled data are shown in black (BE) and gray (BSU). Dashed lines represent decoding from BE and BSE ensembles when noise correlations had been removed by shuffling trials within the same condition. Shading over the lines indicates SEM (Ϯ) for each ensemble size. The lines in the table on top indicate which ensemble sizes were significantly different from each other (p Ͻ 0.05) for the indicated comparisons. Circular markers indicate the individual units' decoding performance once they got added to the ensembles. The right ordinate indicates decoding performance for the maximum ensemble sizes averaged across sessions. We compared median SVM performance of the ensembles that had produced the highest decoding accuracy for each stimulus class independently in monkey R (E) and monkey S (F). Error bars represent standard deviations across recording sessions.
as to the ability of the classifier to use information about the correlation structure of the neuronal ensemble. Note that the curve corresponding to the BE method in monkey S (green) reaches a maximum and then decreases as more neurons are added. This is likely because of the finite number of trials in our sample and the relatively low selectivity of neurons for the attended location. A small number of trials yields a too-high feature-to-instance ratio (i.e., neuron-to-trial). This leads to overfitting and consequently a decay in mean decoding accuracy (Trunk, 1979;Guyon and Elisseeff, 2003;Kanitscheider et al., 2015). In fact, the average decoding performance for the maximum possible BE ensemble size per recording session is not significantly different from chance (Fig. 6B, right ordinate, exact test, p ϭ 0.8000 s ).
When examining decoding of motion direction, almost all BSU ensemble sizes for monkey R were significantly above chance (Fig. 6C, 89/94, exact test, p Ͻ 0.01 t ), and all ensemble sizes in monkey S were (Fig. 6D, exact test, p Ͻ 0.01). u The BE also yielded significantly abovechance performances across 91 of 94 ensemble sizes in monkey R and across 60 of 61 ensemble sizes in monkey S (exact tests, p Ͻ 0.01 v and p Ͻ 0.01, w respectively). Removing noise correlations again had a decreasing effect on most of the BE ensembles in both animals (paired t test, 1.43 ϫ 10 Ϫ4 Ͻ p Ͻ 0.0469 x and 6.19 ϫ 10 Ϫ5 Ͻ p Ͻ 0.0458 y for monkeys R and S, respectively) and yielded mostly no change in the BSU ensembles.
When decoding attended location using the BE method, we found higher decoding accuracy across all ensemble sizes in monkey R than in monkey S (unpaired t test, 5.69 ϫ10 Ϫ7 Յ p Յ 2.99 ϫ10 Ϫ4 z ). In contrast, when decoding motion direction using the BE, performance was better in monkey S relative to monkey R (unpaired t test, 2.74 ϫ10 Ϫ6 Յ p Յ 0.0080 bb ). The same was true when examining the BSU ensembles (unpaired t test, 1.05 ϫ10 Ϫ5 Յ p Յ 4.02 ϫ10 Ϫ4 aa for location decoding and 1.46 ϫ10 Ϫ5 Յ p Յ 0.0191 cc for direction decoding). These results follow the same trend as the differences in proportions of selective cells between animals (Fig. 3A). Notably, we were able to decode more information from either ensemble type than from the best single units (dots in all panels). This corroborates that ensembles encode substantially more information than the best single units and therefore than any measurement derived from statistics based on single-unit performance (e.g., average, median, or maximum performance across single units).
To more closely examine the differences in decoding performance between the BSU and BE ensemble-building methods, we assessed the maximum performances linked to each ensemble building method for both features (see Tables 2 and 3). It is important to note that in this analysis, the maximum performance is not necessarily equivalent to the performance of the full ensemble (see right ordinate axis in Figure 6A-D). Here, we defined maximum decoding performance as the average maximum decoding accuracy across recording sessions in the plotted ensemble sizes. The full ensemble does not necessarily yield the highest decoding performance, because our training/testing set has a finite number of trials. As mentioned earlier, training on finite data can lead to overfitting and suboptimal decoding accuracy (Trunk, 1979;Guyon and Elisseeff, 2003;Kanitscheider et al., 2015). The maximum performance can be considered as a low boundary estimate in the information encoded by the neuronal ensemble we recorded from.
When decoding target location in monkey R (Fig. 6E), the best BE returned slightly higher median decoding accuracy than the best BSU (89.19% Ϯ 3.07 and 88.00% Ϯ 5.26, respectively), yet this difference was not statistically significant (Wilcoxon signed-rank, p ϭ 0.1250 dd ). Similarly, when decoding motion direction, the best BE yielded a higher median accuracy (72.62% Ϯ 3.42) than the best SU (70.10% Ϯ 4.57), but this difference again did not reach statistical significance (Wilcoxon signed-rank, p ϭ 0.1250 ff ).
When decoding target location in monkey S (Fig. 6F), the best BE generated higher median decoding accuracy than the best BSU (60.93% Ϯ 2.05 and 57.32% Ϯ 2.26, respectively). This difference did not reach statistical sig- Detailed list of what the maximum decoding accuracy was and at which ensemble size it was achieved, measured separately for the BE and BSU methods. Because the estimates are noisy and the decoding performance saturates, the ensemble sizes at which 90% of the maximum performance were achieved are also listed. The data are divided up into the individual sessions recorded from each animal. Decoding performances between ensemble types as well as ensemble sizes were compared across the nine sessions using Wilcoxon signed-rank tests (p Ͻ 0.05).
nificance (Wilcoxon signed-rank, p ϭ 0.0625 ee ). The same was true for the decoding of motion direction (median best BE ϭ 89.38% Ϯ 1.66, median best BSU ϭ 87.10% Ϯ 2.21; Wilcoxon rank-sum, p ϭ 0.0625 gg ). In summary, when we analyzed each animal separately, we found a small trend for the median decoding accuracy to be higher when using the BE ensemble building method, but it did not reach statistical significance. However, when pooling data across animals (n ϭ 9, Tables 2 and 3, compare second and fourth columns) we found that BE method yielded significantly higher maximum performance than the BSU method for both attended location and motion direction (Wilcoxon signedrank test, p ϭ 0.0039 hh for target location and p ϭ 0.0039 ii for motion direction).

Effect of ensemble size on decoding performance
One interesting observation in Fig. 6A-D is that as ensembles increase in size, there is a fast increase in performance for small sizes that seems to reach an asymptote for ensembles of ϳ20 units or less. This suggests that maximum classification performance can be achieved with ensembles that are substantially smaller than the largest possible ensemble size. It also suggests that the information brought about by adding more neurons to an ensemble can be negligible. To closely examine this issue, we computed the ensemble sizes at which the maximum performance was achieved, as well as the ensemble size at which 90% of that maximum performance was achieved for both the BE and the BSU methods (Tables 2  and 3).
In some ensembles, the number of neurons needed to achieve maximum performance is smaller than the number of neurons needed to achieve 90% of that performance (Table 2, monkey Sd5). This is because the best decoding performance as achieved by a single neuron, and adding more neurons lowered the performance. These ensembles were more the exception than the rule. They usually have lower performance than the rest of the ensembles, which can be explained by the progressive addition of noisy neurons that are poorly tuned and detrimental to the correlation structure.
We pooled the indices across monkeys and across direction and location ensembles (Tables 2 and 3) and compared ensemble sizes for maximum performance and 90% of maximum performance. The latter was done to have a second estimate that does not depend on a single measurement that could be due to a peak in performance at a given ensemble size. The BE method reached maximal decoding accuracy at smaller ensemble sizes than the BSU method (Wilcoxon signed-rank, p ϭ 0.0198 jj ). The BE method also yielded 90% of maximum decoding accuracy at smaller ensemble sizes compared to the BSU method (Wilcoxon signed-rank, p ϭ 0.0293 kk ). Overall, our results indicate that BE ensembles reach their best decoding performances at smaller ensemble sizes. This is also in line with our previously mentioned result (Fig. 6B), which indicated that significant differences between BE and the shuffled counterparts were found at relatively small ensemble sizes.

Decoding accuracy and behavior
One important question is whether the information encoded by neuronal ensembles in the LPFC contributes to task performance. If it does, one would anticipate that fluctuations in ensemble performance correlate with fluctuations in the animals' performance. We decoded the attended location using the neuronal activity from the BE ensembles that had generated the maximal decoding performance for different trial outcomes: all trials, hit trials only, relevant error trials [misses (no response) and false positives (response to change in distractor stimulus)], and false positives only. In monkey R, the average decoding performance across all trials was 83.11% (SD 4.79; Fig.  7A, black bar), which was significantly higher than chance decoding (exact test, p Ͻ 0.01 ll ) but slightly lower than the performance using hit trials only (87.40% Ϯ 4.27, white bar; exact test to compare to chance, p Ͻ 0.01 nn ) whereas the decoder performed at chance level when using error trials (50.72% Ϯ 7.80, exact test, p ϭ 0.4300, pp dark gray bar) or only false-positive trials (47.09% Ϯ 10.24, exact test, p ϭ 0.4600, rr light gray bar).
When we decoded target location from the data obtained from monkey S, we actually did not reach higher- We repeated this analysis for the feature motion direction. In monkey R, the decoder's performance was above chance when considering all trial outcomes (Fig. 7C, black bar) and correct trials only (white bar; 66.68% Ϯ 3.71 and 69.01% Ϯ 4.15, respectively; exact test, p Ͻ 0.01 tt,vv ). When considering all error trials (dark gray bar), the performance dropped to chance level (53.38% Ϯ 4.17; exact test, p ϭ 0.5500 xx ). Similarly, when decoding from falsepositive trials, performance was at chance level (52.01% Ϯ 13.23; exact test, p ϭ 0.3850 zz ).
In monkey S, the decoding accuracy remained high regardless of behavioral performance (Fig. 7D). When decoding from all trials, the mean performance was 86.03% Ϯ 2.16 (black bar). When using hit trials only (white bar), there was a slight increase to 86.98% Ϯ 1.79, and when using error trials (dark gray bar), it dropped to 82.52% Ϯ 3.61, which is very similar to the decoding performance when considering false positives only (83.08% Ϯ 4.82, light gray bar). In all cases, the decoder performed above chance level (exact test, p Ͻ 0.01 uu,ww,yy,aaa ).
These results indicate that in both animals there is a similar relationship between the behavior of the monkey and the encoding efficiency of LPFC neuronal ensembles, with the effect being more pronounced for the allocation of spatial attention.

Discussion
Our data demonstrate that individual neurons and neuronal ensembles in the LPFC encode information regarding the attended spatial location and the motion direction of a stimulus. Using linear classifiers, we demonstrate that neuronal ensembles in the LPFC encode more information about these two variables than individual single units. The size and composition of the neuronal ensembles influence the amount of encoded information. Finally, our data show that the performance of neuronal ensembles at encoding information on a single-trial basis is correlated with the animals' behavioral performance.

Encoding of spatial attention and stimulus features in LPFC
Single neurons in the LPFC encode the location of visuospatial attention (Lennert andMartinez-Trujillo, 2011, 2013). However, how this translates into the ability of neuronal ensembles to encode the spatial location as well as nonspatial features of visual stimuli during an attentional task has not yet been extensively investigated. In our study, rhesus monkeys directed their attention to one of two moving RDPs based on a color cue and then maintained attention on the target location (spatial atten- Figure 7. Relationship between decoding accuracy and monkeys' behavior. We used the BE ensemble with the highest decoding accuracy to decode target location from the neuronal activity during different trial outcomes (black, averaged across all trials; white, correct trials only; dark gray, error trials; light gray, false positives only) independently for monkey R (A) and monkey S (B). Error bars represent standard deviations across recording sessions. Asterisks mark significant differences in mean accuracy compared to chance decoding accuracy ‫,ءءء(‬ p Ͻ 0.001, ‫,ء‬ p Ͻ 0.05; exact test). We repeated this analysis for motion direction in monkey R (C) and monkey S (D). tion) until they detected a motion direction change in the target, ignoring any changes in the distractor.
Our data show that the encoding of attended location and stimulus motion direction (two task-relevant variables) is stronger in neuronal ensembles than in single units. This may appear to be a trivial result, since neuronal ensembles can average noise and perform better than their individual components. However, this is not necessarily true if single units have perfect classification performance (e.g., 100%). Additionally, the firing of units in the ensembles is not uncorrelated-in fact, it has been shown that in visual cortices, the correlation structure of an ensemble can impair performance and that attention reduces this effect by decorrelating neuronal firing (Cohen and Maunsell, 2009;Mitchell et al., 2009). Interestingly, in monkey S, for the encoding of the attended location, when removing correlations, the ensemble performance drops below the performance of the best single unit for some ensemble sizes (dashed lines drop below the first circular marker in the y axis). This agrees with a recent report of noise correlation improving the coding of working memory in the lateral prefrontal cortex (Leavitt et al., 2017). This suggests that the coding properties of LPFC neuronal ensembles and the amount of encoded information cannot simply be estimated from the information obtained from measurements of single-unit activity alone (e.g., averaging the performance of single units or choosing the maximum performance across units).
Attention is the enhanced processing of behaviorally relevant information at the expense of distractors (Treue and Trujillo, 1999;Martinez-Trujillo and Treue, 2004;Patzwahl and Treue, 2009). Spatial attention involves the allocation of attention to a relevant visual location in our environment (Posner, 1980). The effects of visual attention on neuronal responses appear to get stronger as one moves up the processing hierarchy (Treue, 2001). By the time visual signals reach the LPFC, attentional filtering is strong and arises early after a cue onset (Everling et al., 2002;Buschman and Miller, 2007;Lennert and Martinez-Trujillo, 2011;Squire et al., 2013). To guide the allocation of attention, information about visual stimuli and their behavioral relevance must be integrated somewhere in the brain. We hypothesize that area 8A of the LPFC is a likely candidate based on its anatomic and functional properties, e.g. bilateral representation of the visual field (Lennert andMartinez-Trujillo, 2011, 2013;Tremblay et al., 2015;Bullock et al., 2017), selectivity for stimulus features (Hussar and Pasternak, 2009;Mendoza-Halliday et al., 2014), and connectivity to other prefrontal areas and sensory cortices (Petrides, 2005). In our study, we are able to corroborate previous reports of an involvement of area 8A in spatial attention (Reser et al., 2013;Tremblay et al., 2015), while also demonstrating that other taskrelevant parameters such as the direction of motion of a stimulus are encoded as well. Indeed, a recent report has provided evidence of a contribution of area 8A to the coding of attended nonspatial features (Bichot et al., 2015). Our study does not specifically address coding of nonspatial attention by neurons in area 8A but it does show that task-relevant stimulus features are encoded, which is likely necessary for coding of signals related to feature-based attention (such as to motion direction).
Specifically, we show that single units encode both the attended location and motion direction of stimuli. Furthermore, the populations of neurons encoding these two variables seem at least partially segregated within the LPFC, with a small proportion of units representing both types of information.

Single-unit selectivity in LPFC
There is ample evidence that a large proportion of prefrontal neurons show task-related activity, ranging from ϳ40% (Fusi et al., 2016) to almost 100% (Duncan, 2001). Specifically, many prefrontal neurons encode taskrelevant parameters (Rao et al., 1997;Donahue and Lee, 2015), such as information about the stimulus properties, memory components, or reward size. These neurons may not show classic sensory neural tuning, but could selectively respond to the most relevant information for the current task (Yantis, 2008;Fusi et al., 2016).
In our study, a large proportion of neurons preferentially responded when the attended stimulus was presented in the ipsilateral or the contralateral hemifield relative to the recording site. We also found many neurons to respond more strongly to one motion direction than the opposite motion direction. In general, our results agree with previous findings of large proportions of prefrontal neurons exhibiting task-related activity and many single units having preferences for pertinent motion parameters (Hussar and Pasternak, 2009;Mendoza-Halliday et al., 2014). Although we did not explore the full range of tuning, the selectivity of our single LPFC neurons does not seem to substantially differ from the one found in visual neurons in area MT for similar parameters (e.g., motion direction).
Previous studies have reported that single units in the LPFC show mixed selectivity (Rigotti et al., 2013;Fusi et al., 2016). This is thought to increase the computational power of neuronal ensembles (Miller, 2013) by increasing the dimensionality of the neural representations. We did not find a substantial number of units selective for both attended location and the stimuli's motion direction, the relevant parameters in our task. One possibility for this discrepancy is that in we did not employ a large enough number of parameters and our task may not have been complex enough in that respect. Furthermore, our animals were extensively trained in the task, which may have biased the selectivity of neurons toward the relevant task variables. In favor of this hypothesis, we found that the differences in selectivity between the recorded populations in the two animals correspond to their training history. The latter also suggests that although LPFC neuronal representations seem to be flexible, prolonged exposure to certain tasks and stimuli may leave a permanent "blueprint" in those selectivities. This may be the basis for specialization after prolonged training. However, this issue needs further investigation.
One question arising from these results is whether and how the proportion of selective neurons in LPFC area 8A could change with training, as well as how long is needed for this to happen. Unfortunately, microelectrode array recordings are not identical from session to session; therefore we could not track most neurons over different sessions. However, we did not find the proportion of selectivities to change dramatically over recording sessions. One possibility is that the electrodes penetrate the surface orthogonally, and thus parallel to cortical columns; so for a single electrode, even if the depth position and therefore the recorded neurons vary from day to day, the tuning of the neurons likely remains the same.
In summary, LPFC neuronal ensembles encode attended spatial locations as well as nonspatial visual features (motion direction) with significantly higher accuracy than individual neurons. This result suggests that the LPFC contains activity maps of these two variables that can be read out by downstream or upstream areas; it also highlights that the information contained in such maps cannot be inferred from the selectivity of individual units alone. Overall, our results emphasize the importance of simultaneous measurements of neural activity in behaving animals.