Cortical Pyramidal and Parvalbumin Cells Exhibit Distinct Spatiotemporal Extracellular Electric Potentials

Abstract Brain circuits are composed of diverse cell types with distinct morphologies, connections, and distributions of ion channels. Modeling suggests that the spatial distribution of the extracellular voltage during a spike depends on cellular morphology, connectivity, and identity. However, experimental evidence from the intact brain is lacking. Here, we combined high-density recordings from hippocampal region CA1 and neocortex of freely moving mice with optogenetic tagging of parvalbumin-immunoreactive (PV) cells. We used ground truth tagging of the recorded pyramidal cells (PYR) and PV cells to construct binary classification models. Features derived from single-channel waveforms or from spike timing alone allowed near-perfect classification of PYR and PV cells. To determine whether there is unique information in the spatial distribution of the extracellular potentials, we removed all single-channel waveform information from the multichannel waveforms using an event-based delta-transformation. We found that spatiotemporal features derived from the transformed waveforms yield accurate classification. The extracellular analog of the spatial distribution of the initial depolarization phase provided the highest contribution to the spatially based prediction. Compared with PV cell spikes, PYR spikes exhibited higher spatial synchrony at the beginning of the extracellular spike and lower synchrony at the trough. The successful classification of PYR and PV cells based on purely spatial features provides direct experimental evidence that spikes of distinct cell types are associated with distinct spatial distributions of extracellular potentials.


Introduction
Brain circuits are composed of different cell types with distinct roles in neuronal network dynamics (Tremblay et al., 2016;Tasic et al., 2018). Since the days of Brodmann (1909) and Ramón y Cajal (1909), it has been recognized that neurons in different brain regions, nuclei, and layers may have different morphologies (McKay and Turner, 2005;Ascoli et al., 2007;Clascá et al., 2012;Jones, 2012;Fujishima et al., 2018). Within a brain region, neurons that vary in the type of output (e.g., excitatory, inhibitory) and postsynaptic targets (e.g., somatic, dendritic, or axonal) exhibit distinct morphology Klausberger and Somogyi, 2008;Kepecs and Fishell, 2014). Histologic and in vitro studies showed that the size, form, and orientation of the soma, the dendritic tree, and the axonal arbor all vary between cells that express different genes and neurochemical markers (Monyer and Markram, 2004;Zeisel et al., 2015;Zeng and Sanes, 2017).
In behaving animals, extracellular recording techniques allow simultaneous recording of electrical potentials generated by multiple neurons and sampling every neuron at several spatial locations (Buzsáki, 2004;Shobe et al., 2015;Jun et al., 2017;Hong and Lieber, 2019;Steinmetz et al., 2021). Multisite recordings with well defined electrode geometry open the door to blind cell type classification based on electrophysiological properties. The relation between morphology (structure) and neuronal cell type (function) is well established in vitro and using postmortem immunohistology (McCormick et al., 1985;Freund and Buzsáki, 1996;Somogyi and Klausberger, 2005). Extensive modeling work has been dedicated to understand the relationship between the spatial distribution of extracellular electrical potentials resulting from spikes and neuronal morphology (Rall, 1962;Gold et al., 2009;Einevoll et al., 2013). However, the relationship between the spatial distribution of extracellular electrical potentials and neuronal cell types in the intact brain remains unexplored.
To determine whether spikes of different cell types give rise to distinct distributions of extracellular potentials, we focus here on pyramidal cells (PYR) and parvalbumin-immunoreactive (PV) cells in neocortex and hippocampal region CA1. PYR have pyramid-shaped somata and vertically oriented, apical and basal (polar) dendritic trees (Spruston, 2008). In contrast, PV (mainly basket) cells have less polarized dendritic trees and axonal arbors that extend horizontally (Maccaferri et al., 2000;Pawelzik et al., 2002;Ganter et al., 2004). Furthermore, PYR and PV cells exhibit distinct spatial profiles of ion channels. While similar somatodendritic gradients are observed for Na 1 channels, K 1 channels exhibit a steeper decreasing gradient along dendrites farther from the soma in PYR, compared with PV cells (Magee and Johnston, 1995;Johnston et al., 2000;Hu et al., 2010). To go beyond descriptive structure-function relations, we hypothesized that PYR and PV cells could be classified based solely on spatial information acquired from freely moving mice using high-density probes. Previously, classification of PYR and interneurons in neocortex and hippocampus was based on waveform features (Henze et al., 2002;Barthó et al., 2004;Cardin et al., 2009;Stark et al., 2013;Mendoza et al., 2016;Yu et al., 2019), firing patterns (Taira and Georgopoulos, 1993;Kobayashi et al., 2019;Troullinou et al., 2020), or combinations thereof (Csicsvari et al., 1998;Frank et al., 2001;Viskontas et al., 2007). However, cell type classification based on spatial features per se was never attempted.
Here, we used connectivity-based and optical tagging to establish a dataset of labeled PYR and PV cells from neocortex and CA1 of freely moving mice. To tune the classification procedure and determine baseline performance, we first created classification models that used features based on single-channel waveforms or on spike timing. Using a chunking-based data augmentation method, the models achieve near-perfect performance. Next, we devised an event-based d -transformation method to conserve only purely spatial information and derived spatial features from the multichannel recordings. Models trained on spatial features derived from the transformed waveforms yield accurate classification. The findings suggest that differences between PYR and PV cell neuronal morphology, connectivity, and ion channel distributions are reflected in the extracellular potentials in a consistent manner.

Experimental design
The dataset used in this study has been previously analyzed (Stark et al., 2013).

Recordings and photostimulation
After allowing the animals to recover for at least 48 h, recordings were initiated. Recordings were conducted in the home cage during spontaneous behavior. Mice were tethered by one ultralight cable for multichannel neuronal recordings and a second cable for multichannel optical stimulation. Recordings were conducted as the probe was moved gradually from the neocortex to the CA1 pyramidal cell layer. At each location in the brain, neuronal activity was inspected for spontaneous spiking activity, and, if encountered, a baseline period of at least 15 min was recorded followed by photostimulation (peak driving current, 60 mA; mean 6 SD peak light power, 35 6 7 mW; 50-70 ms pulses). Signals were generated by custom code written in MATLAB (MathWorks), converted by a digital signal processor (model RX5 and/or RX6,TDT) to voltage signals, and fed into a linear 16-channel current source. After each session, the probe was either left in place or moved (35-70 mm steps), and the brain was allowed to settle overnight.
For connectivity-based tagging, we tagged units that participated as a reference in a cross-correlation histogram (CCH) that exhibited a significant (p , 0.001, Bonferronicorrected Poisson test) peak in the monosynaptic time range (0-5 ms) as excitatory cells (424 of 980 units). Units that exhibited a significant trough in the monosynaptic time range were tagged as inhibitory (21 of 980 units). For optogenetic-based tagging, units that exhibited a significant (p , 0.001, Poisson test) increase in spiking rate during 50-70 ms DC pulses given on the recording shank were tagged as optically activated cells; 98 of 980 (10%) units conformed to the criterion. Next, we labeled units based on the three tags. Units tagged exclusively as excitatory were labeled as PYR cells (420 units), whereas units that were optically activated or inhibitory (but not excitatory) were labeled as PV cells (102 units). The remaining units (458 of 980, 47%) were not labeled and were discarded from the dataset. Ten of the labeled units (PYR 9 units; PV, 1 unit) were recorded using seven instead of eight channels and were therefore discarded as well, yielding a final dataset that included 512 tagged units (PYR cells,411 units;PV cells,101 units;Fig. 1). Of the 101 units referred to as "PV cells," 93 were optically activated (92%), 13 were both optically activated and inhibitory, and 8 units were only inhibitory. Spike width, firing rate, and bursting behavior were similar for the inhibitory and the optically tagged PV cells. Thus, the 8 inhibitorytagged PV-like cells were grouped with the 93 optically activated cells, and the entire group was denoted as PV cells. A majority of the units (449 of 512 units) were recorded from CA1. The median [interquartile range (IQR)] number of spikes per unit was 8368 929] for PYR,and 66,850 [13,802] for PV cells.

Feature extraction
The shape and timing of the recorded spikes were used to extract a total of 34 features. We derived features of the following three modalities: waveform-based features, derived from a single channel (n = 8); spike-timing features, ignoring the spike waveform (n = 8); and spatial features, derived from the multichannel waveforms (n = 18). All features were based exclusively on spontaneous events that occurred in the lack of any light stimuli. For every spike, the waveform was extracted for 32 samples (1.6 ms) on every channel of the recording shank. The limited duration of the spikes places an upper bound on the classification performance of waveformbased and spatial models. For every channel separately, the waveform was averaged over spikes, and the mean waveform was calculated and upsampled by eightfold using Fourier interpolation to increase the temporal resolution. Since the average and the Fourier transform are linear operators, the order of the two steps does not affect the outcome.

Single-channel feature extraction
For the waveform-based features, the channel with the largest trough-to-peak (TTP) magnitude was denoted as the "main" channel. The waveform in the main channel was scaled by dividing all values by the minimal value (i.e., at the trough). Scaling was performed to remove information about the sampling process (e.g., electrode impedance and neuron-electrode distance). When the absolute value at the trough was smaller than at the peak, waveforms were inverted (multiplied by À1). The outcome is a 256-element vector limited to the À1 to 1 range, with at least one value at À1. To provide a rich description of the waveform, a total of n = 8 waveform-based features were extracted from the main channel (Fig. 2, Table 1): four from the waveform itself, one from the first temporal derivative, and three from the second temporal derivative. For every feature, we compared the distribution of values between all available PYR cells (n = 411) and PV cells (n = 101) and calculated effect sizes. The specific measure of effect size used was the nonparametric estimator for common-language effect size (A w ; Ruscio, 2008) which exhibits a smaller bias compared with alternatives (Li, 2016). A w estimates the probability that a random sample from one distribution is larger than a random sample from a second distribution. Disregarding the direction of the effect, A w is thus limited to the 0.5-1 range, taking a value of 0.5 when the two distributions are fully intermixed, and 1 when the two distributions do not overlap at all. All eight features (100%) exhibited a consistent difference (0.64 A w 0.98; p , 0.05, U test; Table 1). Thus, all waveform-based features are potentially useful for classification.
The addition of a redundant feature would contribute no additional information to the classification process. To estimate relations between features, we first computed rank (Spearman's) correlation coefficients (CCs) between every possible pair of waveform-based features (Extended Data Fig. 2-1A). To go beyond monotonic relations, we estimated mutual information (MI) between distributions of pairs of features (Timme and Lapish, 2018). If a feature had ,10 unique values, the feature was considered naturally discrete; only the spatial dispersion (SPD)-Count feature (see Spatial feature extraction subsection below) was naturally discrete. The use of 10 bins limits the maximal information to log 2 10 = 3.3 bits. Deviation from chance level was determined using a permutation test, creating a null distribution based on 5000 iterations of shuffled pairs and evaluating the tail of the null distribution above the observed MI. We found that some features were only weakly correlated with others (e.g., the Break-measure, median [IQR] absolute CC was 0.19 [0.076-0.24]), suggesting that independent information could be gleaned by using the feature. Alternatively, a weakly correlated feature may be dominated by noise. However, the possibility is unlikely since the Break-measure differed for the PYR and PV groups (A w = 0.64; p = 7.85 Â 10 À6 , U test; Table 1). Other features were more strongly correlated with the host of other features (e.g., full-width at halfmaximum (FWHM), 0.81 [0.45-0.87]; Smile-cry, 0.76 [0.33-0.91]). Since a small number of samples limits the power of standard statistical tests (e.g., the Mann-Whitney U test), when comparing CCs between two groups within a modality we applied a permutation test. We compared a statistic, defined as the difference between the Figure 1. PYR and PV interneurons are tagged in freely moving mice. A, Optical tagging of PV cells. a, Every PV::ChR2 mouse was chronically implanted with a four-fiber/four-shank/32-channel optoelectronic array in the neocortex (nCX). Optical stimuli were applied, in separate sessions, in the nCX and in hippocampal region CA1. Peristimulus time histogram of the PV cell (bottom), triggered by the onset of 50 ms light pulses applied on the optical fiber attached to the recording shank (n = 20; 33 mW). The unit is tagged as PV because of a robust firing rate increase during light (gray) compared with no-light periods. ***p , 0.001, Poisson test. b, Wide-band (1-5000 Hz) recordings from four same-shank channels in CA1. Bottom, Spike trains of a PYR (purple) and the PV cell (green). B, Connectivity-based tagging. Top, Mean (6SD) spike waveforms. For every unit, the channel that exhibits the highest trough-to-peak magnitude is denoted the main channel (boxed). Middle, Auto-correlation histograms (ACHs), showing burst spiking activity of the PYR (purple). Bottom, Cross-correlation histogram (CCH; black) between the spikes of the PYR and the optically tagged PV cell. Gray, Monosynaptic window. The CCH is consistent with monosynaptic excitation of the PV cell by the reference unit, tagging the reference unit as excitatory (PYR). ***p , 0.001, Poisson test.   Fig. 1B). a, Trough-to-peak (TTP) duration feature, defined as the time between the maximal negativity and the ensuing maximal positivity. b, Cumulative distribution function (CDF) of the TTP duration feature for the entire population (PYR, n = 411; PV cells, n = 101; no chunking). Here and in all subsequent CDFs, horizontal lines represent 50%, vertical dashed lines indicate medians, and the *** symbol corresponds to p , 0.001 (U test). The filled circles represent values corresponding to the examples given in a. The difference between PYR and PV cells indicates longer TTP durations for PYR compared with PV cells. B, Waveform-based features allow near-perfect classification. Cross-validated random forest models were trained using the waveform-based features (n = 50 partitions). The chunking method yields improved classification compared with no chunking. The ROC AUCs without chunking (blue) and with 50 spike chunks (orange) are higher than chance level. ***p , 0.001, Wilcoxon test compared with chance level, 0.5. Inset, Confusion matrices (no chunking) based on different decision thresholds (top, 0.1; bottom, 0.9) show variability in prediction, exemplifying the shortcomings of threshold-dependent metrics. C, Features extracted from spike timing. High-frequency features derived from single-sided short-term (0-50 ms) ACHs. a, The medians of the two groups of CCs, to the 95th percentile of a null distribution created by shuffling the CCs between the groups, and calculating the statistic 1000 times. We did not observe stronger absolute CCs within families: the absolute intrafamily correlation was 0.37 [0.14-0

Spike-timing features
For the features based on spike timing, the autocorrelation histogram (ACH) was calculated for every unit over a range of 61000 ms using a bin size of 0.5 ms. The ACH depends only on the timing of the spikes and is agnostic to the spike waveforms. The ACH was upsampled eightfold using polyphase filtering to increase the temporal resolution. The ACH is an even function, but edge effects continued Uniform-distance feature is defined as the average absolute difference between the single-sided ACH and a straight line (the example units are the same as in Fig. 1B). b, Cumulative distribution of the Uniform-distance feature for the entire population (no chunking). The larger Uniform-distance values for PYR indicate larger deviations from linear recovery for PYR compared with PV cells. D, Classification based on spike-timing features is not consistently improved by chunking. AUCs were derived from ROC curves based on n = 50 cross-validated random forest models. ROC curves for the test data without chunking (blue) and with 1600 spike chunks (orange) with performance above chance level. All conventions are the same as in B. See also Extended Data Figures 2-1, 2-2. necessarily cause asymmetry in any practical implementation. To obtain an ACH (0-1000 ms) free from edge effects, the value in every positive time bin was averaged with its negative homolog. A total of n = 8 spike-timing features was derived (Fig. 2, Table 2). Three of the features were high-frequency features, derived from the short-term ACH (up to 50 ms). Two were low-frequency features, derived from the long-term ACH (50-1000 ms). The last three were wide-band features: two were derived from the complete ACH (0-1000 ms), and one was derived from the entire recording. For every feature, we compared the values derived for the PYR and PV cells. All eight features (100%) exhibited a consistent difference between PYR and PV cells (p , 0.05, U test; Table 2). Thus, spiketiming features may be useful for classification.
To quantify interrelations, we computed CCs and MI between spike-timing features extracted from the complete spike trains (Extended Data Fig. 2 The fact that almost all variance of the MI values is explained by pairwise correlations suggests that the interrelations between the pairs of features are largely monotonic. The correlations between high-frequency features, namely Uniform-distance, Kullback-Leibler distance (D KL )-Short, and Rise-time suggest that the features provide redundant information. Thus, a small number of spike-timing features may suffice for classification.

Spatial feature extraction
For extracting purely spatial features, an event-based d -transformation was first applied to the mean upsampled waveform of every channel to remove all waveform-based information (Fig. 3A), as follows. (1) First, positive spikes were inverted as done for the waveform-based process. (2) Next, three events were defined. One event was the time of maximal negativity (NEG). For the additional two events, the median over all the channels was calculated. The second event was the first median crossing (FMC), the first time point before the maximal negativity of the channel in which the global median was crossed. The third event was the second median crossing (SMC), the first time point after the maximal negativity of the channel in which the global median was crossed. Every event was detected on every channel, yielding a total of 24 points. (3) Third, the waveform was replaced by a d -like function that took the value of the maximal negativity of the channel at the singular event time point and zeros everywhere else. The d -like functions were scaled by the absolute value of the global  . Performance was quantified by the threshold-independent AUC. The classification yields an AUC of exactly 0.5, corresponding to purely random prediction. D, A time-based feature, FMC-Time-lag-SD, derived from the differences between the times of the FMC event in different channels. The feature quantifies the temporal dispersion of the event, without considering the actual positions of the recording electrodes. a, FMC-Time-lag-SD is defined as the SD of the time differences between the FMC event of the main channel (vertical dotted lines) and the other channels. In gray are ignored channels, for which the magnitude of the TTP was below a predetermined threshold. b, Cumulative distribution of the FMC-Time-lag-SD feature for the entire population (411 PYR, 98 PV cells, no chunking). The smaller FMC-Time-lag-SD values of the PYR indicate higher spatiotemporal synchrony for PYR compared with PV cells. All conventions for the CDFs here and in subsequent panels are the same as in Figure 2A. E, A graph-based feature, FMC-Average-weight, derived from the differences between the FMC event time in different channels and the electrode locations. a, FMC-Average-weight is defined as the average edge weight in the event graph. The event graph is a directed graph with vertices representing the electrodes, and edges representing the transmission speed based on the timing of the events and the location of the electrodes. Only channels that passed the threshold for the magnitude of the TTP were considered. b, Cumulative distribution of the FMC-Average-weight feature (no chunking). The larger values for PYR indicate minimum over all channels, effectively removing all waveform-based information from every single channel. However, waveform-based information may still be available when using multiple events together (e.g., FMC, NEG, and SMC).
(4) To remove all waveform-based information, the d functions were shifted together to centralize (shift to the 129th sample) the time of the event on the main channel. The process transforms the waveform in the main channel to be nearly identical for all units (Fig. 3B). Residual variability in the time of the trough may remain if the channel with maximal magnitude of the TTP and the channel with the maximal negativity are not the same. Overall, n = 18 features were extracted from the transformed waveforms (Table 3) associated with the three events, quantifying the following three dimensions: time based, graph based, and value based. (1) In the timebased dimension, only the timing of the events was considered (e.g., the SD of the FMC; Fig. 3D, FMC-Time-lag-SD). Channels for which the magnitude of the TTP (before d -transformation) did not pass an arbitrary threshold of 25% of the maximal magnitude of the TTP over all channels were ignored (Fig. 3D, gray). A median [IQR] of 4 [3-5] PYR channels and 3 [2-5] PV cell channels were removed.
(2) The graph-based dimension included both event timing and the physical locations of the recording electrodes on the probe. An event graph was generated based on a specific event, with a node for each channel (e.g., Fig. 3E, FMC-Average-weight). Only channels for which the magnitude of the TTP passed the 25% threshold were considered. Directed edges connected every two nonoverlapping events, with a weight representing "transmission speed": the Euclidean distance between the electrodes, divided by the time difference between the events. (3) The valuebased dimension (SPD) ignored timing information and considered the scaled maximal negativity values on every channel, based on the global maximal negativity (e.g., Fig.  3F, SPD-Count). We found that 10 of 18 (56%) of the spatial features exhibited differences between PYR and PV cells (p , 0.05, U test). Although some features do not show consistent differences between the two cell types, classification may benefit from the features because of, for instance, distinct second-order statistics.
To estimate feature redundancy because of high correlations, we computed the CCs between the spatial features extracted from the transformed mean waveforms (Extended Data Fig. 3-1A). The rank correlation matrix of the spatial features showed absolute correlations (median [IQR]: 0.2 [0.1-0.33]), that were weaker than for the waveform-based features (p = 3.5 Â 10 À4 , U test) and for the spike-timing features (p = 1.1 Â 10 À7 ). Eighty percent of the spatial feature pairs exhibited absolute correlations higher than zero (122 of Fig. 3-1B). Since the correlation and the MI analysis agreed, the relatively weak correlations between spatial features may result from large amounts of noise in every feature. Alternatively, the features may provide independent information, useful for classification.

Classification procedure
The classification model was chosen to be random forests (Breiman, 1996(Breiman, , 2001 because of the relative simplicity. Furthermore, several methods are available for understanding the determinants of a specific random forest model prediction (Archer and Kimes, 2008). To achieve good estimation of model performance, a nested cross-validation procedure was applied (Varma and Simon, 2006;Krstajic et al., 2014). For every modality (waveform, spike timing, and spatial), the training procedure was repeated n = 50 times. In every iteration, data were first partitioned in an approximate 80:20 ratio into training and test sets in a stratified fashion. Thus, the training set always included 328 PYR and 80 PV cells, and the test set included 83 PYR and 21 PV cells; only the identity of the units changed between iterations. To handle the imbalance between the number of PYR and PV cells in the dataset, the model weights that control the effect of every class on the impurity score used for training the random forest model were adjusted. Specifically, instead of assigning equal weights, class weights were set to be inversely proportional to the number of samples in every class using the following: total number of training set samples/(number of classes Â number of class samples in the training set). Second, a fivefold grid search was conducted on the training set to find the best hyperparameters for the model, optimizing the receiver operating characteristic (ROC) of the area under the curve (AUC). The tested hyperparameters were the number of estimators, the depth of each estimator, the minimal number of samples required to split a node, and the minimal number of samples required to be at a leaf node. Other hyperparameters received default values based on the implementation of the scikit-learn library in Python (Pedregosa et al., 2011). Third, using the optimized hyperparameters, the model was trained on the entire training set. Finally, model performance was evaluated using the test set.

Performance and explainability
Model performance was assessed using a metric that is robust to unbalanced data. Two types of metrics exist: threshold dependent and threshold independent. Threshold-dependent metrics consider only the binary decision: in our case, PYR or PV. Threshold-independent metrics consider the raw prediction, a value between continued higher transmission rates for PYR compared with PV cells. F, A value-based feature, SPD-Count, derived from SPD of the maximal negativity on every channel. a, SPD-Count is defined as the number of channels that reached at least 50% of the maximal negativity of the main channel. b, Cumulative distribution of the SPD-Count feature (no chunking). No consistent difference between the PYR and PV cells is observed, suggesting similar spatial distributions of the scaled maximal negativity (p . 0.05, U test). See also Extended Data Figure 3-1. 0 and 1, and not the decision itself. Threshold-dependent metrics require choosing a threshold, and thus the outcome may vary according to the chosen decision threshold. An arbitrarily chosen threshold does not necessarily reflect performance, especially when considering unbalanced datasets (Sheng and Ling, 2006). The choice of a decision threshold is not trivial, and is the subject of active research (Esposito et al., 2021). For these reasons, we used an established threshold-independent metric, the AUC (Fawcett, 2006). The AUC reflects the relation between true positives and false positives for all possible thresholds and is hence threshold independent as well as suitable for unbalanced datasets.
The theoretical chance level for the AUC metric is 0.5. To determine the empirical chance level, the performance of models trained on data with shuffled training-set labels was assessed. For every modality (waveform, spike timing, and spatial), the training procedure was conducted with the labels shuffled only for the training set. The AUC exhibited the expected chance-level behavior, yielding the following median [IQR]  To assess the contribution of every feature to the final classification prediction, we used Shapley additive explanations (SHAPs) values (Lundberg et al., 2020). The SHAP method uses a game-theoretic approach to calculate an importance score for every feature in each sample, considering all interdependencies with other features. Since the predictions of random forest models range from 0 to 1, the contribution of every feature can take values from À1to 1. For each model, we calculated and averaged the absolute contribution of every feature based on the test set, taking the median over the 50 partitions to avoid idiosyncrasies of a single arbitrary partition. Since the absolute value is taken, SHAP values are necessarily non-negative, creating a skewed distribution that does not have an expected value of zero. To calculate significance, we used a permutation test. The SHAP values for each feature were compared with the SHAP values obtained for the models trained with shuffled trainingset labels. Models trained with shuffled labels represent chance-level classifiers, for which the importance of every feature can be considered as the chance-level baseline. To create a null distribution, we partitioned the dataset into a train and test sets 1000 times. For every train-test partition, we shuffled the labels, trained the models as described previously, and calculated the SHAP values. Then, to calculate a p-value for every feature, we compared the original SHAP value to the null distribution.

Chunking method
Models trained on larger nonredundant datasets typically exhibit improved performance. Therefore, data augmentation approaches to synthetically increase the size of the dataset are often applied (Moreno- Barea et al., 2018). Augmentation may be implemented by adding noise or transforming the data (rotation and reflection in image classification; Mikołajczyk and Grochowski, 2018). Here, to augment the size of a given dataset, features were extracted from "chunks" that included subsets of spikes, instead of using all available spikes together. Thus, we increased variability using the natural richness of the data that is otherwise flattened to a single mean waveform. The chunking process increases the number of samples in the dataset, with the possible cost of increased noise. A total of n = 7 different chunk sizes was used, with 25, 50, 100, 200, 400, 800, or 1600 spikes per chunk. For a given unit with N spikes and a chunk size C, the spikes were randomly split into bN/Cc chunks so that every chunk consisted of between C and 2C -1 distinct spikes. Spikes were randomly assigned to chunks. In a given split, all chunks consisted of the same number of spikes up to a difference of a single spike. Every chunk received the label of the source unit. Because of the different numbers of spikes recorded for PYR and PV cells (5,651,196 PYR spikes and 11,612,978 PV cell spikes), the balance between PYR and PV samples changed compared with the original dataset (411 PYR and 101 PV cells). For instance, using a chunk size of 25 spikes, the total number of PYR samples was 225,850 while the number of PV samples was 464,473.
To determine features for every chunk separately, the waveforms were averaged over all available samechannel spikes within a chunk, from which waveform and spatial features were derived. To derive chunk-specific spike-timing features, the ACH was accumulated by summing over all single-spike ACHs. For every spike, the single-spike ACH was based on all spikes that occurred in the À1000 to 1000 ms time window around the reference spike. To be applicable to the chunking case, the singlespike firing rate was defined as the mean of the inverse interspike intervals before and after the spike.
To provide information about the distribution of the values over chunks, several statistics were extracted from the individual values of every chunk. Extracting statistics based on all the chunks of a unit allows considering intraunit variability as a feature. For every feature, the mean, SD, and the 25%, 50%, and 75% quantiles were extracted (referred to as "chunk statistics"). Note that extracting the mean out of all the individual feature values of the chunks is not the same as not using chunking. Without chunking, first all the waveforms are averaged, transformed, and then features are extracted. In contrast, when using the mean over chunks, averaging happens at the end of the process (averaging the waveforms within a chunk still happens at the beginning). If all steps are linear, the two processes yield the same results. However, since feature extraction is not a linear operator, the mean statistic may contain unique information. The specific statistics were chosen to capture the first and second moments of the across-chunk distribution and for their simplicity. The chunk statistics can be expanded further to capitalize on different properties of the distribution over different chunks. The five new chunk statistics increased the total number of features used by the model sixfold. Notably, the chunk statistic features are the same for all the chunks of a given unit.
When training models using chunks, the data were partitioned based on units. In the training set, every chunk was considered independently (not as part of the unit). In addition, instead of performing a grid search for every chunk size, the hyperparameters for all chunk sizes were chosen by a no-chunking equivalent. Chunk statistics were extracted for the no-chunking dataset as well, so the number of features was equal to the number of features in the chunking method. The models applying chunking used the hyperparameters found using the no-chunking equivalent grid search, based on the same partition of the data to training and test sets. For testing and evaluating, every chunk received an independent prediction. Then, predictions were pooled over all same-unit chunks by casting a majority vote, yielding a final chunk-based prediction for the unit.
When calculating the SHAP values for the chunkingbased models, we randomly chose 1000 samples out of the test set. The procedure for computing SHAP values for chunking-based data were otherwise the same as for the no-chunking data. The absolute SHAP value of every feature was summed together with the values of the chunk statistics extracted from the same feature, yielding a single importance value for every original feature.

Generalization analysis
Units in the tagged dataset were recorded from CA1 (449 of 512 units, 88%) and from neocortex. The availability of tagged data from two brain regions allows testing and quantifying interregion generalization. Generalization was determined by the performance of models trained using recordings from one region on test data from the same region ("training region"), and from the other region ("non-trained-on region"). To directly quantify generalization, we partitioned the full dataset into the following three sets for each training region: (1) a training set, containing ;80% of the training region units (CA1: 301 PYR, 58 PV; neocortex: 27 PYR, 23 PV); (2) a test set, containing the remaining 20% of the training region units (CA1: 76 PYR, 14 PV; neocortex: 7 PYR, 6 PV); and (3) a second test set, containing all units of the non-trained-on region (neocortex: 34 PYR, 29 PV; CA1: 377 PYR, 72 PV). Waveform, spike timing, and spatial models were trained on the reduced CA1 dataset with 50, 1600, and 25 spike chunks, respectively (found to yield the best performance for each modality on the combined dataset). Chunked data were used for the grid search: the training set was further partitioned into a "development set" containing 80% of the units and an "evaluation set" containing 20% of the units. If the development set contained .5000 chunks, the grid search was conducted on a random subset of 5000 chunks, minimizing run time while allowing an efficient search. The difference in the number of units between the CA1 and neocortical training sets leads to an inherent difference in absolute performance between the two training region conditions. However, generalization can be readily compared between the two conditions based on the performance of the test set of the non-trained-on region relative to the performance of the test set from the training region.

Statistical analyses
A threshold of a = 0.05 was used for all statistical tests. An exception was the threshold used for tagging the units, namely for determining whether a unit exhibits light activation, and whether two units exhibit monosynaptic connectivity (a = 0.001). All descriptive statistics (n, median, IQR) can be found in the results, figures, tables, and legends. Differences between the medians of two unpaired groups were evaluated using a Mann-Whitney U test (two tailed unless otherwise specified). Differences between the median of a single group and a number, or between the medians of two paired groups, were evaluated using Wilcoxon's signed-rank test (one tailed unless otherwise specified). Comparisons of more than two groups were conducted using a Kruskal-Wallis one-way nonparametric ANOVA, and corrected for multiple comparisons using Tukey's procedure. Rank (Spearman's) correlation coefficients were tested using a permutation test. All statistical tests were conducted using either SciPy library (Virtanen et al., 2020) or custom code implemented in Python and MATLAB.

Data availability
The code used for feature extraction, model training, and visualization is freely available on GitHub (https:// github.com/EranStarkLab/SpatiotemporalSpiking).

PYR and PV interneurons are tagged in freely moving mice
Differentiating between PYR and PV cells based on electrical properties requires a ground truth-labeled dataset. We recorded and tagged extracellular spiking data from freely moving PV::ChR2 mice (n = 7) using chronically implanted four-shank, 32-channel optoelectronic arrays (Fig.  1Aa). Every shank was equipped with a diode-coupled fiber, enabling independent illumination of small local groups of neurons while concurrently recording the extracellular activity (Fig. 1Ab). We used 50-70 ms light pulses for optical tagging. A unit was tagged as PV if the stimulus-locked firing rate increase was consistently above baseline (p , 0.001, Poisson test; Fig. 1Ac). Using the optical tagging procedure, a total of 27 units from the neocortex and 71 from CA1 were tagged as PV cells.
For every pair of simultaneously recorded units, we calculated the spike-to-spike CCH and tested for peaks in the monosynaptic time range (0-5 ms; p , 0.001, Bonferronicorrected Poisson test). Units that participated as a reference in a CCH that exhibited a significant peak were tagged as excitatory (Fig. 1B). Using the monosynaptic CCH analysis, 424 units were tagged as excitatory and 21 as inhibitory; 13 of 21 units were both inhibitory and optically activated. Together with the optically tagged PV cells and after removing invalid samples (Materials and Methods), the dataset consists of 512 units, of which 411 units (80.3%) are PYR (Fig. 1C).

Waveform-based and spike-timing features allow near-perfect classification of PYR and PV cells
Waveform-based spike properties differ between PYR and PV cells and are widely used for cell type classification (Barthó et al., 2004;Cardin et al., 2009;Stark et al., 2013). However, many classifiers use waveform-based features in conjunction with features based on spike timing, and previously used spike-based classifiers have not been cross-validated. For every unit, we calculated n = 8 features based on the waveform of the main channel (e.g., TTP duration; Fig. 2A While spike-timing information has been used for cell type classification before, most implementations also considered waveform-based features (Csicsvari et al., 1998;Frank et al., 2001;Viskontas et al., 2007). To directly test whether spike timing alone can yield accurate classification, we derived n = 8 spike-timing features from the spike trains of every unit (e.g., Uniform-distance; Fig.  2C, Table 2). We conducted the training and evaluation process for the spike-timing modality as for the waveformbased classification. Without chunking, the AUC was 0.975 [0.957-0.986] (p = 3.8 Â 10 À10 , Wilcoxon test). The performance of the spike-timing models did not exhibit consistent improvement on chunking (p . 0.05 for all chunk sizes, Wilcoxon test). Nevertheless, the highest improvement in the AUC was achieved using 1600 spike chunks, increasing performance by 0.28% [À0.39 to 0.74%] to yield an AUC of 0.977 [0.965-0.987] (p = 0.07, Wilcoxon test; Fig. 2D). SHAP analysis using models trained with 1600 spike chunks (Table 2) attributed the highest importance to the D KL -Long feature (0.19 [0.17-0.2]), followed by the Uniform-distance feature (0.14 [0.12-0.16] (p , 0.002 for both, permutation test). The results suggest that both high-frequency and lowfrequency features contribute to differentiation between PYR and PV cells, allowing near-perfect performance.
Transforming multichannel spike waveforms to eventbased d-like functions removes all waveform-based information and allows extracting purely spatial features Having established a cross-validated pipeline for cell type classification from spike data, we turned to focus on spatial features. To limit the information to spatiotemporal features per se, we first removed all single-channel waveform information from the waveforms recorded over the eight channels using an event-based d -transformation (Materials and Methods; Fig. 3A). The procedure was applied to the following three distinct events: the FMC, the NEG, and the SMC. The transformation yielded nearly identical main channel waveforms for all units (Fig. 3B). To determine whether the d -transformation indeed removed all single-channel waveform information, we used the transformed spikes as input for waveform-based feature extraction, followed by model training and testing. The classification models yielded chance-level results. Specifically, the AUC was 0.5 [0.5-0.5] (p = 0.99, Wilcoxon test; Fig.  3C). When using a naive decision threshold of 0.5, the same class was predicted for every sample (Fig. 3C, inset). Thus, the d -transformed waveforms are devoid of waveform-based information.
From the transformed waveforms recorded on the eight channels, n = 18 spatial features were derived for every unit ( Table 3). The features were partitioned into the following three families: time based (Fig. 3D), graph based (Fig. 3E), and value based ( Fig. 3F; Materials and Methods). Ten of eighteen (56%) of the spatial features exhibited differences between the PYR and PV populations (p , 0.05, U test). To estimate feature redundancy, we computed rank CCs and MI between every pair of spatial features. The matrix of CCs between spatial features (Extended Data Fig. 3-1) showed absolute correlations (median [IQR]: 0.2 [0.1-0.33]) that were smaller than for the waveform-based features (0.43 [0.2-0.77]; p = 3.5 Â 10 À4 , U test) and for the spike-timing features (0.52 [0.38-0.7]; p = 1.1 Â 10 À7 ). Furthermore, MI values between spatial feature distributions were smaller than MI between waveform-based features (p , 7.1 Â 10 À9 , U test) and smaller than MI between spike-timing features (p , 4.3 Â 10 À9 ). Moreover, absolute intermodality CCs [0.13 (0.07-0.20)] were smaller than absolute intramodality CCs (0.244 [0.13-0.48]; p = 3.1 Â 10 À19 , U test). The weak correlations between spatial and waveform-based features, and between spatial and spiketiming features, suggest that a combination of features from different modalities may be beneficial for classification. Finally, the differences between the PYR and PV groups for most spatial features, together with the relatively weak mutual information between pairs of spatial features, suggest that the various spatial features may contain nonoverlapping information useful for classification.

The variance of spatial features over channels and across chunks is different for PYR and for PV cells
A direct comparison of the spatiotemporal dispersion of the event times between PYR and PV cells revealed eventdependent synchronization differences for both cell types (p , 4.5 Â 10 À20 , Kruskal-Wallis test; Fig. 4A). For PYR, an increase in spatiotemporal synchronization from FMC to NEG was observed (exhibited by a decrease in the SD; FMC: 15.9 ms [8.8-26.3 ms]; NEG: 13.5 ms [7.7-23.5 ms]; p , 0.02, Kruskal-Wallis test, corrected for multiple comparisons). An increase from FMC to NEG was also seen for PV cells 3 ms]; NEG: 9.4 ms [5.4-13.3 ms]; p , 1 Â 10 À19 ). For both cell types, the increase in synchronization was followed by a decrease from NEG to SMC (SMC: PYR, 26 ms [18.8-34.3 ms], p , 2.2 Â 10 À16 ; PV, 24.1 ms [15.6-38.9 ms], p , 3.6 Â 10 À11 ). Thus, for both PYR and PV cells, spatiotemporal synchronization changes during the course of an action potential.
Next, we assessed whether PYR and PV cells exhibit differences in spatiotemporal synchronization during specific events. Higher spatiotemporal synchronization was observed for PYR spikes compared with PV cells during FMC, expressed by lower SD (A w = 0.72; p = 2.3 Â 10 À12 , U test; Table 3). Thus, the FMC event occurred on multiple channels nearly at the same time for PYR spikes, whereas for PV cell spikes the FMC was more dispersed in time. Synchronization flipped during the NEG event, with higher synchronization for the PV cell spikes (Aw = 0.63; p = 3.44 Â 10 À5 , U test). We did not observe consistent differences during the SMC event (Aw = 0.50; p = 0.47, U test). Similar changes between events were observed for the Time-lag-SS and the Shortest-path features (Extended Data Fig. 4-1). The synchronization differences between the spikes of PYR and PV cells may reflect the distinct morphologic and functional properties of the different cell types.
Intraunit variability, the variability across the chunks of the same unit, may degrade classification performance. Alternatively, intraunit variability may differ between classes and possibly benefit classification. Of the statistics extracted from the chunks, the SD is a second moment statistic, and may hold unique information compared with the other chunk statistics used. Specifically, we examined the intraunit SD values for all spatial features calculated based on 25 spike chunks (the smallest chunk size used). To allow comparing SDs of multiple features, features were scaled based on all units before calculating the SD for every unit. Most features (13 of 18) showed consistent differences of the SD between the PYR and PV cells groups (0.56 A w 0.90; p , 0.05, U test; Table 4). All features that did not differ consistently between the two cell types were of the graph-based family (FMC-Average-weight, SMC-Average-weight, SMC-Longest-path, FMC-Shortestpath, and SMC-Shortest-path; 0.50 A w 0.54; p . 0.05, U test; Fig. 4B, gray lines). All the features that consistently differed between the two cell types exhibited larger SD values for PV cells, compared with PYR (Fig. 4B, black lines). The median SDs for all features were lower for PYR (0.34 [0.16-0.47]) compared with PV cells (0.53 [0.3-0.6]; p = 1.9 Â 10 À4 , Wilcoxon test). The higher intraunit variability for PV cells indicates a common phenomenon of the spatial features that is identified specifically by chunking.

Features based exclusively on spatial information allow accurate classification of PYR and PV cells
To determine whether the differences between the spatial distribution of the extracellular signals from PYR and PV cells are only correlative or indicative, we conducted a training and evaluation process. The process was carried in the same manner as for the waveformbased and spike-timing features. Using the mean waveforms, the median [IQR] AUC was 0.83 [0.8-0.85] (p = 3.8 Â 10 À10 , Wilcoxon test; Fig. 5). The performance of the spatial models improved when chunking was applied: chunking consistently increased the performance for all tested chunk sizes (25-1600 spikes: p , 1.45 Â 10 À9 , Wilcoxon test; Fig. 5A). Upon chunking to 1600 spike chunks, the AUC increased by 9.6% compared with the no-chunking AUC. The AUC increased monotonically for progressively smaller chunk sizes, achieving a value of 0.963 [0.949-0.975] for 25 spike chunks (16.3% [13.9-19.6%] increase; Fig. 5B). We did not test smaller chunk sizes. Therefore, results reported from this point onward are based on the best model, using 25 spike chunks. While the performance of the spatial models was lower than the performance of waveform-based or spike-timing models (p , 6.6 Â 10 À6 for both, two-tailed Wilcoxon test), the results indicate that PYR and PV cells can be accurately differentiated based on purely spatial features. On their own, spatial properties provide a completely new approach to cell type classification.
To assess the contribution of every spatial feature and family of features, we analyzed SHAP values. We found that feature importance was not uniform (Fig.  5C). Specifically, the highest importance was attributed to features derived from the FMC events ( ]; p = 2.6 Â 10 À8 for both, Kruskal-Wallis test corrected for multiple comparisons). The importance of the time-based features was the largest (time based compared with graph based, p = 2.6 Â 10 À8 ). The distribution of the six most important features is shown in Extended Data Figure 5-1. The feature importance analysis suggests that features agnostic to the physical distance between channels and features that do consider spatial locations make nonoverlapping contributions to classification. Moreover, the usage of multiple features derived from the same event, the FMC, is beneficial.
Spatial models generalize poorer than waveform models but better than spike-timing models In the tagged dataset, units were recorded from both CA1 and neocortex, allowing the testing of interregion generalization. To quantify similarities between regions, we determined the performance of models trained using data from a single region on one test set from the training region, and on another test set from the non-trained-on region. Training on either CA1 data or neocortical data, all models performed above chance level when tested on both the CA1 test set and the neocortical test set (p , 7.5 Â 10 À10 for all comparisons, Wilcoxon test; Fig. 6A, Extended Data Fig. 6-1) Fig. 6A). Likewise, performance was lower for the spiketiming models, compared with the spatial models (p , 0.005 for both comparisons; Fig. 6A). Thus, while all modalities generalize from CA1 to neocortex and from neocortex to CA1, waveform-based models allow the best performance whereas spike-timing models perform the worst.
Above-chance performance on the test set of the nontrained-on region does not guarantee perfect generalization. To quantify generalization, we defined a "generalization error" as the relative decrease when comparing performance on the test set of the training region and performance on the test set of the non-trained-on region. The generalization error was consistently above zero for spatial models trained on data from either region, and for CA1-trained spike-timing models (p , 0.05 for all three comparisons, Wilcoxon test; Fig. 6B). For both training sets, waveform-based models showed lower errors than spatial models (p , 0.05 for both comparisons, Kruskal-Wallis test corrected for multiple comparisons; Fig. 6C). In addition, for the CA1-trained models, the generalization error of the spatial models was lower than that of the spike-timing models (p , 6.7 Â 10 À6 ). Thus, spatial models generalize better than spike-timing models, but worse than waveform-based models, in particular when trained on CA1 data.
Finally, to determine which spatial features are most important for classification in every region, we computed SHAP values for spatial models trained on data from a single region. Despite some differences in specific values, the six features that made the largest contributions were the same for models trained on CA1 data (Extended Data Fig. 6-2A) and for models trained on neocortical data (Fig. 5C,. Hence, the determinants for the predictions of the spatial models are similar in neocortical and CA1 data.

Discussion
Using optically tagged high-density recordings from hippocampal region CA1 and neocortex of freely moving mice, we found that spiking of PYR and PV cells was associated with different spatiotemporal distributions of extracellular voltage. Compared with PV cell spikes, PYR spikes exhibited higher spatial synchrony at the beginning of the spike and lower synchrony at the trough. Together, the spatial features derived from the extracellular voltage distributions allowed accurate classification of PYR and PV cells.

Differences in the spatial distribution of extracellular voltages during spikes
Although the contribution of spatial information to classification tasks has been explored before (Buccino et al., 2018;Jia et al., 2019), previous work did not separate spatial information from other waveform-based properties. Using an event-based d -transformation, we derived spatial features devoid of single-channel waveform information.
The accurate classification based on spatial features is tantamount to spatiotemporal differences in the extracellular voltage distribution and is consistent with morphologic differences in the dendrosomatic and axonal organization of PYR and PV cells. In CA1, PV basket cell axons form a diverse horizontal network while most dendrites extend vertically (Freund and Buzsáki, 1996;Klausberger et al., 2003). The dendritic trees of PYR cells also extend vertically, but are more polarized (Bannister and Larkman, 1995;Spruston, 2008). The extracellular expression of intracellular signals has been studied theoretically (Rall, 1962) and modeled for reconstructed morphologies (Holt and Koch, 1999;Gold et al., 2006;Schomburg et al., 2012;Lindén et al., 2014;Bestel et al., 2021). Even with limited spatial sampling, the present results provide direct experimental evidence for a unique mapping between cell type and the spatial distribution of extracellular potentials. The results should be construed as lower bounds, since higher-density or three-dimensional sampling may allow further improvement. Among spatial features, the dispersion of the FMC event between recording sites made the highest contribution to the prediction. The FMC is a putative extracellular analog of the initial depolarization phase at the recording site. Thus, the lower interelectrode variance of FMC among PYR cells compared with PV cells indicates higher spatial synchrony at the beginning of the spike. This observation is consistent with known morphologic and electrotonic differences between the proximal dendrites Figure 5. Features based exclusively on spatial information allow accurate classification of PYR and PV cells. A, Classification based on spatial features is boosted by chunking. AUCs were derived from ROC curves based on n = 50 cross-validated random forest models. The AUC increases monotonically when chunk size is reduced. Every boxplot shows the median (IQR), whiskers extend for 1.5 times the IQR in every direction, a plus indicates an outlier, and notches represent 95% confidence intervals based on bootstrapping with 1000 repetitions. The best performance (highest AUC) and largest improvement compared with no-chunking (1) is observed for 25 spike chunks. ***p , 0.001, Wilcoxon test. B, Spatial features allow accurate classification. ROC curves for the test data without chunking (blue) and with 25 spike chunks (orange). The AUCs are higher than chance level. All conventions are the same as in Figure 2B. C, Feature importance analysis for spatial models with 25 spike chunks. SHAP values were used to assess the individual contribution of each feature to the prediction. The dotted lines represent chance-level importance values, based on models trained with shuffled PYR-PV labels. The features derived from the FMC event are associated with the highest SHAP values, indicating that synchrony at the initial depolarization phase makes the highest contribution to classification outcome. **p , 0.01, ***p , 0.001, one-tailed permutation test. See also Extended Data Figure 5-1. of PYR and PV cells. Since the probes were always inserted perpendicularly to CA1 stratum pyramidale, the vertical dendritic tree of both PYR and PV cells was parallel to the extracellular electrode arrays. In CA1, PYR have thick apical dendrites extending for up to 250 mm (diameter, 1.6-2.5 mm), whereas PV cell dendrites in stratum pyramidale are thinner (diameter, 1.3-1.7 mm; Gulyás et al., 1999;Andersen et al., 2007). The higher FMC synchrony of PYR spikes is consistent with lower axial resistance of the thicker proximal PYR dendrites, yielding synchronized somatic and dendritic potentials.
The second event that showed differences in temporal dispersion was the NEG, which corresponds to a point between the peak of the derivative and the peak of membrane potential during the spike Pettersen and Einevoll, 2008). During NEG, spatial synchrony reverses, being higher for PV cells compared with PYR cells. The narrower waveforms of PV cells are associated with a higher concentration of voltage-gated K 1 (Kv) channels, compared with PYR cells (Bean, 2007). In neuronal models with passive dendrites and when membrane time constants are relatively slow, membrane resistance has little effect on spike shape (Pettersen and Einevoll, 2008). However, since active conductance affects spike shape (Martina and Jonas, 1997), the differences in synchrony during the NEG event may result from distinct spatial gradients of Kv channels in CA1 PYR and PV basket cells. In PYR, there is a rapid decrease in the density of Kv channels on dendrites farther from the soma (Johnston et al., 2000), whereas the decrease for basket cells is more moderate (Hu et al., 2010). Kv channels open at depolarization, initiating repolarization and late hyperpolarization, and, in turn, govern the frequency and width of the spikes (Pongs, 1999). The more uniform spatial density of Kv channels in PV cells compared with PYR cells may synchronize the extracellular signals generated by the different cellular compartments.
The spatial features exhibit lower intraunit variability of PYR cells, compared with PV cells. The difference between the PYR and PV variability contributed to classification. The excitatory input of PYR in CA1 originates mainly from upstream regions (e.g., CA3 and entorhinal cortex; Andersen et al., 2007), whereas CA1 PV cells are mainly innervated by local PYR cells (Freund and Buzsáki, 1996). Thus, a possible source for the difference in intraunit variability is the distinct sources of excitation of PYR and PV cells.

Differences in spike waveform and spike timing
Compared with basket cells, neocortical and CA1 PYR have wider spikes (Simons, 1978;Connors et al., 1982;Kawaguchi and Hama, 1988;Contreras, 2004), lower firing rates (Kawaguchi et al., 1987), and an increased burst propensity (Kandel and Spencer, 1961;Ranck, 1973;Harris et al., 2001). Indeed, we found that waveform width (e.g., TTP-duration), burstiness (e.g., Uniformdistance), and firing rate all differ between PYR and PV Figure 6. Spatial models generalize poorer than waveform models, but better than spike-timing models. A, Cross-validated random forest models (n = 50) were trained for every modality on the CA1 (left, red) or neocortex (nCX; right, gray) data, and tested separately on different data from CA1 and from nCX. Conventions for boxplots here and in B are the same as in Figure 5A. All models exhibit above-chance performance. ***p , 0.001, Wilcoxon test. The performance of models on the non-trained-on region is highest for waveform-based models and lowest for spike-timing models. **p , 0.01; ***p , 0.001, Kruskal-Wallis test corrected for multiple comparisons. B, The decrease in performance on generalization. Generalization error is defined here as the difference between the AUC on the test set of the training region and the AUC on the test set of the non-trained-on region, divided by AUC on the test set of the training region. Spatial models trained on either region and spike-timing models trained on CA1 data, show generalization errors larger than zero. n.s.p . 0.05; **p , 0.01; ***p , 0.001, Wilcoxon test. The dashed horizontal line represents the zero-mark (i.e., same performance on CA1 and nCX). When trained on CA1 data (red), spatial models generalize poorer than waveform models but better than spike-timing models. n.s.p . 0.05; *p , 0.05; ***p , 0.001, Kruskal-Wallis test corrected for multiple comparisons. See also Extended Data Figures 6-1, 6-2. cells. Furthermore, both TTP-duration in the waveform models, and the Uniform-distance in the spike-timing models, contributed to the prediction. The importance of waveform width properties is in line with studies that used width-related features to differentiate between PYR and interneurons (Frank et al., 2001;Cardin et al., 2009;Stark et al., 2013). Similarly, the importance of burstiness and firing rate is consistent with prior work (Connors and Gutnick, 1990;Taira and Georgopoulos, 1993). The long-term ACH, which was not used before for classification, held informative value, consistent with distinct low-frequency rhythmic activity of PYR and PV cells (e.g., theta; Csicsvari et al., 1999;Buzsáki, 2002;Czurkó et al., 2011).

Chunking
When discussing the extracellular waveform of a neuron, many studies refer to the mean waveform (Trainito et al., 2019;Sun et al., 2021). To improve the performance of the classification models, we exploited the variability of spike waveforms and timing recorded from a single cell using chunking, increasing the number of samples at the possible cost of increased noise. The chunking method as implemented here was agnostic to two pieces of information. First, spikes were randomly assigned to chunks, ignoring possible time-related changes that may be constructive. Second, the relation of chunks to the same unit was only partially considered, and additional "chunk statistics" may be extracted. The statistics extracted from the distribution of feature values over chunks provide limited consideration of the other chunks. Consequently, the present implementation does not allow classification of all the chunks of a specific unit as a whole. Hence, our results form a lower bound for the improvement to be gained from chunking. More complex models may capitalize on time-dependent differences and dependencies between the samples. Modifying chunk size inherently results in a trade-off between the number of samples and the noise. Spatially and waveform-based models benefited the most from smaller chunks, while spike-timing models benefited from larger chunk sizes. The higher sensitivity of spike-timing models to noise in small chunks is consistent with the discrete nature of the spike trains, because ACHs are sparse when the number of spikes is small.

Interregion differences
Waveform models yielded near-perfect classification of data from the non-trained-on region for both training regions, in line with similar waveform widths of PYR and PV cells in neocortex and CA1 (McCormick et al., 1985;Kawaguchi and Hama, 1988). Yet, the generalization of waveform models was not perfect and is not expected to be universal: in the primate, pyramidal tract PYR cells exhibit narrow spikes (Vigneswaran et al., 2011;Lemon et al., 2021). CA1-trained spike-timing and spatial models exhibited decreased performance when tested on neocortical data compared with CA1 data, with spatial models generalizing better than spike-timing models. The poor generalization observed for the spike-timing models is consistent with the fact that neocortical PYR cells are less likely to exhibit bursts, compared with CA1 PYR cells (McCormick et al., 1985;Lacaille et al., 1987). The intermediate generalization of the spatial models may correspond to interregion differences in cellular morphology, ion channel distributions, or other cellular network properties. The synaptic and intracellular events that occur just before and during the spike may affect the spatial distribution of the signal (Zador et al., 1995;Hagen et al., 2016Hagen et al., , 2017. Hence, even morphologically identical cells with the exact same compartmental distribution of ion channels are expected to show different spatial distributions of extracellular potentials when embedded in distinct networks.

Limitations and applications
There are a few notable limitations to this work. First, cell type classification based on spatial features requires sampling of the extracellular space over multiple points. Here, we used a fixed electrode configuration with 20 mm vertical spacing, and application to data recorded using other configurations may require modifications. Second, expanding the duration of the sampled spikes beyond 1.6 ms (32 samples at 20 kHz) may increase the performance of spatially based models. Third, while several models yielded near-perfect performance, focusing strictly on mice does not warrant generalization to homological brain regions in other animals.
Our results suggest several possible applications. First, the concept of chunking combined with a majority vote can be used in real time, allowing classification outcomes to be updated online. Using the chunking and voting approach, there is no need to rerun the entire model whenever a new piece of data is collected. Instead, every time a predetermined number of spikes is gathered, another vote can be added to the prediction. Second, classification based strictly on spike timing can be used when waveform information is unavailable or for realtime applications.

Future directions
To identify the cellular network origin of the spatiotemporal differences in synchrony between PYR and PV cells, targeted experiments may be conducted. We hypothesize that the higher intraunit variability observed for PV cells compared with PYR cells may reflect distinct connectivity patterns. The excitatory input of PYR cells in CA1 originates mainly from upstream regions (e.g., CA3 and entorhinal cortex; Andersen et al., 2007), whereas CA1 PV cells are more likely to be innervated by local PYR cells (Freund and Buzsáki, 1996). The hypothesis may be tested using somatic opsins (Shemesh et al., 2017;Chen et al., 2018;Forli et al., 2021). The spatiotemporal distribution of extracellular potentials during spikes generated via somatic activation can be compared with natural spiking, generated by integrating EPSPs impinging mainly on the dendrites. More consistent spatiotemporal synchrony achieved for optically induced spikes will provide direct evidence that input variability may lead to more variability in the spatiotemporal synchrony.
In the future, other spatial features may be used to increase classification performance. We showed that the cross-validated classification of PYR and PV cells is already near perfect when based on waveforms alone, but other cell types may not be accurately distinguished using features derived from a single channel. For instance, compared with PV cells, somatostatin interneurons have lower firing rates and broader spikes (Ma et al., 2010;Royer et al., 2012;Veit et al., 2017). Distinguishing between multiple cell types using extracellular data may benefit from using spatial information.