Hearing Scenes: A Neuromagnetic Signature of Auditory Source and Reverberant Space Separation

Santani Teng; Verena R. Sommer; Dimitrios Pantazis; Aude Oliva

doi:10.1523/ENEURO.0007-17.2017

Article Figures & Data

Figures

Tables

Download figure
Open in new tab
Download powerpoint
Figure 1.
Stimulus conditions, MEG classification scheme, and single-sound decoding time course. A, Stimulus design. Three brief sounds were convolved with three different RIRs to produce nine sound sources spatialized in reverberant environments. B, MEG pattern vectors were used to train an SVM classifier to discriminate every pair of stimulus conditions (three sound sources in three different space sizes each). Decoding accuracies across every pair of conditions were arranged in 9 × 9 decoding matrices, one per time point t. C, Averaging across all condition pairs (shaded matrix partition) for each time point t resulted in a single-sound decoding time course. Lines below time course indicates significant time points (N = 14, cluster-definition threshold, p < 0.05, 1000 permutations). Decoding peaked at 156 ms; error bars represent 95% CI.
Download figure
Open in new tab
Download powerpoint
Figure 2.
Separable space and source identity decoding. A, Individual conditions were pooled across source identity (left, top) or space size (left, bottom) in separate analyses. Classification analysis was then performed on the orthogonal stimulus dimension to establish the time course with which the brain discriminated between space (red) and source identity (blue). Sound-source classification peaked at 130 ms, while space classification peaked at 386 ms. Significance indicators and latency error bars on plots same as in Figure 1. B, Space was classified across sound sources and vice versa. Left panel, Cross-classification example in which a classifier was trained to discriminate between spaces on sound sources 1 and 2, then tested on space discrimination on source 3. Right panel, Sound-source cross-classification example in which a classifier was trained to discriminate between sound sources on space sizes 1 and 2, then tested on sound-source discrimination on space 3. B, Results from all nine such pairwise train-test combinations were averaged to produce a classification time course in which the train and test conditions contained different experimental factors. Sound-source cross-classification peaked at 132 ms, while space cross-classification peaked at 385 ms. Significance bars below time courses and latency error bars same as in Figure 1.
Download figure
Open in new tab
Download powerpoint
Figure 3.
Sensorwise decoding of source identity and space size. MEG decoding time courses were computed separately for 102 sensor locations yielding decoding sensor maps. A, Sensor map of sound source decoding at the peak of the effect (130 ms). B, Sensor map of space size decoding at the peak of the effect (386 ms). Significant decoding is indicated with a black circle over the sensor position (p < 0.01; corrected for false discovery rate (FDR) across sensors and time).
Download figure
Open in new tab
Download powerpoint
Figure 4.
Temporal generalization matrix of auditory source and space decoding time courses. Left column shows the generalized decoding profiles of space (A) and source (B) decoding. Right column shows the statistically significant results (t test against 50%, p < 0.05, FDR corrected).
Download figure
Open in new tab
Download powerpoint
Figure 5.
Behavior correlates with MEG decoding data. Assessment of linear relationships between response times and MEG peak decoding latencies (A), as well as behavioral and decoding accuracies (B). Bootstrapping the participant sample (N = 14, p < 0.05) 10,000 times revealed significant correlations between RT and latency (r = 0.66, p = 0.0060) and behavioral and decoding accuracy (r = 0.59, p < 0.0001). Individual condition pairs are denoted by source (So; red) or space (Sp; blue) labels, with numerals indicating which conditions were compared. For space conditions: 1, small; 2, medium; 3, large. For source conditions: 1, hand pat; 2, pole tap; 3, ball bounce.
Download figure
Open in new tab
Download powerpoint
Figure 6.
Stimulus dissimilarity analysis based on cochleogram data. A, Cochleograms were generated for each stimulus, discretized into 200 5-ms bins and 64 frequency subbands. Each cochleogram thus comprised 200 64 × 1 pattern vectors. For each pair of stimuli, pattern vectors across frequency subbands were correlated at corresponding time points and subtracted from 1. B, Overall cochleogram-based dissimilarity. The final dissimilarity value at time t is an average of all pairwise correlations at that time point. Peak overall cochleogram dissimilarity occurred at 500 ms; peak MEG dissimilarity (decoding accuracy) is shown for comparison. C, Pooled cochleogram-based dissimilarity across space size and source identity. Pairwise correlations were performed and averaged analogously to pooled decoding analysis. MEG pooled decoding peaks for source identity and space size are shown for reference; corresponding stimulus dissimilarity peaks were significantly offset (p < 0.05 for both source identity and space).
Download figure
Open in new tab
Download powerpoint
Figure 7.
Comparison of MEG neural representations to a categorical versus an ordinal scene size model. Representational dissimilarity matrices (RDMs) of a categorical and an ordinal model (A) were correlated with the MEG data from 138–801 ms (the temporal window of significant space size decoding) to assess the nature of MEG scene size representations. B, Results indicate the MEG representations have significantly higher correlation with the ordinal than the categorical scene size model. Spearman correlation coefficients ρ were averaged across time points in the temporal window. Error bars represent ±SEM.
Download figure
Open in new tab
Download powerpoint
Figure 8.
Space and sound source decoding with repetition-window stimuli. A, Representative waveforms of single and repeated stimuli. Repeated stimuli were produced by concatenation of anechoic stimuli, followed by RIR convolution and linear amplitude ramping. B, Source (blue) and space (red) decoding. Sound-source classification peaked at 167 (96-312) ms, while space classification peaked at 237 (71-790) ms. Color-coded lines below time courses indicate significant time points, as in experiment 1; latency error bars indicate bootstrapped confidence intervals as in experiment 1. Gray vertical lines indicate stimulus onset and approximate offset.

Tables

Figures

View popup

Table 1.

Summary of key statistical tests

Line	Data structure	Type of test	95% confidence intervals
a	None assumed: classification accuracy over time	Bootstrap N = 14 participants 1000 times to obtain empirical distribution of significant decoding onset	Onset CI: 12–64 ms
b	None assumed: classification accuracy over time	Bootstrap N = 14 participants 1000 times to obtain empirical distribution of significant decoding peak	Peak CI: 119–240 ms
c	None assumed: classification accuracy over time	Bootstrap N = 14 participants 1000 times to obtain empirical distribution of significant sound-source decoding onset	Onset CI: 37–60 ms
d	None assumed: classification accuracy over time	Bootstrap N = 14 participants 1000 times to obtain empirical distribution of significant sound-source decoding peak	Peak CI: 116–140 ms
e	None assumed: classification accuracy over time	Bootstrap N = 14 participants 1000 times to obtain empirical distribution of significant space decoding onset	Onset CI: 71–150 ms
f	None assumed: classification accuracy over time	Bootstrap N = 14 participants 1000 times to obtain empirical distribution of significant space decoding peak	Peak CI: 246–395 ms
g	None assumed: onsets of source and space decoding	Compare bootstrapped empirical distribution of space decoding onset with mean source decoding onset	Space onset CI: 71–150 ms
h	None assumed: peaks of source and space decoding	Compare bootstrapped empirical distribution of space decoding peak with mean source decoding peak	Space peak CI: 246–395 ms
i	None assumed: cross-classification accuracy over time	Bootstrap N = 14 participants 1000 times to obtain empirical distribution of significant sound-source cross-decoding peaks	Onset CI: 40–63 ms Peak CI: 111–139 ms
j	None assumed: cross-classification accuracy over time	Bootstrap N = 14 participants 1000 times to obtain empirical distribution of significant space cross-decoding peaks	Onset CI: 125–356 ms Peak CI: 251–513 ms
k	None assumed: MEG-behavior correlations	Bootstrapping N = 14 pool, 10,000 iterations of Spearman correlation between behavioral reaction time and MEG peak latency	CI: .227–.895
l	None assumed: MEG-behavior correlations	Bootstrapping N = 14 pool, 10,000 iterations of Spearman correlation between behavioral accuracy and MEG peak accuracy	CI: .325–.795
m	None assumed: empirical distribution of source decoding peak	Compare bootstrapped empirical distribution of source decoding peak with source dissimilarity peak	Peak CI: 116–140 ms
n	None assumed: empirical distribution of space decoding peak	Compare bootstrapped empirical distribution of space decoding peak with mean space dissimilarity peak	Peak CI: 246–395 ms
o	Normal distribution: MEG-model correlations over time points	Paired t test between mean correlations	Mean difference CI: 0.0470–0.0507
p	None assumed: classification accuracy over time	Bootstrap N = 16 participants 1000 times to obtain empirical distribution of significant source decoding onset	Source peak CI: 96–312 ms
q	None assumed: classification accuracy over time	Bootstrap N = 16 participants 1000 times to obtain empirical distribution of significant source decoding onset	Space peak CI: 71–790 ms