Acoustic landmarks drive delta–theta oscillations to enable speech comprehension by facilitating perceptual parsing
Introduction
Because auditory signals unfold over time, at multiple scales, the process of decoding input sounds to link them to meaningful objects or concepts requires integrating sensory information over time. In speech perception, this temporal integration must occur in at least two (and arguably more) distinct timescales which relate to syllabic-level (~ 200 ms or ~ 5 Hz) and phonemic-level (~ 25 ms or ~ 40 Hz) information. Several models have suggested that this type of multi-time resolution analysis and integration could be performed in auditory cortex using neuronal oscillations – corresponding to these two temporal windows of integration (~ 5 Hz, theta; ~ 40 Hz, gamma) – to parse the sound input at these separate timescales (Ghitza, 2011, Poeppel, 2003). It is hypothesized, in particular, that the phase of the slow oscillation (nested with gamma) locks to the syllabic rhythm to optimally decode and integrate syllabic and phonemic speech features (Giraud and Poeppel, 2012).
In this magnetoencephalography (MEG) study, we focus on the role of the longer temporal window, most readily corresponding to delta–theta oscillations, to gain a better mechanistic understanding of how neuronal activity in this band might underpin auditory perception and speech comprehension. Recently, much research has focused on slow neural oscillations and their relationship to auditory stimuli (Cogan and Poeppel, 2011, Ding and Simon, 2009, Howard and Poeppel, 2010, Howard and Poeppel, 2012, Luo and Poeppel, 2007, Luo and Poeppel, 2012, Peelle et al., 2013). In addition, the relevance of low–modulation frequency oscillations to multi-sensory perception has been demonstrated, for example in naturalistic scenes or the well-studied cocktail party scenario (Kerlin et al., 2010, Luo et al., 2010, Zion Golumbic et al., 2013). There is an emerging consensus that the phase of slow oscillations precisely tracks the stimulus acoustics. However whether this stimulus–response alignment across time is necessary for speech comprehension remains debated (Howard and Poeppel, 2010; versus Luo and Poeppel, 2007, Peelle et al., 2013). One hypothesis is that cortical delta–theta oscillations track the critical band envelopes of the stimulus — a feature which carries crucial cues regarding segmental and syllabic information (Rosen, 1992)1. Despite the body of research showing this oscillation tracking the envelope, it remains unclear which aspects of the stimulus drive this response. One plausible hypothesis generated from the Giraud and Poeppel (2012) model suggests that it is the onsets of syllables that produce temporal fluctuations, which entrain slow neural oscillations at the syllabic rate. Here, we test this hypothesis by filtering these fluctuations in very particular ways and analyzing the effect on oscillatory entrainment. As such, the principal goal of this study is to understand more clearly the mechanisms of slow oscillation envelope tracking and, in particular, to uncover aspects in the temporal domain of the stimulus that drive this neuronal activity.
It has recently been demonstrated that theta envelope tracking of speech is enhanced by stimulus intelligibility (Peelle and Davis, 2012, Peelle et al., 2013), while earlier work showed similar neural phase-locking for sentences played backwards (no intelligibility) and forwards (Howard and Poeppel, 2010). Thus the question of whether the linguistic content of the stimuli induces a top-down ‘amplification’ of the oscillation-based envelope-tracking mechanism is debated. As a result, a secondary goal of this study is to investigate how envelope tracking relates to intelligibility and to understand its putative function in the broader context of speech perception.
This neurophysiological experiment builds on a recent behavioral study that manipulated the temporal acoustic features of speech to delineate the role of low frequency (syllabic) cues in speech intelligibility (Ghitza, 2012). Artificially removing exactly those temporal fluctuations in the critical band envelopes that relate to the syllabic rate (2–9 Hz) significantly reduces the intelligibility of the degraded speech. However, when brief noise bursts are added to the degraded stimulus precisely where the ‘acoustic landmarks’2 of the original would have been, the error rate drops by about 50%. The interpretation proposed to explain this psychophysical effect is that removing these cues disrupts the ability of cortical delta–theta oscillations to track the stimulus envelope. While removing slow fluctuations from the stimulus reduced intelligibility, reinstating temporal cues artificially by using transient edges at landmark positions enhanced intelligibility.
We hypothesize that temporal cues that reflect the syllabic rate are at the origin of the envelope-tracking phenomenon, which in turn constitutes a crucial condition for continuous speech to be intelligible. Specifically, we propose that acoustic landmarks entrain intrinsic cortical oscillations to permit the extraction of temporal primitives and subsequently finer grained speech features in a decoding stage. This quasi-periodicity generates the envelope tracking behavior, which could have the capacity to parse the stimulus into syllable-size representations.
Section snippets
Participants
16 right-handed participants (9 females; mean age 23 years, range 18–31) took part in the experiment after providing informed consent and received compensation for their participation. Handedness was determined using the Edinburgh Handedness Inventory (Oldfield, 1971). All participants were self-reported as having normal hearing and no neurological deficits. One participant was removed because he did not input his behavioral ratings as instructed. Another was removed due to too much noise in the
Intelligibility and sharpness
Intelligibility ratings (Fig. 3) closely mirror Ghitza's (2012) psychophysical findings. We tested differences between conditions in a one-way repeated measures ANOVA and found a main effect of condition (F = 11.6, p < .0001). Using a post-hoc Tukey–Kramer multiple comparisons test, we determined that both the Noθ and Chθ conditions were significantly less intelligible than the Control condition (Noθ, p < .0001; Chθ, p < .0001). Furthermore, summing these two inputs into the Noθ + Chθ condition resulted
Discussion
This study demonstrates a clear relationship between envelope tracking in the auditory cortex and intelligibility of a speech signal. While this relationship has been suggested previously (Luo and Poeppel, 2007, Peelle et al., 2013), this particular method enables us to shed light on the nature of this relationship. Specifically, we suggest that reliable envelope tracking requires a sufficient degree of temporal envelope fluctuations at the cochlear output. These fluctuations are driven by
Conclusion
Our data paint an interesting picture of the role of neural envelope tracking in perceptual analysis of auditory signals, and ultimately in speech comprehension. Our interpretation of the data speaks first and foremost to the mechanism by which envelope-tracking activity is generated in auditory cortices. Namely, sharp fluctuations in critical band envelopes, driven by acoustic landmarks (e.g., edges), entrain the slow oscillations of auditory cortex, forcing the oscillation to track stimulus
Acknowledgments
This work is supported by NIH R01 DC05660 to D.P. We thank Jeff Walker for his expert technical support, and Benjamin Morillon for help with the analysis. Oded Ghitza is funded by a research grant from the United States Air Force Office of Scientific Research.
Conflict of InterestThe authors have no conflicts of interest.
References (45)
- et al.
Cortical oscillations and sensory predictions
Trends Cogn. Sci.
(2012) A temporal sampling framework for developmental dyslexia
Trends Cogn. Sci.
(2011)- et al.
Reduced phase locking to slow amplitude modulation in adults with dyslexia: an MEG study
NeuroImage
(2012) - et al.
Top-down knowledge supports the retrieval of lexical information from degraded speech
Brain Res.
(2007) - et al.
The neuromagnetic response to spoken sentences: co-modulation of theta band amplitude and phase
NeuroImage
(2012) - et al.
The spectrotemporal filter mechanism of auditory selective attention
Neuron
(2013) - et al.
Phase patterns of neuronal responses reliably discriminate speech in human auditory cortex
Neuron
(2007) The assessment and analysis of handedness: the Edinburgh inventory
Neuropsychologia
(1971)The analysis of speech in different temporal integration windows: cerebral lateralization as ‘asymmetric sampling in time’
Speech Commun.
(2003)- et al.
Auditory M100 component 1: relationship to Heschl's gyri
Brain Res. Cogn. Brain Res.
(1994)