Elsevier

NeuroImage

Volume 85, Part 2, 15 January 2014, Pages 761-768
NeuroImage

Acoustic landmarks drive delta–theta oscillations to enable speech comprehension by facilitating perceptual parsing

https://doi.org/10.1016/j.neuroimage.2013.06.035Get rights and content

Highlights

  • We presented speech with modified temporal fluctuations (2–9 Hz) and recorded MEG.

  • Sharp fluctuations (edges) enabled stimulus-tracking using oscillations.

  • The effect was found only at the stimulus syllabic rate (2–4 Hz).

  • Stimulus-tracking by theta oscillations underpin intelligibility.

Abstract

A growing body of research suggests that intrinsic neuronal slow (< 10 Hz) oscillations in auditory cortex appear to track incoming speech and other spectro-temporally complex auditory signals. Within this framework, several recent studies have identified critical-band temporal envelopes as the specific acoustic feature being reflected by the phase of these oscillations. However, how this alignment between speech acoustics and neural oscillations might underpin intelligibility is unclear. Here we test the hypothesis that the ‘sharpness’ of temporal fluctuations in the critical band envelope acts as a temporal cue to speech syllabic rate, driving delta–theta rhythms to track the stimulus and facilitate intelligibility. We interpret our findings as evidence that sharp events in the stimulus cause cortical rhythms to re-align and parse the stimulus into syllable-sized chunks for further decoding. Using magnetoencephalographic recordings, we show that by removing temporal fluctuations that occur at the syllabic rate, envelope-tracking activity is reduced. By artificially reinstating these temporal fluctuations, envelope-tracking activity is regained. These changes in tracking correlate with intelligibility of the stimulus. Together, the results suggest that the sharpness of fluctuations in the stimulus, as reflected in the cochlear output, drive oscillatory activity to track and entrain to the stimulus, at its syllabic rate. This process likely facilitates parsing of the stimulus into meaningful chunks appropriate for subsequent decoding, enhancing perception and intelligibility.

Introduction

Because auditory signals unfold over time, at multiple scales, the process of decoding input sounds to link them to meaningful objects or concepts requires integrating sensory information over time. In speech perception, this temporal integration must occur in at least two (and arguably more) distinct timescales which relate to syllabic-level (~ 200 ms or ~ 5 Hz) and phonemic-level (~ 25 ms or ~ 40 Hz) information. Several models have suggested that this type of multi-time resolution analysis and integration could be performed in auditory cortex using neuronal oscillations – corresponding to these two temporal windows of integration (~ 5 Hz, theta; ~ 40 Hz, gamma) – to parse the sound input at these separate timescales (Ghitza, 2011, Poeppel, 2003). It is hypothesized, in particular, that the phase of the slow oscillation (nested with gamma) locks to the syllabic rhythm to optimally decode and integrate syllabic and phonemic speech features (Giraud and Poeppel, 2012).

In this magnetoencephalography (MEG) study, we focus on the role of the longer temporal window, most readily corresponding to delta–theta oscillations, to gain a better mechanistic understanding of how neuronal activity in this band might underpin auditory perception and speech comprehension. Recently, much research has focused on slow neural oscillations and their relationship to auditory stimuli (Cogan and Poeppel, 2011, Ding and Simon, 2009, Howard and Poeppel, 2010, Howard and Poeppel, 2012, Luo and Poeppel, 2007, Luo and Poeppel, 2012, Peelle et al., 2013). In addition, the relevance of low–modulation frequency oscillations to multi-sensory perception has been demonstrated, for example in naturalistic scenes or the well-studied cocktail party scenario (Kerlin et al., 2010, Luo et al., 2010, Zion Golumbic et al., 2013). There is an emerging consensus that the phase of slow oscillations precisely tracks the stimulus acoustics. However whether this stimulus–response alignment across time is necessary for speech comprehension remains debated (Howard and Poeppel, 2010; versus Luo and Poeppel, 2007, Peelle et al., 2013). One hypothesis is that cortical delta–theta oscillations track the critical band envelopes of the stimulus — a feature which carries crucial cues regarding segmental and syllabic information (Rosen, 1992)1. Despite the body of research showing this oscillation tracking the envelope, it remains unclear which aspects of the stimulus drive this response. One plausible hypothesis generated from the Giraud and Poeppel (2012) model suggests that it is the onsets of syllables that produce temporal fluctuations, which entrain slow neural oscillations at the syllabic rate. Here, we test this hypothesis by filtering these fluctuations in very particular ways and analyzing the effect on oscillatory entrainment. As such, the principal goal of this study is to understand more clearly the mechanisms of slow oscillation envelope tracking and, in particular, to uncover aspects in the temporal domain of the stimulus that drive this neuronal activity.

It has recently been demonstrated that theta envelope tracking of speech is enhanced by stimulus intelligibility (Peelle and Davis, 2012, Peelle et al., 2013), while earlier work showed similar neural phase-locking for sentences played backwards (no intelligibility) and forwards (Howard and Poeppel, 2010). Thus the question of whether the linguistic content of the stimuli induces a top-down ‘amplification’ of the oscillation-based envelope-tracking mechanism is debated. As a result, a secondary goal of this study is to investigate how envelope tracking relates to intelligibility and to understand its putative function in the broader context of speech perception.

This neurophysiological experiment builds on a recent behavioral study that manipulated the temporal acoustic features of speech to delineate the role of low frequency (syllabic) cues in speech intelligibility (Ghitza, 2012). Artificially removing exactly those temporal fluctuations in the critical band envelopes that relate to the syllabic rate (2–9 Hz) significantly reduces the intelligibility of the degraded speech. However, when brief noise bursts are added to the degraded stimulus precisely where the ‘acoustic landmarks’2 of the original would have been, the error rate drops by about 50%. The interpretation proposed to explain this psychophysical effect is that removing these cues disrupts the ability of cortical delta–theta oscillations to track the stimulus envelope. While removing slow fluctuations from the stimulus reduced intelligibility, reinstating temporal cues artificially by using transient edges at landmark positions enhanced intelligibility.

We hypothesize that temporal cues that reflect the syllabic rate are at the origin of the envelope-tracking phenomenon, which in turn constitutes a crucial condition for continuous speech to be intelligible. Specifically, we propose that acoustic landmarks entrain intrinsic cortical oscillations to permit the extraction of temporal primitives and subsequently finer grained speech features in a decoding stage. This quasi-periodicity generates the envelope tracking behavior, which could have the capacity to parse the stimulus into syllable-size representations.

Section snippets

Participants

16 right-handed participants (9 females; mean age 23 years, range 18–31) took part in the experiment after providing informed consent and received compensation for their participation. Handedness was determined using the Edinburgh Handedness Inventory (Oldfield, 1971). All participants were self-reported as having normal hearing and no neurological deficits. One participant was removed because he did not input his behavioral ratings as instructed. Another was removed due to too much noise in the

Intelligibility and sharpness

Intelligibility ratings (Fig. 3) closely mirror Ghitza's (2012) psychophysical findings. We tested differences between conditions in a one-way repeated measures ANOVA and found a main effect of condition (F = 11.6, p < .0001). Using a post-hoc Tukey–Kramer multiple comparisons test, we determined that both the Noθ and Chθ conditions were significantly less intelligible than the Control condition (Noθ, p < .0001; Chθ, p < .0001). Furthermore, summing these two inputs into the Noθ + Chθ condition resulted

Discussion

This study demonstrates a clear relationship between envelope tracking in the auditory cortex and intelligibility of a speech signal. While this relationship has been suggested previously (Luo and Poeppel, 2007, Peelle et al., 2013), this particular method enables us to shed light on the nature of this relationship. Specifically, we suggest that reliable envelope tracking requires a sufficient degree of temporal envelope fluctuations at the cochlear output. These fluctuations are driven by

Conclusion

Our data paint an interesting picture of the role of neural envelope tracking in perceptual analysis of auditory signals, and ultimately in speech comprehension. Our interpretation of the data speaks first and foremost to the mechanism by which envelope-tracking activity is generated in auditory cortices. Namely, sharp fluctuations in critical band envelopes, driven by acoustic landmarks (e.g., edges), entrain the slow oscillations of auditory cortex, forcing the oscillation to track stimulus

Acknowledgments

This work is supported by NIH R01 DC05660 to D.P. We thank Jeff Walker for his expert technical support, and Benjamin Morillon for help with the analysis. Oded Ghitza is funded by a research grant from the United States Air Force Office of Scientific Research.

Conflict of InterestThe authors have no conflicts of interest.

References (45)

  • C.E. Schroeder et al.

    Low-frequency neuronal oscillations as instruments of sensory selection

    Trends Neurosci.

    (2009)
  • C.E. Schroeder et al.

    Dynamics of active sensing and perceptual selection

    Curr. Opin. Neurobiol.

    (2010)
  • J.M. Thomson et al.

    Rhythmic processing in children with developmental dyslexia: auditory and motor rhythms link to reading and spelling

    J. Physiol. Paris

    (2008)
  • J.M. Thomson et al.

    The ERP signature of sound rise time changes

    Brain Res.

    (2009)
  • D.A. Abrams et al.

    Abnormal cortical processing of the syllable rate of speech in poor readers

    J. Neurosci.

    (2009)
  • Y. Adachi et al.

    Reduction of nonperiodic environmental magnetic noise in MEG measurement by continuously adjusted least square method

    IEEE Trans. Appl. Supercond.

    (2001)
  • E. Ahissar et al.

    Speech comprehension is correlated with temporal response patterns recorded from auditory cortex

    Proc. Natl. Acad. Sci. U. S. A.

    (2001)
  • J. Besle et al.

    Tuning of the human neocortex to the temporal dynamics of attended events

    J. Neurosci.

    (2011)
  • G.B. Cogan et al.

    A mutual information analysis of neural coding of speech by low-frequency MEG phase information

    J. Neurophysiol.

    (2011)
  • N. Ding et al.

    Neural representations of complex temporal modulations in the human auditory cortex

    J. Neurophysiol.

    (2009)
  • R. Drullman et al.

    Effect of reducing slow temporal modulations on speech reception

    J. Acoust. Soc. Am.

    (1994)
  • B.J. Farley et al.

    Spatiotemporal coordination of slow-wave ongoing activity across auditory cortical areas

    J. Neurosci.

    (2013)
  • Cited by (0)

    View full text