Elsevier

NeuroImage

Volume 124, Part A, 1 January 2016, Pages 16-23
NeuroImage

EEG oscillations entrain their phase to high-level features of speech sound

https://doi.org/10.1016/j.neuroimage.2015.08.054Get rights and content

Highlights

  • Speech/noise stimuli were presented without systematic changes in spectral content.

  • EEG oscillations entrained their phase to those stimuli.

  • Phase but not degree of entrainment differed from response to original speech.

  • Time-reversal of the speech/noise stimuli did not abolish the effect.

  • Shows that entrainment to speech entails an acoustic high-level component.

Abstract

Phase entrainment of neural oscillations, the brain's adjustment to rhythmic stimulation, is a central component in recent theories of speech comprehension: the alignment between brain oscillations and speech sound improves speech intelligibility. However, phase entrainment to everyday speech sound could also be explained by oscillations passively following the low-level periodicities (e.g., in sound amplitude and spectral content) of auditory stimulation—and not by an adjustment to the speech rhythm per se. Recently, using novel speech/noise mixture stimuli, we have shown that behavioral performance can entrain to speech sound even when high-level features (including phonetic information) are not accompanied by fluctuations in sound amplitude and spectral content. In the present study, we report that neural phase entrainment might underlie our behavioral findings. We observed phase-locking between electroencephalogram (EEG) and speech sound in response not only to original (unprocessed) speech but also to our constructed “high-level” speech/noise mixture stimuli. Phase entrainment to original speech and speech/noise sound did not differ in the degree of entrainment, but rather in the actual phase difference between EEG signal and sound. Phase entrainment was not abolished when speech/noise stimuli were presented in reverse (which disrupts semantic processing), indicating that acoustic (rather than linguistic) high-level features play a major role in the observed neural entrainment. Our results provide further evidence for phase entrainment as a potential mechanism underlying speech processing and segmentation, and for the involvement of high-level processes in the adjustment to the rhythm of speech.

Introduction

The auditory environment is essentially rhythmic (e.g., music, speech, animal calls), and relevant information (e.g., phonemes, sounds) alternates with irrelevant input (such as silence in-between) in a regular fashion. Based on these environmental rhythms, the brain might have developed a clever tool for an efficient way of stimulus processing (Calderone et al., 2014, Schroeder and Lakatos, 2009): Neural oscillations could align their high excitability (i.e., amplifying) phase with regularly occurring important events, whereas their low excitability (i.e., suppressive) phase could coincide with irrelevant events.

This phenomenon has been called phase entrainment and has been shown to improve speech intelligibility (Ahissar et al., 2001, Kerlin et al., 2010, Luo and Poeppel, 2007). However, the presented stimuli in most experiments contain pronounced fluctuations in (sound) amplitude and may simply evoke a passive “amplitude following” of brain oscillations (i.e., auditory steady-state potentials, ASSR; Galambos et al., 1981). In other words, past reports of phase entrainment to speech might reflect an adjustment to fluctuations in low-level features and/or to co-varying high-level features1 of speech sound. Critically, in the former case, phase entrainment would only reflect the periodicity of the auditory stimulation and could not be seen as an active “tool” for efficient stimulus processing (VanRullen et al., 2014). On the other hand, were one able to observe phase adjustment to (hypothetical) speech-like stimuli that retain a regular speech structure but that do not evoke ASSR at a purely sensory level of auditory processing (such as the cochlea), this would provide important evidence for the proposed active mechanism of stimulus processing (Giraud and Poeppel, 2012, Schroeder et al., 2010). Recently, we reported the construction of such stimuli (Zoefel and VanRullen, 2015)—speech/noise snippets with conserved patterns of high-level features, but without concomitant changes in sound amplitude or spectral content. We could show that auditory behavioral performance entrains to those stimuli, as detection of a tone pip was modulated by the phase of the preserved high-level rhythm. However, it remained to be tested whether this behavioral modulation also entails neural phase entrainment.

In addition, we focus on a highly relevant question recently brought up by Peelle and Davis (2012), based on the previously reported correlation between phase entrainment and intelligibility (Ahissar et al., 2001, Kerlin et al., 2010, Luo and Poeppel, 2007): Does speech intelligibility enhance phase entrainment, or does phase entrainment enhance speech intelligibility? If the latter is true, so they argue, phase entrainment should occur based on acoustic (e.g., voice gender, identity) and not linguistic (e.g., semantic) information. Still, so far, this question remains unsolved: Although behavioral phase entrainment does depend on linguistic cues (the observed phase adjustment for our speech/noise mixture stimuli did not occur for time-reversed stimuli; Zoefel and VanRullen, 2015), this does not have to be the case for the potentially underlying neural phase entrainment. Thus, we compared entrainment of EEG oscillations to original (unprocessed) speech snippets with that to our constructed speech/noise mixture stimuli but also to reversed speech/noise snippets (Fig. 1).

Section snippets

Participants

Twelve participants volunteered after giving written informed consent (7 female; mean age: 27.6 years). All participants reported normal hearing and received compensation for their time. The experimental protocol was approved by the relevant ethical committee at Centre National de la Recherche Scientifique (CNRS).

Experimental stimuli

A detailed description of stimulus construction was given by Zoefel and VanRullen (2015). In short, phase-specific auditory noise was added to original snippets such that sound

Results

We presented 12 subjects with speech/noise stimuli without systematic fluctuations in low-level features (here defined as sound amplitude and spectral content; see Zoefel and VanRullen (2015) for a detailed discussion of this definition), but with intact high-level features of speech sound, fluctuating at ~ 2  8 Hz (“constructed condition”). Additionally, those speech/noise snippets were presented in reverse (“constructed reversed condition”), thus potentially disentangling high-level features

Discussion

Phase entrainment of neural oscillations as a potential tool for efficient stimulus processing has been described repeatedly (Calderone et al., 2014, Lakatos et al., 2005, Lakatos et al., 2013, Schroeder et al., 2010, Schroeder and Lakatos, 2009) and is paramount in current theories of speech comprehension (Doelling et al., 2014, Ghitza, 2011, Ghitza, 2012, Ghitza, 2013, Ghitza, 2014, Giraud and Poeppel, 2012, Zion Golumbic et al., 2013). However, the underlying mechanisms are far from clear (

Acknowledgements

The authors are grateful to Alain de Cheveigné and Daniel Pressnitzer for helpful comments and discussions. This study was supported by a Studienstiftung des deutschen Volkes (German National Academic Foundation) scholarship to BZ, and a EURYI Award as well as an ERC Consolidator grant P-CYCLES under grant agreement 614244 to RV.

References (64)

  • C.E. Schroeder et al.

    Low-frequency neuronal oscillations as instruments of sensory selection

    Trends Neurosci.

    (2009)
  • C.E. Schroeder et al.

    Dynamics of active sensing and perceptual selection

    Curr. Opin. Neurobiol.

    (2010)
  • S. Uppenkamp et al.

    Locating the initial stages of speech-sound processing in human temporal cortex

    NeuroImage

    (2006)
  • R. VanRullen et al.

    Perceptual echoes at 10 Hz in the human brain

    Curr. Biol.

    (2012)
  • E.M. Zion Golumbic et al.

    Mechanisms underlying selective neuronal tracking of attended speech at a “cocktail party.”

    Neuron

    (2013)
  • D.A. Abrams et al.

    Right-hemisphere auditory cortex is dominant for coding syllable patterns in speech

    J. Neurosci.

    (2008)
  • E. Ahissar et al.

    Speech comprehension is correlated with temporal response patterns recorded from auditory cortex

    Proc. Natl. Acad. Sci. U. S. A.

    (2001)
  • A.M. Bastos et al.

    Simultaneous recordings from the primary visual cortex and lateral geniculate nucleus reveal rhythmic interactions and a cortical source for γ-band oscillations

    J. Neurosci.

    (2014)
  • Y. Benjamini et al.

    Controlling the false discovery rate: a practical and powerful approach to multiple testing

    J. R. Stat. Soc. Ser. B Methodol.

    (1995)
  • J. Besle et al.

    Tuning of the human neocortex to the temporal dynamics of attended events

    J. Neurosci.

    (2011)
  • J.R. Binder et al.

    Human temporal lobe activation by speech and nonspeech sounds

    Cereb. Cortex

    (2000)
  • D.H. Brainard

    The psychophysics toolbox

    Spat. Vis.

    (1997)
  • E.A. Buffalo et al.

    Laminar differences in gamma and alpha coherence in the ventral stream

    Proc. Natl. Acad. Sci. U. S. A.

    (2011)
  • G. Buzsáki et al.

    The origin of extracellular fields and currents—EEG, ECoG, LFP and spikes

    Nat. Rev. Neurosci.

    (2012)
  • M.H. Davis et al.

    Hierarchical processing in spoken language comprehension

    J. Neurosci.

    (2003)
  • I. DeWitt et al.

    Phoneme and word recognition in the auditory ventral stream

    Proc. Natl. Acad. Sci. U. S. A.

    (2012)
  • N. Ding et al.

    Robust cortical entrainment to the speech envelope relies on the spectro-temporal fine structure

    NeuroImage

    (2013)
  • N. Ding et al.

    Emergence of neural encoding of auditory objects while listening to competing speakers

    Proc. Natl. Acad. Sci. U. S. A.

    (2012)
  • N. Ding et al.

    Cortical entrainment to continuous speech: functional roles and interpretations

    Front. Hum. Neurosci.

    (2014)
  • L. Fontolan et al.

    The contribution of frequency-specific activity to hierarchical information processing in the human auditory cortex

    Nat. Commun.

    (2014)
  • R. Galambos et al.

    A 40-Hz auditory potential recorded from the human scalp

    Proc. Natl. Acad. Sci. U. S. A.

    (1981)
  • O. Ghitza

    Linking speech perception and neurophysiology: speech decoding guided by cascaded oscillators locked to the input rhythm

    Front. Psychol.

    (2011)
  • Cited by (85)

    • Better speech-in-noise comprehension is associated with enhanced neural speech tracking in older adults with hearing impairment

      2022, Cortex
      Citation Excerpt :

      However, instead of extracting a single peak value, we compared each condition to the control condition for time lags ranging from 0 to 300 msec as these latency windows are known to include two prominent peaks in the cross-correlation function (e.g., Braiman et al., 2018; Horton et al., 2013; Zoefel & VanRullen, 2016), which were also apparent in our data (Fig. 2C). As opposed to Petersen et al. (2017) and in line with two previous studies (Braiman et al., 2018; Zoefel & VanRullen, 2016), we limited our analyses to these first two peaks, as the grand average neural tracking response in Babble appeared to “break down” in time lags of greater latency (Fig. 2C). To control for potential perceptual differences between the conditions, only correctly comprehended trials were extracted and compared with the control condition.

    View all citing articles on Scopus
    View full text