EEG oscillations entrain their phase to high-level features of speech sound
Introduction
The auditory environment is essentially rhythmic (e.g., music, speech, animal calls), and relevant information (e.g., phonemes, sounds) alternates with irrelevant input (such as silence in-between) in a regular fashion. Based on these environmental rhythms, the brain might have developed a clever tool for an efficient way of stimulus processing (Calderone et al., 2014, Schroeder and Lakatos, 2009): Neural oscillations could align their high excitability (i.e., amplifying) phase with regularly occurring important events, whereas their low excitability (i.e., suppressive) phase could coincide with irrelevant events.
This phenomenon has been called phase entrainment and has been shown to improve speech intelligibility (Ahissar et al., 2001, Kerlin et al., 2010, Luo and Poeppel, 2007). However, the presented stimuli in most experiments contain pronounced fluctuations in (sound) amplitude and may simply evoke a passive “amplitude following” of brain oscillations (i.e., auditory steady-state potentials, ASSR; Galambos et al., 1981). In other words, past reports of phase entrainment to speech might reflect an adjustment to fluctuations in low-level features and/or to co-varying high-level features1 of speech sound. Critically, in the former case, phase entrainment would only reflect the periodicity of the auditory stimulation and could not be seen as an active “tool” for efficient stimulus processing (VanRullen et al., 2014). On the other hand, were one able to observe phase adjustment to (hypothetical) speech-like stimuli that retain a regular speech structure but that do not evoke ASSR at a purely sensory level of auditory processing (such as the cochlea), this would provide important evidence for the proposed active mechanism of stimulus processing (Giraud and Poeppel, 2012, Schroeder et al., 2010). Recently, we reported the construction of such stimuli (Zoefel and VanRullen, 2015)—speech/noise snippets with conserved patterns of high-level features, but without concomitant changes in sound amplitude or spectral content. We could show that auditory behavioral performance entrains to those stimuli, as detection of a tone pip was modulated by the phase of the preserved high-level rhythm. However, it remained to be tested whether this behavioral modulation also entails neural phase entrainment.
In addition, we focus on a highly relevant question recently brought up by Peelle and Davis (2012), based on the previously reported correlation between phase entrainment and intelligibility (Ahissar et al., 2001, Kerlin et al., 2010, Luo and Poeppel, 2007): Does speech intelligibility enhance phase entrainment, or does phase entrainment enhance speech intelligibility? If the latter is true, so they argue, phase entrainment should occur based on acoustic (e.g., voice gender, identity) and not linguistic (e.g., semantic) information. Still, so far, this question remains unsolved: Although behavioral phase entrainment does depend on linguistic cues (the observed phase adjustment for our speech/noise mixture stimuli did not occur for time-reversed stimuli; Zoefel and VanRullen, 2015), this does not have to be the case for the potentially underlying neural phase entrainment. Thus, we compared entrainment of EEG oscillations to original (unprocessed) speech snippets with that to our constructed speech/noise mixture stimuli but also to reversed speech/noise snippets (Fig. 1).
Section snippets
Participants
Twelve participants volunteered after giving written informed consent (7 female; mean age: 27.6 years). All participants reported normal hearing and received compensation for their time. The experimental protocol was approved by the relevant ethical committee at Centre National de la Recherche Scientifique (CNRS).
Experimental stimuli
A detailed description of stimulus construction was given by Zoefel and VanRullen (2015). In short, phase-specific auditory noise was added to original snippets such that sound
Results
We presented 12 subjects with speech/noise stimuli without systematic fluctuations in low-level features (here defined as sound amplitude and spectral content; see Zoefel and VanRullen (2015) for a detailed discussion of this definition), but with intact high-level features of speech sound, fluctuating at ~ 2 − 8 Hz (“constructed condition”). Additionally, those speech/noise snippets were presented in reverse (“constructed reversed condition”), thus potentially disentangling high-level features
Discussion
Phase entrainment of neural oscillations as a potential tool for efficient stimulus processing has been described repeatedly (Calderone et al., 2014, Lakatos et al., 2005, Lakatos et al., 2013, Schroeder et al., 2010, Schroeder and Lakatos, 2009) and is paramount in current theories of speech comprehension (Doelling et al., 2014, Ghitza, 2011, Ghitza, 2012, Ghitza, 2013, Ghitza, 2014, Giraud and Poeppel, 2012, Zion Golumbic et al., 2013). However, the underlying mechanisms are far from clear (
Acknowledgements
The authors are grateful to Alain de Cheveigné and Daniel Pressnitzer for helpful comments and discussions. This study was supported by a Studienstiftung des deutschen Volkes (German National Academic Foundation) scholarship to BZ, and a EURYI Award as well as an ERC Consolidator grant P-CYCLES under grant agreement 614244 to RV.
References (64)
- et al.
Cortical oscillations and sensory predictions
Trends Cogn. Sci.
(2012) - et al.
Entrainment of neural oscillations as a modifiable substrate of attention
Trends Cogn. Sci.
(2014) - et al.
EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis
J. Neurosci. Methods
(2004) - et al.
Acoustic landmarks drive delta-theta oscillations to enable speech comprehension by facilitating perceptual parsing
NeuroImage
(2014) - et al.
The spectrotemporal filter mechanism of auditory selective attention
Neuron
(2013) - et al.
Phase patterns of neuronal responses reliably discriminate speech in human auditory cortex
Neuron
(2007) - et al.
Neural response phase tracks how listeners learn new acoustic representations
Curr. Biol.
(2013) - et al.
Brain dynamics encode the spectrotemporal boundaries of auditory objects
Hear. Res.
(2013) - et al.
Dual mechanism of neuronal ensemble inhibition in primary auditory cortex
Neuron
(2011) The analysis of speech in different temporal integration windows: cerebral lateralization as “asymmetric sampling in time.”
Speech Comm.
(2003)
Low-frequency neuronal oscillations as instruments of sensory selection
Trends Neurosci.
Dynamics of active sensing and perceptual selection
Curr. Opin. Neurobiol.
Locating the initial stages of speech-sound processing in human temporal cortex
NeuroImage
Perceptual echoes at 10 Hz in the human brain
Curr. Biol.
Mechanisms underlying selective neuronal tracking of attended speech at a “cocktail party.”
Neuron
Right-hemisphere auditory cortex is dominant for coding syllable patterns in speech
J. Neurosci.
Speech comprehension is correlated with temporal response patterns recorded from auditory cortex
Proc. Natl. Acad. Sci. U. S. A.
Simultaneous recordings from the primary visual cortex and lateral geniculate nucleus reveal rhythmic interactions and a cortical source for γ-band oscillations
J. Neurosci.
Controlling the false discovery rate: a practical and powerful approach to multiple testing
J. R. Stat. Soc. Ser. B Methodol.
Tuning of the human neocortex to the temporal dynamics of attended events
J. Neurosci.
Human temporal lobe activation by speech and nonspeech sounds
Cereb. Cortex
The psychophysics toolbox
Spat. Vis.
Laminar differences in gamma and alpha coherence in the ventral stream
Proc. Natl. Acad. Sci. U. S. A.
The origin of extracellular fields and currents—EEG, ECoG, LFP and spikes
Nat. Rev. Neurosci.
Hierarchical processing in spoken language comprehension
J. Neurosci.
Phoneme and word recognition in the auditory ventral stream
Proc. Natl. Acad. Sci. U. S. A.
Robust cortical entrainment to the speech envelope relies on the spectro-temporal fine structure
NeuroImage
Emergence of neural encoding of auditory objects while listening to competing speakers
Proc. Natl. Acad. Sci. U. S. A.
Cortical entrainment to continuous speech: functional roles and interpretations
Front. Hum. Neurosci.
The contribution of frequency-specific activity to hierarchical information processing in the human auditory cortex
Nat. Commun.
A 40-Hz auditory potential recorded from the human scalp
Proc. Natl. Acad. Sci. U. S. A.
Linking speech perception and neurophysiology: speech decoding guided by cascaded oscillators locked to the input rhythm
Front. Psychol.
Cited by (85)
Better speech-in-noise comprehension is associated with enhanced neural speech tracking in older adults with hearing impairment
2022, CortexCitation Excerpt :However, instead of extracting a single peak value, we compared each condition to the control condition for time lags ranging from 0 to 300 msec as these latency windows are known to include two prominent peaks in the cross-correlation function (e.g., Braiman et al., 2018; Horton et al., 2013; Zoefel & VanRullen, 2016), which were also apparent in our data (Fig. 2C). As opposed to Petersen et al. (2017) and in line with two previous studies (Braiman et al., 2018; Zoefel & VanRullen, 2016), we limited our analyses to these first two peaks, as the grand average neural tracking response in Babble appeared to “break down” in time lags of greater latency (Fig. 2C). To control for potential perceptual differences between the conditions, only correctly comprehended trials were extracted and compared with the control condition.