EEG oscillations entrain their phase to high-level features of speech sound

doi:10.1016/j.neuroimage.2015.08.054

NeuroImage

Volume 124, Part A, 1 January 2016, Pages 16-23

https://doi.org/10.1016/j.neuroimage.2015.08.054 Get rights and content

Highlights

•
Speech/noise stimuli were presented without systematic changes in spectral content.
•
EEG oscillations entrained their phase to those stimuli.
•
Phase but not degree of entrainment differed from response to original speech.
•
Time-reversal of the speech/noise stimuli did not abolish the effect.
•
Shows that entrainment to speech entails an acoustic high-level component.

Abstract

Phase entrainment of neural oscillations, the brain's adjustment to rhythmic stimulation, is a central component in recent theories of speech comprehension: the alignment between brain oscillations and speech sound improves speech intelligibility. However, phase entrainment to everyday speech sound could also be explained by oscillations passively following the low-level periodicities (e.g., in sound amplitude and spectral content) of auditory stimulation—and not by an adjustment to the speech rhythm per se. Recently, using novel speech/noise mixture stimuli, we have shown that behavioral performance can entrain to speech sound even when high-level features (including phonetic information) are not accompanied by fluctuations in sound amplitude and spectral content. In the present study, we report that neural phase entrainment might underlie our behavioral findings. We observed phase-locking between electroencephalogram (EEG) and speech sound in response not only to original (unprocessed) speech but also to our constructed “high-level” speech/noise mixture stimuli. Phase entrainment to original speech and speech/noise sound did not differ in the degree of entrainment, but rather in the actual phase difference between EEG signal and sound. Phase entrainment was not abolished when speech/noise stimuli were presented in reverse (which disrupts semantic processing), indicating that acoustic (rather than linguistic) high-level features play a major role in the observed neural entrainment. Our results provide further evidence for phase entrainment as a potential mechanism underlying speech processing and segmentation, and for the involvement of high-level processes in the adjustment to the rhythm of speech.

Introduction

The auditory environment is essentially rhythmic (e.g., music, speech, animal calls), and relevant information (e.g., phonemes, sounds) alternates with irrelevant input (such as silence in-between) in a regular fashion. Based on these environmental rhythms, the brain might have developed a clever tool for an efficient way of stimulus processing (Calderone et al., 2014, Schroeder and Lakatos, 2009): Neural oscillations could align their high excitability (i.e., amplifying) phase with regularly occurring important events, whereas their low excitability (i.e., suppressive) phase could coincide with irrelevant events.

This phenomenon has been called phase entrainment and has been shown to improve speech intelligibility (Ahissar et al., 2001, Kerlin et al., 2010, Luo and Poeppel, 2007). However, the presented stimuli in most experiments contain pronounced fluctuations in (sound) amplitude and may simply evoke a passive “amplitude following” of brain oscillations (i.e., auditory steady-state potentials, ASSR; Galambos et al., 1981). In other words, past reports of phase entrainment to speech might reflect an adjustment to fluctuations in low-level features and/or to co-varying high-level features¹ of speech sound. Critically, in the former case, phase entrainment would only reflect the periodicity of the auditory stimulation and could not be seen as an active “tool” for efficient stimulus processing (VanRullen et al., 2014). On the other hand, were one able to observe phase adjustment to (hypothetical) speech-like stimuli that retain a regular speech structure but that do not evoke ASSR at a purely sensory level of auditory processing (such as the cochlea), this would provide important evidence for the proposed active mechanism of stimulus processing (Giraud and Poeppel, 2012, Schroeder et al., 2010). Recently, we reported the construction of such stimuli (Zoefel and VanRullen, 2015)—speech/noise snippets with conserved patterns of high-level features, but without concomitant changes in sound amplitude or spectral content. We could show that auditory behavioral performance entrains to those stimuli, as detection of a tone pip was modulated by the phase of the preserved high-level rhythm. However, it remained to be tested whether this behavioral modulation also entails neural phase entrainment.

In addition, we focus on a highly relevant question recently brought up by Peelle and Davis (2012), based on the previously reported correlation between phase entrainment and intelligibility (Ahissar et al., 2001, Kerlin et al., 2010, Luo and Poeppel, 2007): Does speech intelligibility enhance phase entrainment, or does phase entrainment enhance speech intelligibility? If the latter is true, so they argue, phase entrainment should occur based on acoustic (e.g., voice gender, identity) and not linguistic (e.g., semantic) information. Still, so far, this question remains unsolved: Although behavioral phase entrainment does depend on linguistic cues (the observed phase adjustment for our speech/noise mixture stimuli did not occur for time-reversed stimuli; Zoefel and VanRullen, 2015), this does not have to be the case for the potentially underlying neural phase entrainment. Thus, we compared entrainment of EEG oscillations to original (unprocessed) speech snippets with that to our constructed speech/noise mixture stimuli but also to reversed speech/noise snippets (Fig. 1).

Section snippets

Participants

Twelve participants volunteered after giving written informed consent (7 female; mean age: 27.6 years). All participants reported normal hearing and received compensation for their time. The experimental protocol was approved by the relevant ethical committee at Centre National de la Recherche Scientifique (CNRS).

Experimental stimuli

A detailed description of stimulus construction was given by Zoefel and VanRullen (2015). In short, phase-specific auditory noise was added to original snippets such that sound

Results

We presented 12 subjects with speech/noise stimuli without systematic fluctuations in low-level features (here defined as sound amplitude and spectral content; see Zoefel and VanRullen (2015) for a detailed discussion of this definition), but with intact high-level features of speech sound, fluctuating at ~ 2 − 8 Hz (“constructed condition”). Additionally, those speech/noise snippets were presented in reverse (“constructed reversed condition”), thus potentially disentangling high-level features

Discussion

Phase entrainment of neural oscillations as a potential tool for efficient stimulus processing has been described repeatedly (Calderone et al., 2014, Lakatos et al., 2005, Lakatos et al., 2013, Schroeder et al., 2010, Schroeder and Lakatos, 2009) and is paramount in current theories of speech comprehension (Doelling et al., 2014, Ghitza, 2011, Ghitza, 2012, Ghitza, 2013, Ghitza, 2014, Giraud and Poeppel, 2012, Zion Golumbic et al., 2013). However, the underlying mechanisms are far from clear (

Acknowledgements

The authors are grateful to Alain de Cheveigné and Daniel Pressnitzer for helpful comments and discussions. This study was supported by a Studienstiftung des deutschen Volkes (German National Academic Foundation) scholarship to BZ, and a EURYI Award as well as an ERC Consolidator grant P-CYCLES under grant agreement 614244 to RV.

References (64)

L.H. Arnal et al.
Cortical oscillations and sensory predictions
Trends Cogn. Sci.
(2012)
D.J. Calderone et al.
Entrainment of neural oscillations as a modifiable substrate of attention
Trends Cogn. Sci.
(2014)
A. Delorme et al.
EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis
J. Neurosci. Methods
(2004)
K.B. Doelling et al.
Acoustic landmarks drive delta-theta oscillations to enable speech comprehension by facilitating perceptual parsing
NeuroImage
(2014)
P. Lakatos et al.
The spectrotemporal filter mechanism of auditory selective attention
Neuron
(2013)
H. Luo et al.
Phase patterns of neuronal responses reliably discriminate speech in human auditory cortex
Neuron
(2007)
H. Luo et al.
Neural response phase tracks how listeners learn new acoustic representations
Curr. Biol.
(2013)
A.R. McMullan et al.
Brain dynamics encode the spectrotemporal boundaries of auditory objects
Hear. Res.
(2013)
M.N. O'Connell et al.
Dual mechanism of neuronal ensemble inhibition in primary auditory cortex
Neuron
(2011)
D. Poeppel
The analysis of speech in different temporal integration windows: cerebral lateralization as “asymmetric sampling in time.”
Speech Comm.
(2003)

C.E. Schroeder et al.

Low-frequency neuronal oscillations as instruments of sensory selection

Trends Neurosci.

(2009)

C.E. Schroeder et al.

Dynamics of active sensing and perceptual selection

Curr. Opin. Neurobiol.

(2010)

S. Uppenkamp et al.

Locating the initial stages of speech-sound processing in human temporal cortex

NeuroImage

(2006)

R. VanRullen et al.

Perceptual echoes at 10 Hz in the human brain

Curr. Biol.

(2012)

E.M. Zion Golumbic et al.

Mechanisms underlying selective neuronal tracking of attended speech at a “cocktail party.”

Neuron

(2013)

D.A. Abrams et al.

Right-hemisphere auditory cortex is dominant for coding syllable patterns in speech

J. Neurosci.

(2008)

E. Ahissar et al.

Speech comprehension is correlated with temporal response patterns recorded from auditory cortex

Proc. Natl. Acad. Sci. U. S. A.

(2001)

A.M. Bastos et al.

Simultaneous recordings from the primary visual cortex and lateral geniculate nucleus reveal rhythmic interactions and a cortical source for γ-band oscillations

J. Neurosci.

(2014)

Y. Benjamini et al.

Controlling the false discovery rate: a practical and powerful approach to multiple testing

J. R. Stat. Soc. Ser. B Methodol.

(1995)

J. Besle et al.

Tuning of the human neocortex to the temporal dynamics of attended events

J. Neurosci.

(2011)

J.R. Binder et al.

Human temporal lobe activation by speech and nonspeech sounds

Cereb. Cortex

(2000)

D.H. Brainard

The psychophysics toolbox

Spat. Vis.

(1997)

E.A. Buffalo et al.

Laminar differences in gamma and alpha coherence in the ventral stream

Proc. Natl. Acad. Sci. U. S. A.

(2011)

G. Buzsáki et al.

The origin of extracellular fields and currents—EEG, ECoG, LFP and spikes

Nat. Rev. Neurosci.

(2012)

M.H. Davis et al.

Hierarchical processing in spoken language comprehension

J. Neurosci.

(2003)

I. DeWitt et al.

Phoneme and word recognition in the auditory ventral stream

Proc. Natl. Acad. Sci. U. S. A.

(2012)

N. Ding et al.

Robust cortical entrainment to the speech envelope relies on the spectro-temporal fine structure

NeuroImage

(2013)

N. Ding et al.

Emergence of neural encoding of auditory objects while listening to competing speakers

Proc. Natl. Acad. Sci. U. S. A.

(2012)

N. Ding et al.

Cortical entrainment to continuous speech: functional roles and interpretations

Front. Hum. Neurosci.

(2014)

L. Fontolan et al.

The contribution of frequency-specific activity to hierarchical information processing in the human auditory cortex

Nat. Commun.

(2014)

R. Galambos et al.

A 40-Hz auditory potential recorded from the human scalp

Proc. Natl. Acad. Sci. U. S. A.

(1981)

O. Ghitza

Linking speech perception and neurophysiology: speech decoding guided by cascaded oscillators locked to the input rhythm

Front. Psychol.

(2011)

Cited by (85)

Neural envelope tracking predicts speech intelligibility and hearing aid benefit in children with hearing loss
2023, Hearing Research
Early assessment of hearing aid benefit is crucial, as the extent to which hearing aids provide audible speech information predicts speech and language outcomes. A growing body of research has proposed neural envelope tracking as an objective measure of speech intelligibility, particularly for individuals unable to provide reliable behavioral feedback. However, its potential for evaluating speech intelligibility and hearing aid benefit in children with hearing loss remains unexplored. In this study, we investigated neural envelope tracking in children with permanent hearing loss through two separate experiments. EEG data were recorded while children listened to age-appropriate stories (Experiment 1) or an animated movie (Experiment 2) under aided and unaided conditions (using personal hearing aids) at multiple stimulus intensities. Neural envelope tracking was evaluated using a linear decoder reconstructing the speech envelope from the EEG in the delta band (0.5-4 Hz). Additionally, we calculated temporal response functions (TRFs) to investigate the spatio-temporal dynamics of the response. In both experiments, neural tracking increased with increasing stimulus intensity, but only in the unaided condition. In the aided condition, neural tracking remained stable across a wide range of intensities, as long as speech intelligibility was maintained. Similarly, TRF amplitudes increased with increasing stimulus intensity in the unaided condition, while in the aided condition significant differences were found in TRF latency rather than TRF amplitude. This suggests that decreasing stimulus intensity does not necessarily impact neural tracking. Furthermore, the use of personal hearing aids significantly enhanced neural envelope tracking, particularly in challenging speech conditions that would be inaudible when unaided. Finally, we found a strong correlation between neural envelope tracking and behaviorally measured speech intelligibility for both narrated stories (Experiment 1) and movie stimuli (Experiment 2). Altogether, these findings indicate that neural envelope tracking could be a valuable tool for predicting speech intelligibility benefits derived from personal hearing aids in hearing-impaired children. Incorporating narrated stories or engaging movies expands the accessibility of these methods even in clinical settings, offering new avenues for using objective speech measures to guide pediatric audiology decision-making.
Individual neurophysiological signatures of spontaneous rhythm processing
2023, NeuroImage
When sensory input conveys rhythmic regularity, we can form predictions about the timing of upcoming events. Although rhythm processing capacities differ considerably between individuals, these differences are often obscured by participant- and trial-level data averaging procedures in M/EEG research. Here, we systematically assessed neurophysiological variability displayed by individuals listening to isochronous (1.54 Hz) equitone sequences interspersed with unexpected (amplitude-attenuated) deviant tones. Our approach aimed at revealing time-varying adaptive neural mechanisms for sampling the acoustic environment at multiple timescales. Rhythm tracking analyses confirmed that individuals encode temporal regularities and form temporal expectations, as indicated in delta-band (1.54 Hz) power and its anticipatory phase alignment to expected tone onsets. Zooming into tone- and participant-level data, we further characterized intra- and inter-individual variabilities in phase-alignment across auditory sequences. Further, individual modeling of beta-band tone-locked responses showed that a subset of auditory sequences was sampled rhythmically by superimposing binary (strong-weak; S-w), ternary (S-w-w) and mixed accentuation patterns. In these sequences, neural responses to standard and deviant tones were modulated by a binary accentuation pattern, thus pointing towards a mechanism of dynamic attending. Altogether, the current results point toward complementary roles of delta- and beta-band activity in rhythm processing and further highlight diverse and adaptive mechanisms to track and sample the acoustic environment at multiple timescales, even in the absence of task-specific instructions.
Neural tracking of speech envelope does not unequivocally reflect intelligibility
2023, NeuroImage
During listening, brain activity tracks the rhythmic structures of speech signals. Here, we directly dissociated the contribution of neural envelope tracking in the processing of speech acoustic cues from that related to linguistic processing. We examined the neural changes associated with the comprehension of Noise-Vocoded (NV) speech using magnetoencephalography (MEG). Participants listened to NV sentences in a 3-phase training paradigm: (1) pre-training, where NV stimuli were barely comprehended, (2) training with exposure of the original clear version of speech stimulus, and (3) post-training, where the same stimuli gained intelligibility from the training phase. Using this paradigm, we tested if the neural responses of a speech signal was modulated by its intelligibility without any change in its acoustic structure. To test the influence of spectral degradation on neural envelope tracking independently of training, participants listened to two types of NV sentences (4-band and 2-band NV speech), but were only trained to understand 4-band NV speech. Significant changes in neural tracking were observed in the delta range in relation to the acoustic degradation of speech. However, we failed to find a direct effect of intelligibility on the neural tracking of speech envelope in both theta and delta ranges, in both auditory regions-of-interest and whole-brain sensor-space analyses. This suggests that acoustics greatly influence the neural tracking response to speech envelope, and that caution needs to be taken when choosing the control signals for speech-brain tracking analyses, considering that a slight change in acoustic parameters can have strong effects on the neural tracking response.
Shared and task-specific phase coding characteristics of gamma- and theta-bands in speech perception and covert speech
2023, Speech Communication
Covert speech is the mental imagery of speaking. This task has gained increasing attention to understand the nature of thought and produce decoding methods for brain–computer interfaces. Building on previous work, we sought to delineate and compare further the potential roles of $θ$ and $γ$ oscillations in covert speech and speech perception. Using 64-channel EEG recordings of two words and rest and employing measures of phase alignment, we found significant differential engagement of $θ$ oscillations and shared phase alignment profiles for the $γ$ -band across tasks. Our findings appear to suggest that the $θ$ and $γ$ oscillations subserve, respectively, distinct and shared encoding processes between the two tasks. Our findings can contribute to the evolving discussion regarding the similarities between output and input-based speech representations and suggests that a transferable model between the two tasks may be feasible which can lead to the development of more user-friendly covert speech brain–computer interfaces.
Better speech-in-noise comprehension is associated with enhanced neural speech tracking in older adults with hearing impairment
2022, Cortex
Citation Excerpt :
However, instead of extracting a single peak value, we compared each condition to the control condition for time lags ranging from 0 to 300 msec as these latency windows are known to include two prominent peaks in the cross-correlation function (e.g., Braiman et al., 2018; Horton et al., 2013; Zoefel & VanRullen, 2016), which were also apparent in our data (Fig. 2C). As opposed to Petersen et al. (2017) and in line with two previous studies (Braiman et al., 2018; Zoefel & VanRullen, 2016), we limited our analyses to these first two peaks, as the grand average neural tracking response in Babble appeared to “break down” in time lags of greater latency (Fig. 2C). To control for potential perceptual differences between the conditions, only correctly comprehended trials were extracted and compared with the control condition.
The alignment between low-frequency activity in the brain and slow acoustic modulations in the speech signal depicts a core principle in present theories of speech perception—a process referred to as ‘neural speech tracking’. While most older adults, particularly those with highly prevalent age-related hearing loss, have difficulties with speech perception and comprehension, the impact of hearing loss on neural speech tracking is still unclear. In this study we investigated the effects of pure-tone hearing loss and different types of background noise on the neural tracking response in a large sample of older adults (N = 101). Furthermore, we examined whether the neural tracking response was predictive for speech comprehension. For this purpose, we obtained scalp EEG from our participants who had varying degrees of pure-tone hearing loss (7.5–59.6 dB HL for .5–8 kHz pure tones) while they listened to sentences in quiet, pink and multi-talker babble noise. Speech tracking was quantified by computing the cross-correlation between the EEG signal and the amplitude envelope of the sentences heard. A higher degree of pure-tone hearing loss was associated with greater neural speech tracking (i.e., greater cross-correlation). Additionally, neural speech tracking showed a positive association with speech comprehension. This relationship was modulated by the degree of pure-tone hearing loss with hearing-impaired participants benefitting more from greater neural speech tracking. Our results highlight the potential of neural speech tracking as an objective measure of speech comprehension and as a possible target mechanism for clinical interventions such as neurofeedback. Furthermore, the interaction between speech tracking and pure-tone hearing loss suggests a compensatory mechanism by which the hearing-impaired rely more on slow amplitude modulations in the speech signal.
A comparison and classification of oscillatory characteristics in speech perception and covert speech
2022, Brain Research
Covert speech, the mental imagery of speaking, has been studied increasingly to understand and decode thoughts in the context of brain-computer interfaces. In studies of speech comprehension, neural oscillations are thought to play a key role in the temporal encoding of speech. However, little is known about the role of oscillations in covert speech. In this study, we investigated the oscillatory involvements in covert speech and speech perception. Data were collected from 10 participants with 64 channel EEG. Participants heard the words, ‘blue’ and ‘orange’, and subsequently mentally rehearsed them. First, continuous wavelet transform was performed on epoched signals and subsequently two-tailed t-tests between two classes (tasks) were conducted to determine statistical differences in frequency and time (t-CWT). In the current experiment, a task comprised speech perception or covert rehearsal of a word while a condition was the discrimination between tasks. Features were extracted using t-CWT and subsequently classified using a support vector machine. $θ$ and $γ$ phase amplitude coupling (PAC) was also assessed within tasks and across conditions between perception and covert activities (i.e. cross-task). All binary classifications accuracies (80–90%) significantly exceeded chance level, supporting the use of t-CWT in determining relative oscillatory involvements. While the perception condition dynamically invoked all frequencies with more prominent $θ$ and $α$ activity, the covert condition favoured higher frequencies with significantly higher $γ$ activity than perception. Moreover, the perception condition produced significant $θ$ - $γ$ PAC, possibly corroborating a reported linkage between syllabic and phonemic sampling. Although this coupling was found to be suppressed in the covert condition, we found significant cross-task coupling between perception $θ$ and covert speech $γ$ . Covert speech processing appears to be largely associated with higher frequencies of EEG. Importantly, the significant cross-task coupling between speech perception and covert speech, in the absence of within-task covert speech PAC, seems to support the notion that the $γ$ - and $θ$ -bands reflect, respectively, shared and unique encoding processes across tasks.

View all citing articles on Scopus

View full text

EEG oscillations entrain their phase to high-level features of speech sound

Highlights

Abstract

Introduction

Section snippets

Participants

Experimental stimuli

Results

Discussion

Acknowledgements

Trends Cogn. Sci.

Trends Cogn. Sci.

J. Neurosci. Methods

NeuroImage

Neuron

Neuron

Curr. Biol.

Hear. Res.

Neuron

Speech Comm.

Trends Neurosci.

Curr. Opin. Neurobiol.

NeuroImage

Curr. Biol.

Neuron

Right-hemisphere auditory cortex is dominant for coding syllable patterns in speech

J. Neurosci.

Speech comprehension is correlated with temporal response patterns recorded from auditory cortex

Proc. Natl. Acad. Sci. U. S. A.

Simultaneous recordings from the primary visual cortex and lateral geniculate nucleus reveal rhythmic interactions and a cortical source for γ-band oscillations

J. Neurosci.

Controlling the false discovery rate: a practical and powerful approach to multiple testing

J. R. Stat. Soc. Ser. B Methodol.

Tuning of the human neocortex to the temporal dynamics of attended events

J. Neurosci.

Human temporal lobe activation by speech and nonspeech sounds

Cereb. Cortex

The psychophysics toolbox

Spat. Vis.

Laminar differences in gamma and alpha coherence in the ventral stream

Proc. Natl. Acad. Sci. U. S. A.

The origin of extracellular fields and currents—EEG, ECoG, LFP and spikes

Nat. Rev. Neurosci.

Hierarchical processing in spoken language comprehension

J. Neurosci.

Phoneme and word recognition in the auditory ventral stream

Proc. Natl. Acad. Sci. U. S. A.

Robust cortical entrainment to the speech envelope relies on the spectro-temporal fine structure

NeuroImage

Emergence of neural encoding of auditory objects while listening to competing speakers

Proc. Natl. Acad. Sci. U. S. A.

Cortical entrainment to continuous speech: functional roles and interpretations

Front. Hum. Neurosci.

The contribution of frequency-specific activity to hierarchical information processing in the human auditory cortex

Nat. Commun.

A 40-Hz auditory potential recorded from the human scalp

Proc. Natl. Acad. Sci. U. S. A.

Linking speech perception and neurophysiology: speech decoding guided by cascaded oscillators locked to the input rhythm

Front. Psychol.