Noise-robust cortical tracking of attended speech in real-world acoustic scenes

Søren Asp Fuglsang; Torsten Dau; Jens Hjortkjær

doi:10.1016/j.neuroimage.2017.04.026

Noise-robust cortical tracking of attended speech in real-world acoustic scenes

Neuroimage. 2017 Aug 1:156:435-444. doi: 10.1016/j.neuroimage.2017.04.026. Epub 2017 Apr 13.

Authors

Søren Asp Fuglsang¹, Torsten Dau², Jens Hjortkjær³

Affiliations

¹ Hearing Systems Group, Department of Electrical Engineering, Technical University of Denmark, Ørsteds Plads, Building 352, 2800 Kgs. Lyngby, Denmark. Electronic address: soerenf@elektro.dtu.dk.
² Hearing Systems Group, Department of Electrical Engineering, Technical University of Denmark, Ørsteds Plads, Building 352, 2800 Kgs. Lyngby, Denmark.
³ Hearing Systems Group, Department of Electrical Engineering, Technical University of Denmark, Ørsteds Plads, Building 352, 2800 Kgs. Lyngby, Denmark; Danish Research Centre for Magnetic Resonance, Centre for Functional and Diagnostic Imaging and Research, Copenhagen University Hospital Hvidovre, Kettegaard Allé 30, 2650 Hvidovre, Denmark. Electronic address: jhjort@elektro.dtu.dk.

PMID: 28412441
DOI: 10.1016/j.neuroimage.2017.04.026

Abstract

Selectively attending to one speaker in a multi-speaker scenario is thought to synchronize low-frequency cortical activity to the attended speech signal. In recent studies, reconstruction of speech from single-trial electroencephalogram (EEG) data has been used to decode which talker a listener is attending to in a two-talker situation. It is currently unclear how this generalizes to more complex sound environments. Behaviorally, speech perception is robust to the acoustic distortions that listeners typically encounter in everyday life, but it is unknown whether this is mirrored by a noise-robust neural tracking of attended speech. Here we used advanced acoustic simulations to recreate real-world acoustic scenes in the laboratory. In virtual acoustic realities with varying amounts of reverberation and number of interfering talkers, listeners selectively attended to the speech stream of a particular talker. Across the different listening environments, we found that the attended talker could be accurately decoded from single-trial EEG data irrespective of the different distortions in the acoustic input. For highly reverberant environments, speech envelopes reconstructed from neural responses to the distorted stimuli resembled the original clean signal more than the distorted input. With reverberant speech, we observed a late cortical response to the attended speech stream that encoded temporal modulations in the speech signal without its reverberant distortion. Single-trial attention decoding accuracies based on 40-50s long blocks of data from 64 scalp electrodes were equally high (80-90% correct) in all considered listening environments and remained statistically significant using down to 10 scalp electrodes and short (<30-s) unaveraged EEG segments. In contrast to the robust decoding of the attended talker we found that decoding of the unattended talker deteriorated with the acoustic distortions. These results suggest that cortical activity tracks an attended speech signal in a way that is invariant to acoustic distortions encountered in real-life sound environments. Noise-robust attention decoding additionally suggests a potential utility of stimulus reconstruction techniques in attention-controlled brain-computer interfaces.

Keywords: Acoustic simulations; Auditory attention; Cortical entrainment; Decoding; Delta rhythms; EEG; Speech; Theta rhythms.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Acoustic Stimulation
Adult
Attention / physiology*
Auditory Cortex / physiology*
Electroencephalography
Female
Humans
Male
Noise
Speech Perception / physiology*
Young Adult