Skip to main content

Main menu

  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Blog
    • Collections
    • Podcast
  • TOPICS
    • Cognition and Behavior
    • Development
    • Disorders of the Nervous System
    • History, Teaching and Public Awareness
    • Integrative Systems
    • Neuronal Excitability
    • Novel Tools and Methods
    • Sensory and Motor Systems
  • ALERTS
  • FOR AUTHORS
  • ABOUT
    • Overview
    • Editorial Board
    • For the Media
    • Privacy Policy
    • Contact Us
    • Feedback
  • SUBMIT

User menu

Search

  • Advanced search
eNeuro
eNeuro

Advanced Search

 

  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Blog
    • Collections
    • Podcast
  • TOPICS
    • Cognition and Behavior
    • Development
    • Disorders of the Nervous System
    • History, Teaching and Public Awareness
    • Integrative Systems
    • Neuronal Excitability
    • Novel Tools and Methods
    • Sensory and Motor Systems
  • ALERTS
  • FOR AUTHORS
  • ABOUT
    • Overview
    • Editorial Board
    • For the Media
    • Privacy Policy
    • Contact Us
    • Feedback
  • SUBMIT
PreviousNext
Research ArticleResearch Article: New Research, Cognition and Behavior

Neural Speech Tracking during Selective Attention: A Spatially Realistic Audiovisual Study

Paz Har-shai Yahav, Eshed Rabinovitch, Adi Korisky, Renana Vaknin Harel, Martin Bliechner and Elana Zion Golumbic
eNeuro 2 June 2025, 12 (6) ENEURO.0132-24.2025; https://doi.org/10.1523/ENEURO.0132-24.2025
Paz Har-shai Yahav
1The Gonda Center for Multidisciplinary Brain Research, Bar Ilan University, Ramat Gan, 5290002, Israel
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Eshed Rabinovitch
1The Gonda Center for Multidisciplinary Brain Research, Bar Ilan University, Ramat Gan, 5290002, Israel
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Adi Korisky
1The Gonda Center for Multidisciplinary Brain Research, Bar Ilan University, Ramat Gan, 5290002, Israel
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Renana Vaknin Harel
1The Gonda Center for Multidisciplinary Brain Research, Bar Ilan University, Ramat Gan, 5290002, Israel
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Martin Bliechner
2Department of Psychology, Carl von Ossietzky Universität Oldenburg, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Elana Zion Golumbic
1The Gonda Center for Multidisciplinary Brain Research, Bar Ilan University, Ramat Gan, 5290002, Israel
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Elana Zion Golumbic
  • Article
  • Figures & Data
  • Info & Metrics
  • eLetters
  • PDF
Loading

Article Figures & Data

Figures

  • Extended Data
  • Figure 1.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 1.

    Experimental setup. A, Two lectures were presented simultaneously, with one lecture (target talker) displayed on the screen and audio emitted through the front loudspeaker. The other lecture (non-target talker) was played audio-only through the left loudspeaker. Participants were instructed to focus their attention on the lecture presented on the screen. Critically, in the middle of the experiment, the stimuli switched such that the lecture that was played from the side and had been non-target in the first half, was presented as a video in the second half, and became the target talker, whereas the target talker from the first half was presented in the second half from a loudspeaker on the side and became non-target. Participants answered comprehension questions regarding the target lecture after every three trials. B, Single trial illustration. Target speech began at trial onset, and non-target speech began 3 s after onset and included a 2 s volume ramp-up.

  • Figure 2.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 2.

    Data-driven permutation tests for individual-level statistics. Three permutation tests were designed to assess statistical significance of different results in individual participants. The black rectangles in all panels show the original data organization on the left and the relabeling for permutations test on the right. A, S-R permutation test. In each permutation, the pairing between acoustic envelopes (S) and neural data responses (R) was shuffled across trials such that speech envelopes presented in one trial (both target and non-target speech) were paired with the neural response (R) from in a different trial. This yields a null distribution of reconstruction accuracies that could be obtained by chance, to which the real data can be compared (right). B, Attention-agnostic permutation test. In each permutation, the target and non-target speech stimuli were randomly relabeled to create attention-agnostic regressors that contain 50% target speech and 50% non-target speech. The reconstruction accuracy for each regressor was estimated and the difference between them is used to create a null distribution to which the neural-bias index can be compared (right). C, Order-agnostic permutation test. In each permutation, trials were randomly relabeled and separated into two order-agnostic groups consisting of 50% trials from the first half of the experiment and 50% trials from the second half. The reconstruction accuracy for each group of trials was estimated, and the difference between them is used to create a null distribution to which the real data from each half of the experiment can be compared (right).

  • Figure 3.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 3.

    Behavioral results. Averaged accuracy rates across trials and participants, for multiple-choice comprehension questions, separately for the first (green) and second (yellow) half of the experiment. Error bars denote SEM across participants.

  • Figure 4.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 4.

    Neural bias: group-level results. A, TRF encoding models across all experimental trials, plotted from electrode “Fz,” separately for target and non-target speech. Shaded highlights denote SEM across participants (top). Topographic distribution of the TRF main peaks, plotted separately for target and non-target speech (bottom). B, Topographic distribution of predictive power values (Pearson's r) of the encoding model, averaged across participants, separately for multivariate (top) and univariate (bottom) analysis. C, TRF encoding models for the first half (green) and second half (yellow) of the experiment, plotted from electrode “Fz,” separately for target and non-target speech. Shaded highlights denote SEM across participants. D, Speech reconstruction accuracies for the first and second half of the experiment, for both target and non-target speech. Error bars denote SEM across participants. Extended Data Figure 4-1 is supporting Figure 4.

  • Figure 5.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 5.

    Generalizability across talkers and time. Reconstruction accuracies for decoders trained on data from one half of the experiment (either on target or non-target speech) and tested on data from the other half of the experiment, separately for same role decoders (e.g., train on target and test on target) and for same talker identity decoders (e.g., train on male talker, test on male talker).

  • Figure 6.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 6.

    Speech reconstruction and neural bias in individual participants—full experiment. A, Left, Bar graphs depicting reconstruction accuracy in individual participants for target (black) and non-target (dark gray) speech. Horizontal light gray lines represent the p = 0.05 chance level, derived for each participant based on data-driven S-R permutation. Asterisks indicate participants who also showed significant neural bias to target speech (see panel B). Right, Scatterplot showing reconstruction accuracies for target and non-target speech across all participants. The red line represents the linear regression fit between the two variables, which was significant (Pearson's r = 0.43, p = 0.038). B, Scatterplot showing the average reconstruction accuracy and neural-bias index across participants, which were not significantly correlated. Vertical dashed lines indicate the threshold for significant neural bias (z = 1.64, one-tailed; p < 0.05). C, Scatterplots showing the accuracy on behavioral task versus reconstruction accuracy of target speech (left), non-target speech (middle), and the neural-bias index (right), across all participants. No significant correlations were found. Extended Data Figure 6-1 is supporting Figure 6.

  • Figure 7.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 7.

    Speech reconstruction and neural bias in individual participants—first half of experiment. A, Left, Bar graphs depicting reconstruction accuracy in individual participants for target (black) and non-target (dark gray) speech. Horizontal light gray lines represent the p = 0.05 chance level, derived for each participant based on data-driven S-R permutation. Right, Scatterplot showing reconstruction accuracies for target and non-target speech across all participants. B, Scatterplot showing the average reconstruction accuracy and neural-bias index across participants, which were not significantly correlated. Vertical dashed lines indicate the threshold for significant neural bias (z = 1.64, one-tailed; p < 0.05). C, Scatterplots showing the accuracy on behavioral task versus reconstruction accuracy of target speech (left), non-target speech (middle), and the neural-bias index (right), across all participants. No significant correlations were found.

  • Figure 8.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 8.

    Speech reconstruction and neural bias in individual participants—second half of experiment. A, Left, Bar graphs depicting reconstruction accuracy in individual participants for target (black) and non-target (dark gray) speech. Horizontal light gray lines represent the p = 0.05 chance level, derived for each participant based on data-driven S-R permutation. Right, Scatterplot showing reconstruction accuracies for target and non-target speech across all participants. The red line represents the linear regression fit between the two variables, which was significant (Pearson's r = 0.4, p = 0.048). B, Scatterplot showing the average reconstruction accuracy and neural-bias index across participants, which were not significantly correlated. Vertical dashed lines indicate the threshold for significant neural bias (z = 1.64, one-tailed; p < 0.05). C, Scatterplots showing the accuracy on behavioral task versus reconstruction accuracy of target speech (left), non-target speech (middle), and the neural-bias index (right), across all participants. No significant correlations were found.

  • Figure 9.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 9.

    First versus second half of experiment. Scatterplots showing the neural-bias index (left), target speech (middle), and non-target speech reconstruction accuracies (right) across all participants, in the first versus second half of the experiment. No significant correlations were found for any of the measures. Participants for whom significant difference were found between the two halves (based on an order-agnostic permutation test) are marked in red.

Extended Data

  • Figures
  • Figure 4-1

    Group-level Spectral analysis. Comparison of the power spectrum of the EEG between the first and second half of the experiment. We calculated the spectral power density using a multitaper fast-fourier transform (FFT; as implemented in Fieldtrip), separately for the data from each half of the experiment. Shown in the figure is the power spectrum averaged over 9 centro-parietal electrodes (marked in the subplot on the top-left). A clear peak is seen in the low alpha-range (7-9  Hz) in both halves of the experiment, which was maximal at centro-parietal electrodes (shown in the topographies). However, a paired t-test revealed no significant difference between alpha power in the first vs. second half of the experiment [t(22) = 1.25, p = 0.22], which might have been expected as an index of fatigue or reduced attention over time (e.g. Yu et al., 2021). Download Figure 4-1, TIF file.

  • Figure 6-1

    Comparison of decoder-testing approaches. Here we compare two approaches for testing the performance of decoders trained on EEG data to reconstruct the envelope of concurrently presented speech. Top: The approach used and reported in the current study, in which two Stimulus specific decoders were trained using a multivariate approach to reconstruct the envelopes of target and non-target speech presented concurrently. The scatter-plot shows reconstruction accuracies achieved for both decoders across all participants, when tested on left-out data of the same type (i.e., how well the target decoder can reconstruct left-out target speech, and how well the non-target decoder can reconstruct left-out non-target speech). The gray line reflects the diagonal, and the red line represents the linear regression fit between the two variables which was statistically significant [data is the same as in Figure 6A]. Bottom: Re-analysis of the same data using the auditory attention-decoding (AAD) approach, in which a decoder is trained only on one stimulus (e.g., on target speech), and is then tested on left-out data of the same stimulus (target) and of the other stimulus (non-target), and the two results are compared for classification purposes. The left panel shows a scatter-plot showing how well a decoder trained on target speech can reconstruct left-out target speech vs. how well it can reconstruct left-out non-target speech, across all participants. The left panel shows the same for a decoder trained on non-target speech. The gray line reflects the diagonal, and the red line represents the linear regression fit between the two variables (dashed line indicates a marginally significant regression). In this analysis, almost all dots fall either below or above the diagonal, clearly showing between reconstruction performance when a decoder is tested on data of the same type that it was trained on. This is in line with multiple studies, that propose using this approach for practical applications, such as controlling a neuro-steered hearing device (Henshaw and Ferguson, 2013; O’Sullivan et al., 2015; Kidd, 2017; Geirnaert et al., 2022; Roebben et al., 2024). Given the qualitative sensory differences between target and non-target speech in the spatially-realistic audiovisual setup used (e.g., spatial location, audio/audio-visual presentation etc.), it is not very surprising that decoders trained on these stimuli would differ from each other. However, we posit that this approach is less appropriate in the current study, where the goal was not just to distinguish between the two stimuli, but to test whether target speech is represented more robustly in the neural data than non-target speech, a pattern that is considered a signature of ‘selective attention’ - i.e., enhancement of target speech and/or suppression of non-target speech (Kerlin et al., 2010; Ding et al., 2012; Zion Golumbic et al., 2013b; O’Sullivan et al., 2015; Fiedler et al., 2019). For this purpose, we believe that is it more appropriate to optimize decoders for each stimulus separately (thus accounting for their differences in properties, e.g. differences in spatial location or speaker characteristics), and then assess how well each one performs for predicting the stimulus it was trained on (the model’s goodness-of-fit/predictive power/accuracy). Using this approach, if we find that both decoders perform very well – this indicates that both stimuli are represented with similar precision in the neural response. Conversely, finding that the decoder for one stimulus outperforms the other can be interpreted as superior or more detailed neural encoding of that stimulus relative to the other, effects that have been associated with better intelligibly and/or higher levels of attention to this stimulus (Best et al., 2008; Lin and Carlile, 2015; Getzmann et al., 2017; Teoh and Lalor, 2019; Uhrig et al., 2022; Orf et al., 2023). Download Figure 6-1, TIF file.

Back to top

In this issue

eneuro: 12 (6)
eNeuro
Vol. 12, Issue 6
June 2025
  • Table of Contents
  • Index by author
  • Masthead (PDF)
Email

Thank you for sharing this eNeuro article.

NOTE: We request your email address only to inform the recipient that it was you who recommended this article, and that it is not junk mail. We do not retain these email addresses.

Enter multiple addresses on separate lines or separate them with commas.
Neural Speech Tracking during Selective Attention: A Spatially Realistic Audiovisual Study
(Your Name) has forwarded a page to you from eNeuro
(Your Name) thought you would be interested in this article in eNeuro.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Print
View Full Page PDF
Citation Tools
Neural Speech Tracking during Selective Attention: A Spatially Realistic Audiovisual Study
Paz Har-shai Yahav, Eshed Rabinovitch, Adi Korisky, Renana Vaknin Harel, Martin Bliechner, Elana Zion Golumbic
eNeuro 2 June 2025, 12 (6) ENEURO.0132-24.2025; DOI: 10.1523/ENEURO.0132-24.2025

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Respond to this article
Share
Neural Speech Tracking during Selective Attention: A Spatially Realistic Audiovisual Study
Paz Har-shai Yahav, Eshed Rabinovitch, Adi Korisky, Renana Vaknin Harel, Martin Bliechner, Elana Zion Golumbic
eNeuro 2 June 2025, 12 (6) ENEURO.0132-24.2025; DOI: 10.1523/ENEURO.0132-24.2025
Twitter logo Facebook logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Jump to section

  • Article
    • Abstract
    • Significance Statement
    • Introduction
    • Materials and Methods
    • Results
    • Discussion
    • Footnotes
    • References
    • Synthesis
    • Author Response
  • Figures & Data
  • Info & Metrics
  • eLetters
  • PDF

Keywords

  • EEG
  • selective attention
  • spatial
  • speech processing
  • TRF

Responses to this article

Respond to this article

Jump to comment:

No eLetters have been published for this article.

Related Articles

Cited By...

More in this TOC Section

Research Article: New Research

  • Aperiodicity in mouse CA1 and DG power spectra
  • Transcriptional Changes Fade Prior to Long-Term Memory for Sensitization of the Aplysia Siphon-Withdrawal Reflex.
  • Numbers of granule cells and GABAergic boutons are correlated in shrunken sclerotic hippocampi of sea lions with temporal lobe epilepsy
Show more Research Article: New Research

Cognition and Behavior

  • Is Social Media Use a Blessing or Cure for Motor Function and Skill Acquisition? An Opinion Paper
  • Transcriptional Changes Fade Prior to Long-Term Memory for Sensitization of the Aplysia Siphon-Withdrawal Reflex.
  • Short-Term Perceptual Training Modulates Neural Responses to Deepfake Speech but Does Not Improve Behavioral Discrimination
Show more Cognition and Behavior

Subjects

  • Cognition and Behavior
  • Home
  • Alerts
  • Follow SFN on BlueSky
  • Visit Society for Neuroscience on Facebook
  • Follow Society for Neuroscience on Twitter
  • Follow Society for Neuroscience on LinkedIn
  • Visit Society for Neuroscience on Youtube
  • Follow our RSS feeds

Content

  • Early Release
  • Current Issue
  • Latest Articles
  • Issue Archive
  • Blog
  • Browse by Topic

Information

  • For Authors
  • For the Media

About

  • About the Journal
  • Editorial Board
  • Privacy Notice
  • Contact
  • Feedback
(eNeuro logo)
(SfN logo)

Copyright © 2026 by the Society for Neuroscience.
eNeuro eISSN: 2373-2822

The ideas and opinions expressed in eNeuro do not necessarily reflect those of SfN or the eNeuro Editorial Board. Publication of an advertisement or other product mention in eNeuro should not be construed as an endorsement of the manufacturer’s claims. SfN does not assume any responsibility for any injury and/or damage to persons or property arising from or related to any use of any material contained in eNeuro.