Elsevier

NeuroImage

Volume 128, March 2016, Pages 293-301
NeuroImage

Linguistic category structure influences early auditory processing: Converging evidence from mismatch responses and cortical oscillations

https://doi.org/10.1016/j.neuroimage.2016.01.003Get rights and content

Highlights

  • We are interested in speech sound specific top–down influences on speech recognition.

  • We measured transient and oscillatory brain activity in a magnetoencephalogram experiment.

  • Less specific speech sounds caused reduced change detection responses.

  • Less specific speech sounds displayed decreases in pre-stimulus beta-oscillations.

  • Effects are interpreted as reduced top–down influences from less specific speech sounds.

Abstract

While previous research has established that language-specific knowledge influences early auditory processing, it is still controversial as to what aspects of speech sound representations determine early speech perception. Here, we propose that early processing primarily depends on information propagated top–down from abstractly represented speech sound categories. In particular, we assume that mid-vowels (as in ‘bet’) exert less top–down effects than the high-vowels (as in ‘bit’) because of their less specific (default) tongue height position as compared to either high- or low-vowels (as in ‘bat’). We tested this assumption in a magnetoencephalography (MEG) study where we contrasted mid- and high-vowels, as well as the low- and high-vowels in a passive oddball paradigm. Overall, significant differences between deviants and standards indexed reliable mismatch negativity (MMN) responses between 200 and 300 ms post-stimulus onset. MMN amplitudes differed in the mid/high-vowel contrasts and were significantly reduced when a mid-vowel standard was followed by a high-vowel deviant, extending previous findings. Furthermore, mid-vowel standards showed reduced oscillatory power in the pre-stimulus beta-frequency band (18–26 Hz), compared to high-vowel standards. We take this as converging evidence for linguistic category structure to exert top–down influences on auditory processing. The findings are interpreted within the linguistic model of underspecification and the neuropsychological predictive coding framework.

Introduction

Neuroimaging methods have been increasingly used to probe the mechanisms that underlie speech sound processing. Recently, a number of studies have demonstrated that linguistic category structure has specific modulatory effects on early stages of auditory perception (Bien and Zwitserlood, 2013, Cornell et al., 2011, Cornell et al., 2013, Eulitz and Lahiri, 2004, Friedrich et al., 2008). Linguistic category structure allows speech sound classification according to their acoustic and articulatory properties, often described in terms of a deviation from the neutral, resting position of the mouth. For example, high-vowels (e.g., [ɪ] as in ‘bit’) with a relatively high tongue position during production can be distinguished from low-vowels (e.g., [æ] as in ‘bat’) with a relatively low tongue position during production. Some theories assume that vowels that fall between high and low-vowels (e.g., [ε] as in ‘bet’) are neither high nor low, and, being produced with a neutral tongue position, have no descriptive feature for tongue height (Lahiri and Reetz, 2002, Lahiri and Reetz, 2010, Scharinger and Idsardi, 2014). The production of mid-vowels in English does not necessarily lead to a larger spread of individual vowel tokens, but rather to greater overlap with neighboring vowel category tokens (Hillenbrand et al., 1995). These vowels are assumed to be underspecified and may refer to a rather unspecific motor plan regarding their tongue height.

Recently, it has been proposed that less specific, underspecified vowels have less intrinsic “predictive” value compared to more specific, specified vowels (Eulitz and Lahiri, 2004, Scharinger et al., 2012a, Scharinger et al., 2012b). Scharinger et al. (2012b) demonstrated that the unspecific category structure of the American English vowel [ε] influenced processing, as indexed by the mismatch negativity, an automatic change and prediction error response of the brain (Näätänen and Alho, 1997, Schröger, 2005, Winkler, 2007). In a passive oddball design, the authors contrasted the high- and low-vowels [ɪ] and [æ] in standard position with the low- and high-vowels [æ] and [ɪ] in deviant position. This condition showed a relatively large acoustic distance of the first resonance frequencies (first formant, F1) between the vowels and was compared to a condition in which the acoustic F1 distance was relatively small, i.e., in which the standard was either specific ([æ]) or unspecific ([ε]), contrasting with the deviants [ε] and [æ]. The results showed similar symmetric mismatch responses in the large F1 distance condition, while the small F1 distance condition showed asymmetric MMN differences: the condition with unspecific [ε]-standards yielded significantly reduced MMN amplitudes compared to the condition with specific [æ]-standards. This result is consistent with other electrophysiological studies (Cornell et al., 2011, Cornell et al., 2013, Eulitz and Lahiri, 2004). Within the framework of predictive coding (Friston, 2005, Garrido et al., 2009), this pattern was interpreted as evidence for [ε] being inherently less predictive, such that the prediction error upon encountering the deviant [æ] was reduced.

While the study by Scharinger et al. (2012b) suggests that linguistic category structure may indeed influence early auditory processing, generalization to further vowel contrasts was impossible (e.g., between [ε] and [ɪ]). Moreover, there was no measure with a closer relation to the assumed top–down propagation of category information (strong for [ɪ] and [æ], weak for [ε]). In this regard, recent research suggests that cortical oscillations index directional message passing between different levels of the cortical hierarchy (Arnal and Giraud, 2012, Arnal et al., 2011, Engel and Fries, 2010, Fontolan et al., 2014). In particular, cortical oscillations within the beta-band (15–30 Hz) are assumed to reflect endogenous top–down processes that are interpreted within the predictive coding framework (Wang, 2010). In this framework, beta-power scales with prediction strength propagated downward from representational units to lower processing levels. This mechanism should also operate on speech sound category representations, such that differences in linguistic structure lead to differences in cortical beta-power, which should arise prior to stimulus presentation in an MMN paradigm.

Thus, the current magnetoencephalography (MEG) study has two primary goals: (1) to examine cortical oscillations as a means to further elucidate the mechanisms by which linguistic category structure exerts influence on lower-level auditory processing, and (2) to extend the MMN findings from Scharinger et al. (2012b) to the contrast between the vowels [ε] and [ɪ]. We expect (1) beta-power to differ between [ε] and [ɪ] presented as standards, where predictions build up (Winkler et al., 1996a) and should most strongly be influenced by linguistic category structure, and (2) the MMN to be reduced or absent if deviant [ɪ] follows the standard [ε].

Section snippets

Participants

Thirteen students, all native speakers of American English, were recruited from the University of Maryland (9 females, 4 males, mean age 21 ± 1.3 years). They had no reported history of hearing or neurological problems and participated for class credit or monetary compensation ($10 per hour). All participants provided informed written consent and tested strongly right-handed (> 80%) on the Edinburgh Handedness Inventory (Oldfield, 1971). The study was approved by the Institutional Review Board of

MMNs

Grand averages of standard and deviant responses for all four conditions are illustrated in Fig. 3.

The LMM on the MMN response using the difference of the RMS method showed a main effect of distance (F(1,84) = 4.25, p < 0.05), with larger MMN responses in the large F1 distance condition than in small F1 distance condition (z = 2.93, p < 0.01). Crucially, distance interacted with direction (F(1,84) = 4.40, p < 0.05). This interaction effect recapitulates the distinction between predictive and non-predictive

Discussion

The main result of our neuromagnetic study on the processing of vowels with differing linguistic structure is that the mid-vowel [ε] (as in ‘bet’) consistently resulted in neural patterns that were distinct from the more specific high-vowels [ɪ] and [æ]: oscillatory power in the beta-band was reduced even before the onset of [ε] in standard position, compared to [ɪ]. Moreover, MMN amplitudes were significantly reduced when standard [ε] preceded deviant [ɪ], compared to the reverse case, i.e.

Conclusions

In this neuromagnetic study on the processing of American English vowels, we found that less specific category structure (as exemplified by the mid-vowel [ε]) resulted in reduced MMN responses and reduced beta-power. Our results are compatible within the predictive coding framework (Friston, 2005) and the underspecification approach (Lahiri and Reetz, 2002, Lahiri and Reetz, 2010), while pure bottom–up sensory models would not be able to readily account for the observed patterns in our data.

Acknowledgments

We thank Ariane Rhone for helping prepare the stimuli and Max Ehrmann for laboratory assistance. The research for this study was funded by the NIH grant 7ROIDC005660-07 to W.J.I. and David Poeppel. During preparation of the manuscript, MS was supported by a personal grant from the German Science Foundation (DFG) on “Global and local aspects of temporal and lexical predictions for speech processing” (University of Leipzig).

References (76)

  • A.K. Engel et al.

    Beta-band oscillations—signalling the status quo?

    Curr. Opin. Neurobiol.

    (2010)
  • M.I. Garrido et al.

    The mismatch negativity: a review of underlying mechanisms

    Clin. Neurophysiol.

    (2009)
  • A. Lahiri et al.

    Distinctive features: phonological underspecification in representation and processing

    J. Phon.

    (2010)
  • A.G. Lewis et al.

    Fast oscillatory dynamics during language comprehension: unification versus maintenance and prediction?

    Brain Lang.

    (2015)
  • H. Luo et al.

    Phase patterns of neuronal responses reliably discriminate speech in human auditory cortex

    Neuron

    (2007)
  • A. Mazaheri et al.

    Region-specific modulations in oscillatory alpha activity serve to facilitate processing in the visual and auditory modalities

    NeuroImage

    (2014)
  • R. Näätänen et al.

    The mismatch negativity (MMN) in basic research of central auditory processing: a review

    Clin. Neurophysiol.

    (2007)
  • R.C. Oldfield

    The assessment and analysis of handedness: the Edinburgh Inventory

    Neuropsychologia

    (1971)
  • M.S. Peltola et al.

    Native and foreign vowel discrimination as indexed by the mismatch negativity (MMN) response

    Neurosci. Lett.

    (2003)
  • J.S. Perkell

    Properties of the tongue help to define vowel categories: hypotheses based on physiologically-oriented modeling

    J. Phon.

    (1996)
  • F. Pulvermüller et al.

    Language outside the focus of attention: the mismatch negativity as a tool for studying higher cognitive processes

    Prog. Neurobiol.

    (2006)
  • F. Pulvermüller et al.

    Tracking speech comprehension in space and time

    NeuroImage

    (2006)
  • M. Scharinger

    Minimal representations of alternating vowels

    Lingua

    (2009)
  • M. Scharinger et al.

    Sparseness of vowel category structure: evidence from English dialect comparison

    Lingua

    (2014)
  • A. Strauß et al.

    Alpha and theta brain oscillations index dissociable processes in spoken word recognition

    NeuroImage

    (2014)
  • I. Winkler et al.

    Adaptive modeling of the unattended acoustic environment reflected in the mismatch negativity event-related potential

    Brain Res.

    (1996)
  • I. Winkler et al.

    Pre-attentive detection of vowel contrasts utilizes both phonetic and auditory memory representations

    Cogn. Brain Res.

    (1999)
  • A.A. Alexandrov et al.

    Strength of word-specific neural memory traces assessed electrophysiologically

    PLoS One

    (2011)
  • L.H. Arnal et al.

    Transitions in neural oscillations reflect prediction errors generated in audiovisual speech

    Nat. Neurosci.

    (2011)
  • D. Bates et al.

    lme4: linear mixed-effects models using eigen and S4. R package version

    (2014)
  • O. Bertrand et al.

    Stimulus frequency dependence of the transient oscillatory auditory evoked response (40 Hz) studied by electric and magnetic recordings in humans

  • G.M. Bidelman

    Induced neural beta oscillations predict categorical speech perception abilities

    Brain Lang.

    (2014)
  • H. Bien et al.

    Processing nasals with and without consecutive context phonemes: evidence from explicit categorization and the N100

    Front. Psychol.

    (2013)
  • P. Boersma et al.

    PRAAT: Doing Phonetics by Computer (ver. 5.2.24)

    (2011)
  • F. Bretz et al.

    Multiple Comparisons Using R

    (2011)
  • T.J. Buschman et al.

    Top–down versus bottom–up control of attention in the prefrontal and posterior parietal cortices

    Science

    (2007)
  • G. Buzsáki

    Rhythms of the Brain

    (2006)
  • G. Buzsáki et al.

    Neuronal oscillations in cortical networks

    Science

    (2004)
  • Cited by (27)

    • Rapid pre-attentive processing of a famous speaker: Electrophysiological effects of Angela Merkel's voice

      2022, Neuropsychologia
      Citation Excerpt :

      Since the general pattern in our study is comparable to previous studies, our findings provide evidence for the automatic and pre-attentive character of voice processing. Given that the MMN indexes differences between familiar and unfamiliar voices in our experiment, and given that previous experiments have shown the MMN's sensitivity to phonological processing (Phillips et al., 2000; Scharinger et al., 2016), we conjecture that voice processing takes place in parallel to speech sound processing. Possibly, the acoustic abstraction process indexed by the MMN applies to speech sounds as well as to voices.

    • Neuroplasticity in the phonological system: The PMN and the N400 as markers for the perception of non-native phonemic contrasts by late second language learners

      2021, Neuropsychologia
      Citation Excerpt :

      Overall, in late L2 learners, the capacity to identify lexical-semantic violations in a sentence context is reflected by characteristic, native-like oscillatory patterns that are associated with sentence-level semantic unification processes, under inclusion of additional attentional and control processes. With respect to phonological mismatch processing, previous studies have also associated power modulations in the beta frequency band with the top-down transmission of predictions at the phonological level (Arnal and Giraud, 2012), with beta power increases being associated with stronger phonological predictions (Scharinger et al., 2016). In the present study, no beta power modulations for phonemic violations were found in either the highly or low proficient group.

    • Abstractness of human speech sound representations

      2020, Brain Research
      Citation Excerpt :

      One of the techniques recommended by Luck and Gaspelin (2017) to avoid experimenter bias in time window and electrode region selection, and to reduce the multiple comparisons problem, is to use Principal Component Analysis (PCA) to statistically determine the underlying temporal and spatial dynamics of the experimental effects in the data. We employed this method by using the factor analysis approach developed by Dien and colleagues (Dien et al., 2005; Dien et al., 2003, 2004; Spencer et al., 1999, 2001). Following published recommendations (Dien, 2012; Dien et al., 2005), we used sequential temporo-spatial PCA decomposition to identify the set of discrete or orthogonal temporal events in the voltage fluctuations, as well as discrete spatial regions of activity within each temporal event.

    • Top-down and bottom-up mechanisms as reflected by beta and gamma oscillations in speech perception: An individual-difference approach

      2019, Brain and Language
      Citation Excerpt :

      On the other hand, γ reflects synchronously active neural assemblies (feature binding) or signals the precise temporal relationship of concurrently incoming stimuli, representing a coherent sensory percept (Tallon-Baudry, Bertrand, Wienbruch, Ross, & Pantev, 1997). Specific to the domain of speech sound processing, recent neurophysiological studies have also provided increasing evidence to substantiate the proposed functions subserved by β and γ, respectively (Bidelman, 2015; Pefkou, Arnal, Fontolan, & Giraud, 2017; Scharinger, Monahan, & Idsardi, 2016). For example, English-speaking participants showed higher β power to prototypical than to ambiguous vowels in a categorization task, while γ power was enhanced in response to ambiguous compared with prototypical stimuli (Bidelman, 2015).

    • Does phonological rule of tone substitution modulate mismatch negativity?

      2019, Journal of Neurolinguistics
      Citation Excerpt :

      As a consequence, underspecified phonemes are often replaced or assimilated by other phonemes. In this case, when an underspecified phoneme serves as the standard, the MMN response might be smaller because of less conflict between the underlying representation of the standard and the incoming deviant sound (Cornell, Lahiri, & Eulitz, 2008; Eulitz & Lahiri, 2004); for example, the mid-vowel/ε/in English (Scharinger, Monahan, & Idsardi, 2016) or T3 in Mandarin Chinese (Politzer-Ahles, Schluter, Wu, & Almeida, 2016). To better disentangle these accounts, this study included both Mandarin and Taiwanese materials.

    View all citing articles on Scopus
    View full text