Linguistic category structure influences early auditory processing: Converging evidence from mismatch responses and cortical oscillations
Introduction
Neuroimaging methods have been increasingly used to probe the mechanisms that underlie speech sound processing. Recently, a number of studies have demonstrated that linguistic category structure has specific modulatory effects on early stages of auditory perception (Bien and Zwitserlood, 2013, Cornell et al., 2011, Cornell et al., 2013, Eulitz and Lahiri, 2004, Friedrich et al., 2008). Linguistic category structure allows speech sound classification according to their acoustic and articulatory properties, often described in terms of a deviation from the neutral, resting position of the mouth. For example, high-vowels (e.g., [ɪ] as in ‘bit’) with a relatively high tongue position during production can be distinguished from low-vowels (e.g., [æ] as in ‘bat’) with a relatively low tongue position during production. Some theories assume that vowels that fall between high and low-vowels (e.g., [ε] as in ‘bet’) are neither high nor low, and, being produced with a neutral tongue position, have no descriptive feature for tongue height (Lahiri and Reetz, 2002, Lahiri and Reetz, 2010, Scharinger and Idsardi, 2014). The production of mid-vowels in English does not necessarily lead to a larger spread of individual vowel tokens, but rather to greater overlap with neighboring vowel category tokens (Hillenbrand et al., 1995). These vowels are assumed to be underspecified and may refer to a rather unspecific motor plan regarding their tongue height.
Recently, it has been proposed that less specific, underspecified vowels have less intrinsic “predictive” value compared to more specific, specified vowels (Eulitz and Lahiri, 2004, Scharinger et al., 2012a, Scharinger et al., 2012b). Scharinger et al. (2012b) demonstrated that the unspecific category structure of the American English vowel [ε] influenced processing, as indexed by the mismatch negativity, an automatic change and prediction error response of the brain (Näätänen and Alho, 1997, Schröger, 2005, Winkler, 2007). In a passive oddball design, the authors contrasted the high- and low-vowels [ɪ] and [æ] in standard position with the low- and high-vowels [æ] and [ɪ] in deviant position. This condition showed a relatively large acoustic distance of the first resonance frequencies (first formant, F1) between the vowels and was compared to a condition in which the acoustic F1 distance was relatively small, i.e., in which the standard was either specific ([æ]) or unspecific ([ε]), contrasting with the deviants [ε] and [æ]. The results showed similar symmetric mismatch responses in the large F1 distance condition, while the small F1 distance condition showed asymmetric MMN differences: the condition with unspecific [ε]-standards yielded significantly reduced MMN amplitudes compared to the condition with specific [æ]-standards. This result is consistent with other electrophysiological studies (Cornell et al., 2011, Cornell et al., 2013, Eulitz and Lahiri, 2004). Within the framework of predictive coding (Friston, 2005, Garrido et al., 2009), this pattern was interpreted as evidence for [ε] being inherently less predictive, such that the prediction error upon encountering the deviant [æ] was reduced.
While the study by Scharinger et al. (2012b) suggests that linguistic category structure may indeed influence early auditory processing, generalization to further vowel contrasts was impossible (e.g., between [ε] and [ɪ]). Moreover, there was no measure with a closer relation to the assumed top–down propagation of category information (strong for [ɪ] and [æ], weak for [ε]). In this regard, recent research suggests that cortical oscillations index directional message passing between different levels of the cortical hierarchy (Arnal and Giraud, 2012, Arnal et al., 2011, Engel and Fries, 2010, Fontolan et al., 2014). In particular, cortical oscillations within the beta-band (15–30 Hz) are assumed to reflect endogenous top–down processes that are interpreted within the predictive coding framework (Wang, 2010). In this framework, beta-power scales with prediction strength propagated downward from representational units to lower processing levels. This mechanism should also operate on speech sound category representations, such that differences in linguistic structure lead to differences in cortical beta-power, which should arise prior to stimulus presentation in an MMN paradigm.
Thus, the current magnetoencephalography (MEG) study has two primary goals: (1) to examine cortical oscillations as a means to further elucidate the mechanisms by which linguistic category structure exerts influence on lower-level auditory processing, and (2) to extend the MMN findings from Scharinger et al. (2012b) to the contrast between the vowels [ε] and [ɪ]. We expect (1) beta-power to differ between [ε] and [ɪ] presented as standards, where predictions build up (Winkler et al., 1996a) and should most strongly be influenced by linguistic category structure, and (2) the MMN to be reduced or absent if deviant [ɪ] follows the standard [ε].
Section snippets
Participants
Thirteen students, all native speakers of American English, were recruited from the University of Maryland (9 females, 4 males, mean age 21 ± 1.3 years). They had no reported history of hearing or neurological problems and participated for class credit or monetary compensation ($10 per hour). All participants provided informed written consent and tested strongly right-handed (> 80%) on the Edinburgh Handedness Inventory (Oldfield, 1971). The study was approved by the Institutional Review Board of
MMNs
Grand averages of standard and deviant responses for all four conditions are illustrated in Fig. 3.
The LMM on the MMN response using the difference of the RMS method showed a main effect of distance (F(1,84) = 4.25, p < 0.05), with larger MMN responses in the large F1 distance condition than in small F1 distance condition (z = 2.93, p < 0.01). Crucially, distance interacted with direction (F(1,84) = 4.40, p < 0.05). This interaction effect recapitulates the distinction between predictive and non-predictive
Discussion
The main result of our neuromagnetic study on the processing of vowels with differing linguistic structure is that the mid-vowel [ε] (as in ‘bet’) consistently resulted in neural patterns that were distinct from the more specific high-vowels [ɪ] and [æ]: oscillatory power in the beta-band was reduced even before the onset of [ε] in standard position, compared to [ɪ]. Moreover, MMN amplitudes were significantly reduced when standard [ε] preceded deviant [ɪ], compared to the reverse case, i.e.
Conclusions
In this neuromagnetic study on the processing of American English vowels, we found that less specific category structure (as exemplified by the mid-vowel [ε]) resulted in reduced MMN responses and reduced beta-power. Our results are compatible within the predictive coding framework (Friston, 2005) and the underspecification approach (Lahiri and Reetz, 2002, Lahiri and Reetz, 2010), while pure bottom–up sensory models would not be able to readily account for the observed patterns in our data.
Acknowledgments
We thank Ariane Rhone for helping prepare the stimuli and Max Ehrmann for laboratory assistance. The research for this study was funded by the NIH grant 7ROIDC005660-07 to W.J.I. and David Poeppel. During preparation of the manuscript, MS was supported by a personal grant from the German Science Foundation (DFG) on “Global and local aspects of temporal and lexical predictions for speech processing” (University of Leipzig).
References (76)
- et al.
Cortical oscillations and sensory predictions
Trends Cogn. Sci.
(2012) - et al.
Do sensorimotor beta-oscillations maintain muscle synergy representations in primary motor cortex?
Trends Neurosci.
(2015) Repetition effects to sounds: evidence for predictive coding in the auditory system
Trends Cogn. Sci.
(2006)- et al.
Early electrophysiological indicators for predictive processing in audition: a review
Int. J. Psychophysiol.
(2012) - et al.
Auditory cortical tuning to statistical regularities in phonology
Clin. Neurophysiol.
(2005) - et al.
“What you encode is not necessarily what you store”: evidence for sparse feature representations from mismatch negativity
Brain Res.
(2011) - et al.
Denoising based on time-shift PCA
J. Neurosci. Methods
(2007) - et al.
Sensor noise suppression
J. Neurosci. Methods
(2008) - et al.
The neurotopography of vowels as mirrored by evoked magnetic field measurements
Brain Lang.
(1996) - et al.
Acoustic landmarks drive delta–theta oscillations to enable speech comprehension by facilitating perceptual parsing
NeuroImage
(2014)
Beta-band oscillations—signalling the status quo?
Curr. Opin. Neurobiol.
The mismatch negativity: a review of underlying mechanisms
Clin. Neurophysiol.
Distinctive features: phonological underspecification in representation and processing
J. Phon.
Fast oscillatory dynamics during language comprehension: unification versus maintenance and prediction?
Brain Lang.
Phase patterns of neuronal responses reliably discriminate speech in human auditory cortex
Neuron
Region-specific modulations in oscillatory alpha activity serve to facilitate processing in the visual and auditory modalities
NeuroImage
The mismatch negativity (MMN) in basic research of central auditory processing: a review
Clin. Neurophysiol.
The assessment and analysis of handedness: the Edinburgh Inventory
Neuropsychologia
Native and foreign vowel discrimination as indexed by the mismatch negativity (MMN) response
Neurosci. Lett.
Properties of the tongue help to define vowel categories: hypotheses based on physiologically-oriented modeling
J. Phon.
Language outside the focus of attention: the mismatch negativity as a tool for studying higher cognitive processes
Prog. Neurobiol.
Tracking speech comprehension in space and time
NeuroImage
Minimal representations of alternating vowels
Lingua
Sparseness of vowel category structure: evidence from English dialect comparison
Lingua
Alpha and theta brain oscillations index dissociable processes in spoken word recognition
NeuroImage
Adaptive modeling of the unattended acoustic environment reflected in the mismatch negativity event-related potential
Brain Res.
Pre-attentive detection of vowel contrasts utilizes both phonetic and auditory memory representations
Cogn. Brain Res.
Strength of word-specific neural memory traces assessed electrophysiologically
PLoS One
Transitions in neural oscillations reflect prediction errors generated in audiovisual speech
Nat. Neurosci.
lme4: linear mixed-effects models using eigen and S4. R package version
Stimulus frequency dependence of the transient oscillatory auditory evoked response (40 Hz) studied by electric and magnetic recordings in humans
Induced neural beta oscillations predict categorical speech perception abilities
Brain Lang.
Processing nasals with and without consecutive context phonemes: evidence from explicit categorization and the N100
Front. Psychol.
PRAAT: Doing Phonetics by Computer (ver. 5.2.24)
Multiple Comparisons Using R
Top–down versus bottom–up control of attention in the prefrontal and posterior parietal cortices
Science
Rhythms of the Brain
Neuronal oscillations in cortical networks
Science
Cited by (27)
Rapid pre-attentive processing of a famous speaker: Electrophysiological effects of Angela Merkel's voice
2022, NeuropsychologiaCitation Excerpt :Since the general pattern in our study is comparable to previous studies, our findings provide evidence for the automatic and pre-attentive character of voice processing. Given that the MMN indexes differences between familiar and unfamiliar voices in our experiment, and given that previous experiments have shown the MMN's sensitivity to phonological processing (Phillips et al., 2000; Scharinger et al., 2016), we conjecture that voice processing takes place in parallel to speech sound processing. Possibly, the acoustic abstraction process indexed by the MMN applies to speech sounds as well as to voices.
Neuroplasticity in the phonological system: The PMN and the N400 as markers for the perception of non-native phonemic contrasts by late second language learners
2021, NeuropsychologiaCitation Excerpt :Overall, in late L2 learners, the capacity to identify lexical-semantic violations in a sentence context is reflected by characteristic, native-like oscillatory patterns that are associated with sentence-level semantic unification processes, under inclusion of additional attentional and control processes. With respect to phonological mismatch processing, previous studies have also associated power modulations in the beta frequency band with the top-down transmission of predictions at the phonological level (Arnal and Giraud, 2012), with beta power increases being associated with stronger phonological predictions (Scharinger et al., 2016). In the present study, no beta power modulations for phonemic violations were found in either the highly or low proficient group.
Abstractness of human speech sound representations
2020, Brain ResearchCitation Excerpt :One of the techniques recommended by Luck and Gaspelin (2017) to avoid experimenter bias in time window and electrode region selection, and to reduce the multiple comparisons problem, is to use Principal Component Analysis (PCA) to statistically determine the underlying temporal and spatial dynamics of the experimental effects in the data. We employed this method by using the factor analysis approach developed by Dien and colleagues (Dien et al., 2005; Dien et al., 2003, 2004; Spencer et al., 1999, 2001). Following published recommendations (Dien, 2012; Dien et al., 2005), we used sequential temporo-spatial PCA decomposition to identify the set of discrete or orthogonal temporal events in the voltage fluctuations, as well as discrete spatial regions of activity within each temporal event.
Long-latency event-related responses to vowels: N1-P2 decomposition by two-step principal component analysis
2020, International Journal of PsychophysiologyTop-down and bottom-up mechanisms as reflected by beta and gamma oscillations in speech perception: An individual-difference approach
2019, Brain and LanguageCitation Excerpt :On the other hand, γ reflects synchronously active neural assemblies (feature binding) or signals the precise temporal relationship of concurrently incoming stimuli, representing a coherent sensory percept (Tallon-Baudry, Bertrand, Wienbruch, Ross, & Pantev, 1997). Specific to the domain of speech sound processing, recent neurophysiological studies have also provided increasing evidence to substantiate the proposed functions subserved by β and γ, respectively (Bidelman, 2015; Pefkou, Arnal, Fontolan, & Giraud, 2017; Scharinger, Monahan, & Idsardi, 2016). For example, English-speaking participants showed higher β power to prototypical than to ambiguous vowels in a categorization task, while γ power was enhanced in response to ambiguous compared with prototypical stimuli (Bidelman, 2015).
Does phonological rule of tone substitution modulate mismatch negativity?
2019, Journal of NeurolinguisticsCitation Excerpt :As a consequence, underspecified phonemes are often replaced or assimilated by other phonemes. In this case, when an underspecified phoneme serves as the standard, the MMN response might be smaller because of less conflict between the underlying representation of the standard and the incoming deviant sound (Cornell, Lahiri, & Eulitz, 2008; Eulitz & Lahiri, 2004); for example, the mid-vowel/ε/in English (Scharinger, Monahan, & Idsardi, 2016) or T3 in Mandarin Chinese (Politzer-Ahles, Schluter, Wu, & Almeida, 2016). To better disentangle these accounts, this study included both Mandarin and Taiwanese materials.