Locating the initial stages of speech–sound processing in human temporal cortex
Introduction
Processing in the ascending auditory pathway of the primate is largely sequential and hierarchical up to and including auditory cortex with its core, belt and parabelt regions (Rauschecker, 1998, Kaas and Hackett, 2000). Speech is recoded by the auditory nuclei of the brainstem and thalamus before it reaches auditory cortex (Irvine, 1992, Eggermont, 2001, Frisina, 2001), and both the anatomical and physiological literature suggest that the recoding involves processing of acoustic features relevant to speech. However, these nuclei seem to apply general processes to all sounds independent of their source, and there is no indication that speech-specific processing begins before auditory cortex.
Neuroimaging evidence indicates that, in humans, information is processed in a hierarchical manner in auditory cortex, and that this hierarchical organization extends into the multimodal regions beyond auditory cortex (Hall et al., 2001, Hall et al., 2003, Wessinger et al., 2001, Patterson et al., 2002, Scott and Johnsrude, 2003). It is clear that speech sounds have to pass through primary auditory cortex (PAC), which is active in the presence of speech sounds (e.g., Belin et al., 2002, Ahissar et al., 2001). But the corresponding area in nonhuman mammals (A1) is also active in the presence of speech sounds (Steinschneider et al., 2003, Versnel and Shamma, 1998), so this activity appears to represent general auditory processing of acoustic features rather than speech-specific processes. Regions well beyond PAC, in the superior temporal sulcus and middle temporal gyrus, are sensitive to the intelligibility of sentences, particularly in the left hemisphere (Davis and Johnsrude, 2003, Narain et al., 2003, Scott et al., 2000, Giraud et al., 2004). However, intervening between PAC and these middle temporal regions, there are at least three anatomically differentiable regions that appear to be predominantly auditory, and the connectivity of these regions suggests at least three stages of processing (the human homologues of belt and parabelt cortex, and the upper STS region TAa; Kaas and Hackett, 2000, Seltzer and Pandya, 1991). We hypothesize that the earliest stages to show some specialization for the processing of speech will be within these three regions.
A large number of imaging studies have investigated temporal-lobe involvement in speech-specific processing (e.g., Binder et al., 1997, Binder et al., 2000, Binder et al., 2004, Benson et al., 2001, Callan et al., 2004, Crinion et al., 2003, Davis and Johnsrude, 2003, Dehaene-Lambertz et al., 2005, Demonet et al., 1992, Gandour et al., 2003, Giraud and Price, 2001, Giraud et al., 2004, Hugdahl et al., 2003, Jancke et al., 2002, Joanisse and Gati, 2003, Liebenthal et al., 2003, Liebenthal et al., 2005, Mummery et al., 1999, Narain et al., 2003, Poeppel et al., 2004, Rimol et al., 2005, Schlosser et al., 1998, Scott et al., 2000, Specht and Reul, 2003, Thierry et al., 2003, Vouloumanos et al., 2001, Zatorre et al., 1992; see also Belin et al., 2000, Price et al., 2005). However, many of these studies have investigated speech at the level of the word, phrase or sentence, and such stimuli would probably have engaged lexical, semantic and syntactic processes in addition to speech-sound processing (e.g., Crinion et al., 2003, Davis and Johnsrude, 2003, Giraud et al., 2004, Narain et al., 2003, Scott et al., 2000, Schlosser et al., 1998, Zatorre et al., 1992). In many studies, the acoustic characteristics of the speech stimuli differed substantially from the nonspeech stimuli (e.g., Benson et al., 2001, Binder et al., 2000, Binder et al., 2004, Demonet et al., 1992, Giraud and Price, 2001, Vouloumanos et al., 2001). Many studies include a task that required or encouraged the listener to make an explicit linguistic judgement about the sounds they were hearing (Callan et al., 2004, Jancke et al., 2002, Liebenthal et al., 2003, Binder et al., 2004, Dehaene-Lambertz et al., 2005). But none of these studies has compared the activity to elementary speech sounds (such as vowels) with that to acoustically matched nonspeech sounds, and done so while listeners are engaged on a task that encourages attention to the stimuli, but that does not require or encourage linguistic processing of the sounds.
In this paper, we identify the cortical regions where processing of speech and nonspeech sounds begins to diverge by comparing a new class of synthetic vowels with a set of nonspeech controls that are closely matched in terms of the distribution of energy across frequency and over time. The synthetic vowels, with distinctive properties that identify them as linguistically relevant sounds produced by a human vocal tract, are immediately heard as speech sounds, while the nonspeech controls cannot be heard as speech even with deliberate effort. We furthermore deliberately chose to use a task that would not preferentially engage speech processing—listeners performed a simple intensity monitoring task.
Section snippets
Stimulus development
In creating synthetic vowels, we adhered to the following constraints: (1) Their spectra exhibit three to four relatively narrow, formants in the frequency region below 4000 Hz. (2) Stimuli are presented in sequences, where the individual elements within each sequence occur at a rate of about three per second. (3) The spectra of successive vowels within a sequence differ, but they all appear to come from the same vocal tract (e.g., they are different vowels spoken by the same person). All three
Activation in response to sound
When each of the five sound conditions was contrasted with silence, similar patterns of activation were observed (Fig. 4). The activation was largely confined to the temporal lobes, bilaterally, and there were always two main foci of activation in auditory cortex; one towards the medial end of Heschl's gyrus and one towards the lateral end of HG. These two foci are highly consistent in their locations across conditions.
Main effect of speechlikeness
No significant regions of activation were observed when any of the three
Discussion
In this imaging study, volunteers heard natural vowels produced by a human speaker and four classes of synthetic sounds that were closely matched to each other in terms of the distribution of energy over frequency and time. The two conditions with speech-like carrier frequencies (formants) were perceived as vowels. The two conditions perceived as nonspeech were produced from these synthesized vowels by simple manipulations in the time and frequency domains. When the speech conditions were
Summary
This paper has identified regions of the brain that appear to be differentially involved in the processing of those stimulus characteristics that make sequences of vowels unique in the natural environment. With the aid of carefully controlled sets of stimuli, we have identified candidate areas that appear to be specifically involved in processing those properties of the acoustic signal that indicate whether or not the sound will be perceived as speech. The most active of these centers is in the
Acknowledgments
This work was supported by grants from the Medical Research Council (G9900369, G9901257, G0500221; authors RP and SU). The functional MRI was carried out at the Wolfson Brain Imaging Centre in Cambridge; we thank the radiographers for their assistance with data acquisition and Chris Wood for his help with data preprocessing.
References (89)
- et al.
Human temporal-lobe response to vocal sounds
Cogn. Brain Res.
(2002) - et al.
Parametrically dissociating speech and nonspeech perception in the brain using fMRI
Brain Lang.
(2001) - et al.
Spatial normalization of brain images with focal lesions using cost function masking
NeuroImage
(2001) - et al.
Phonetic perceptual identification by native- and second-language speakers differentially activates brain regions involved with acoustic phonetic processing and those involved with articulatory–auditory/orosensory internal models
NeuroImage
(2004) - et al.
Auditory grouping
- et al.
Neural correlates of switching from auditory to speech perception
NeuroImage
(2005) Between sound and perception: reviewing the search for a neural code
Hear. Res.
(2001)Subcortical neural coding mechanisms for auditory temporal processing
Hear. Res.
(2001)- et al.
Thresholding of statistical maps in functional neuroimaging using the false discovery rate
NeuroImage
(2002) - et al.
Sustained magnetic fields reveal separate sites for sound level and temporal regularity in human auditory cortex
NeuroImage
(2002)
The effects of attention on speech perception: an fMRI study
Brain Lang.
Phonetic perception and the temporal cortex
NeuroImage
Overlapping neural regions for processing rapid temporal cues in speech and nonspeech signals
NeuroImage
Auditory processing in primate cerebral cortex
Curr. Opin. Neurobiol.
Human primary auditory cortex: cytoarchitectonic subdivisions and mapping into a spatial reference system
NeuroImage
A high-output, high-quality sound system for use in auditory fMRI
NeuroImage
The processing of temporal pitch and melody information in auditory cortex
Neuron
Auditory lexical decision, categorical perception, and FM direction discrimination differentially engage left and right auditory cortex
Neuropsychologia
Speech-specific auditory processing: where is it?
Trends Cogn. Sci.
Probabilistic mapping and volume measurement of human primary auditory cortex
NeuroImage
Cortical processing of complex sounds
Curr. Opin. Neurobiol.
Cytochrome oxidase, acetylcholinesterase, and NADPH-diaphorase staining in human supratemporal and insular cortex: evidence for multiple auditory areas
NeuroImage
Processing of sub-syllabic speech units in the posterior temporal lobe: an fMRI study
NeuroImage
The neuroanatomical and functional organization of speech perception
Trends Neurosci.
Afferent cortical connections and architectonics of the superior temporal sulcus and surrounding cortex in the rhesus monkey
Brain Res.
Functional segregation of the temporal lobes into highly differentiated subsystems for auditory perception: an auditory rapid event-related fMRI-task
NeuroImage
Hemispheric dissociation in access to the human semantic system
Neuron
Seeing and hearing speech excites the motor system involved in speech production
Neuropsychologia
Speech comprehension is correlated with temporal response patterns recorded from auditory cortex
Proc. Natl. Acad. Sci. U. S. A.
Voice-selective areas in human auditory cortex
Nature
Human brain language areas identified by functional magnetic resonance imaging
J. Neurosci.
Human temporal lobe activation by speech and nonspeech sounds
Cereb. Cortex
Neural correlates of sensory and decision processes in auditory object identification
Nat. Neurosci.
The problem of functional localization in the human brain
Nat. Rev., Neurosci.
Functional connections between auditory cortex on Heschl's gyrus and on the lateral superior temporal gyrus in humans
J. Neurophysiol.
Temporal lobe regions engaged during normal speech comprehension
Brain
The Method of Paired Comparisons
Hierarchical processing in spoken language comprehension
J. Neurosci.
The anatomy of phonological and semantic processing in normal subjects
Brain
Improved auditory cortex imaging using clustered volume acquisitions
Hum. Brain Mapp.
Neural correlates of segmental and tonal information in speech perception
Hum. Brain Mapp.
The constraints functional neuroimaging places on classical models of auditory word processing
J. Cogn. Neurosci.
Contributions of sensory input, auditory search and verbal comprehension to cortical activity during speech processing
Cereb. Cortex
Encoding of the temporal regularity of sound in the human brainstem
Nat. Neurosci.
Cited by (141)
Cortical networks for recognition of speech with simultaneous talkers
2023, Hearing ResearchCortical activity evoked by voice pitch changes: A combined fNIRS and EEG study
2022, Hearing ResearchCitation Excerpt :Specifically, stronger activity in response to pitch changes was observed in the right planum polare and superior temporal sulcus (STS), both located in the superior temporal cortex (STC). In agreement with these findings, fMRI evidence demonstrated that vowels with a fixed pitch elicit prominent bilateral activity in superior temporal areas (Uppenkamp et al., 2006), while vowels with prosodic modulations evoke additional activity in right superior temporal areas (Obleser et al., 2006). Accordingly, theoretical models of auditory cortical processing (Poeppel, 2003; Zatorre et al., 2002) posit that pitch changes and slow spectral modulations should engage right superior temporal areas, in particular their anterior part.
Cognitive perturbations affect brain cortical activity and postural control: An investigation of human EEG and motion data
2021, Biomedical Signal Processing and ControlCompound words are decomposed regardless of semantic transparency and grammatical class: An fMRI study in Persian
2021, LinguaCitation Excerpt :We randomly selected 30 stimuli from each condition and then extracted their temporal envelope. The extracted envelopes were then filled with MuR by jittering 10 ms fragments of vowel formants (Bozic et al., 2010; Uppenkamp et al., 2006). The resulting MuR stimuli were matched in length with other conditions, and had the same long-term spectro-temporal distribution of energy.
- 1
The first two authors contributed equally to this work.