Elsevier

NeuroImage

Volume 39, Issue 3, 1 February 2008, Pages 1429-1443
NeuroImage

Neural mechanisms underlying auditory feedback control of speech

https://doi.org/10.1016/j.neuroimage.2007.09.054Get rights and content

Abstract

The neural substrates underlying auditory feedback control of speech were investigated using a combination of functional magnetic resonance imaging (fMRI) and computational modeling. Neural responses were measured while subjects spoke monosyllabic words under two conditions: (i) normal auditory feedback of their speech and (ii) auditory feedback in which the first formant frequency of their speech was unexpectedly shifted in real time. Acoustic measurements showed compensation to the shift within approximately 136 ms of onset. Neuroimaging revealed increased activity in bilateral superior temporal cortex during shifted feedback, indicative of neurons coding mismatches between expected and actual auditory signals, as well as right prefrontal and Rolandic cortical activity. Structural equation modeling revealed increased influence of bilateral auditory cortical areas on right frontal areas during shifted speech, indicating that projections from auditory error cells in posterior superior temporal cortex to motor correction cells in right frontal cortex mediate auditory feedback control of speech.

Introduction

While many motor acts are aimed at achieving goals in three-dimensional space (e.g., reaching, grasping, throwing, walking, and handwriting), the primary goal of speech is an acoustic signal that transmits a linguistic message via the listener’s auditory system. For spatial tasks, visual feedback of task performance plays an important role in monitoring performance and improving skill level (Redding and Wallace, 2006, Huang and Shadmehr, 2007). Analogously, auditory information plays an important role in monitoring vocal output and achieving verbal fluency (Lane and Tranel, 1971, Cowie and Douglas-Cowie, 1983). Auditory feedback is crucial for online correction of speech production (Lane and Tranel, 1971, Xu et al., 2004, Purcell and Munhall, 2006b) and for the development and maintenance of stored motor plans (Cowie and Douglas-Cowie, 1983, Purcell and Munhall, 2006a, Villacorta et al., 2007).

The control of movement is often characterized as involving one or both of two broad classes of control. Under feedback control, task performance is monitored during execution and deviations from the desired performance are corrected according to sensory information. Under feedforward control, task performance is executed from previously learned commands, without reliance on incoming task-related sensory information. Speech production involves both feedforward and feedback control, and auditory feedback has been shown to impact both control processes (Houde and Jordan, 1998, Jones and Munhall, 2005, Bauer et al., 2006, Purcell and Munhall, 2006a).

Early evidence of the influence of auditory feedback on speech came from studies showing that speakers modify the intensity of their speech in noisy environments (Lombard, 1911). Artificial disruption of normal auditory feedback in the form of temporally delayed feedback induces disfluent speech (Yates, 1963, Stuart et al., 2002). Recent studies have used transient, unexpected auditory feedback perturbations to demonstrate auditory feedback control of speech. Despite being unable to anticipate the perturbation, speakers respond to pitch (Larson et al., 2000, Donath et al., 2002, Jones and Munhall, 2002, Natke et al., 2003, Xu et al., 2004) and formant shifts (Houde and Jordan, 2002, Purcell and Munhall, 2006b) by altering their vocal output in the direction opposite the shift. These compensatory responses act to steer vocal output closer to the intended auditory target.

The ease with which fluent speakers are able to coordinate the rapid movements of multiple articulators, allowing production of as many as 4–7 syllables per second (Tsao and Weismer, 1997), suggests that speech is also guided by a feedforward controller (Neilson and Neilson, 1987). Our ability to speak effectively when noise completely masks auditory feedback (Lane and Tranel, 1971, Pittman and Wiley, 2001) and the maintained intelligibility of post-lingually deafened individuals (Cowie and Douglas-Cowie, 1983, Lane and Webster, 1991) are further evidence of feedforward control mechanisms. The existence of stored feedforward motor commands that are tuned over time by auditory feedback is provided by studies of sensorimotor adaptation (Houde and Jordan, 2002, Jones and Munhall, 2002, Jones and Munhall, 2005, Purcell and Munhall, 2006a). Speakers presented with auditory feedback containing a persistent shift of the formant frequencies (which constitute important cues for speech perception) of their own speech, will adapt to the perturbation by changing the formants of their speech in the direction opposite the shift. Following adaptation, utterances made immediately after removal or masking of the perturbation typically contain formants that differ from baseline formants in the direction opposite the induced perturbation (e.g., Purcell and Munhall, 2006a). These “overshoots” following adaptation indicate a reorganization of the sensory–motor neural mappings that underlie feedforward control in speech (e.g., Purcell and Munhall, 2006a). The same studies also illustrate that the feedforward speech controller continuously monitors auditory feedback and is modified when that feedback does not meet expectations.

The DIVA model of speech production (Guenther et al., 1998, Guenther et al., 2006) is a quantitatively defined neuroanatomical model that provides a parsimonious account of how auditory feedback is used for both feedback control and for tuning feedforward commands. According to the model, feedforward and feedback commands are combined in primary motor cortex to produce the overall muscle commands for the speech articulators. Both control processes are initiated by activating cells in a speech sound map (SSM) located in the left ventral premotor areas, including Broca’s area in the opercular portion of the inferior frontal gyrus. Activation of these cells leads to the readout of excitatory feedforward commands through projections to the primary motor cortex. Additional projections from the speech sound map to higher-order auditory cortical areas located in the posterior superior temporal gyrus and planum temporale encode auditory targets for the syllable to be spoken. The SSM-to-auditory error cell projections are hypothesized to have a net inhibitory effect on auditory cortex. The auditory targets encoded in these projections are compared to the incoming auditory signal by auditory error cells that respond when a mismatch is detected between the auditory target and the current auditory feedback signal. When a mismatch is detected, projections from the auditory error cells to motor cortex transform the auditory error into a corrective motor command. The model proposes that these corrective motor commands are added to the feedforward command for the speech sound so future productions of the sound will contain the corrective command. In other words, the feedforward control system becomes tuned by incorporating the commands sent by the auditory feedback control system on earlier attempts to produce the syllable.

Because the DIVA model is both quantitatively and neuroanatomically defined, the activity of model components in computer simulations of perturbed and unperturbed speech can be directly compared to task-related blood oxygen level-dependent (BOLD) responses in speakers performing the same tasks. According to the model, unexpected auditory feedback should induce activation of auditory error cells in the posterior superior temporal gyrus and planum temporale (Guenther et al., 2006). Auditory error cell activation then drives a compensatory motor response marked by increased activation of ventral motor, premotor, and superior cerebellar cortex.

The current study utilizes auditory perturbation of speech, in the form of unpredictable upward and downward shifts of the first formant frequency, to identify the neural circuit underlying auditory feedback control of speech movements and to test DIVA model predictions regarding feedback control of speech. Functional magnetic resonance imaging (fMRI) was performed while subjects read aloud monosyllabic words projected orthographically onto a screen. A sparse sampling protocol permitted vocalization in the absence of scanner noise (Yang et al., 2000, Le et al., 2001, Engelien et al., 2002). An electrostatic microphone and headset provided subjects with auditory feedback of their vocalizations while in the scanner. On a subset of trials, an unpredictable real-time F1 shift was introduced to the subject’s auditory feedback. Standard voxel-based analysis of neuroimaging data was supplemented with region of interest (ROI) analyses (Nieto-Castanon et al., 2003) to improve anatomical specificity and increase statistical power. Compensatory responses were also characterized behaviorally by comparing the formant frequency content of vocalizations made during perturbed and unperturbed feedback conditions. Structural equation modeling was used to assess changes in effective connectivity that accompanied increased use of auditory feedback control.

Section snippets

Subjects

Eleven right-handed native speakers of American English (6 females and 5 males; 23–36 years of age, mean age = 28) with no history of neurological disorder participated in the study. All study procedures, including recruitment and acquisition of informed consent, were approved by the institutional review boards of Boston University and Massachusetts General Hospital. A scanner problem that resulted in the introduction of non-biological noise in acquired scans required the elimination of imaging

Acoustic responses

Subjects responded to unexpected F1-shifted auditory feedback by altering the F1 of their speech in the direction opposite the induced shift. Mean F1 traces from the shift-up and shift-down conditions expressed relative to their token- and subject-matched no shift responses are plotted in Fig. 2A. The compensation traces, averaged across subjects, demonstrate significant downward divergence in the shift up condition and upward divergence in the shift down condition compared to the no shift

Formant shift compensation

As illustrated in Fig. 2, subjects responded to unexpected F1 shifts by altering the F1 of their speech in the direction opposite the induced shift. Computer simulations of the DIVA model verified the model’s ability to account for these compensatory responses (Fig. 2B) following an increase in the relative contribution of auditory feedback on motor control. Adaptation to consistently applied upward or downward F1 shifts during production of /CεC/ utterances similar to those used in the current

Conclusions

Collectively, the behavioral and imaging results presented here provide important advancements to our understanding of the role of sensory feedback in online control of vocalization and the network of brain regions that support this control. Behavioral data demonstrated clear evidence of feedback-based correction of segmental vocal output. Imaging data indicated that, in the absence of feedback error, articulator control was left-lateralized in the frontal cortex. When auditory error was

Acknowledgments

This work was supported by grant R01 DC02852 from the National Institute on Deafness and other Communication Disorders (F. Guenther, PI). Imaging was performed at the Athinoula A. Martinos Center for Biomedical Imaging, which is funded by grants from the National Center for Research Resources (P41RR14075) and the MIND institute. The authors would like to thank Satrajit Ghosh, Alfonso Nieto-Castanon, Jason W. Bohland, Virgilio Villacorta, Oren Civier, and Joseph Perkell for their valuable

References (92)

  • J.A. Jones et al.

    Remapping auditory–motor representations in voice production

    Curr. Biol.

    (2005)
  • K.V. Kriegstein et al.

    Distinct functional substrates along the right superior temporal sulcus for the processing of voices

    NeuroImage

    (2004)
  • M. Neilson et al.

    Speech motor control and stuttering: a computational model of adaptive sensory–motor processing

    Speech Commun.

    (1987)
  • K. Neumann et al.

    Cortical plasticity associated with stuttering therapy

    J. Fluen. Disord.

    (2005)
  • A. Nieto-Castanon et al.

    Region of interest based analysis of functional imaging data

    NeuroImage

    (2003)
  • J. Numminen et al.

    Differential effects of overt, covert and replayed speech on vowel-evoked responses of the human auditory cortex

    Neurosci. Lett.

    (1999)
  • J. Numminen et al.

    Subject's own speech reduces reactivity of the human auditory cortex

    Neurosci. Lett.

    (1999)
  • E. Ozdemir et al.

    Shared and distinct neural correlates of singing and speaking

    NeuroImage

    (2006)
  • R. Prabhakaran et al.

    An event-related fMRI investigation of phonological–lexical competition

    Neuropsychologia

    (2006)
  • J.J. Sidtis et al.

    Mapping cerebral blood flow during speech production in hereditary ataxia

    NeuroImage

    (2006)
  • P. Soros et al.

    Clustered functional MRI of overt speech production

    NeuroImage

    (2006)
  • A.A. Stevens et al.

    Event-related fMRI of auditory and visual oddball tasks

    Magn. Reson. Imaging

    (2000)
  • A. Toyomura et al.

    Neural correlates of auditory feedback control in human

    Neuroscience

    (2007)
  • R.J. Wise et al.

    Brain regions involved in articulation

    Lancet

    (1999)
  • D.M. Wolpert et al.

    Motor prediction

    Curr. Biol.

    (2001)
  • J.J. Bauer et al.

    Vocal responses to unanticipated perturbations in voice loudness feedback: an automatic mechanism for stabilizing voice amplitude

    J. Acoust. Soc. Am.

    (2006)
  • S.J. Blakemore et al.

    Central cancellation of self-produced tickle sensation

    Nat. Neurosci.

    (1998)
  • S.J. Blakemore et al.

    Why can't you tickle yourself?

    NeuroReport

    (2000)
  • S. Brown et al.

    Stuttered and fluent speech production: an ALE meta-analysis of functional neuroimaging studies

    Hum. Brain Mapp.

    (2005)
  • T.A. Burnett et al.

    Early pitch-shift response is active in both steady and dynamic voice pitch control

    J. Acoust. Soc. Am.

    (2002)
  • V.S. Caviness et al.

    MRI-based topographic parcellation of human neocortex: an anatomically specified method with estimate of reliability

    J. Cogn. Neurosci.

    (1996)
  • I.K. Christoffels et al.

    Neural correlates of verbal feedback processing: an fMRI study employing overt speech

    Hum. Brain Mapp.

    (2007)
  • C.G. Clopper et al.

    Acoustic characteristics of the vowel systems of six regional varieties of American English

    J. Acoust. Soc. Am.

    (2005)
  • A. Collignon et al.

    Automated multi-modality image registration based on information theory

  • R.I. Cowie et al.

    Speech production in profound post-lingual deafness

  • G. Curio et al.

    Speaking modifies voice-evoked activity in the human auditory cortex

    Hum. Brain Mapp.

    (2000)
  • L.F. De Nil et al.

    A positron emission tomography study of short- and long-term treatment effects on functional brain activation in adults who stutter

    J. Fluen. 0Disord.

    (2003)
  • V. Della-Maggiore et al.

    Corticolimbic interactions associated with performance on a short-term memory task are modified by age

    J. Neurosci.

    (2000)
  • J. Diedrichsen et al.

    Neural correlates of reach errors

    J. Neurosci.

    (2005)
  • T.M. Donath et al.

    Effects of frequency-shifted auditory feedback on voice F0 contours in syllables

    J. Acoust. Soc. Am.

    (2002)
  • N.F. Dronkers

    A new brain region for coordinating speech articulation

    Nature

    (1996)
  • J.R. Duffy

    Motor Speech Disorders

    (1995)
  • S.J. Eliades et al.

    Dynamics of auditory–vocal interaction in monkey auditory cortex

    Cereb. Cortex

    (2005)
  • A.C. Evans et al.

    3D statistical neuroanatomical models from 305 MRI volumes

    Proc. IEEE Nucl. Sci. Symp. Med. Imag.

    (1993)
  • S.H. Ferguson et al.

    Vowel intelligibility in clear and conversational speech for normal-hearing and hearing-impaired listeners

    J. Acoust. Soc. Am.

    (2002)
  • B. Fischl et al.

    Automatically parcellating the human cerebral cortex

    Cereb. Cortex

    (2004)
  • Cited by (460)

    View all citing articles on Scopus
    View full text