Research paper
Hearing speech sounds: Top-down influences on the interface between audition and speech perception

https://doi.org/10.1016/j.heares.2007.01.014Get rights and content

Abstract

This paper focuses on the cognitive and neural mechanisms of speech perception: the rapid, and highly automatic processes by which complex time-varying speech signals are perceived as sequences of meaningful linguistic units. We will review four processes that contribute to the perception of speech: perceptual grouping, lexical segmentation, perceptual learning and categorical perception, in each case presenting perceptual evidence to support highly interactive processes with top-down information flow driving and constraining interpretations of spoken input. The cognitive and neural underpinnings of these interactive processes appear to depend on two distinct representations of heard speech: an auditory, echoic representation of incoming speech, and a motoric/somatotopic representation of speech as it would be produced. We review the neuroanatomical system supporting these two key properties of speech perception and discuss how this system incorporates interactive processes and two parallel echoic and somato-motoric representations, drawing on evidence from functional neuroimaging studies in humans and from comparative anatomical studies. We propose that top-down interactive mechanisms within auditory networks play an important role in explaining the perception of spoken language.

Introduction

You receive an unexpected call on your mobile phone. Despite the background noise on the line you immediately recognise your colleague’s voice and can hear that she is excited about something. Catching her breath, she tells you that your joint grant application has been approved for funding and that you should meet to celebrate. In the space of a few seconds, this phone conversation has communicated a vital piece of information, conveyed the emotional significance of this news and provided physical information about the talker. While such exciting news is almost certainly not a daily occurrence, the cognitive and neural mechanisms that are at the heart of this scenario are so ubiquitous as to go largely unnoticed in our day-to-day life. We invariably focus on the information being communicated rather than the means by which it is conveyed, even in difficult listening situations.g.1

This paper will focus on the cognitive and neural mechanisms by which a complex time-varying acoustic signal is perceived as sequences of sounds that convey meaning; addressing precisely those stages of processing that occur so rapidly, automatically and effortlessly as to be beneath our notice. We suggest that a complete account of speech perception requires an understanding of both basic auditory and higher-level cognitive processes (see Plomp, 2001, for similar arguments). We will present evidence for an interactive processing system in which bottom-up and top-down processes combine to support speech perception. This interactive account provides mechanisms by which perceptual processing can rapidly change so as to optimally perceive and comprehend speech – including those important mobile-phone calls.

In the first section of the paper we will review behavioural evidence for interactive processes playing a critical role in speech perception. The background provided by these several decades of behavioural evidence must be accounted for by any neural account of speech perception and therefore constitutes the majority of the evidence presented here. Building on this behavioural evidence, the second section of the paper describes two types of representation that are integral to the implementation of an interactive account of speech perception. These multiple, parallel representations of the speech input make distinct contributions to the robustness of speech perception. In the third and final section of the paper we briefly review evidence from the anatomy of the auditory system that is consistent with this computational account, reviewing evidence both for interactive processes, and for multiple perceptual pathways.

Section snippets

Evidence for interactivity in speech perception

In this section, we will discuss four processes that contribute to speech perception: (1) perceptual grouping of speech sounds into a single coherent stream, (2) segmentation of speech into meaningful (lexical) units, (3) perceptual learning mechanisms by which distorted and degraded speech is perceived and comprehended, and (4) mechanisms for perceiving variable forms of speech in a categorical fashion. For each of these four cases we suggest that evidence supports highly interactive processes

Computational requirements for interactive processes in speech perception

We have reviewed four domains in which top-down processes appear to contribute to speech perception: in promoting perceptual grouping, in achieving lexical segmentation, in supporting perceptual learning of distorted speech, and in maintaining categorical perception of speech segments. In this section, we will address the computational implications of such interactions and suggest that: (1) top-down influences act on auditory, echoic representations of incoming speech, and (2) top-down

Towards a neuroanatomical account of speech perception

This section will discuss the neural basis of the two central propositions that we make concerning speech perception: (1) that bidirectional, interactive connectivity allows higher-level constraints to influence ongoing speech perception and support the rapid retuning of perceptual processes, and (2) that parallel processing pathways support both an auditory-echoic record of incoming speech and the mapping of heard speech onto somatomotor representations involved in speech production. In

Concluding remarks

“Whereas elementary functions of a tissue can, by definition, have a precise localization in particular cell groups, there can of course be no question of the localization of complex functional systems in limited areas of the brain or of its cortex.” Luria (1976), p. 30.

In this paper we have proposed a multiple-pathway account of auditory processes that are critically important for a complex and uniquely human function – the comprehension of spoken language. As the quotation from Luria

Acknowledgements

Preparation of this paper was supported by the UK Medical Research Council, and the Canada Research Chairs program. We thank Maggie Kemmner, Sarah Hawkins and two anonymous reviewers for comments on an earlier draft of the paper.

References (176)

  • J.A. Fodor et al.

    The psychological reality of linguistic segments

    J. Verb. Learn. Verb. Behav.

    (1965)
  • A.D. Friederici et al.

    Auditory language comprehension: an event-related fMRI study on the processing of syntactic and lexical information

    Brain Lang.

    (2000)
  • S. Garrod et al.

    Why is conversation so easy?

    Trends Cogn. Sci.

    (2004)
  • N. Golestani et al.

    Learning new sounds of speech: reallocation of neural substrates

    Neuroimage

    (2004)
  • F.H. Guenther et al.

    Neural modeling and imaging of the cortical interactions underlying syllable production

    Brain Lang.

    (2006)
  • T.A. Hackett et al.

    Prefrontal connections of the parabelt auditory cortex in macaque monkeys

    Brain Res.

    (1999)
  • T. Hartley et al.

    A linguistically constrained model of short term memory for nonwords

    J. Mem. Lang.

    (1996)
  • S. Hawkins

    Roles and representations of systematic fine phonetic detail in speech understanding

    J. Phonetics

    (2003)
  • G. Hickok et al.

    Dorsal and ventral streams: a framework for understanding aspects of the functional anatomy of language

    Cognition

    (2004)
  • R.F. Huffman et al.

    The descending auditory pathway and acousticomotor systems: connections with the inferior colliculus

    Brain Res. Brain Res. Rev.

    (1990)
  • J.H. Kaas et al.

    Auditory processing in primate cerebral cortex

    Curr. Opin. Neurobiol.

    (1999)
  • D. Kersten et al.

    Bayesian models of object perception

    Curr. Opin. Neurobiol.

    (2003)
  • T. Kraljic et al.

    Perceptual learning for speech: Is there a return to normal?

    Cogn. Psychol.

    (2005)
  • I. Lehiste

    Isochrony reconsidered

    J. Phonetics

    (1977)
  • A.M. Liberman et al.

    On the relation of speech to language

    Trends in Cognitive Science

    (2000)
  • D.G. MacKay et al.

    Relations between word perception and production: New theory and data on the verbal transformation effect

    J. Mem. Lang.

    (1993)
  • J.S. Magnuson et al.

    Lexical effects on compensation for coarticulation: The ghost of Christmash past

    Cogn. Sci.

    (2003)
  • V.A. Mann et al.

    Some differences between phonetic and auditory modes of perception

    Cognition

    (1983)
  • E. Ahissar et al.

    Speech comprehension is correlated with temporal response patterns recorded from auditory cortex

    Proc. Natl. Acad. Sci. USA

    (2001)
  • A.D. Baddeley

    Working Memory

    (1986)
  • P.J. Bailey et al.

    Information in speech: observations on the perception of [s]-stop clusters

    J. Exp. Psychol. Hum. Percept. Perform.

    (1980)
  • S.A. Brady et al.

    Range effect in the perception of voicing

    J. Acoust. Soc. Am.

    (1978)
  • A.S. Bregman

    Auditory Scene Analysis

    (1990)
  • M.R. Brent

    Towards a unified model of lexical acquisition and lexical access

    J. Psycholinguist. Res.

    (1997)
  • J.F. Brugge et al.

    Functional connections between auditory cortex on Heschl’s gyrus and on the lateral superior temporal gyrus in humans

    J. Neurophysiol.

    (2003)
  • J. Bybee et al.

    Alternatives to the combinatorial paradigm of linguistic theory based on domain general principles of human cognition

    Linguist. Rev.

    (2005)
  • R.P. Carlyon et al.

    Effects of attention and unilateral neglect on auditory stream segregation

    J. Exp. Psychol. Hum. Percept. Perform.

    (2001)
  • R.P. Carlyon et al.

    The continuity illusion and vowel identification

    Acta Acust. Unit. Acust.

    (2002)
  • M.H. Christiansen et al.

    Learning to segment speech using multiple cues: a connectionist model

    Lang. Cognitive Process.

    (1998)
  • C.M. Clarke et al.

    Rapid adaptation to foreign-accented English

    J. Acoust. Soc. Am.

    (2004)
  • R.G. Crowder

    The purity of auditory memory

    Philos. Trans. Royal Soc. Lond. B Biol. Sci.

    (1983)
  • R.G. Crowder et al.

    Precategorical acoustic storage

    Percept. Psychophy.

    (1969)
  • A. Cutler et al.

    The role of strong syllables in segmentation for lexical access

    J. Exp. Psychol. Hum. Percept. Perform.

    (1988)
  • J.E. Cutting

    Aspects of phonological fusion

    J. Exp. Psychol. Hum. Percept. Perform.

    (1975)
  • M.H. Davis

    Connectionist modelling of lexical segmentation and vocabulary acquisition

  • M.H. Davis et al.

    Hierarchical processing in spoken language comprehension

    J. Neurosci.

    (2003)
  • M.H. Davis et al.

    Leading up the lexical garden path: segmentation and ambiguity in spoken word recognition

    J. Exp. Psychol. Hum. Percept. Perform.

    (2002)
  • M.H. Davis et al.

    Lexical information drives perceptual learning of distorted speech: evidence from the comprehension of noise-vocoded sentences

    J. Exp. Psychol. Gen.

    (2005)
  • Davis, M.H., Coleman, M.R., Absalom, A., Rodd, J.M., Johnsrude, I.S., Matta, B., Owen, A.M., Menon, D.K., in...
  • L.A. de la Mothe et al.

    Cortical connections of the auditory cortex in marmoset monkeys: core and medial belt regions

    J. Comp. Neurol.

    (2006)
  • Cited by (311)

    View all citing articles on Scopus
    View full text