Research paperHearing speech sounds: Top-down influences on the interface between audition and speech perception
Introduction
You receive an unexpected call on your mobile phone. Despite the background noise on the line you immediately recognise your colleague’s voice and can hear that she is excited about something. Catching her breath, she tells you that your joint grant application has been approved for funding and that you should meet to celebrate. In the space of a few seconds, this phone conversation has communicated a vital piece of information, conveyed the emotional significance of this news and provided physical information about the talker. While such exciting news is almost certainly not a daily occurrence, the cognitive and neural mechanisms that are at the heart of this scenario are so ubiquitous as to go largely unnoticed in our day-to-day life. We invariably focus on the information being communicated rather than the means by which it is conveyed, even in difficult listening situations.g.1
This paper will focus on the cognitive and neural mechanisms by which a complex time-varying acoustic signal is perceived as sequences of sounds that convey meaning; addressing precisely those stages of processing that occur so rapidly, automatically and effortlessly as to be beneath our notice. We suggest that a complete account of speech perception requires an understanding of both basic auditory and higher-level cognitive processes (see Plomp, 2001, for similar arguments). We will present evidence for an interactive processing system in which bottom-up and top-down processes combine to support speech perception. This interactive account provides mechanisms by which perceptual processing can rapidly change so as to optimally perceive and comprehend speech – including those important mobile-phone calls.
In the first section of the paper we will review behavioural evidence for interactive processes playing a critical role in speech perception. The background provided by these several decades of behavioural evidence must be accounted for by any neural account of speech perception and therefore constitutes the majority of the evidence presented here. Building on this behavioural evidence, the second section of the paper describes two types of representation that are integral to the implementation of an interactive account of speech perception. These multiple, parallel representations of the speech input make distinct contributions to the robustness of speech perception. In the third and final section of the paper we briefly review evidence from the anatomy of the auditory system that is consistent with this computational account, reviewing evidence both for interactive processes, and for multiple perceptual pathways.
Section snippets
Evidence for interactivity in speech perception
In this section, we will discuss four processes that contribute to speech perception: (1) perceptual grouping of speech sounds into a single coherent stream, (2) segmentation of speech into meaningful (lexical) units, (3) perceptual learning mechanisms by which distorted and degraded speech is perceived and comprehended, and (4) mechanisms for perceiving variable forms of speech in a categorical fashion. For each of these four cases we suggest that evidence supports highly interactive processes
Computational requirements for interactive processes in speech perception
We have reviewed four domains in which top-down processes appear to contribute to speech perception: in promoting perceptual grouping, in achieving lexical segmentation, in supporting perceptual learning of distorted speech, and in maintaining categorical perception of speech segments. In this section, we will address the computational implications of such interactions and suggest that: (1) top-down influences act on auditory, echoic representations of incoming speech, and (2) top-down
Towards a neuroanatomical account of speech perception
This section will discuss the neural basis of the two central propositions that we make concerning speech perception: (1) that bidirectional, interactive connectivity allows higher-level constraints to influence ongoing speech perception and support the rapid retuning of perceptual processes, and (2) that parallel processing pathways support both an auditory-echoic record of incoming speech and the mapping of heard speech onto somatomotor representations involved in speech production. In
Concluding remarks
“Whereas elementary functions of a tissue can, by definition, have a precise localization in particular cell groups, there can of course be no question of the localization of complex functional systems in limited areas of the brain or of its cortex.” Luria (1976), p. 30.
In this paper we have proposed a multiple-pathway account of auditory processes that are critically important for a complex and uniquely human function – the comprehension of spoken language. As the quotation from Luria
Acknowledgements
Preparation of this paper was supported by the UK Medical Research Council, and the Canada Research Chairs program. We thank Maggie Kemmner, Sarah Hawkins and two anonymous reviewers for comments on an earlier draft of the paper.
References (176)
- et al.
The effect of subphonetic differences on lexical access
Cognition
(1994) - et al.
Is the sine-wave speech cocktail party worth attending?
Speech Commun.
(1999) - et al.
Distributional regularity and phonotactic constraints are useful for segmentation
Cognition
(1996) - et al.
Human dorsal and ventral auditory streams subserve rehearsal-based and echoic processes during verbal working memory
Neuron
(2005) - et al.
Bootstrapping word boundaries: a bottom-up corpus based approach to speech segmentation
Cognitive Psychol.
(1997) - et al.
Learning-induced neural plasticity associated with improved identification performance after training of a difficult second-language phonetic contrast
Neuroimage
(2003) - et al.
The predominance of strong initial syllables in the English vocabulary
Comput. Speech Lang.
(1987) - et al.
Neural correlates of switching from auditory to speech perception
Neuroimage
(2005) - et al.
The projection of the auditory cortex upon the diencephalon and brain stem in the cat
Brain Res.
(1969) - et al.
Cognitive penetration of the mechanisms of perception: Compensation for coarticulation of lexically restored phonemes
J. Mem. Lang.
(1988)