Elsevier

NeuroImage

Volume 24, Issue 4, 15 February 2005, Pages 1052-1057
NeuroImage

Analysis of the spectral envelope of sounds by the human brain

https://doi.org/10.1016/j.neuroimage.2004.10.031Get rights and content

Abstract

Spectral envelope is the shape of the power spectrum of sound. It is an important cue for the identification of sound sources such as voices or instruments, and particular classes of sounds such as vowels. In everyday life, sounds with similar spectral envelopes are perceived as similar: we recognize a voice or a vowel regardless of pitch and intensity variations, and we recognize the same vowel regardless of whether it is voiced (a spectral envelope applied to a harmonic series) or whispered (a spectral envelope applied to noise). In this functional magnetic resonance imaging (fMRI) experiment, we investigated the basis for analysis of spectral envelope by the human brain. Changing either the pitch or the spectral envelope of harmonic sounds produced similar activation within a bilateral network including Heschl's gyrus and adjacent cortical areas in the superior temporal lobe. Changing the spectral envelope of continuously alternating noise and harmonic sounds produced additional right-lateralized activation in superior temporal sulcus (STS). Our findings show that spectral shape is abstracted in superior temporal sulcus, suggesting that this region may have a generic role in the spectral analysis of sounds. These distinct levels of spectral analysis may represent early computational stages in a putative anteriorly directed stream for the categorization of sound.

Introduction

Spectral envelope defines the shape of the power spectrum of a complex sound. It is one means by which sound sources such as voices or sound classes such as vowels can be characterized. The spectral envelope of a sound is related to a dimension of timbre, the spectral centroid (Grey, 1977, Krumhansl and Iverson, 1992, McAdams and Cunible, 1992). Timbre is defined operationally as the property that distinguishes two sounds of identical pitch, duration, and intensity (American Standards Association, 1960). The spectral envelope is not the sole determinant of timbre, a property with other dimensions including temporal envelope (the attack and decay of a sound) and a third dimension that is not consistent between studies (Grey, 1977, Krumhansl and Iverson, 1992, McAdams and Cunible, 1992). Everyday experience suggests that we possess mechanisms for the abstraction of spectral envelope from the detailed spectrotemporal structure of the sound with which it is associated. We identify individual voices despite variations in pitch and intensity, and we perceive the same vowel sound whether the vowel is voiced or whispered. In the latter case, the same spectral envelope is applied to a harmonic series or to noise. Fig. 1 demonstrates the common spectral envelope of voiced and whispered vowels.

In this functional magnetic resonance imaging (fMRI) experiment, we investigated the analysis of spectral envelope by the human brain. We used stimuli in which ‘generic’ spectral envelopes were applied to harmonic series or to noise. These stimuli are likely to be processed by brain mechanisms that analyze the spectral envelopes of natural sounds such as vowels (Fig. 1), while avoiding the semantic associations of real vowels or musical instrumental timbres.

The experimental design (Fig. 2) employed sequences of sounds composed either entirely of harmonic sounds or from alternating harmonic sounds and noise. In stimulus sequences consisting entirely of harmonic sounds, spectral envelope or pitch was manipulated (Fig. 2: all-harmonic conditions). This manipulation allowed us to examine a level of analysis that could be based on the detailed spectrotemporal structure of sound. Changing the spectral envelope alters the detailed spectral structure of the sound while changing pitch (in stimuli such as these with unresolved harmonics) is associated with changes in the repetition rate and temporal structure of the sound. In both cases, there are changes to detailed spectrotemporal structure that are likely to be processed by early auditory areas in the superior temporal lobe. Based on previous studies of pitch change alone (Patterson et al., 2002, Warren et al., 2003), we predicted that the analysis of detailed spectrotemporal structure would engage a cortical network including nonprimary auditory cortex in Heschl's gyrus (HG) and planum temporale (PT).

In stimulus sequences consisting of alternating noise and harmonic sounds, spectral envelope was manipulated while the detailed spectrotemporal structure constantly varied (Fig. 2: alternating conditions). This manipulation allowed us to examine a level of analysis in which spectral envelope is abstracted independently of changes in the detailed spectrotemporal structure. We predicted that this level of spectral analysis would engage additional brain areas to those involved in the analysis of detailed spectrotemporal structure: these additional areas are required for the abstraction of spectral shape independently of changes in fine structure. Based on evidence for the involvement of the superior temporal sulcus (STS) in the identification of a variety of natural sounds (Adams and Janata, 2002, Beauchamp et al., 2004, Belin et al., 2000, Binder et al., 2004, Engelien et al., 1995, Lewis et al., 2004, Maeder et al., 2001, Menon et al., 2002, Nakamura et al., 2001, Zatorre et al., 2004), we predicted the specific involvement of STS in this more abstract level of spectral analysis. We hypothesized that the abstraction of the spectral shapes of sounds by STS is a general mechanism of auditory cognition, in addition to any more specific role of STS in the processing of particular sound categories.

Section snippets

Materials and methods

Stimuli were synthesized digitally in the frequency domain from harmonic series or fixed-amplitude, random-phase noise with equivalent passband and intensity (sampling rate 44.1 kHz and 16 bit resolution). Harmonic sounds were in positive Schroeder phase (Schroeder and Strube, 1986) to reduce peak factor. Spectral envelope was specified in the frequency domain for both noise and harmonic stimuli (Fig. 2). The duration of each sound was 500 ms (with 20 ms gating windows). Sounds were combined

Results

This experiment was designed to examine two levels of auditory analysis using contrasts based on two types of stimuli. In the all-harmonic contrasts, the baseline condition has constant spectrotemporal fine structure, whereas in the contrast between the alternating conditions, the baseline condition has constantly changing spectrotemporal fine structure (harmonic series or noise). Spectral envelope or pitch changes in the all-harmonic sequences can be assessed from changes in spectrotemporal

Discussion

This experiment has demonstrated distinct levels of analysis of spectral envelope that map onto distinct cortical regions in the human auditory brain. Changing the spectral envelope or the pitch of a harmonic sound engages a brain network that includes nonprimary auditory cortex in lateral HG and anterolateral PT. In these harmonic stimuli, both pitch and spectral envelope changes might be analyzed on the basis of the detailed spectrotemporal structure of the stimulus, and it is likely that

Acknowledgment

This work is supported entirely by the Wellcome Trust.

References (38)

  • C. Alain et al.

    ‘What’ and ‘where’ in the human auditory system

    Proc. Natl. Acad. Sci. U. S. A.

    (2001)
  • American Standards Association

    Acoustical Terminology SI. 1-1960

    (1960)
  • C.L. Barnes et al.

    Efferent cortical connections of multimodal cortex of the superior temporal sulcus in the rhesus monkey

    J. Comp. Neurol.

    (1992)
  • G.C. Baylis et al.

    Functional subdivisions of the temporal lobe neocortex

    J. Neurosci.

    (1987)
  • P. Belin et al.

    Adaptation to speaker's voice in right anterior temporal lobe

    NeuroReport

    (2003)
  • P. Belin et al.

    Voice-selective areas in human auditory cortex

    Nature

    (2000)
  • J.R. Binder et al.

    Neural correlates of sensory and decision processes in auditory object identification

    Nat. Neurosci.

    (2004)
  • A. Engelien et al.

    The functional anatomy of recovery from auditory agnosia. A PET study of sound categorization in a neurological patient and normal controls

    Brain

    (1995)
  • A.C. Evans et al.

    3D statistical neuroanatomical models from 305 MRI volumes

    Proc. IEEE Nucl. Sci. Symp. Med. Imag. Conf.

    (1993)
  • Cited by (107)

    • The hearing hippocampus

      2022, Progress in Neurobiology
    • MASS: Multi-task anthropomorphic speech synthesis framework

      2021, Computer Speech and Language
      Citation Excerpt :

      Therefore, if the high frequency is increased in the mel-spectrum, the emotion of the speech is likely to be transformed into an emotional emotion. As for spectrum, according to Warren’s (Warren et al., 2005) research, speech with similar spectral envelopes are considered similar. We obtain the spectrum envelope from the spectrum, and compare the spectrum envelope of the speech synthesized by MASS framework with the spectrum envelope of the target speaker’s speech to explore whether the synthesized speech contains speaker’s feature of the target speaker.

    • Bilateral age-related atrophy in the planum temporale is associated with vowel discrimination difficulty in healthy older adults

      2021, Hearing Research
      Citation Excerpt :

      Based on the dorsal and ventral stream framework by Hickok and Poeppel, a dorsal stream critical for auditory-motor integration, located within the posterior portion of the PT (area Spt i.e., Sylvian-parietal-temporal) (Hickok et al., 2003; 2009) gives reason to support the view for a separation between an anterior spatial hearing region of the PT (Buchsbaum et al., 2005; Isenberg et al., 2012). A recent study showed recruitment of dorsal stream areas in the PT when performing speech discrimination tasks (Rogalsky et al., 2020), while Warren et al. showed activation of anterolateral PT while changing spectral envelope or the pitch of a sound. In previous studies, the PT showed to be of importance in early auditory processing such as pitch perception (Binder et al., 1996).

    • Stroke and acquired amusia

      2020, Music and the Aging Brain
    • Disorders of music processing in dementia

      2020, Music and the Aging Brain
    • Neural architectures of music – Insights from acquired amusia

      2019, Neuroscience and Biobehavioral Reviews
    View all citing articles on Scopus
    View full text