Do ‘Dominant Frequencies’ explain the listener’s response to formant and spectrum shape variations?
Section snippets
Acoustic bases of vowel percepts
Traditionally, vowel quality has been specified acoustically in terms of the first, second and third formant frequencies. F1 and F2 are the main determinants of vowel color. For back vowels the contribution of F3 is negligible, but including it makes it possible to characterize the F2–F3 proximity of retroflection and front rounded vowels. Higher formants seem more linked to individual voice characteristics.
The formant-based approach is validated by, among other things, the success of formant
Listener responses to formant and spectrum variations
The remarkable thing about all these amplitude variations is that, although they can be drastic and can be perceived, they seem to leave the timber/vowel quality component of the stimulus virtually unaltered.
For instance, in trying to recreate a recorded utterance by means of high-fidelity copy synthesis, phoneticians have noted that the percept of a vowel’s “phonetic quality” that is, its “timber” transmitted in parallel with voice quality and channel characteristics – can be astonishingly
Phase locking
Comparing the auditory system to an analog or simulated spectrograph highlights a significant difference between biological and current technological sound analysis. Whereas conventional speech spectrography averages the temporal output from the analysis filters, the auditory system takes the process further making ingenious use of this information. Simplifying we can say that the ear’s operation is similar to a filter followed by a zero crossing counter.
Suppose we examine the output of a
DOMIN: an auditory spectrograph
Most public-domain software tools for speech analysis do not incorporate time-place representations. The “auditory” spectrograph described by Carlson and Granström (1982) is an exception. It was a pioneering effort. Spectrograms were produced using critical-band analysis, and filter outputs were displayed in phon/Bark units. Included in the model was a Dominant Frequency (DF) representation showing the number of channels dominated by (read: ‘phase locked to’) a certain frequency plotted against
Formants
Fig. 4 combines two KONVERT representations: A DF histogram and a sone/Bark pattern. The vowel is [ε]-like computed with F1 = 500, F2 = 2000, F3 = 2700 and F4 = 3300 Hz and F0 = 100 Hz. The x-axis represents frequency (in Hz). Number of channels is represented on the left ordinate. The sone/Bark values should be read along the second y-axis (right). Ten bins per Bark were used.
It is immediately clear that the frequencies with the largest number of channels are located at the peaks in the spectrum curve. If
Concluding Remarks
Building on the work of Carlson and Granström, we demonstrated how a realistic auditory model of vowel processing (KONVERT) can represent information about both whole spectra and formant patterns (i.e., F1, F2 and F3). Whole spectra are represented as the output of a critical-band filterbank (i.e., excitation patterns), whereas F0 and formants (as carried by their strong harmonics) are captured by Dominant Frequency histograms that model the effects of phase locking in auditory neurons.
Apart
References (27)
- et al.
On explaining certain male–female differences in the phonetic realization of vowel categories
J. Phonetics
(1996) - et al.
Modeling the perception of concurrent vowels: vowels with the same fundamental frequency
J. Acoust. Soc. Am.
(1989) - et al.
Modeling the judgment of vowel quality differences
J. Acoust. Soc. Am.
(1981) - Blomberg, M., Carlson, R., Elenius, K., Granstrom, B., 1984. Auditory models in isolated word recognition. In:...
- Carlson, R., Granström, B., 1976. Detectability of changes of level and spectral slope in vowels. In: STL-QPSR, vol....
- Carlson, R., Granström, B., 1979. Model predictions of vowel dissimilarity. In: STL-QPSR, vol. 20(3–4). Royal Institute...
- et al.
Towards an auditory spectrograph
- Carlson, R., Granström, B., Fant, G., 1970. Some studies concerning perception of isolated vowels. In: STL-QPSR, vol....
- et al.
Two-formant models, pitch and vowel perception
- Carlson, R., Granström, B., Klatt, D.H., 1979. Vowel perception: the relative perceptual salience of selected acoustic...
Speech coding in the auditory nerve I: vowel-like sounds
J. Acoust. Soc. Am.
Acoustic Theory of Speech Production
Cited by (6)
Midbrain responses to micro-stimulation of the cochlea using high density thin-film arrays
2012, Hearing ResearchCitation Excerpt :In all cases of deafness, auditory perceptual thresholds are increased and the ability to comprehend speech is compromised (Moore, 2007). For those with normal hearing the perception of acoustic environments requires the discrimination of several sources of sound that can have discreet fluctuations in frequency, intensity, timing and place (Lindblom et al., 2009; Lutfi, 2008; Moore, 2007; Plomp, 1976). Psychophysical aspects of frequency perception observed in the normal hearing population are fundamental to complex sound perception and as such the delivery of discrete frequency input and a greater control over amplitude should be goals in the development of CIs.
Study of the effects of age and body mass index on the carotid wall vibration: Extraction methodology and analysis
2014, Proceedings of the Institution of Mechanical Engineers, Part H: Journal of Engineering in MedicineVowel perception in normal speakers
2013, Handbook of Vowels and Vowel DisordersAcoustic analysis for automatic speech recognition
2013, Proceedings of the IEEEThe role of formant amplitude in the perception of /i/ and /u/
2010, Journal of the Acoustical Society of America