Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Summary statistics in auditory perception

Abstract

Sensory signals are transduced at high resolution, but their structure must be stored in a more compact format. Here we provide evidence that the auditory system summarizes the temporal details of sounds using time-averaged statistics. We measured discrimination of 'sound textures' that were characterized by particular statistical properties, as normally result from the superposition of many acoustic features in auditory scenes. When listeners discriminated examples of different textures, performance improved with excerpt duration. In contrast, when listeners discriminated different examples of the same texture, performance declined with duration, a paradoxical result given that the information available for discrimination grows with duration. These results indicate that once these sounds are of moderate length, the brain's representation is limited to time-averaged statistics, which, for different examples of the same texture, converge to the same values with increasing duration. Such statistical representations produce good categorical discrimination, but limit the ability to discern temporal detail.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Textures and time-averaged statistics.
Figure 2: Texture and exemplar discrimination results.
Figure 3: Exemplar discrimination with mixtures of sources.

References

  1. Plomp, R. Rate of decay of auditory sensation. J. Acoust. Soc. Am. 36, 277–282 (1964).

    Article  Google Scholar 

  2. Dye, R.H. & Hafter, E.R. The effect of intensity on the detection of interaural differences of time in high-frequency trains of clicks. J. Acoust. Soc. Am. 75, 1593–1598 (1984).

    Article  Google Scholar 

  3. Saint-Arnaud, N. & Popat, K. Analysis and synthesis of sound texture. Proc. AJCAI Workshop Comput. Auditory Scene Anal. 293–308 (1995).

  4. Dubnov, S., Bar-Joseph, Z., El-Yaniv, R., Lischinski, D. & Werman, M. Synthesizing sound textures through wavelet tree learning. IEEE Comput. Graph. Appl. 22, 38–48 (2002).

    Article  Google Scholar 

  5. Athineos, M & Ellis, D. Sound texture modeling with linear prediction in both time and frequency domains. IEEE Workshop Appl. Signal Processing Audio Acoustics 648–651 (2003).

  6. Lu, L., Wenyin, L. & Zhang, H. Audio textures: theory and applications. IEEE Trans. Speech Audio Process. 12, 156–167 (2004).

    Article  Google Scholar 

  7. Schwarz, D. State of the art in sound texture synthesis. 14th Int. Conf. Digital Audio Effects 221–231 (2011).

  8. McDermott, J.H., Oxenham, A.J. & Simoncelli, E.P. Sound texture synthesis via filter statistics. IEEE Workshop Appl. Signal Processing Audio Acoustics 297–300 (2009).

  9. McDermott, J.H. & Simoncelli, E.P. Sound texture perception via statistics of the auditory periphery: evidence from sound synthesis. Neuron 71, 926–940 (2011).

    Article  CAS  Google Scholar 

  10. Demany, L., Trost, W., Serman, M. & Semal, C. Auditory change detection: simple sounds are not memorized better than complex sounds. Psychol. Sci. 19, 85–91 (2008).

    Article  Google Scholar 

  11. Goossens, T., van de Par, S. & Kohlrausch, A. On the ability to discriminate Gaussian-noise tokens or random tone-burst complexes. J. Acoust. Soc. Am. 124, 2251–2262 (2008).

    Article  Google Scholar 

  12. Geffen, M.N., Gervain, J., Werker, J.F. & Magnasco, M.O. Auditory perception of self-similarity in water sounds. Front. Integr. Neurosci. 5, 15 (2011).

    Article  Google Scholar 

  13. Hanna, T.E. Discrimination of reproducible noise as a function of bandwidth and duration. Percept. Psychophys. 36, 409–416 (1984).

    Article  CAS  Google Scholar 

  14. Coble, S.F. & Robinson, D.E. Discriminability of bursts of reproducible noise. J. Acoust. Soc. Am. 92, 2630–2635 (1992).

    Article  CAS  Google Scholar 

  15. Heller, L.M. & Trahiotis, C. The discrimination of samples of noise in monotic, diotic, and dichotic conditions. J. Acoust. Soc. Am. 97, 3775–3781 (1995).

    Article  CAS  Google Scholar 

  16. Goossens, T., van de Par, S. & Kohlrausch, A. Gaussian-noise discrimination and its relation to auditory object formation. J. Acoust. Soc. Am. 125, 3882–3893 (2009).

    Article  Google Scholar 

  17. Gerken, G.M., Bhat, V.K.H. & Hutchinson-Clutter, M.H. Auditory temporal integration and the power-function model. J. Acoust. Soc. Am. 88, 767–778 (1990).

    Article  CAS  Google Scholar 

  18. Moore, B.C.J. Frequency difference limens for short-duration tones. J. Acoust. Soc. Am. 54, 610–619 (1973).

    Article  CAS  Google Scholar 

  19. Viemeister, N.F. Temporal modulation transfer functions based upon modulation thresholds. J. Acoust. Soc. Am. 66, 1364–1380 (1979).

    Article  CAS  Google Scholar 

  20. Sheft, S. & Yost, W.A. Temporal integration in amplitude modulation detection. J. Acoust. Soc. Am. 88, 796–805 (1990).

    Article  CAS  Google Scholar 

  21. Houtgast, T. & Plomp, R. Lateralization threshold of a signal in noise. J. Acoust. Soc. Am. 44, 807–812 (1968).

    Article  CAS  Google Scholar 

  22. Hafter, E.R., Dye, R.H. & Gilkey, R.H. Lateralization of tonal signals which have neither onsets nor offsets. J. Acoust. Soc. Am. 65, 471–477 (1979).

    Article  CAS  Google Scholar 

  23. Klein, D.J., Konig, P. & Kording, K.P. Sparse spectrotemporal coding of sounds. EURASIP J. Appl. Signal Process. 7, 659–667 (2003).

    Google Scholar 

  24. Smith, E.C. & Lewicki, M.S. Efficient auditory encoding. Nature 439, 978–982 (2006).

    Article  CAS  Google Scholar 

  25. Hromadka, T., DeWeese, M.R. & Zador, A.M. Sparse representations of sounds in the unanesthetized auditory cortex. PLoS Biol. 6, 124–137 (2008).

    Article  CAS  Google Scholar 

  26. Deutsch, D. Tones and numbers: Specificity of interference in short-term memory. Science 168, 1604–1605 (1970).

    Article  CAS  Google Scholar 

  27. Starr, G.E. & Pitt, M.A. Interference effects in short-term memory for timbre. J. Acoust. Soc. Am. 102, 486–494 (1997).

    Article  CAS  Google Scholar 

  28. Latinus, M. & Belin, P. Human voice perception. Curr. Biol. 21, R143–R145 (2011).

    Article  CAS  Google Scholar 

  29. Tzanetakis, G. & Cook, P. Musical genre classification of audio signals. IEE Trans. Speech Audio Processing 10, 293–302 (2002).

    Article  Google Scholar 

  30. Guttman, N. & Julesz, B. Lower limits of auditory periodicity analysis. J. Acoust. Soc. Am. 35, 610 (1963).

    Article  Google Scholar 

  31. Warren, R.M., Bashford, J.A. Jr., Cooley, J.M. & Brubaker, B.S. Detection of acoustic repetition for very long stochastic patterns. Percept. Psychophys. 63, 175–182 (2001).

    Article  CAS  Google Scholar 

  32. Kaernbach, C. The memory of noise. Exp. Psychol. 51, 240–248 (2004).

    Article  Google Scholar 

  33. Agus, T.R., Thorpe, S.J. & Pressnitzer, D. Rapid formation of auditory memories: insights from noise. Neuron 66, 610–618 (2010).

    Article  CAS  Google Scholar 

  34. McDermott, J.H., Wrobleski, D. & Oxenham, A.J. Recovering sound sources from embedded repetition. Proc. Natl. Acad. Sci. USA 108, 1188–1193 (2011).

    Article  CAS  Google Scholar 

  35. Carlyon, R.P., Micheyl, C., Deeks, J.M. & Moore, B.C.J. Auditory processing of real and illusory changes in frequency modulation (FM) phase. J. Acoust. Soc. Am. 116, 3629–3639 (2004).

    Article  Google Scholar 

  36. Lyzenga, J., Carlyon, R.P. & Moore, B.C.J. Dynamic aspects of the continuity illusion: perception of level and of the depth, rate and phase of modulation. Hear. Res. 210, 30–41 (2005).

    Article  CAS  Google Scholar 

  37. Cutting, J.E. & Rosner, B. Categories and boundaries in speech and music. Percept. Psychophys. 16, 564–571 (1974).

    Article  Google Scholar 

  38. Nahum, M., Nelken, I. & Ahissar, M. Low-level information and high-level perception: The case of speech in noise. PLoS Biol. 6, e126 (2008).

    Article  Google Scholar 

  39. Ariely, D. Seeing sets: Representation by statistical properties. Psychol. Sci. 12, 157–162 (2001).

    Article  CAS  Google Scholar 

  40. Chong, S.C. & Treisman, A. Representation of statistical properties. Vision Res. 43, 393–404 (2003).

    Article  Google Scholar 

  41. Haberman, J. & Whitney, D. Seeing the mean: ensemble coding for sets of faces. J. Exp. Psychol. Hum. Percept. Perform. 35, 718–734 (2009).

    Article  Google Scholar 

  42. Parkes, L., Lund, J., Angelucci, A., Solomon, J.A. & Morgan, M. Compulsory averaging of crowded orientation signals in human vision. Nat. Neurosci. 4, 739–744 (2001).

    Article  CAS  Google Scholar 

  43. Greenwood, J.A., Bex, P.J. & Dakin, S.C. Positional averaging explains crowding with letter-like stimuli. Proc. Natl. Acad. Sci. USA 106, 13130–13135 (2009).

    Article  CAS  Google Scholar 

  44. Balas, B., Nakano, L. & Rosenholtz, R. A summary-statistic representation in peripheral vision explains visual crowding. J. Vis. 9, 1–18 (2009).

    PubMed  Google Scholar 

  45. Freeman, J. & Simoncelli, E.P. Metamers of the ventral stream. Nat. Neurosci. 14, 1195–1201 (2011).

    Article  CAS  Google Scholar 

  46. Alvarez, G.A. & Oliva, A. Spatial ensemble statistics are efficient codes that can be represented with reduced attention. Proc. Natl. Acad. Sci. USA 106, 7345–7350 (2009).

    Article  CAS  Google Scholar 

  47. Yabe, H. et al. Temporal window of integration of auditory information in the human brain. Psychophysiology 35, 615–619 (1998).

    Article  CAS  Google Scholar 

  48. Poeppel, D. The analysis of speech in different temporal integration windows: cerebral lateralization as 'asymmetric sampling in time'. Speech Commun. 41, 245–255 (2003).

    Article  Google Scholar 

  49. Viemeister, N.F. & Wakefield, G.H. Temporal integration and multiple looks. J. Acoust. Soc. Am. 90, 858–865 (1991).

    Article  CAS  Google Scholar 

  50. Elhilali, M. & Shamma, S.A. A cocktail party with a cortical twist: how cortical mechanisms contribute to sound segregation. J. Acoust. Soc. Am. 124, 3751–3771 (2008).

    Article  Google Scholar 

Download references

Acknowledgements

The authors thank B. Anderson, S. Keshvari and J. Traer for comments on earlier versions of the manuscript. Research was funded by the Howard Hughes Medical Institute.

Author information

Authors and Affiliations

Authors

Contributions

J.H.M., M.S. and E.P.S. designed the experiments. M.S. conducted the experiments. J.H.M. analyzed the data. J.H.M. and E.P.S. wrote the manuscript.

Corresponding author

Correspondence to Josh H McDermott.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1 and 2 and Supplementary Table 1 (PDF 1961 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

McDermott, J., Schemitsch, M. & Simoncelli, E. Summary statistics in auditory perception. Nat Neurosci 16, 493–498 (2013). https://doi.org/10.1038/nn.3347

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nn.3347

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing