Skip to main content
Log in

Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

In this paper, we propose a computational model of the recognition of real world scenes that bypasses the segmentation and the processing of individual objects or regions. The procedure is based on a very low dimensional representation of the scene, that we term the Spatial Envelope. We propose a set of perceptual dimensions (naturalness, openness, roughness, expansion, ruggedness) that represent the dominant spatial structure of a scene. Then, we show that these dimensions may be reliably estimated using spectral and coarsely localized information. The model generates a multidimensional space in which scenes sharing membership in semantic categories (e.g., streets, highways, coasts) are projected closed together. The performance of the spatial envelope model shows that specific information about object shape or identity is not a requirement for scene categorization and that modeling a holistic representation of the scene informs about its probable semantic category.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Amadasun, M. 1989. Textural features corresponding to textural properties. IEEE Trans. Sys., Man and Cybernetics, 19:1264-1274.

    Google Scholar 

  • Atick, J. and Redlich, A. 1992. What does the retina know about natural scenes? Neural Computation, 4:196-210.

    Google Scholar 

  • Baddeley, R. 1997. The correlational structure of natural images and the calibration of spatial representations. Cognitive Science, 21:351-372.

    Google Scholar 

  • Barrow, H.G. and Tannenbaum, J.M. 1978. Recovering intrinsec scene characteristics from images. In Computer Vision Systems, A. Hanson and E. Riseman (Eds.), Academic Press: New York, pp. 3-26.

    Google Scholar 

  • Biederman, I. 1987. Recognition-by-components:Atheory of human image interpretation. Psychological Review, 94:115-148.

    Google Scholar 

  • Biederman, I. 1988. Aspects and extension of a theory of human image understanding. In Computational Processes in Human Vision: An Interdisciplinary Perspective, Z. Pylyshyn (Ed.), Ablex Publishing Corporation: Norwood, New Jersey.

    Google Scholar 

  • Carson, C., Belongie, S., Greenspan, H., and Malik, J. 1997. Regionbased image querying. In Proc. IEEEW. on Content-Based Access of Image and Video Libraries, pp. 42-49.

  • Carson, C., Thomas, M., Belongie, S., Hellerstein, J.M., and Malik, J. 1999. Blobworld: A system for region-based image indexing and retrieval. In Third Int. Conf. on Visual Information Systems, June 1999, Springer-Verlag.

  • De Bonet, J.S. and Viola, P. 1997. Structure driven image database retrieval. Advances in Neural Information Processing, 10:866-872.

    Google Scholar 

  • van der Schaaf, A. and van Hateren, J.H. 1996. Modeling of the power spectra of natural images: Statistics and information. Vision Research, 36:2759-2770.

    Google Scholar 

  • Field, D.J. 1987. Relations between the statistics of natural images and the response properties of cortical cells. Journal of Optical Society of America, 4:2379-2394.

    Google Scholar 

  • Field, D.J. 1994. What is the goal of sensory coding? Neural Computation, 6:559-601.

    Google Scholar 

  • Friedman, A. 1979. Framing pictures: The role of knowledge in automatized encoding and memory for gist. Journal of Experimental Psychology: General, 108:316-355.

    Google Scholar 

  • Guerin-Dugue, A. and Oliva, A. 2000. Classification of scene photographs from local orientations features. Pattern Recognition Letters, 21:1135-1140.

    Google Scholar 

  • Gorkani, M.M. and Picard, R.W. 1994. Texture orientation for sorting photos “at a glance”. In Proc. Int. Conf. Pat. Rec., Jerusalem, Vol. I, pp. 459-464.

    Google Scholar 

  • Hancock, P.J., Baddeley, R.J., and Smith, L.S. 1992. The principal components of natural images. Network, 3:61-70.

    Google Scholar 

  • Heaps, C. and Handel, S. 1999. Similarity and features of natural textures. Journal of Experimental Psychology: Human Perception and Performance, 25:299-320.

    Google Scholar 

  • Henderson, J.M. and Hollingworth, A. 1999. High level scene perception. Annual Review of Psychology, 50:243-271.

    Google Scholar 

  • Hochberg, J.E. 1968. In the mind's eye. In Contemporary Theory and Research in Visual Perception, R.N. Haber (Ed.), Holt, Rinehart, and Winston: New York, pp. 309-331.

    Google Scholar 

  • Lipson, P., Grimson, E., and Sinha, P. 1997. Configuration based scene classification and image indexing. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Puerto Rico, pp. 1007-1013.

  • Marr, D. 1982. Vision. WH Freeman: San Francisco, CA.

    Google Scholar 

  • Moghaddam, B. and Pentland, A. 1997. Probabilistic Visual Learning for Object Representation. IEEE Trans. Pattern Analysis and Machine Vision, 19(7):696-710.

    Google Scholar 

  • Morgan, M.J., Ross, J., and Hayes, A. 1991. The relative importance of local phase and local amplitude in patchwise image reconstruction. Biological Cybernetics, 65:113-119.

    Google Scholar 

  • Oliva, A. and Schyns, P.G. 1997. Coarse blobs or fine edges? Evidence that information diagnosticity changes the perception of complex visual stimuli. Cognitive Psychology, 34:72-107.

    Google Scholar 

  • Oliva, A. and Schyns, P.G. 2000. Diagnostic color blobs mediate scene recognition. Cognitive Psychology, 41:176-210.

    Google Scholar 

  • Oliva, A., Torralba, A., Guerin-Dugue, A., and Herault, J. 1999. Global semantic classification using power spectrum templates. In Proceedings of The Challenge of Image Retrieval, Electronic Workshops in Computing series, Springer-Verlag: Newcastle.

    Google Scholar 

  • O'Regan, J.K., Rensink, R.A., and Clark, J.J. 1999. Changeblindness as a result of 'mudsplashes'. Nature, 398:34.

    Google Scholar 

  • Piotrowski, L.N. and Campbell, F.W. 1982. A demonstration of the visual importance and flexibility of spatial-frequency amplitude and phase. Perception, 11:337-346.

    Google Scholar 

  • Pentland, A.P. 1984. Fractal-based description of natural scenes. IEEE Trans. on Pattern Analysis and Machine Intelligence, 6:661-674.

    Google Scholar 

  • Potter, M.C. 1975. Meaning in visual search. Science, 187:965-966.

    Google Scholar 

  • Rao, A.R. and Lohse, G.L. 1993. Identifying high level features of texture perception. Graphical Models and Image Processing, 55:218-233.

    Google Scholar 

  • Rensink, R.A. 2000. The dynamic representation of scenes. Visual Cognition, 7:17-42.

    Google Scholar 

  • Rensink, R.A., O'Regan, J.K., and Clark, J.J. 1997. To see or not to see: the need for attention to perceive changes in scenes. Psychological Science, 8:368-373.

    Google Scholar 

  • Ripley, B.D. 1996. Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge, UK.

    Google Scholar 

  • Rosch, E. and Mervis, C.B. 1975. Family resemblances: Studies in the internal structure of categories. Cognitive Psychology, 7:573-605.

    Google Scholar 

  • Sanocki, T. and Epstein, W. 1997. Priming spatial layout of scenes. Psychological Science, 8:374-378.

    Google Scholar 

  • Sanocki, T. and Reynolds, S. 2000. Does figural goodness influence the processing and representation of spatial layout. Investigative Ophthalmology and Visual Science, 41:723.

    Google Scholar 

  • Schyns, P.G. and Oliva, A. 1994. From blobs to boundary edges: evidence for time-and spatial-scale dependent scene recognition. Psychological Science, 5:195-200.

    Google Scholar 

  • Simons, D.J. and Levin, D.T. 1997. Change blindness. Trends in Cognitive Sciences, 1:261-267.

    Google Scholar 

  • Sirovich, L. and Kirby, M. 1987. Low-dimensional procedure for the characterization of human faces. Journal of Optical Society of America, 4:519-524.

    Google Scholar 

  • Swets, D.L. and Weng, J.J. 1996. Using discriminant eigenfeatures for image retrieval. IEEE Trans. On Pattern Analysis and Machine Intelligence, 18:831-836.

    Google Scholar 

  • Switkes, E., Mayer, M.J., and Sloan, J.A. 1978. Spatial frequency analysis of the visual environment: anisotropy and the carpentered environment hypothesis. Vision Research, 18:1393-1399.

    Google Scholar 

  • Szummer, M. and Picard, R.W. 1998. Indoor-outdoor image classification. In IEEE intl.Workshop on Content-Based Access of Image and Video Databases.

  • Tamura, H., Mori, S., and Yamawaki, T. 1978. Textural features corresponding to visual perception. IEEE Trans. Sys. Man and Cybernetics, 8:460-473.

    Google Scholar 

  • Torralba, A. and Oliva, A. 1999. Scene organization using discriminant structural templates. In IEEE Proc. Of Int. Conf in Comp. Vision, pp. 1253-1258.

  • Torralba, A. and Oliva, A. 2001. Depth perception from familiar structure. submitted.

  • Torralba, A. and Sinha, P. 2001. Statistical context priming for object detection. In IEEE. Proc of Int. Conf. in Computer Vision.

  • Tversky, B. and Hemenway, K. 1983. Categories of environmental scenes. Cognitive Psychology, 15:121-149.

    Google Scholar 

  • Vailaya, A., Figueiredo, M., Jain, A., and Zhang, H.J. 1999. Contentbased hierarchical classification of vacation images. In Proceedings of the International Conference on Multimedia, Computing and Systems, June.

  • Vailaya, A., Jain, A., and Zhang, H.J. 1998. On image classification: City images vs. landscapes. Pattern Recognition, 31:1921-1935.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Oliva, A., Torralba, A. Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope. International Journal of Computer Vision 42, 145–175 (2001). https://doi.org/10.1023/A:1011139631724

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1011139631724

Navigation