Opinion
Untangling invariant object recognition

https://doi.org/10.1016/j.tics.2007.06.010Get rights and content

Despite tremendous variation in the appearance of visual objects, primates can recognize a multitude of objects, each in a fraction of a second, with no apparent effort. However, the brain mechanisms that enable this fundamental ability are not understood. Drawing on ideas from neurophysiology and computation, we present a graphical perspective on the key computational challenges of object recognition, and argue that the format of neuronal population representation and a property that we term ‘object tangling’ are central. We use this perspective to show that the primate ventral visual processing stream achieves a particularly effective solution in which single-neuron invariance is not the goal. Finally, we speculate on the key neuronal mechanisms that could enable this solution, which, if understood, would have far-reaching implications for cognitive neuroscience.

Introduction

Our daily activities rely heavily on the accurate and rapid identification of objects in our visual environment. The apparent ease of with which we recognize objects belies the magnitude of this feat: we effortlessly recognize objects from among tens of thousands of possibilities and we do so within a fraction of a second, in spite of tremendous variation in the appearance of each one. Understanding the brain mechanisms that underlie this ability would be a landmark achievement in neuroscience.

Object recognition is computationally difficult for many reasons, but the most fundamental is that any individual object can produce an infinite set of different images on the retina, due to variation in object position, scale, pose and illumination, and the presence of visual clutter (e.g. 1, 2, 3, 4, 5). Indeed, although we typically see an object many times, we effectively never see the same exact image on our retina twice. Although several computational efforts have attacked this so-called ‘invariance problem’ (e.g. 1, 3, 6, 7, 8, 9, 10, 11, 12), a robust, real-world machine solution still evades us and we lack a satisfying understanding of how the problem is solved by the brain. We believe that these two achievements will be accomplished nearly simultaneously by an approach that takes into account both the computational issues and the biological clues and constraints.

Because it is easy to get lost in the sea of previous studies and ideas, the goal of this manuscript is to clear the table, bring forth key ideas in the context of the primate brain, and pull those threads together into a coherent framework. Below, we use a graphical perspective to provide intuition about the object recognition problem, show that the primate ventral visual processing stream produces a particularly effective solution in the inferotemporal (IT) cortex, and speculate on how the ventral visual stream approaches the problem. Along the way, we argue that some approaches are only tangential to, or even distract from, understanding object recognition.

Section snippets

What is object recognition?

We define object recognition as the ability to accurately discriminate each named object (‘identification’) or set of objects (‘categorization’) from all other possible objects, materials, textures other visual stimuli, and to do this over a range of identity-preserving transformations of the retinal image of that object (e.g. image transformations resulting from changes in object position, distance, and pose). Of course, vision encompasses many disparate challenges that may interact with

What computational processes must underlie object recognition?

To solve a recognition task, a subject must use some internal neuronal representation of the visual scene (population pattern of activity) to make a decision (e.g. 15, 16): is object A present or not? Computationally, the brain must apply a decision function [16] to divide an underlying neuronal representational space into regions where object A is present and regions where it is not (Figure 1b; one function for each object to be potentially reported). Because brains compute with neurons, the

Why is object recognition hard? Object manifold tangling

Object recognition is hard because useful forms of visual representation are hard to build. A major impediment to understanding such representations arises from the fact that vision operates in high-dimensional space. Our eyes fixate the world in ∼300 ms intervals before moving on to a new location. During each brief glimpse, a visual image is projected into the eye, transduced by ∼100 million retinal photoreceptors and conveyed to the brain in the spiking activity pattern of ∼1 million retinal

The ventral visual stream transformation untangles object manifolds

In humans and other primates, information processing to support visual recognition takes place along the ventral visual stream (for reviews, see 5, 24, 25). We, and others (e.g. 1, 26), consider this stream to be a progressive series of visual re-representations, from V1 to V2 to V4 to IT cortex (Figure 2). Beginning with the studies of Gross [27], a wealth of work has shown that single neurons at the highest level of the monkey ventral visual stream – the IT cortex – display spiking responses

How does the ventral visual stream untangle object manifolds?

We do not yet know the answer to this question. Hubel and Wiesel's [30] observation that visual cortex complex cells can pool over simple cells to build tolerance to identity-preserving transformations (especially position) has been computationally implemented and extended to higher cortical levels, including the IT 1, 12, 33. However, beyond this early insight, systems neuroscience has not provided a breakthrough.

Some neurophysiological effort has focused on characterizing IT neuronal

Acknowledgements

We would like to thank N Kanwisher, N Li, N Majaj, N Rust, J Tenenbaum and three anonymous referees for helpful comments on earlier versions of this manuscript. Support was provided by The National Eye Institute (NIH-R01-EY014970), The Pew Charitable Trusts (PEW UCSF 2893sc) and The McKnight Foundation.

References (65)

  • S. Ullman

    High Level Vision

    (1996)
  • S. Edelman

    Representation and Recognition in Vision

    (1999)
  • E. Ashbridge et al.

    Generalizing across object oreintation and size

  • I. Biederman

    Recognition-by-components: a theory of human image understanding

    Psychol. Rev.

    (1987)
  • B.A. Olshausen

    A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information

    J. Neurosci.

    (1993)
  • D. Arathorn

    Map-seeking Circuits in Visual Cognition

    (2002)
  • T. Serre

    Robust object recognition with cortex-like mechanisms

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2007)
  • S. Thorpe

    Speed of processing in the human visual system

    Nature

    (1996)
  • M.C. Potter

    Short-term conceptual memory for pictures

    J. Exp. Psychol. [Hum Learn].

    (1976)
  • F.G. Ashby et al.

    Decision rules in the perception and categorization of multidimensional stimuli

    J. Exp. Psychol. Learn. Mem. Cogn.

    (1988)
  • K.O. Johnson

    Sensory discrimination: decision process

    J. Neurophysiol.

    (1980)
  • H. Barlow

    The neuron doctrine in perception

  • R.O. Duda

    Pattern Classification

    (2001)
  • C.P. Hung

    Fast readout of object identity from macaque inferior temporal cortex

    Science

    (2005)
  • J.B. Tenenbaum

    A global geometric framework for nonlinear dimensionality reduction

    Science

    (2000)
  • S.T. Roweis et al.

    Nonlinear dimensionality reduction by locally linear embedding

    Science

    (2000)
  • D. Marr

    Vision: A Computational Investigation into the Human Representation and Processing of Visual Information

    (1982)
  • K. Johnson

    Neural Mechanisms of Tactile Form Recognition

    (1995)
  • N.K. Logothetis et al.

    Visual object recognition

    Annu. Rev. Neurosci.

    (1996)
  • K. Tanaka

    Inferotemporal cortex and object vision

    Annu. Rev. Neurosci.

    (1996)
  • D.J. Felleman et al.

    Distributed hierarchical processing in the primate cerebral cortex

    Cereb. Cortex

    (1991)
  • C.G. Gross

    Visual properties of neurons in inferotemporal cortex of the Macaque

    J. Neurophysiol.

    (1972)
  • Cited by (0)

    View full text