Review
How position dependent is visual object recognition?

https://doi.org/10.1016/j.tics.2007.12.006Get rights and content

Visual object recognition is often assumed to be insensitive to changes in retinal position, leading to theories and formal models incorporating position-independent object representations. However, recent behavioral and physiological evidence has questioned the extent to which object recognition is position independent. Here, we take a computational and physiological perspective to review the current behavioral literature. Although numerous studies report reduced object recognition performance with translation, even for distances as small as 0.5 degrees of visual angle, confounds in many of these studies make the results difficult to interpret. We conclude that there is little evidence to support position-independent object recognition and the precise role of position in object recognition remains unknown.

Introduction

One of the biggest challenges faced by the visual object recognition system is to enable rapid and accurate recognition despite vast differences in the retinal projection of an object produced by changes in, for example, viewing angle, size, position in the visual field or illumination 1, 2, 3, 4. Such ‘invariance’ is often considered one of the key characteristics of object recognition 4, 5, 6, 7. Changes in position (translations) are among the simplest of these transformations, because only the retinal position of the projection of an object is affected, and not the projection itself [2]. Although it is often assumed that objects can be recognized independently of retinal position [8], the behavioral evidence is limited. In this review, we critically evaluate the behavioral studies of position dependence in visual object recognition from a computational and physiological perspective. We find that the behavioral data on position independence are inconclusive. Furthermore, these studies do not test several key predictions from neurophysiology, including the effect of translations between eccentricities and hemifields, making it difficult to understand the relationship between behavior and the proposed neural substrate. We argue that whereas the balance of the available evidence argues against complete position independence 9, 10, 11, 12, 13, 14, the role of position in visual object recognition remains essentially unknown.

Section snippets

What is visual object recognition?

We use a definition of visual object recognition similar to that of previous authors 1, 3. For our purposes ‘visual object’ refers to a conjunction of a complex set of visual features. Successful recognition of such an object requires that the response to a current percept be in some way consistent enough with the internal representation of a previous percept to at least partially invoke it 1, 7, 15. This formulation of visual object recognition defines it, fundamentally, as a process of

The importance of position in object recognition

Given the comparison model of object recognition described above, there are two types of preexisting object representations that might underlie object recognition in the context of position changes. Both make specific behavioral predictions about the degree to which experience with an object at one position will affect recognition during later presentations of that object at different positions (transfer).

The first possibility is that the preexisting representations are specific to the object

Position dependence in the ventral visual pathway

The cortical system supporting object recognition is often described as a ventral visual pathway extending from primary visual cortex (V1) through a series of hierarchical processing stages (V2–V4) to the anterior parts of the inferior temporal (IT) cortex [29], a region crucial for visual object recognition 30, 31. Here we focus primarily on the response properties of neurons in monkey IT, which respond selectively to visual objects.

Behavioral studies of position dependence

Physiological considerations (RFs, retinal sampling) suggest there should be some effect of translations on object recognition, especially those between hemifields and eccentricities. However, without the behavioral output of the system it is impossible to know whether these characteristics have a role in determining the degree of position dependence. Most of the formal models of object recognition (Box 1) attempt to implement some aspect of the physiology into their architecture. Thus, the

Concluding remarks

A complete understanding of object recognition requires the integration of physiological, computational and behavioral evidence. Although the current behavioral data argue against complete position independence, future research will need to address what factors (such as task or long-term experience) affect the degree of position dependence and which properties of IT neurons are reflected in behavior (Box 2).

Acknowledgements

We would like to thank Hans Op de Beeck, Marlene Behrmann, Lalitha Chandrasekher, Jim DiCarlo, Daniel Dilks, Stephanie Manchin, Alex Martin, Mortimor Mishkin, Julianne Rollenhagen and Rebecca Schwarzlose for their insightful comments on an earlier version of this manuscript. We would also like to thank Hans Op de Beeck for contributing to Figure 2.

References (89)

  • C.G. Gross et al.

    The neural basis of stimulus equivalence across retinal translation

  • I. Biederman

    High level object recognition without an anterior inferior temporal lobe

    Neuropsychologia

    (1997)
  • D.C. Van Essen

    The visual field representation in striate cortex of the macaque monkey: asymmetries, anisotropies, and individual variability

    Vision Res.

    (1984)
  • U. Hasson

    Eccentricity bias as an organizing principle for human high-order object areas

    Neuron

    (2002)
  • F.N. Newell

    The interaction of shape- and location-based priming in object categorisation: Evidence for a hybrid “what plus where” representation stage

    Vision Res.

    (2005)
  • M. Dill et al.

    Display symmetry affects positional specificity in same-different judgment of pairs of novel visual patterns

    Vision Res.

    (1999)
  • K.R. Cave

    The representation of location in visual images

    Cognit. Psychol.

    (1994)
  • M. Ahissar et al.

    The reverse hierarchy theory of visual perceptual learning

    Trends Cogn. Sci.

    (2004)
  • G. Wallis et al.

    Invariant face and object recognition in the visual system

    Prog. Neurobiol.

    (1997)
  • M. Riesenhuber et al.

    Neural mechanisms of object recognition

    Curr. Opin. Neurobiol.

    (2002)
  • S. Edelman

    Constraining the neural representation of the visual world

    Trends Cogn. Sci.

    (2002)
  • G.A. Rousselet

    How parallel is visual processing in the ventral pathway?

    Trends Cogn. Sci.

    (2004)
  • A. Oliva et al.

    The role of context in object recognition

    Trends Cogn. Sci.

    (2007)
  • I. Biederman

    Scene perception: detection and judging object undergoing relational violations

    Cognit. Psychol.

    (1982)
  • S. Edelman

    Representations and Recognition in Vision

    (1999)
  • M. Riesenhuber et al.

    Models of object recognition

    Nat. Neurosci.

    (2000)
  • S. Ullman

    High-Level Vision: Object Recognition and Visual Cognition

    (1997)
  • M. Riesenhuber et al.

    Hierarchical models of object recognition in cortex

    Nat. Neurosci.

    (1999)
  • D. Marr et al.

    Representation and recognition of the spatial organization of three-dimensional shapes

    Proc. R. Soc. Lond. B. Biol. Sci.

    (1978)
  • D. Marr

    Vision

    (1982)
  • D.D. Cox

    ‘Breaking’ position-invariant object recognition

    Nat. Neurosci.

    (2005)
  • M. Graf

    Coordinate transformations in object recognition

    Psychol. Bull.

    (2006)
  • M. Dill et al.

    Imperfect invariance to object translation in the discrimination of complex shapes

    Perception

    (2001)
  • M. Dill et al.

    The role of visual field position in pattern-discrimination learning

    Proc. Biol. Sci.

    (1997)
  • M. Dill et al.

    Limited translation invariance of human visual pattern recognition

    Percept. Psychophys.

    (1998)
  • D.H. Foster et al.

    Internal representations and operations in the visual comparison of transformed patterns: effects of pattern point-inversion, position symmetry, and separation

    Biol. Cybern.

    (1985)
  • T.A. Nazir et al.

    Some results on translation invariance in the human visual system

    Spat. Vis.

    (1990)
  • P. Cavanagh

    Size and position invariance in visual-system

    Perception

    (1978)
  • I. Biederman et al.

    Evidence for complete translational and reflectional invariance in visual object priming

    Perception

    (1991)
  • R. Ellis

    Varieties of object constancy

    Q. J. Exp. Psychol.

    (1989)
  • J.J. DiCarlo et al.

    Anterior inferotemporal neurons of monkeys engaged in object recognition can be highly sensitive to object retinal position

    J. Neurophysiol.

    (2003)
  • J.E. Hummel et al.

    Dynamic binding in a neural network for shape recognition

    Psychol. Rev.

    (1992)
  • B.A. Olshausen

    A multiscale dynamic routing circuit for forming size-invariant and position-invariant object representations

    J. Comput. Neurosci.

    (1995)
  • T. Serre

    A feedforward architecture accounts for rapid categorization

    Proc. Natl. Acad. Sci. U. S. A.

    (2007)
  • Cited by (96)

    • An expanded model for perceptual visual single object recognition system using expectation priming following neuroscientific evidence

      2021, Cognitive Systems Research
      Citation Excerpt :

      It is selective to complex forms, and it has bigger receptive fields on average than previous areas of ventral pathway (Kravitz et al., 2008, 2010; Rolls et al., 2008; Rolls, Aggelopoulos, & Fashan, 2003; Tanaka et al., 1991; Yamane et al., 2006). Further, there is evidence of a retinotopic organization (Rolls et al., 2008; Kravitz et al., 2008, 2010, 2013; Miyashita, 1993; Yasuda, Banno, & Komatsu, 2010) and position-dependent behavior (Hung, Kreiman, Poggio, & DiCarlo, 2005; Kravitz et al., 2008; Rolls et al., 2008; Tanaka et al., 1991; Yamane, Carlson, Bowman, Wang, & Connor, 2008) in object recognition processes. When information about an object is handled, the network’s activity forms activation clusters, each of which describes complex features of the present object and form activation columns in clusters for similar features, allowing overlapping responses in the representation of similar objects (Kravitz et al., 2013; Lehky & Tanaka, 2016; Tanaka et al., 1991).

    • Object Vision in a Structured World

      2019, Trends in Cognitive Sciences
    View all citing articles on Scopus
    View full text