Elsevier

NeuroImage

Volume 178, September 2018, Pages 172-182
NeuroImage

The temporal evolution of conceptual object representations revealed through models of behavior, semantics and deep neural networks

https://doi.org/10.1016/j.neuroimage.2018.05.037Get rights and content

Highlights

  • Used MEG to reveal lower bound for emergence of conceptual object representations.

  • Two criteria: between-exemplar generalization and relationship to behavior.

  • MEG pattern similarity between exemplars rises after 160 ms.

  • Model of behavior explains unique MEG variance after 150 ms.

  • Earlier MEG response well captured by early layer of deep neural network model.

Abstract

Visual object representations are commonly thought to emerge rapidly, yet it has remained unclear to what extent early brain responses reflect purely low-level visual features of these objects and how strongly those features contribute to later categorical or conceptual representations. Here, we aimed to estimate a lower temporal bound for the emergence of conceptual representations by defining two criteria that characterize such representations: 1) conceptual object representations should generalize across different exemplars of the same object, and 2) these representations should reflect high-level behavioral judgments. To test these criteria, we compared magnetoencephalography (MEG) recordings between two groups of participants (n = 16 per group) exposed to different exemplar images of the same object concepts. Further, we disentangled low-level from high-level MEG responses by estimating the unique and shared contribution of models of behavioral judgments, semantics, and different layers of deep neural networks of visual object processing. We find that 1) both generalization across exemplars as well as generalization of object-related signals across time increase after 150 ms, peaking around 230 ms; 2) representations specific to behavioral judgments emerged rapidly, peaking around 160 ms. Collectively, these results suggest a lower bound for the emergence of conceptual object representations around 150 ms following stimulus onset.

Introduction

There is enormous variability in the visual appearance of objects, yet we can rapidly recognize them without effort, even under difficult viewing conditions (DiCarlo & Cox, 2007; Potter et al., 2014). Evidence from neurophysiological studies in human suggests the emergence of visual object representations within the first 150 ms of visual processing (Thorpe et al., 1996; Carlson et al., 2013; Cichy et al., 2014). For example, the specific identity of objects can be decoded from the magnetoencephalography (MEG) signal with high accuracy around 100 ms (Cichy et al., 2014). However, knowing when discriminative information about visual objects is available does not inform us about the nature of those representations, in particular whether they primarily reflect (low-level) visual features or (high-level) conceptual aspects of the objects (Clarke et al., 2014). To address this issue, in this study we employed multivariate MEG decoding and model-based representational similarity analysis (RSA) to elucidate the nature of object representations over time.

Previous studies have demonstrated increasing category specificity (van de Nieuwenhuijzen et al., 2013; Cichy et al., 2014), tolerance for position and size (Isik et al., 2014) and semantic information (Clarke et al., 2013) over the first 200 ms following stimulus onset, suggesting some degree of abstraction from low-level visual features. However, identifying the nature of object representations is an inherently difficult problem: low-level features may be predictive of object identity, making it hard to disentangle the relative contribution of low and high-level properties to measured brain signals (Groen et al., 2017). In this study, we addressed this problem by combining tests for the generalization of object representations with methods to separate the independent contributions of low- and high-level properties. We focused on two specific criteria that would need to be fulfilled for a representation to be considered conceptual. First, a conceptual representation should generalize beyond the specific exemplar presented, not just variations of the same exemplar. Second, a conceptual representation should also reflect high-level behavioral judgments about objects (Clarke and Tyler, 2015; Wardle et al., 2016). We consider fulfillment of these two properties to provide a lower bound at which a representation could be considered conceptual.

We collected MEG and behavioral data from 32 participants allowing us to probe the temporal dynamics of conceptual object representations according to the two criteria above. To test for generalization across specific exemplars, we assessed the reliability of object representations across two independent sets of objects. Further, we assessed the relation of those object representations to behavior by comparing participants' behavioral judgments with the MEG response patterns using RSA. Importantly, to isolate the relative contributions of low-level and conceptual properties to those MEG responses, we identified the variance uniquely explained by behavioral judgments, isolating low-level representations using early layers of a deep neural network, which have been shown to capture low-to mid-level responses in fMRI and monkey ventral visual cortex (Cadieu et al., 2014; Cichy et al., 2016a; Eickenberg et al., 2017; Güçlü and van Gerven, 2015; Khaligh-Razavi and Kriegeskorte, 2014; Yamins et al., 2014; Wen et al., 2017). Finally, to achieve a more interpretable understanding of the contribution of behavior to MEG responses, we identified the unique and shared variance explained in the MEG response by behavior and two high-level conceptual models, one perceptual (upper layers in a deep neural network) and one semantic (based on word co-occurrence statistics).

Section snippets

Participants

32 healthy participants (18 female, mean 25.8, range 19–47) with normal or corrected-to-normal vision took part in this study. As a part of a pilot experiment used for purely illustrative purposes (see Figure 4a), 8 participants (5 overlap) completed the same behavioral task with a different set of stimuli. All participants gave written informed consent prior to participation in the study as a part of the study protocol (93-M-0170, NCT00001360). The study was approved by the Institutional

Results

Our aim in this study was to characterize the emergence of conceptual representations for visual objects. We applied multivariate decoding and representational similarity analysis to MEG data to examine (1) how object representations generalize across time and object exemplars, and (2) to elucidate the unique and shared contributions of behavioral judgments to measured MEG responses. The resulting temporal profiles inform us about stages of object processing from low-level visual to conceptual

Discussion

In this study, we investigated the temporal evolution of visual object representations. In particular we focused on determining a lower bound for the emergence of conceptual representations of objects. We proposed two criteria that would reflect conceptual representations: 1) generalization of representations between different exemplars of the same object, and 2) relationship to high-level behavioral judgments. We find qualitatively different processing of objects over time: Early responses

Conflicts of interest

The authors declare no competing financial interests.

Acknowledgements

This work was supported by the Intramural Research Program of the National Institute of Mental Health (ZIA-MH-002909) - National Institute of Mental Health Clinical Study Protocol 93-M-0170, NCT00001360, a Feodor-Lynen fellowship of the Humboldt Foundation to M.N.H., and a Rubicon Fellowship from the Netherlands Organisation for Scientific Research to I.I.A.G.

References (56)

  • C.F. Cadieu et al.

    Deep neural networks rival the representation of primate IT cortex for core visual object recognition

    PLoS Comput. Biol.

    (2014)
  • T.A. Carlson et al.

    Representational dynamics of object vision: the first 1000 ms

    J. Vis.

    (2013)
  • C.C. Chang et al.

    LIBSVM: a library for support vector machines

    ACM Trans. Int. Syst. Technol.

    (2011)
  • K. Chatfield et al.

    Return of the devil in the details: delving deep into convolutional nets

    Brain Mach. Vis. Conf

    (2014)
  • R.M. Cichy et al.

    Comparison of deep neural networks to spatio-temporal cortical dynaics of human visual object recognition reveals hierarchical correspondence

    Sci. Rep.

    (2016)
  • R.M. Cichy et al.

    Neural dynamics of real-world object vision that guide behavior

    bioRxiv

    (2017)
  • R.M. Cichy et al.

    Resolving human object recognition in space and time

    Nat. Neurosci.

    (2014)
  • R.M. Cichy et al.

    Similarity-based fusion of MEG and fMRI reveals spatio-temporal dynamics in human cortex during visual object recognition

    Cerebr. Cortex

    (2016)
  • A. Clarke et al.

    Predicting the time course of individual objects with MEG

    Cerebr. Cortex

    (2014)
  • A. Clarke et al.

    From perception to conception: how meaningful objects are processed over time

    Cerebr. Cortex

    (2013)
  • D.D. Coggan et al.

    The role of visual and semantic properties in the emergence of category-specific patterns of neural response in the human brain

    eNeuro

    (2016)
  • M. Davies

    The Corpus of Contemporary American English (COCA): 520 Million Words, 1990-present

    (2008)
  • R. Goldstone

    An efficient method for obtaining similarity data

    Behav. Res. Methods Instrum. Comput.

    (1994)
  • U. Güçlü et al.

    Deep neural networks reveal a gradient in the complexity of representations across the ventral stream

    J. Neurosci.

    (2015)
  • M.R. Greene et al.

    Visual scenes are categorized by function

    J. Exp. Psychol.

    (2016)
  • I.I.A. Groen et al.

    Spatially pooled contrast responses predict neural and perceptual similarity of naturalistic image categories

    PLoS Comput. Biol.

    (2012)
  • I.I.A. Groen et al.

    From image statistics to scene gist: evoked neural activity reveals transition from natural image structure to scene category

    J. Neurosci.

    (2013)
  • I.I.A. Groen et al.

    Distinct contributions of functional and deep neural network features to representational similarity of scenes in human brain and behavior

    eLife

    (2018)
  • Cited by (0)

    1

    Equal contribution.

    View full text