The temporal evolution of conceptual object representations revealed through models of behavior, semantics and deep neural networks
Introduction
There is enormous variability in the visual appearance of objects, yet we can rapidly recognize them without effort, even under difficult viewing conditions (DiCarlo & Cox, 2007; Potter et al., 2014). Evidence from neurophysiological studies in human suggests the emergence of visual object representations within the first 150 ms of visual processing (Thorpe et al., 1996; Carlson et al., 2013; Cichy et al., 2014). For example, the specific identity of objects can be decoded from the magnetoencephalography (MEG) signal with high accuracy around 100 ms (Cichy et al., 2014). However, knowing when discriminative information about visual objects is available does not inform us about the nature of those representations, in particular whether they primarily reflect (low-level) visual features or (high-level) conceptual aspects of the objects (Clarke et al., 2014). To address this issue, in this study we employed multivariate MEG decoding and model-based representational similarity analysis (RSA) to elucidate the nature of object representations over time.
Previous studies have demonstrated increasing category specificity (van de Nieuwenhuijzen et al., 2013; Cichy et al., 2014), tolerance for position and size (Isik et al., 2014) and semantic information (Clarke et al., 2013) over the first 200 ms following stimulus onset, suggesting some degree of abstraction from low-level visual features. However, identifying the nature of object representations is an inherently difficult problem: low-level features may be predictive of object identity, making it hard to disentangle the relative contribution of low and high-level properties to measured brain signals (Groen et al., 2017). In this study, we addressed this problem by combining tests for the generalization of object representations with methods to separate the independent contributions of low- and high-level properties. We focused on two specific criteria that would need to be fulfilled for a representation to be considered conceptual. First, a conceptual representation should generalize beyond the specific exemplar presented, not just variations of the same exemplar. Second, a conceptual representation should also reflect high-level behavioral judgments about objects (Clarke and Tyler, 2015; Wardle et al., 2016). We consider fulfillment of these two properties to provide a lower bound at which a representation could be considered conceptual.
We collected MEG and behavioral data from 32 participants allowing us to probe the temporal dynamics of conceptual object representations according to the two criteria above. To test for generalization across specific exemplars, we assessed the reliability of object representations across two independent sets of objects. Further, we assessed the relation of those object representations to behavior by comparing participants' behavioral judgments with the MEG response patterns using RSA. Importantly, to isolate the relative contributions of low-level and conceptual properties to those MEG responses, we identified the variance uniquely explained by behavioral judgments, isolating low-level representations using early layers of a deep neural network, which have been shown to capture low-to mid-level responses in fMRI and monkey ventral visual cortex (Cadieu et al., 2014; Cichy et al., 2016a; Eickenberg et al., 2017; Güçlü and van Gerven, 2015; Khaligh-Razavi and Kriegeskorte, 2014; Yamins et al., 2014; Wen et al., 2017). Finally, to achieve a more interpretable understanding of the contribution of behavior to MEG responses, we identified the unique and shared variance explained in the MEG response by behavior and two high-level conceptual models, one perceptual (upper layers in a deep neural network) and one semantic (based on word co-occurrence statistics).
Section snippets
Participants
32 healthy participants (18 female, mean 25.8, range 19–47) with normal or corrected-to-normal vision took part in this study. As a part of a pilot experiment used for purely illustrative purposes (see Figure 4a), 8 participants (5 overlap) completed the same behavioral task with a different set of stimuli. All participants gave written informed consent prior to participation in the study as a part of the study protocol (93-M-0170, NCT00001360). The study was approved by the Institutional
Results
Our aim in this study was to characterize the emergence of conceptual representations for visual objects. We applied multivariate decoding and representational similarity analysis to MEG data to examine (1) how object representations generalize across time and object exemplars, and (2) to elucidate the unique and shared contributions of behavioral judgments to measured MEG responses. The resulting temporal profiles inform us about stages of object processing from low-level visual to conceptual
Discussion
In this study, we investigated the temporal evolution of visual object representations. In particular we focused on determining a lower bound for the emergence of conceptual representations of objects. We proposed two criteria that would reflect conceptual representations: 1) generalization of representations between different exemplars of the same object, and 2) relationship to high-level behavioral judgments. We find qualitatively different processing of objects over time: Early responses
Conflicts of interest
The authors declare no competing financial interests.
Acknowledgements
This work was supported by the Intramural Research Program of the National Institute of Mental Health (ZIA-MH-002909) - National Institute of Mental Health Clinical Study Protocol 93-M-0170, NCT00001360, a Feodor-Lynen fellowship of the Humboldt Foundation to M.N.H., and a Rubicon Fellowship from the Netherlands Organisation for Scientific Research to I.I.A.G.
References (56)
- et al.
Understanding what we see: how we derive meaning from vision
Trends Cognit. Sci.
(2015) - et al.
Decoding the time-course of object recognition in the human brain: from visual features to categorical decisions
Neuropsychologia
(2017) - et al.
Untangling invariant object recognition
Trends Cognit. Sci.
(2007) - et al.
Seeing it all: convolutional network layers map the function of the human visual system
Neuroimage
(2017) - et al.
Multivariate pattern analysis for MEG: a comparison of dissimilarity measures, comparison of dissimilarity measures
Neuroimage
(2018) - et al.
Characterizing the dynamics of mental representations: the temporal generalization method
Trends Cognit. Sci.
(2014) - et al.
MEG-based decoding of the spatiotemporal dynamics of visual category perception
Neuroimage
(2013) - et al.
Perceptual similarity of visual patterns predicts dynamic neural activation patterns measured with MEG
Neuroimage
(2016) The Psychophysics toolbox
Spatial Vis.
(1997)- et al.
Concreteness ratings for 40 thousand generally known English word lemmas
Behav. Res. Meth.
(2014)
Deep neural networks rival the representation of primate IT cortex for core visual object recognition
PLoS Comput. Biol.
Representational dynamics of object vision: the first 1000 ms
J. Vis.
LIBSVM: a library for support vector machines
ACM Trans. Int. Syst. Technol.
Return of the devil in the details: delving deep into convolutional nets
Brain Mach. Vis. Conf
Comparison of deep neural networks to spatio-temporal cortical dynaics of human visual object recognition reveals hierarchical correspondence
Sci. Rep.
Neural dynamics of real-world object vision that guide behavior
bioRxiv
Resolving human object recognition in space and time
Nat. Neurosci.
Similarity-based fusion of MEG and fMRI reveals spatio-temporal dynamics in human cortex during visual object recognition
Cerebr. Cortex
Predicting the time course of individual objects with MEG
Cerebr. Cortex
From perception to conception: how meaningful objects are processed over time
Cerebr. Cortex
The role of visual and semantic properties in the emergence of category-specific patterns of neural response in the human brain
eNeuro
The Corpus of Contemporary American English (COCA): 520 Million Words, 1990-present
An efficient method for obtaining similarity data
Behav. Res. Methods Instrum. Comput.
Deep neural networks reveal a gradient in the complexity of representations across the ventral stream
J. Neurosci.
Visual scenes are categorized by function
J. Exp. Psychol.
Spatially pooled contrast responses predict neural and perceptual similarity of naturalistic image categories
PLoS Comput. Biol.
From image statistics to scene gist: evoked neural activity reveals transition from natural image structure to scene category
J. Neurosci.
Distinct contributions of functional and deep neural network features to representational similarity of scenes in human brain and behavior
eLife
Cited by (0)
- 1
Equal contribution.