ReviewEvidence from auditory and visual event-related potential (ERP) studies of deviance detection (MMN and vMMN) linking predictive coding theories and perceptual object representations☆
Highlights
►A joint review of electric brain potentials of auditory and visual deviance detection is provided. ►Electric brain signals elicited by sensory deviance are suggested to reflect prediction errors. ►Generative models of sensory regularities are proposed to serve as perceptual object representations. ►Sensory deviance detection results are interpreted in terms of predictive coding theories.
Introduction
Helmholtz's (1860/1962) notion of unconscious inference engendered arguably the most fruitful line of perceptual research throughout the relatively short history of psychology, the empiricist tradition. In one of its contemporary variants, Gregory (1980) suggests that perception is akin to scientific hypotheses: it is the brain's best-fitting model for the information entering the senses. But together with Gordon (1997) we can ask how these models are formed, what evidence they are tested against, and how they adapt to an ever changing environment? To answer these questions, some of the theories of predictive coding (Creutzig and Sprekeler, 2008, Dayan et al., 1995, Friston, 2005, Friston, 2010, Hohwy, 2007, Mumford, 1992, Rao and Ballard, 1999, Schütz-Bosbach and Prinz, 2007) evoke the principle of free-energy minimization (e.g., Friston, 2005, Friston, 2010).
Predictive coding theories suggest that the perceptual system's primary objective is to minimize the discrepancy between predictions from its internal generative models of the environment and the actual sensory input. Structured as a hierarchy of models of increasing levels of abstraction, predictions from each level are tested on data emerging one level lower with the difference (termed the “error signal” or “prediction error”) being passed upwards in the hierarchy (for non-mathematical descriptions, see Baldeweg, 2007, Hohwy et al., 2008). The error signal then governs model selection/adjustment in order to minimize prediction error throughout the system. Thus predictive coding theories implement the analysis by synthesis principle (Neisser, 1967, Yuille and Kersten, 2006) and conform to the notion of gist-first processing suggested by some recent theories of perception (Ahissar and Hochstein, 2004, Bar, 2004, Bar, 2007), whereby higher-level (more general) models govern the interpretation (model selection) at lower levels. Predictive coding theories acknowledge the stochastic nature of the information entering the senses, a notion that has long been argued by an early theorist of perception, Egon Brunswik (1956). Dealing with probability distributions instead of discrete values, predictive coding theories assume that the brain follows Bayesian inference rules in model selection (Kersten et al., 2004, Knill and Pouget, 2004, Yuille and Kersten, 2006). Models based on hierarchical Bayesian inference using hierarchical generative models represent a recent development in the field (Friston and Kiebel, 2009, Lee and Mumford, 2003). In a hierarchical setting, the predictions from higher levels play the role of empirical priors on representations in lower levels. This resolves concerns about where priors come from and makes (empirical) priors accountable to sensory data. Thus sensory data is used to update the evaluation (the probability of the correctness) of existing models. In the end, the model with the highest probability of being correct determines the (conscious) percept.
Thus, according to these theories, the general makeup of the afferent system1 is divided into 1) neuronal circuits implementing the generative models and setting up lower levels in the hierarchy and 2) circuits determining prediction errors and passing them onto higher levels (Friston, 2005). However, whereas the determination of prediction errors is quite clear, the make-up of the corresponding generative models is rather unspecified beyond the principles of Bayesian inference processing. A consequence of this imbalance of detail between the two assumed functional units of predictive coding theories is that most neuroscience evidence interpreted in favor of predictive coding comes from observing neuronal activity that shows effects expected of processing prediction errors. The major sources of such evidence are single-cell data and simulations (Grill-Spector et al., 2006, Hosoya et al., 2005, Jehee and Ballard, 2009, Lee and Mumford, 2003, Wang et al., 2006), local field potentials (Kumar et al., in press), and large-scale brain responses (Alink et al., 2010, Aoyama et al., 2005, den Ouden et al., 2010, Murray et al., 2002); each showing reduced activity for predicted as compared with unpredicted sensory input. There exist also behavioral data compatible with what is expected from a system working on Bayesian principles (den Ouden et al., 2010, Ernst and Banks, 2002, Hohwy et al., 2008, Weiss et al., 2002, Yu, 2007). However, the representation and maintenance of the generative models received less elaboration so far.
Psychological theories agree on that the overall function of perception is to discover the sources of the information entering the senses, because knowledge about these objects and events can be utilized to reach survival and reproduction goals (e.g., Brunswik, 1956). Thus behavior is influenced by the distal objects and events. Even when behavior is apparently controlled by a single feature (e.g. we pick up a cherry by its color), the feature belongs to an object. Therefore, psychological theories have for a long time assumed the existence of brain representations for objects and suggested that incoming sensory information is stored and manipulated in such units in the brain. The question addresses here is how these representations relate to the multi-leveled generative models of predictive coding theories?
The representations inferred from studies measuring the mismatch negativity (MMN: Näätänen et al., 1978, for a recent review, see Näätänen et al., 2011) event-related brain potential (ERP) and its visual counterpart (vMMN: Tales et al., 1999, Heslenfeld, 2003, for recent reviews, see Czigler, 2007, Czigler, 2010, Kimura, 2012) may provide a useful link between these two views of perception. MMN and vMMN are elicited when the incoming stimulus violates some regular feature detected from the preceding sequence. MMN was discovered in the context of the auditory oddball paradigm. Occasionally exchanging a repetitive sound (termed, the “standard”) for a different one (termed, the “deviant”) elicited a fronto-centrally negative ERP response (MMN) peaking between 100 and 200 ms from the onset of the deviance (typically the sound onset). MMN was initially described as an ERP correlate of detecting a mismatch between the memory trace of the repeating sound and that of the incoming one (Näätänen et al., 1978). Research in the past thirty years demonstrated that MMN is also elicited by violations of regularities which are more complex than stimulus repetition, including such regularities in which each sound is specified by the immediately preceding one (Paavilainen et al., 2007, Horváth et al., 2001). These and similar evidence as well as a detailed analysis of the alternative interpretations (see Winkler, 2007) led to the hypothesis that 1) memory representations of the detected regularities are generative models providing predictions about upcoming sensory events and 2) MMN is elicited when the current stimulus does not match these predictions (Baldeweg, 2006, Baldeweg, 2007, Garrido et al., 2009c, Näätänen et al., 2011, Sinkkonen, 1999, Winkler, 2007, Winkler et al., 1996; see also Bendixen et al, 2012-this issue). Winkler and Czigler (1998) further argued that the function of the MMN signal is to update the regularity representations violated by the deviant stimulus (see also Winkler, 2007). Thus, in terms of predictive coding theories, MMN can be regarded as a signal carrying the prediction error (Garrido et al., 2009c).
Based on the above interpretation of MMN, the memory representations reflected in the MMN ERP component may be compatible with the generative models assumed in predictive coding descriptions of perception. We previously suggested (Winkler, 2010, Winkler et al., 2009) that the representations inferred from MMN studies meet the criteria set for auditory object representations. Thus results obtained in studies of the auditory and visual MMN may provide a link between the predictive coding view of perception and the psychological literature of perceptual object representations.
Here we review results of studies measuring the auditory and visual MMN offering (indirect) evidence about the nature of perceptual object representations. The aims of the review are 1) to compare characteristics of the object representations in the two modalities and 2) to assess how well they fit into a generalized predictive coding account of perception.2
Section snippets
Object representations and (v)MMN
Objects serve as perceptual units, as was first emphasized by Gestalt psychologists (Köhler, 1947) and they are also the units of attentional selection (e.g., Duncan, 1984). The first difference between the two (auditory and visual) modalities lies in what constitutes this unit of representation. That is, what is a perceptual object? Whereas in vision, object representations unequivocally refer to physical objects in the environment, in the auditory modality, two different perceptual units can
Similarities and differences between auditory and visual object representations
In the previous section, we showed that both auditory and visual memory representations, as inferred from studies of deviance detection, possess the characteristics expected of perceptual object representations. The evidence described above painted a picture of similar representations across the two modalities. Here we briefly review the possible differences between the representations in the two modalities.
The vast majority of vMMN studies adapted designs developed for auditory research. This
Are perceptual object representations compatible with the generative models postulated by predictive coding theories?
In terms of predictive coding theories, the organism's knowledge about the world is encoded in generative models. In hierarchical predictive coding models, the system comprises nested levels with error signals propagating upwards and predictions propagating downwards. This recurrent or reciprocal message passing among levels of the hierarchical model enables the model to be optimized or adjusted; thereby selecting the best explanation for the current sensory input. No level has special
Summary
We reviewed evidence suggesting that the memory representations involved in auditory and visual deviance detection meet the criteria set for perceptual object representations. We discussed the similarities and differences between these representations in the two modalities. Finally, we hypothesized that the memory representations involved in deviance detection are closely related to the generative models assumed by predictive coding theories.
Acknowledgments
This work was supported by the European Community's Seventh Framework Programme FP7 (Challenge 2 — Cognitive Systems, Interaction, Robotics) under grant agreement 231168-SCANDLE (to I.W.) and the Hungarian National Research Fund (OTKA) under grant agreement 71600 (to I.C.).
References (204)
- et al.
The reverse hierarchy theory of visual perceptual learning
Trends in Cognitive Sciences
(2004) - et al.
Neuromagnetic analysis of effect of audition-based prediction on visual information processing
International Cong. Ser.
(2005) Repetition effects to sounds: evidence for predictive coding in the auditory system
Trends in Cognitive Sciences
(2006)The proactive brain: using analogies and associations to generate predictions
Trends in Cognitive Sciences
(2007)- et al.
Rapid extraction of auditory feature contingencies
NeuroImage
(2008) - et al.
Early electrophysiological indicators for predictive processing in audition
Int. J. Psychophysiol.
(2012) - et al.
Visual prediction and perceptual expertise
Int. J. Psychophysiol.
(2012) - et al.
Color categories affect pre-attentive color perception
Biological Psychology
(2010) - et al.
Visual temporal window of interaction as revealed by the mismatch negativity event-related potential to stimulus omission
Brain Research
(2006) - et al.
The novelty P3: an event-related brain potential (ERP) sign of the brain's evaluation of novelty
Neuroscience and Biobehavioral Reviews
(2001)
Cortical circuits for perceptual inference
Neur. Net.
The functional anatomy of the MMN: a DCM study of the roving paradigm
NeuroImage
Repetition suppression and plasticity in the human brain
NeuroImage
The mismatch negativity: a review of underlying mechanisms
Clinical Neurophysiology
Repetition and the brain: neural models of stimulus-specific effects
Trends in Cognitive Sciences
The symbol grounding problem
Physica D
Predictive coding explains binocular rivalry: an epistemological review
Cognition
Simultaneously active pre-attentive representations of local and global rules for sound sequences
Cognitive Brain Research
The temporal window of integration in elderly and young adults
Neurobiology of Aging
DoN1/MMN, P3a, and RON form a strongly coupled chain reflecting the three stages of auditory distraction?
Biological Psychology
Long-term memory traces facilitate short-term memory trace formation in audition in humans
Neuroscience Letters
Pre-attentive auditory processing of lexicality
Brain and Language
The reviewing object files: object-specific integration of information
Cognitive Psychology
Visual mismatch negativity and unintentional temporal-context-based prediction in vision
Int. J. Psychophysiol.
The Bayesian brain: the role of uncertainty in neural coding and computation
Trends in Neurosciences
Processing abstract auditory features in the human auditory cortex
NeuroImage
Cortical speech processing unplugged: a timely subcortico-cortical framework
Trends in Cognitive Sciences
Visual mismatch negativity elicited by magnocellular system activation
Vision Research
Auditory and visual objects
Cognitive
Perceptual magnet effect in the light of behavioral and psychophysiological data
Journal of the Acoustical Society of America
Stimulus predictability reduces responses in primary visual cortex
Journal of Neuroscience
Stimulus-specific adaptation occurs in the auditory thalamus
Journal of Neuroscience
Event-related potentials to task-irrelevant changes in facial expressions
Behavioral and Brain Functions
ERP repetition effects and mismatch negativity generation: a predictive coding perspective
Journal of Psychophysiology
Visual objects in context
Nature Reviews Neuroscience
Regularity extraction and application in dynamic auditory stimulus sequences
Journal of Cognitive Neuroscience
The attentional blink demonstrates automatic deviance processing in vision
NeuroReport
Hierarchical and asymmetric temporal sensitivity in human auditory cortices
Nature Neuroscience
Auditory streaming: competition among alternative organizations
Perception & Psychophysics
Auditory Scene Analysis
The Perceptual Organization of Sound
Primary auditory stream segregation and perception of order in rapid sequences of tones
Journal of Experimental Psychology
Perception and the Representative Design of Psychological Experiments
Prediction, cognition and the brain
Front. Human Neurosci.
A kind of auditory ‘primitive intelligence’ already present at birth
European Journal of Neuroscience
Dysfunction of processing task-irrelevant emotional faces in major depressive disorder patients revealed by expression-related visual MMN
Neuroscience Letters
Development of language specific phoneme representations in the infant brain
Nature Neuroscience
Sensory memory — a tutorial review
On short and long auditory stores
Psychological Bulletin
Short- and long-term prerequisites of the mismatch negativity in the auditory event related potential (ERP)
Journal of Experimental Psychology: Learning, Memory, and Cognition
Cited by (0)
- ☆
Contribution to the Special Issue titled “Predictive information processing in the brain: Principles, neural mechanisms and models” edited by J. Todd, E. Schröger, and I. Winkler.