Elsevier

Brain and Cognition

Volume 65, Issue 2, November 2007, Pages 145-168
Brain and Cognition

Top-down predictions in the cognitive brain

https://doi.org/10.1016/j.bandc.2007.06.007Get rights and content

Abstract

The human brain is not a passive organ simply waiting to be activated by external stimuli. Instead, we propose that the brain continuously employs memory of past experiences to interpret sensory information and predict the immediately relevant future. The basic elements of this proposal include analogical mapping, associative representations and the generation of predictions. This review concentrates on visual recognition as the model system for developing and testing ideas about the role and mechanisms of top-down predictions in the brain. We cover relevant behavioral, computational and neural aspects, explore links to emotion and action preparation, and consider clinical implications for schizophrenia and dyslexia. We then discuss the extension of the general principles of this proposal to other cognitive domains.

Introduction

As military tacticians have known for millennia, surprise and uncertainty are costly in terms of time and energy expenditures (Tzu, 2006). Sharing a view expressed in the past, we propose that a fundamental function of the brain is to predict proximate events, which facilitates interactions with external stimuli, conserves effort, and ultimately increases the chances of survival. Useful predictions typically do not arise de novo. Even innate abilities of the human brain, such as vision or language, usually require development, during which experience-based mapping of sensory inputs to their identities, and to the appropriate responses, takes place. For example, it is only after years of training that a baseball player is able to effectively anticipate a particular type of pitch, predict the trajectory of the ball by combining top-down information about a particular pitch with bottom-up perception of the rotation and speed of the ball, and correctly decide whether and how to swing at it—all in about half of a second. This ability to rely on stored knowledge and learned modes of behavior reduces the need to consider a large number of potential causes or courses of action, which enables quicker interpretation of endogenous and exogenous events, and faster, more precise, and less effortful responses.

Recent computational and theoretical work demonstrates how predictions can be integrated with sensory input to reduce the computational demands in perception (Bar, 2007, Engel et al., 2001, Grossberg, 1980, Ullman, 1995). However, the neuroanatomical mechanism underlying the generation and efficient representation of predictions in the brain is not completely understood. In this review, we will focus on how the brain generates predictions about the visual world and uses these predictions to allow an efficient and accurate interpretation of the environment. Though we largely concentrate on vision, the same general principles can be applied to other sensory modalities, as well as to more complex cognitive domains.

Picture an impatient driver attempting to overtake a slow-moving vehicle on a busy two-lane road. Much information has to be acquired and computed very quickly by the driver’s brain to avoid a collision. The visual system must parse the constantly changing two-dimensional image at the retinae into a coherent three-dimensional representation of the scene, in which the road, the objects on and surrounding it, and the position of the driver’s car are accurately identified. The velocities and trajectories of each vehicle in the vicinity must be estimated and changes in these parameters anticipated. Anticipating the movement of other vehicles on the road also involves judging the road conditions, the state of the traffic signals ahead, as well as the other drivers’ intentions based on their interpretations of these factors and their internal states. Many of these judgments are based not only on visual, but also on auditory, proprioceptive, and even olfactory and social cues (e.g., “will the driver of that junker in front of me emanating noxious smoke and loud music allow me to pass, or try to keep me in the opposite lane?”). Lastly, the forces applied to the car’s controls must be computed correctly and adjusted instantly based on both internal (proprioceptive) and external (the vehicle’s characteristics and road conditions) parameters. This multimodal information must be analyzed correctly and updated on a moment-to-moment basis to avoid an accident.

Even when we leave aside the other sensory and motor aspects of the task and just concentrate on vision, the complexity of the problem seems immense. The complete understanding of how the brain solves this problem so efficiently remains in many respects elusive, exemplified by our inability to implement anything close to human visual abilities in machine vision. Yet such situations are correctly assessed and successfully navigated every day by millions of drivers with diverse driving abilities, underscoring the nearly universal truth that we are “geniuses at vision” (Hoffman, 1998). We propose that this efficiency of vision arises in part from the integration of two interacting and cooperating mechanisms—a fast, coarse subsystem that initiates top-down predictions based on partially processed visual input, and a slower, finer subsystem that is guided by the rapidly activated predictions and refines them based on slower-arriving detailed information. Before delving into the details of how predictions may be effected in the brain to facilitate visual processing, we will briefly review the underlying neuroanatomical structure through which visual information flows, and consider how the various pathways process visual information.

The primary output from the retina to the lateral geniculate nucleus (LGN) of the thalamus, and from LGN to the primary visual cortex, comprises several parallel pathways with differing information processing characteristics and conduction speeds. The magnocellular (M) pathway has its origin in the large parasol retinal ganglion cells, which collect input from a number of large (diffuse) bipolar cells (Boycott and Wassle, 1991, Kaplan, 2004). The retinal M cells project to the deep (1–2) layers of the LGN, which in turn send projections to layer 4Ca of the primary visual cortex. The M cells have large receptive fields, high luminance contrast gain, transient and temporally sensitive responses, and fast conduction velocities, but are not color-selective or able to resolve fine details (Kaplan and Shapley, 1986, Merigan and Maunsell, 1993, Schiller et al., 1979, Wiesel and Hubel, 1966). From V1, M neurons project to the thick-stripe regions of V2, continuing to the motion-processing region MT/V5, and then to the higher-order motion and attention regions in the temporal and posterior parietal cortex. The M projections comprise most of the dorsal, or “where”, visual stream, which subserves spatial vision, motion detection, attention, and action planning (Goodale and Milner, 1992, Ungerleider and Mishkin, 1982).

The second major pathway, the parvocellular (P) pathway, originates with the midget retinal ganglion cells, which typically receive information from just two (ON/OFF-center) bipolar midget cells connected to a single cone (Kaplan, 2004). The P ganglion cells project to the superficial (3–6) layers of the LGN and layers 4A and 4Cb of V1 (Callaway, 2005, Perry et al., 1984). In part because of their anatomical connections, the P cells have small receptive fields, are color-sensitive, and have low gain for luminance contrast (Kaplan & Shapley, 1986). The latter property renders the P cells ineffective for achromatic stimuli below ∼8% contrast (Tootell, Hamilton, & Switkes, 1988). The P cells exhibit tonic response properties, low responsiveness to temporal modulation of stimuli, and have low conduction velocities compared with that of the M cells (Kaplan, 2004, Wassle and Boycott, 1991). The P projections form much of the ventral visual (“what”) stream (Goodale and Milner, 1992, Ungerleider and Mishkin, 1982), which nonetheless includes some magnocellular inputs (Merigan & Maunsell, 1993). The ventral stream has a hierarchical architecture in which visual form information is analyzed in an increasingly complex fashion as it propagates anteriorly, from V1 to V2, V4, and lastly, to the complex form analysis regions in the inferior temporal (IT) cortex. While V1 is sensitive to very basic, local visual properties such as orientation and retinotopic location (Hubel & Wiesel, 1962), neurons in regions in the mid- and anterior IT respond to increasingly larger portions of the visual field and become sensitive to more abstract, viewpoint-, location-, and illumination-invariant properties (Gallant et al., 1996, Qiu and von der Heydt, 2005, Tanaka, 1996, Tanaka, 1997, Vogels et al., 2001). A number of late-stage regions in lateral occipital and inferior temporal regions specialize in recognition of particular classes of visual stimuli, such objects (Bar et al., 2001, Bar et al., 2006a, Grill-Spector et al., 2001, Ishai et al., 2000, Logothetis and Sheinberg, 1996, Malach et al., 1995, Tanaka, 1996), scenes (Aguirre et al., 1998, Epstein and Kanwisher, 1998), and faces (Allison et al., 1994b, Allison et al., 1994a, Kanwisher et al., 1997, Puce et al., 1995).

Less is known about the third, relatively recently discovered, koniocellular (K) visual stream, which originates with the bistratified retinal ganglion cells (Hendry & Yoshioka, 1994). The retinoganglionic K cells terminate between the major (M and P) layers in the LGN, whence geniculate K projections are sent to the “CO blobs” (cytochome oxidase-rich areas) in layers 2–3 of V1. The K cells are vastly outnumbered in both the LGN and V1 by the M and P cells (Callaway, 2005). The K cells appear to have varied spatial resolution, sensitivity to short-wavelength (blue) light, and perhaps higher luminance contrast gain than P cells (Hendry and Reid, 2000, Hendry and Yoshioka, 1994, Norton and Casagrande, 1982). Their conduction velocity appears to be fairly slow, comparable to that of P cells (Hendry and Reid, 2000, Hendry and Yoshioka, 1994, Livingstone and Hubel, 1988, Merigan and Maunsell, 1993). The diverse anatomical and functional characteristics of K cells make it difficult to isolate these cells and to understand their exact role in vision (Kaplan, 2004).

While the M and P projections still remain largely segregated in V1, the dorsal and ventral streams that they form have cross-connections at virtually every level of their processing hierarchies (DeYoe and Van Essen, 1988, Merigan and Maunsell, 1993). This sharing of different types of information presumably allows the integration of the information carried by the different streams into coherent percepts and recognition of stimuli with impoverished color or luminance contrast, albeit more slowly. Attention to the stimulus is thought to be required for this process (Treisman & Gelade, 1980). In addition to the connections between the two cortical visual streams, the cortical regions in each stream have connections with subcortical nuclei, such as the projections from posterior IT (monkey area TEO) to the superior colliculus, and from anterior IT (monkey area TE) to the pulvinar nucleus and the magnocellular division of the mediodorsal nucleus of the thalamus (Webster et al., 1993, Webster et al., 1995).

Much of research in vision has justifiably concentrated on information arriving via the primary visual pathway (i.e., retina-LGN-primary visual cortex) and propagating along the cortical visual streams. However, other important visual input and processing routes exist in the brain. Indeed, projections from the retina to structures other than the LGN are hardly insubstantial, with at least nine other targets besides the LGN (Weiskrantz, 1996). Among those alternate input routes, the ones most relevant to facilitating cortical visual analysis may be inputs to the superior colliculus, which projects to the pulvinar nucleus of the thalamus (Diamond & Hall, 1969), as well as a direct retinal input to the pulvinar (Leh, Mullen, & Ptito, 2006). While the exact functions of the pulvinar, the largest visual (and auditory) nucleus in the human thalamus (Grieve, Acuna, & Cudeiro, 2000), remain poorly understood (Casanova, 2004), it seems to be involved in the processing and relay of partially analyzed information, visual memory, and perhaps auxiliary visual function in the absence of cortical vision (Weiskrantz, 1996). Most of the visual inputs to the pulvinar are from the complex cells in layers 5 and 6 (i.e., feedback) of the primary visual cortex (Abramson and Chalupa, 1985, Casanova, 1993), indicating that it is a higher-order relay and association nucleus involved in processing information that has already undergone an initial analysis in the primary sensory cortex. In addition, the various subdivisions of the pulvinar also have bidirectional connections with many of the cortical regions in both visual streams (Webster et al., 1993, Webster et al., 1995), allowing the pulvinar to integrate information coming from several cortical processing levels and visual field representations (Casanova, 2004). These cortical connections overlap with inputs from the superior colliculus (Benedek, Norita, & Creutzfeldt, 1983). The pulvinar projects to most of the visual association and prefrontal cortices (Giguere and Goldman-Rakic, 1988, Goldman-Rakic and Porrino, 1985, Grieve et al., 2000, Guillery, 1995, Guillery and Sherman, 2002, Romanski et al., 1997). It has been implicated as potentially being involved in supporting residual visual abilities in “blindsight” (Cowey & Stoerig, 1995), a condition in which cortically blind patients have no perception of visual stimuli but retain latent visual abilities that can be used to guide their actions (Sahraie et al., 2006, Weiskrantz, 1996, Weiskrantz, 2004). The pathways subserving this residual vision system may overlap with the “fast brain” proposed by Bullier, 2001a, Bullier, 2001b and may be part of a key bypass route from the primary visual cortex to the higher-order regions in the prefrontal cortex (Sherman & Guillery, 2004). We elaborate on these and other possibilities as they relate to our model of top-down facilitation in vision in the following sections.

Visual recognition has been traditionally considered a bottom-up, feedforward process. According to this view, visual information flows from lower-level regions to higher-level regions until recognition is accomplished, after which semantic analysis and/or object name information can be activated, depending on situational demands. This unidirectional view of visual processing has been largely influenced by the ascending hierarchical processing architecture of the ventral visual stream as discussed above. However, because of the noise and clutter in natural images, a purely bottom-up architecture has fundamental difficulties in identifying objects in all but the simplest circumstances (Bullier, 2001a, Bullier, 2001b), something that was also learned from efforts in computer vision (Sharon et al., 2006, Ullman et al., 2002). For example, given the tremendous variation in lighting, shadows, occlusions, reflections and the like, it is nearly impossible for bottom-up and lateral processes (i.e., those relying only on short-range, horizontal connections) to “know” which edges should be assigned to one shape vs. another, which makes it difficult to unify locally-processed information into a global percept. Moreover, perceptual completion often requires spanning large regions of the visual field and thereby of the primary visual cortex—too distant for the short reach of local inhibitory and excitatory connections (Angelucci & Bullier, 2003). Furthermore, imperceptible or ambiguous properties of a stimulus, such as features that are occluded, cannot be resolved by simply analyzing sensory input. In this case, the nature of the stimulus can only be resolved by inferring the imperceptible or ambiguous properties of the input based on information derived from previous experience (Friston, 2005).

Consequently, “top-down,” feedback projections from higher-level regions would be necessary to account for our visual abilities in all but the simplest of circumstances (Angelucci et al., 2002b, Shmuel et al., 2005, Thorpe and Fabre-Thorpe, 2001). The very nature of the connections between visual regions makes it highly unlikely that only forward processing would be employed in analyzing visual input. The majority of the connections between LGN, V1, and V2, as well as V3, V4, IT and other extrastriate cortical regions are bidirectional (Ergenzinger et al., 1998, Felleman and Van Essen, 1991, Ghashghaei et al., 2007, Lund et al., 1993, Pandya, 1995, Rockland and Drash, 1996), and the number of purely feedback connections is estimated to even exceed the number of feedforward inputs (Salin & Bullier, 1995). This supposition is supported by findings showing that basic analysis in the early visual areas is influenced by global projections from the higher levels (Porrino, Crane, & Goldman-Rakic, 1981). Even at the lowest cortical processing levels, which have small receptive fields, inactivation of feedback connections to V1 leads to weakened suppression of surrounds in center-surround cells (Angelucci & Bullier, 2003), and feedback connections from V2 seem to shape simple sensitivities in V1 (Shmuel et al., 2005). In addition, under certain viewing conditions, neurons in V1 can react to stimuli outside of what is thought to be their traditional receptive fields (Kapadia, Ito, Gilbert, & Westheimer, 1995) and select neurons in V2 are sensitive to surround stimulation when the surround stimulus has an eccentricity as large as ten times the size of these neurons’ classic receptive fields (Zhang et al., 2005), indicating top-down feedback. Furthermore, varying attentional modulation has been shown to modulate the neural responses as early as in the primary visual cortex (Somers, Dale, Seiffert, & Tootell, 1999). It seems clear that a full understanding of visual processing cannot be achieved until we understand the role and functional properties of the top-down, feedback processes in the brain.

The differences between the feedforward and feedback connections in the visual processing hierarchy have helped to shape our thinking about the role of bottom-up and top-down processes (for the purposes of this review, we consider feedforward projections to be bottom-up, and feedback projections to be top-down). In particular, connections from lower-level regions tend to project to relatively few regions higher in the processing hierarchy, the projections within these regions are relatively focused, and these feedforward projections tend to terminate in the superficial layers of the cortex. In contrast, projections that originate in higher-level regions tend to target many regions in the processing hierarchy, have wider connections patterns within these regions, and terminate predominantly in the deep cortical layers (Angelucci et al., 2002a, Angelucci et al., 2002b). For example, the top-down projection from V5 to V2 is more widely distributed than the focused bottom-up projection from V2 to V5 (Shipp & Zeki, 1989). Additionally, V1 receives feedback projections from a greater number of regions than the number of regions to which it sends ascending projections; e.g., V1 receives top-down projections from IT while sending no direct bottom-up projections to IT (Barone et al., 2000, Boussaoud et al., 1990, Livingstone and Hubel, 1987, Lund et al., 1975, Perkel et al., 1986, Rockland and Van Hoesen, 1994, Shipp and Zeki, 1989, Suzuki and Eichenbaum, 2000, Ungerleider and Desimone, 1986). These asymmetries in the properties of top-down and bottom-up connections suggest an important distinction in the role of feedforward and feedback projections. Because feedforward projections are more focused and restricted, bottom-up processing tends to exhibit a structured build-up of information from simple to complex, with activity from lower regions in the processing hierarchy driving activity in the higher regions. In contrast, due to the more diffuse and dispersed nature of feedback projections, top-down processes coordinate and bias local activity across lower-level regions based on information derived from global, contextual, and gist information.

These differences in the connectivity patterns of top-down and bottom-up projections are reflected by asymmetries in their function. In particular, while bottom-up projections are driving inputs (i.e., they always elicit a response from target regions), top-down inputs are more often modulatory (i.e., they can exert subtler influence on the response properties in target areas), although they can also be driving (Buchel and Friston, 1997, Girard and Bullier, 1989, Sandell and Schiller, 1982, Sherman and Guillery, 2004). Top-down input might need to be at least partly modulatory, because purely driving feedback could induce neural activity in lower-lever regions that would be indistinguishable from that in response to external stimuli (Adams and Rutkin, 1970, Moriarity et al., 2001, Nashold and Wilson, 1970, Vignal et al., 2007), which could give rise to hallucinations (Siegel, Kording, & Konig, 2000). While it may be beneficial to sensitize one’s neural representations of a stove, sink, and a refrigerator when walking into a kitchen, to aid in the recognition of these objects, it would be counterproductive to “perceive” (i.e., hallucinate) those objects without any sensory input from the environment. The tendency of feedforward and feedback projections to terminate in different cortical layers (deep versus superficial) (Bernardo et al., 1990, Garraghty et al., 1989, Jones, 1986, Schroeder et al., 1995) may help in segregating top-down and bottom-up inputs to prevent “network hallucinations”.

A number of theories have described the computational processes that may be employed by top-down mechanisms to facilitate sensory processing (Friston, 2005, Grossberg, 1980, Hinton et al., 1995, Mumford, 1992, Ullman, 1995). These models, taking into account the functional and structural properties of feedforward and feedback connections described above, generally posit that predictions based on prior experience are generated in higher-level areas and projected to lower-level areas to guide the recognition process driven by sensory information. Guidance is implemented by using these top-down predictions to sensitize bottom-up stimulus-driven processing. Therefore, top-down predictions facilitate the recognition process by reducing the number of candidate representations of an object that need to be considered. For example, when trying to identify a stove in a kitchen, prior experience with kitchens may be used to predict that the most likely identity of the square object standing on the floor is a stove or a dishwasher, which eliminates the need to consider a large number of possible object identities.

But what happens when a top-down prediction does not match the bottom-up representation? For example, what transpires in the brain when the square object on the floor of the kitchen is actually a television, when the prediction is that the most likely identity of the object is a stove or dishwasher? Many models posit that top-down and bottom-up information might be integrated via an iterative error-minimization mechanism, with information processed in recursive, interacting loops of activity (Friston, 2005, Grossberg, 1980, Hinton et al., 1995, Mumford, 1992, Ullman, 1995). Specifically, the prediction sent from higher-level regions would be matched with stimulus-generated bottom-up activity, and an error signal is generated reflecting any mismatch between the predicted signal and the actual stimulus-generated activity. If the error signal is substantial, the mismatch is projected to the higher neural region where a new prediction, updated by the error signal, is generated. When the perceptual task is simple, such as identifying clearly presented, commonly encountered stimuli, the initial prediction or set of predictions is likely to match up well with the sensory input, rendering multiple iterations of this cycle unnecessary. However, if the identity of the object has not been resolved in the initial pass, the cycle of matching and refining predictions continues until the identity of the input stimulus has been determined. While these models (Friston, 2005, Grossberg, 1980, Hinton et al., 1995, Mumford, 1992, Ullman, 1995) generally fit into the qualitative framework described above, their underlying mathematical, computational, and proposed cortical mechanisms differ in their details. Below we concentrate on elucidating some of the differences between them.

In the adaptive resonance theory (ART) of Grossberg (1980), the matching between top-down and bottom-up information is accomplished through a pattern-matching algorithm that leads to a stable and efficient representation of the input identity. The prediction patterns that match the sensory-based representations lead to an amplification of the matching neural signal. Conversely, mismatches lead to an extinction of the signal and alert the system that a new pattern must be generated. This new prediction pattern is a combination of the parts of the initial prediction that matched the stimulus features and updated features that improve the fit with the stimulus-driven information (Grossberg, 1980).

Mumford (1992) proposed a mechanism based on an architecture of multiple local top-down/bottom-up loops that traverse adjacent levels in the processing hierarchy. These loops carry both the information that has been successfully predicted by higher-level regions and the residual that needs to be explained in further iterations of the loop or at other levels of the processing hierarchy. In this model, the top-down/bottom-up loops are used both for lower-level visual processes, such as segmentation (e.g., between the level that attempts to segment the input image based on local-feature properties, and the next-higher level that generates predictions about how the larger image should be segmented), and for higher-level processing, such as determining the conjunction between objects. These loops are relatively restricted in how much of the processing network they transverse, like links on a chain, therefore this model posits that processing builds up in increasing complexity. However, in contrast to bottom-up models, the build-up of processing takes into account information based on prior experience in its top-down/bottom-up loops. These loops demonstrate the type of neural architecture necessary for the integration of top-down and bottom-up information. Mumford’s model has been extended with statistical methods (e.g., Kalman filtering, 1960) for addressing various questions such as how receptive fields develop (Rao & Ballard, 1997).

The “counter streams” framework of Ullman (1995) proposes a large-scale bidirectional information flow in which sensory input triggers parallel bottom-up and top-down processing streams. These streams simultaneously activate multiple candidate interpretations of the information coming from the senses, leading to rapid and flexible processing. Recognition is accomplished when a match is found and the top-down and bottom-up “counter streams” meet. This model contrasts with Mumford’s model (1992) in that the top-down and bottom-up streams transverse multiple regions and levels of the processing hierarchy. Thus, Ullman’s model posits that different levels of the processing hierarchy are able influence one another.

Recent work by Friston (2005) employs an empirical Bayesian statistical framework to describe how predictions can guide the recognition process. This framework is largely complementary to Mumford’s work in that it assumes the architecture of top-down/bottom-up loops between adjacent levels of the processing hierarchy. Friston employs a Bayesian statistical framework with the explicit underlying assumption that experience-based information is used to minimize prediction error when processing sensory information. The iterative refining process alters the parameters in the model through Hebbian-like changes in cortical plasticity. According to Friston (2005), experience with a stimulus alters the weights of the top-down and bottom-up connections to reflect the refined representation that results from minimizing prediction error. This learning scheme is able to account for several cognitive phenomena, such as mismatch negativity and priming-related neural response reductions.

A common feature of the frameworks that support a critical role for top-down influences is the need for a mechanism for matching information in top-down and bottom-up processing streams. Typically, these models, as well as other experimental and theoretical work, suggest that the mechanism by which this matching is accomplished is the precise synchrony of the pattern of neural activity across the processing hierarchy (Bar et al., 2006a, Engel et al., 2001, Friston, 2005, Grossberg, 1980, Mumford, 1992, Siegel et al., 2000, Ullman, 1995).

Synchrony between neural activity has been hypothesized to be a signature of integration of neural processing both in local processing, such as for binding local stimulus features, and across disparate neural regions, such as for integrating across modalities or exchanging information across the brain (Engel et al., 2001, Hummel and Biederman, 1992, Simoes et al., 2003, Tononi et al., 1992, Varela et al., 2001).1 In the case of top-down/bottom-up integration, synchronized activity may represent the match between higher-level and lower-level representations of the stimulus while the unsynchronized activity would reflect any residual mismatch, or “error”, in the representations. Therefore, the iterative process of refining the representation until a sufficient representation of the stimulus has been achieved can be described as the process by which the neural activity in higher- and lower-level regions is refined until the patterns of activity between these regions synchronize. For example, in Ullman’s “counter streams” model, when the top-down and bottom-up streams “meet” to achieve recognition it might be reflected in increased synchrony of the activity in higher and lower level regions (Ullman, 1995). Recent results support this view, demonstrating that neural synchrony increases with an improved match between top-down and bottom-up processes (Bar et al., 2006a, Ghuman et al., in preparation, von Stein et al., 2000, von Stein and Satnthein, 2000).

A recent computational model demonstrates a physiologically plausible mechanism describing how synchronized activity may govern top-down/bottom-up integration, and how this interaction can facilitate cortical processing (Siegel et al., 2000). In this computational study, the authors simulated a hierarchical neural network with separate top-down and bottom-up connections and inputs based on empirical in vivo and in vitro findings. A Poisson spike train with added noise was used to simulate naturalistic stimulus input. The key result was that top-down input reduced the noise in the representation of the stimulus compared with bottom-up processing alone. Furthermore, as the match between higher- and lower-level representations improved, the synchrony between the activity in them increased. Additionally, increased synchrony between the activity in higher- and lower-level regions (i.e., when the representations matched) decreased the noise of the neural representation of the stimulus far more than non-synchronized (i.e. non-matching) top-down projections. Moreover, increasing the strength of the top-down signal, particularly if the activity in the higher- and lower-level regions synchronizes, not only leads to less-noisy representations, but also to faster processing. Additionally, the authors also demonstrated how, with two bottom-up inputs (representing two competing stimuli), top-down influences can strengthen processing of one stimulus over another, e.g., in response to contextual or attentional biases (Desimone & Duncan, 1995). This model demonstrates how top-down/bottom-up synchrony could lead to facilitated and biased processing as typically observed with priming and cueing paradigms (Brunia, 1993, Buckner et al., 1998, Dale et al., 2000, Desimone, 1996, Grill-Spector et al., 2006, Jennings and van der Molen, 2005, Wiggs and Martin, 1998, Zago et al., 2005). Indeed, a recent study demonstrates increased synchrony between higher- and lower-level neural regions with facilitated processing in repetition priming (Ghuman, Bar, Dobbins, & Schnyer, 2006).

Finally, in this computational model, because of the physiological differences between the different inhibitory inputs, synchrony between higher- and lower-level regions occurred specifically in the lower frequency bands when top-down influences were present. The lower-frequency-band synchrony for top-down interactions is consistent with differences in the temporal properties of the receptors in the cortical layers where feedforward and feedback projections terminate. Specifically, feedforward projections are mediated through fast GABA-A and AMPA receptors, while feedback projections are mediated by slower GABA-B and NMDA receptors (Salin & Bullier, 1995). Removing the top-down influence in this model suppressed the lower-frequency synchrony and increased higher-frequency synchrony. This result is consistent with experimental results demonstrating increased cross-cortical synchrony in the lower theta-, alpha-, and beta-frequency bands when top-down influences are greater (Bar et al., 2006a, von Stein and Satnthein, 2000) and greater local synchrony in the higher, gamma-frequency bands, when bottom-up processing predominates (Engel et al., 2001, Tallon-Baudry and Bertrad, 1999, von Stein and Satnthein, 2000, von Stein et al., 2000). Additionally, theoretical considerations suggest that the slower temporal dynamics of feedback projections are more appropriate for top-down effects, which tend to be modulatory and prolonged, than for more-transient sensory evoked responses (Friston, 2005). In particular, top-down information, such as the global and context information, is useful for facilitation during the entire processing sequence and can aid multiple processing levels. On the other hand, it is wasteful to maintain the representation of bottom-up information once a more complete representation has been formed. For example, once the identity of an object has been found, it would be inefficient to continue to represent every detail about the object, because many of these details are insignificant (Wiggs and Martin, 1998, Zago et al., 2005).

In summary, an improved match between higher- and lower-level representations can result in improved synchrony in lower frequency bands and facilitated processing, demonstrating how top-down influences can improve processing on a neural level. It is important to note that with small modifications to the assumptions about the underlying neural architecture of the work by Siegel et al. (2000), it could be adapted to fit with the various theoretical frameworks of top-down/bottom-up interactions described above. For example, if the higher-level and lower-level regions in the model are assumed to be in adjacent regions along the hierarchy, this model can describe a computational example of how Mumford’s (1992) mechanism may be implemented. On the other hand, this model also fits with Ullman’s “counter-streams” mechanism (1995) in that the top-down and bottom-up signals are mapped onto separate but interacting neuronal populations. Additionally, if the computations underlying top-down/bottom-up processing fit the assumptions of prediction error minimization and empirical Bayesian statistics, this model could be adjusted in accordance with Friston’s framework (2005). Finally, if these computations adapted the complex pattern-matching algorithm in the ART framework, this computational work could be implemented in Grossberg’s framework (1980).

Section snippets

A model for the triggering of top-down facilitation in object recognition

The theoretical treatments described above provide a framework for the computations that may be involved in top-down/bottom-up interactions in visual recognition. However, they do not specify what information may be used to initiate top-down predictions, which neural regions would be involved, and how information travels through the brain during top-down facilitation of object recognition. A recent proposal by Bar (2003), building on early top-down/bottom-up models (Grossberg, 1980, Kosslyn,

Top-down facilitation of object recognition is modulated by experience

The studies described above demonstrate that top-down facilitation is a critical part of object recognition (Bar, 2003, Bar et al., 2006a). This facilitation typically depends on accumulated experience with the visual world, as the perceiver has to have learned that a particular blurred shape could be attributed to, for example, a desk lamp or an umbrella (Fig. 1). It is possible, of course, that for certain classes of natural stimuli, especially those signifying threat (e.g., spiders or

Cooperation of OFC and the amygdala in processing of emotional stimuli

Stimuli with emotional cues tend to signify biologically important events, which require rapid evaluation and response. For this reason it is perhaps unsurprising that emotional stimuli are processed differently from neutral stimuli (Adolphs et al., 2005, Carretie et al., 2006, de Gelder et al., 2003, LeDoux, 2000, Morris et al., 2001, Morris et al., 1999, Morris et al., 1998, Nomura et al., 2004, Pourtois et al., 2006, Vuilleumier et al., 2003, Winston et al., 2003). Emotional stimuli tend to

The pathways involved in top-down facilitation are affected in schizophrenia

In recent years, important links have been identified between visual processing deficits in schizophrenia and impaired magnocellular function, while parvocellular processing seems to be mostly preserved in schizophrenia patients. Studies using and evoked response potentials (ERPs) in electroencephalography (Butler et al., 2007, Doniger et al., 2002, Schechter et al., 2005) have demonstrated neural and functional deficits in the early visual pathway. Schizophrenia patients exhibit significantly

Discussion

The primary principle that emerges from our proposal and the reviewed findings is that the brain extracts coarse, gist information rapidly, and uses it to generate predictions that help interpret that input. The focus here has been on the specific problem of visual object recognition. However, it seems possible to generalize the same principle to many domains and modalities. To make this generalization, we will introduce a more global framework and terminology.

In this framework, the brain is

Acknowledgments

We thank Jasmine Boshyan for assistance with manuscript preparation. Supported by NINDS R01-NS044319 and NS050615, the James S. McDonnell Foundation #21002039, an award for Dart Neuroscience LP, and the MGH Fund for Medical Discovery.

References (259)

  • D.B. Bednarek et al.

    Latencies of stimulus-driven eye movements are shorter in dyslexic subjects

    Brain and Cognition

    (2006)
  • J.S. Bedwell et al.

    Functional magnetic resonance imaging examination of the magnocellular visual pathway in nonpsychotic relatives of persons with schizophrenia

    Schizophrenia Research

    (2004)
  • I. Biederman et al.

    Scene perception: Detecting and judging objects undergoing relational violations

    Cognitive Psychology

    (1982)
  • R. Buckner et al.

    Functional-anatomic correlates of object priming in humans revealed by rapid presentation event-related fmri

    Neuron

    (1998)
  • J. Bullier

    Feedback connections and conscious vision

    Trends in Cognitive Sciences

    (2001)
  • J. Bullier

    Integrated model of visual processing

    Brain Research Reviews

    (2001)
  • J. Bullier et al.

    Parallel versus serial processing: New vistas on the distributed organization of the visual system

    Current Opinion in Neurobiology

    (1995)
  • L. Carretie et al.

    An electrophysiological study on the interaction between emotional and spatial frequency of visual stimuli

    Neuropsychology

    (2007)
  • L. Carretie et al.

    Cortical response to subjectively unconscious danger

    NeuroImage

    (2005)
  • P.L. Cornelissen et al.

    Magnocellular visual function and children’s single word reading

    Vision Research

    (1998)
  • A.M. Dale et al.

    Dynamic statistical parametric mapping: Combining fMRI and MEG for high-resolution imaging of cortical activity

    Neuron

    (2000)
  • E.A. DeYoe et al.

    Concurrent processing streams in monkey visual cortex

    Trends in Neuroscience

    (1988)
  • M.J. Fenske et al.

    Top-down facilitation of visual object recognition: Object-based and context-based contributions

    Progress in Brain Research

    (2006)
  • C.D. Frith et al.

    The neural basis of mentalizing

    Neuron

    (2006)
  • H.T. Ghashghaei et al.

    Pathways for emotion: Interactions of prefrontal and anterior temporal pathways in the amygdala of the rhesus monkey

    Neuroscience

    (2002)
  • H.T. Ghashghaei et al.

    Sequence of information processing for emotions based on the anatomic dialogue between prefrontal cortex and amygdala

    NeuroImage

    (2007)
  • M.A. Goodale et al.

    Separate visual pathways for perception and action

    Trends in Neuroscience

    (1992)
  • J.E. Adams et al.

    Visual response to subcortical stimulation in the visual and limbic systems

    Confinia Neurologica

    (1970)
  • R. Adolphs et al.

    Amygdala damage impairs emotional memory for gist but not details of complex stimuli

    Nature Neuroscience

    (2005)
  • R. Adolphs et al.

    Fear and the human amygdala

    Journal of Neuroscience

    (1995)
  • T. Allison et al.

    Face recognition in human extrastriate cortex

    Journal of Neurophysiology

    (1994)
  • T. Allison et al.

    Human extrastriate visual cortex and the perception of faces, words, numbers, and colors

    Cerebral Cortex

    (1994)
  • D.G. Amaral et al.

    Amygdalo-cortical projections in the monkey (Macaca fascicularis)

    Journal of Comparative Neurology

    (1984)
  • Aminoff, E., Gronau, N., & Bar, M. (in press). The parahippocampal cortex mediates spatial and non-spatial...
  • A. Angelucci et al.

    Circuits for local and global signal integration in primary visual cortex

    Journal of Neuroscience

    (2002)
  • M. Bar

    A cortical mechanism for triggering top-down facilitation in visual object recognition

    Journal of Cognitive Neuroscience

    (2003)
  • M. Bar

    Visual objects in context

    Nature Reviews Neuroscience

    (2004)
  • M. Bar

    The proactive brain: Predicting for perceiving

    Trends in Cognitive Sciences

    (2007)
  • M. Bar et al.

    The units of thought

    Hippocampus

    (2007)
  • M. Bar et al.

    Top-down facilitation of visual recognition

    Proceedings of the National Academy of Sciences of the United States of America

    (1995)
  • M. Bar et al.

    Humans prefer curved visual objects

    Psychological Science

    (2006)
  • M. Bar et al.

    Very first impressions

    Emotion

    (2006)
  • M. Bar et al.

    Spatial context in recognition

    Perception

    (1996)
  • H. Barbas et al.

    Projections from the amygdala to basoventral and mediodorsal prefrontal regions in the rhesus monkey

    Journal of Comparative Neurology

    (1990)
  • H. Barbas et al.

    Diverse thalamic projections to the prefrontal cortex in the rhesus monkey

    Journal of Comparative Neurology

    (1991)
  • P. Barone et al.

    Laminar distribution of neurons in extrastriate areas projecting to visual areas v1 and v4 correlates with the hierarchical rank and indicates the operation of a distance rule

    Journal of Neuroscience

    (2000)
  • A. Bechara et al.

    Dissociation of working memory from decision making within the human prefrontal cortex

    Journal of Neuroscience

    (1998)
  • A. Bechara et al.

    Deciding advantageously before knowing the advantageous strategy

    (1997)
  • A. Bechara et al.

    Characterization of the decision-making deficit of patients with ventromedial prefrontal cortex lesions

    Brain

    (2000)
  • A. Bechara et al.

    Failure to respond autonomically to anticipated future outcomes following damage to prefrontal cortex

    Cerebral Cortex

    (1996)
  • Cited by (294)

    View all citing articles on Scopus
    View full text