Distributed representations and syntactic structures

Linguistic descriptions consist of symbols for objects, qualities and relations. Through rule-based recombination of these symbols a virtually infinite number of different descriptions can be generated with a relatively small set of symbols. The efficiency of this versatile strategy is further enhanced by chunking. In order to reduce the length of descriptions, frequently occurring constellations are represented by symbols of higher order that summarise sets of component features and their respective relations in a single term. This chunking strategy saves time because it reduces the length of descriptions but it requires increased numbers of symbols. There is, thus, a trade-off between required storage capacity and processing speed. It appears as if the brain uses a similar strategy to represent perceptual objects and motor programs and that it possesses learning mechanisms to optimize the trade-off between storage requirement and processing speed. Perceptual objects can be represented by scripts that consist of symbols for their components and their respective qualities and symbols describing the relations between these items. The same is true for movements which can be decomposed into elementary motion components and a specific set of relations that define the respective combination and sequence of components.

Analyses of the response properties of neurons encountered at the various levels of the processing hierarchy of sensory systems suggest the following sequence of processing steps. Neurons in the sensory organs encode in their responses very elementary, local properties and qualities of the components of perceptual objects and barely any relations. An exception is the retina of the eye. Unlike most other sensory organs it possesses a complex, multi-layered network that permits, firstly, recombination of signals in convergent feed-forward architectures and, secondly, extensive lateral interactions that modulate responses in a context dependent way. Thus, the rate modulated output of ganglion cells does not only signal the presence of a local property of an object, in this case the brightness and spectral composition of a small part of its surface, but also the neighbourhood relations of these particular features. However, chunking at larger scales occurs only once signals are processed at the cortical level. As one proceeds along the hierarchically arranged cortical processing areas, one encounters neurons that respond selectively to increasingly complex constellations of elementary features. It is commonly held that this chunking is achieved by iterative recombination of feed-forward connections from lower to higher order neurons. If the thresholds of the respective higher order neurons are adjusted such that they respond only if the full set of the feeding neurons is simultaneously active, they assume the function of conjunction detectors. Their responses signal not only the presence of certain sets of component features but also the way in which these features are related to each other. The latter information is implicitly encoded in the architecture of feed-forward connections that link selected sets of feeder neurons to higher order conjunction detectors (Tanaka 1997).

Several theoretical arguments suggest that these chunking operations are complemented at each level of processing by additional mechanisms that permit flexible definition of relations among the responses of distributed neurons. Natural scenes, for example, usually contain a large number of different objects, the contours of which may be overlapping or partially occluded. Hence, a single border may be shared by several objects. This introduces ambiguities that need to be resolved in order to provide appropriately sorted signals to the feed-forward chunking circuits. Without such prior sorting it would be difficult to avoid accidental formation of false conjunctions. Thus, responses to contours belonging to the same object need to be grouped together for further joint processing and chunking and they need to be segregated from responses to other objects and the embedding background (von der Malsburg 1999). This selection of “chunkable” responses has to occur in a context dependent way and hence needs to be based on an evaluation of neighbourhood relations. Responses evoked by coherent contours need then be tagged as related in a way that assures their selective binding by subsequent chunking. These grouping operations must occur at early levels of the processing hierarchy because successful scene segmentation is a prerequisite for the later identification of individual objects (Wang 2005). However, context sensitive dynamic definition of relations is also required at the highest levels of processing where neurons are encountered that respond to very complex constellations of features, e.g. the various components of a face, the mouth, the eyes, or the nose. The main argument for the need to flexibly define relations also at these high levels of processing derives from the evidence that perceptual objects are not only represented by individual highly complex chunking neurons but also by distributed assemblies of cells (Singer 1999; Tsunoda et al. 2001). First, chunking neurons whose response properties are sufficiently complex and selective to encode only a single perceptual object are rare and seem to exist only for highly overlearned objects or objects of particular behavioural relevance (Logothetis et al. 1994). Second, it is inconceivable that novel objects can be represented by pre-established chunking neurons because the required feed-forward architectures would have to be specified à priori to support formation of the appropriate chunks. Third, objects that are simultaneously encoded in different sensory modalities elicit responses in several different sensory systems, and these need to be related to each other in order to arrive at a comprehensive polymodal description of this object. These considerations suggest that objects not representable by individual chunking neurons are encoded by assemblies of distributed neurons, each of which represents only a particular component of the object. In assembly coding, however, a relation-defining mechanism is again required that tags responses as related that are evoked by the components of the same object. The reason is that assemblies, just as the above mentioned scripts, consist of symbols representing particular features—here neurons tuned to particular components and qualities of perceptual objects—and relation defining codes that indicate which symbols have actually been recruited into the description of a particular perceptual object.

Synchrony as tag of relatedness

Psychophysical and electrophysiological evidence indicates that attentional mechanisms play an important role in grouping operations both at the level where scene segmentation is accomplished as well as at the higher levels where object identification is thought to occur (Treisman 1999). However, the mechanisms underlying these relation defining grouping operations are still poorly understood. One proposal is that attentional mechanisms modulate the discharge rate of neurons and enhance the amplitude of responses to attended features (Cook and Maunsell 2002). In this scenario the signature of relatedness is the concomitant enhancement of discharge frequency. Responses selected for chunking or the formation of an assembly would be distinguished from all others by their higher discharge frequency. This interpretation has been challenged by the argument that response amplitude may be an ambiguous signature of relatedness because it depends on too many other stimulus related variables, requires long read-out times and makes it difficult to segregate assemblies from one another that are simultaneously configurated within the same neuronal network (Singer 1999). Therefore, it has been proposed that neurons exploit two independent coding strategies in order to convey two messages in parallel: First, they should signal that the feature or the chunk of features for which they code is present, and second, they should indicate with which of the other simultaneously active neurons their responses are related. This latter code should assure that responses tagged as related are processed jointly at subsequent stages, i.e. are routed together into the appropriate chunking channels and/or are recognizable without ambiguity as originating from cells of the same assembly. It is commonly held that the first message is encoded in the discharge rate of the neurons because at lower levels of processing discharge rate reflects reliably different physical aspects of elementary stimuli and at higher levels the presence of complex chunks. Following the discovery that neurons in the primary visual cortex can synchronize their spike discharges with a precision in the millisecond range (Gray and Singer 1989), it has been proposed that the synchronization of responses could serve as the required tag of relatedness (Gray et al. 1989). One way to select subsets of responses for further joint processing, and thus for binding them together is, to selectively raise their saliency. Most models on the binding function of selective attention are based on such a mechanism but they usually assume that saliency is enhanced by rate increases. However, precise temporal synchronization of spike discharges is an equally efficient means to raise selectively the saliency of neuronal responses (Biederlack et al. 2006). The reason is that synchronized input to target neurons has a much stronger impact than temporally uncoordinated input. Simultaneously arriving EPSPs summate much more effectively than temporally dispersed EPSPs, and this coincidence sensitivity of neurons is further augmented in cortical neurons by a number of specific mechanisms: (1) Active dendritic conductances that amplify fast rising depolarisations of large amplitude (Ariav et al. 2003), (2) the frequency adaptation of synaptic release and postsynaptic receptors which attenuates temporal summation of EPSPs (Markram and Tsodyks 1996), and (3) a dependence of firing threshold on the rising slope of depolarisations, favouring responses to fast rising depolarisations (Azouz and Gray 2003). These mechanisms increase selectively the impact of synchronous inputs, and they do so with a temporal resolution in the millisecond range. Thus, relations can be defined within narrow temporal windows (<10 ms), and hence different relations can be encoded with less ambiguity and in much more rapid alternation than if relations were expressed by joint rate increases.

The proposal that precise temporal synchrony is used as a tag of relatedness in neuronal processing agrees well with the temporal sensitivity of mechanisms supporting synaptic plasticity and Hebbian learning. Known mechanisms of synaptic plasticity exploit temporal correlations among the discharges of input connections and/or the discharges of inputs and those of the postsynaptic target cells. The temporal resolution of the mechanism that classifies discharges as synchronous (asynchronous) i.e. related (unrelated), and causes synapses to strengthen (weaken) also operates with a precision in the millisecond range (Markram et al. 1997; Wespatat et al. 2004). Thus, there is a perfect match between the signatures of relatedness used in signal processing and Hebbian learning. This cannot be otherwise because both processes have to rely on the same relation defining code to avoid learning of false conjunctions.

The role of oscillations and spike synchronization

Investigations of response synchronization in the visual system have revealed that precise synchronization of discharges is often associated with an oscillatory patterning of the neuronal responses (Gray and Singer 1989). Because individual cells tend to skip cycles of these oscillations they are rarely detectable in the spike trains of single cells but they are readily seen in data representing the responses of large populations of neurons, i. e. in multiunit recordings or recordings of local field potentials. In vitro experiments in cortical slices and simulation studies have in the meantime established causal relations between the two phenomena (Volgushev et al. 1998; Whittington et al. 2001). The oscillatory patterning of the responses is mainly due to oscillations generated within the various pools of inhibitory interneurons that are coupled both through chemical and electrical synapses and capable of sustaining oscillatory activity patterns. These oscillatory inhibitory inputs to pyramidal cells veto their discharges during the inhibitory troughs and favour discharges at the depolarizing peaks, thus causing synchrony in firing. These locally synchronized oscillatory responses can become synchronized over large distances due to reciprocal coupling of the oscillatory networks via excitatory cortico-cortical connections. It follows from this mechanism that the precision with which spikes can be synchronized increases with oscillation frequency. A relation exists also between oscillation frequency and the distance over which synchronization is maintained. Synchronization among remote groups of neurons or among large assemblies of neurons tends to occur at lower oscillation frequencies than synchronization of local clusters of cells.

The duration of synchronized events

Early studies were based mostly on conventional cross-correlation analysis of cell discharges and/or local field potentials. This method reliably detects synchronous firing if it is sustained over prolonged periods of time but it fails if synchronous events occur only a few times in a response. Therefore, more sensitive measures have been developed that allow assessment of brief events of coincident firing. One of these methods, the unitary event analysis uses statistical methods to identify single, non-accidental incidences of coincident firing (Pipa et al. 2007, 2008), the other evaluates consistent phase relations between the discharges of individual neurons and LFP oscillations (spike-field coherence, Fries et al. 2002). Application of these methods to data obtained from awake behaving animals have revealed that episodes of synchronized firing are often restricted to short epochs of particular behavioural sequences and may be as short as a few tens of milliseconds (Maldonado et al. 2008). This agrees with measurements of the minimal time required to segment scenes and identify objects. It was estimated that the grouping operations required for scene segmentation and object identification should not take more than 10–20 ms per processing stage (Thorpe et al. 1996; van Rullen and Thorpe 2001). This implies that a substantial amount of information about the accomplished grouping must be encoded in the precise timing relations between individual discharges of distributed neurons. The reason is that not much information can be encoded in variations of discharge rates of individual cells, as they can generate only few spikes within such short time windows.

Synchrony and feature binding

Evidence from studies in the visual system suggests that response synchronization may be used throughout all processing stages, from the retina to the highest cortical areas, in order to establish relations among distributed responses, i.e. to bias grouping of responses for subsequent chunking and to tag responses of assembly members as related. In all cases synchronization probability reflects some of the Gestalt criteria that are used for scene segmentation and perceptual grouping. In the retina, ganglion cell responses synchronize with millisecond precision if evoked by continuous contours or coherent objects (Neuenschwander and Singer 1996) and there is evidence from studies on the escape response of frogs, that synchronicity of ganglion cell firing is actually carrying behaviourally relevant information. Retinal synchronization is associated with high frequency oscillations (up to 90 Hz) and based on horizontal interactions within the network of coupled amacrin cells.

In the primary visual cortex synchrony is often associated, especially when it is observed over larger distances, with an oscillatory patterning of spike discharges in the gamma frequency range (30–60 Hz). At this processing stage synchronization probability correlates well with elementary Gestalt rules. It is enhanced between responses evoked by continuous contours, by contours moving with the same speed in the same direction, by collinearly aligned contour segments, and by contours belonging to the same surface (Engel et al. 1991b; Castelo-Branco et al. 2000). It is maximal among responses evoked by coherent patterns such as regular gratings, and it is minimal or absent among responses to incoherent stimuli such as random dot patterns (for review see Singer 1999; Engel et al. 2001). In the cortex, response synchronization among spatially distributed neurons is mediated by the network of tangential horizontal connections, and if it occurs across the midline of the visual field, by callosal connections (Engel et al. 1991a). One of the reasons why synchrony is stronger among responses to continuous or collinearly aligned contours or contours moving in the same direction is the anatomical anisotropy of these tangential connections (Löwel and Singer 1992). Those spanning larger distances connect preferentially columns with similar feature preference. Thus, elementary grouping criteria are implemented in the anisotropies of the network of tangential connections and translated into synchronization probability. In the cat, such stimulus-specific synchronization phenomena have been observed both within and across different visual areas, both within and across hemispheres, and between the visual cortex and the superior colliculus. In primates, especially in the awake behaviourally trained animal, multisite recordings have been applied much less frequently and therefore less data are available on response synchronization. However, the results obtained from primary visual cortex closely resemble those obtained from cats—but oscillation frequencies tend to be higher and the distances over which synchrony is observed tend to be shorter. In the motion sensitive area MT of the dorsal processing stream response synchronization was found to reflect the Gestalt rule of common fate which is one of the strongest binding cue for perceptual grouping (Kreiter and Singer 1996). Presentation of two spatially overlapping bars moving in different directions led to the formation of two distinct assemblies of neurons whereby those responding to the same contour synchronized their discharges, while those responding to different contours, did not. In the inferior temporal cortex of the ventral processing stream, synchronization probability reflected the binding of chunking neurons into assemblies representing individual objects (Tsunoda et al. 2001). Neurons responding to the components of faces (eyes, nose, mouth, etc.) synchronized their responses when the arrangement of these components was such that the animals signalled having recognized a face while they did not synchronize when the components were scrambled or presented in a way that was judged by the animal as incompatible with the appearance of a normal face. Neither in the case of MT nor IT was it possible to distinguish between the various arrangements of the presented stimuli if only the discharge rate of the neurons was evaluated. This is compatible with the interpretation that discharge rate signals the presence of particular features while the correlations among the discharges of neurons indicate how these features are related to each other.

The role of attention

Grouping operations based on elementary Gestalt rules and the binding of the stereotyped feature constellations of highly familiar objects can occur preattentively. This automatic, attention independent grouping is thought to be based on chunking in fixed feed-forward architectures. However, several arguments suggest that synchronization may also serve as mechanism for automatic grouping. The synchronization of retinal responses cannot be influenced by attentional mechanisms as there are no efferent projections capable of conveying the required information. The fact that feature specific response synchronization is readily observed in anesthetized preparations also suggests that binding through synchrony can occur preattentively. Interestingly, and this may turn out to be a feature distinguishing automatic grouping by chunking or by synchronization, automatic grouping by synchrony is highly context dependent while chunking is not. Despite anesthesia, grouping by synchronization remains sensitive to the global configuration of stimuli. In cat areas 17 and 18 synchronization probability changes when variations in stimulus context require a change in grouping, and this reorganization of synchrony patterns occurs even if stimulus configurations are changed in a way that leaves the stimuli appearing within the aperture of the classical receptive fields of the recorded neurons unchanged (Castelo-Branco et al. 2000; Engel et al. 1991b).

In addition to this evidence for attention-independent grouping by synchrony more recent results clearly indicate that synchronization is also highly susceptible to top–down, attention-dependent influences and that it plays an important role in attention dependent response selection and binding (Fries et al. 2001b). Various measures have been used to assess the influence of selective attention on neuronal synchrony: correlations among spike discharges, spike-field coherence, correlations in phase locking between oscillatory field potentials, and finally, the amplitude and the phase locking of oscillatory responses in MEG and EEG recordings. As the amplitude of these latter signals depends to a crucial extent on the synchronicity of large populations of neurons, not only variations in phase locking but also in the power of oscillations can be taken as a measure of synchrony. These data indicate that focussing attention on a particular stimulus or on a particular modality increases the synchrony of responses in the neuronal networks that process the attended stimulus. Again, this enhanced synchronization is associated with and most likely caused by an oscillatory patterning of neuronal activity in the beta and especially in the gamma-frequency range. At the same time one observes a reduction of oscillatory activity in lower frequency bands (alpha, delta). Evidence also indicates that anticipation of a particular stimulus or a motor act is associated with the generation of oscillatory activity in the beta- and gamma-frequency band in cortical areas required for the processing of the stimulus or the execution of the task (Roelfsema et al. 1997; Schoffelen et al. 2005). For tasks involving sensory discrimination and motor responses this anticipatory synchronization can extend across widely distributed networks of cortical areas. This anticipatory modulation of oscillatory activity is usually not associated with major changes in the discharge activity of neurons, suggesting that it consists mainly of a synchronous pacing of excitability through oscillatory activity generated in the network of inhibitory interneurons. It has been proposed that this subthreshold modulation of excitability facilitates rapid synchronization of responses once stimuli are available, thereby enhancing transmission across multiple cortical stages (Fries et al. 2001a, 2007). Recent results from MEG studies in human subjects take this proposal one step further and suggest that the anticipatory induction of coherent oscillations across distributed cortical areas and executive structures facilitates selective routing of activity and rapid handshaking among the involved processing stages. However, at present, it is unknown which centres coordinate this attention-dependent modulation.

In principle synchronization could be used as an alternative mechanism to rate modulations in order to raise the saliency of responses. Experiments on binocular rivalry support this conjecture (Fries et al. 2002). In cat primary visual cortex the responses to the respective perceived stimulus differed from those to the suppressed stimulus because they were more synchronized and not because they were more vigorous. A similar conclusion is suggested by experiments on perceived brightness (Biederlack et al. 2006). If a small grating is superimposed on a large grating, the perceived contrast of the former increases with increasing orientation or phase offset between the two gratings. This effect is closely related to changes of neuronal responses in primary visual cortex. Neurons responding to the small grating increase their discharge rate but not their synchrony with increasing orientation offset while they increase the synchrony of their discharges but not the rate with increasing phase offset. This indicates that the saliency of responses can be enhanced either by increasing the rate or the synchronicity of discharges. The fact that the effects are perceptually indistinguishable illustrates nicely the complementarity of rate codes and temporal codes.

Oscillations and read out

Self generated oscillatory activity and the associated synchronization of spike discharges are likely to also play a role in the read-out of information stored in the architecture of neural networks. Data from multisite recordings and optical imaging have revealed that spontaneous activity is not simply noise but exhibits a high degree of spatial and temporal organization (Arieli et al. 1996; Fries et al. 2001a, 2007). In the visual cortex the spontaneous activity fluctuations are coherent among columns sharing similar orientation preferences, probably because these columns are interconnected more strongly through cortico-cortical projections than are columns with dissimilar preferences. These spontaneous fluctuations have a strong impact on the latency and amplitude of light-evoked responses. Multisite recordings of spiking activity and field potentials from primary visual cortex revealed that columns preferring contours with similar orientation and in particularly collinearly aligned contours engage in highly synchronized oscillations in the gamma-frequency range when the cortex is in an activated state, i.e. when the EEG exhibits high power in the beta and gamma frequency range. The effect of these self-generated coherence patterns is that columns oscillating in synchrony respond with precisely synchronized latencies when activated by light stimuli while response latencies fluctuate unsystematically and over a wide range for columns that have not been oscillating in synchrony prior to light stimulation. Therefore, the output of columns coding for features that tend to be grouped perceptually (same orientation, collinearity) is more synchronized than the output of columns coding for features that are less likely to be grouped (Fries et al. 2001a). Thus, self-generated gamma oscillations translate the anisotropies in the network of horizontal connections into spatially selective patterns of coherence which in turn bias grouping by rapid synchronization of the very first components of responses to contours. Further support for this notion comes from multisite recordings in V1 of monkeys trained to freely inspect complex visual scenes. Shortly after the onset of fixation (40–100 ms) one observes a brief burst of highly synchronized high frequency oscillations in the local field potential that are precisely phase-locked across recording sites. These fixation related oscillations are in turn associated with excess synchronization of spike discharges in the responses to the contours of the scene. As these oscillations occur also when the animal scans a blank screen, they are most likely due to corollary activity that is generated in anticipation of having to process new constellations of features once a new segment of the scene is fixated (Maldonado et al. 2008). In analogy to the effect of the spontaneous oscillations this self-generated coherent activity could serve the read-out of grouping criteria residing in the network of tangential connection and to translate these criteria into specific synchronization patterns. The saccade-related oscillations and the associated spike synchronization precede by several tens of milliseconds the peak of the neurons’ rate responses which reach a maximum only around 100 ms after the eyes have come to rest. Grouping cues encoded in latency adjustments and spike synchronization are thus available long before the changes in the neurons’ discharge rate can be fully evaluated. Such rapid processing at early stages of the visual system appears desirable given that the animals changed gaze direction on average 4–5 times in a second. This implies that scene segmentation, the eventual resolution of ambiguities, the selection of signals for chunking and the subsequent dynamic grouping of chunks into object representations must have been accomplished within about 200 ms.

Conclusions

The data reviewed in this chapter suggest that sensory systems exploit two complementary ways to evaluate and represent relations between features of perceptual objects. One strategy consists of the generation of specialized neurons in feed-forward architectures that respond selectively to particular constellations of features. The discharge rate of these chunking neurons encodes both the presence of particular features and the way in which they are related to each other. This coding strategy is fast but can encode only relations defined a priori by the convergence patterns of the feed-forward connections. As multisite recordings suggest, there is a second strategy that exploits the precise temporal relations between the discharges of distributed neurons to encode relations. This mechanism permits flexible and context-dependent definition of relations. It exploits the coincidence sensitivity of neurons and uses precise temporal synchronization of discharges as tag of relatedness. Interestingly, the same tag appears to be exploited by the mechanisms mediating use-dependent synaptic plasticity and associative learning. As synchronization enhances the impact of the synchronized responses, it appears to be used not only to define relations among distributed responses but also in a more general way to select responses for further processing to raise their perceptual saliency, and to support selective rooting of activity under the control of attentional mechanisms. Because temporal codes can only be assessed with multisite recordings and because these have a relatively short history, we are still at the beginning of understanding coding strategies based on the dynamic interactions among large numbers of neurons. It may turn out that precise synchronization is only one, albeit a very important signature of the many potentially significant dynamical states. Precisely timed phase offsets and sequences of patterns defined by specific temporal relations are likely to play an equally important role (Fries et al. 2007). To analyse these more complex patterns and to examine whether they contain information that can be related to behaviour is one of the great challenges in future Systems Neurobiology.