Main

Among all the remarkable tasks that our visual system can perform, it would seem that locating something should be one of the easiest, especially if this 'something' is a mere spot of light, which, unlike a real object, requires no processing for identification. We know that visual cells in the brain have a receptive field, opening a kind of window on part of the world. So why is the selective activation of a set of these cells not sufficient to indicate where a spot of light is located? Many psychophysical experiments, old and new, tell us that visual localization is not that simple. Systematic errors occur, particularly in the presence of motion, whether of gaze or of the objects in our field of vision. Whenever things change, accurate timing of events becomes crucial. Under these circumstances, localization deteriorates because the visual system is too slow. Long and unreliable delays are involved in processing visual information in the retina and, ineluctably, such delays result in spatial errors. Here we consider three conditions in which localization errors happen — some unnoticed, others giving rise to strong illusions. These three conditions are smooth pursuit of a moving object, presentation of a flash in the dark just before a saccadic eye movement, and presentation of a flash in the presence of a moving stimulus. These conditions are usually the preoccupation of specialized research groups, perhaps not always aware that they are facing the same type of problem. Bringing together their findings, and the ideas that they generate, might provide a common perspective.

Mislocalization during smooth pursuit

When ocular smooth pursuit is perfect, eye velocity matches stimulus velocity and, at any instant, the eyes point accurately at the physical location of the pursued stimulus. Were it otherwise, the image of the stimulus would slip off the fovea and, automatically, this position error would trigger a corrective saccade. Remarkably precise, continuous aiming is achieved despite the fact that there is a long delay between the occurrence of a visual stimulus and its perception by the brain. Hazelhoff and Wiersma1 referred to this delay as 'perception time'. Some of it is consumed by processing in the retina and the rest by processing in visual areas. In macaques, responses to visual stimuli can be recorded after 60–100 ms in several cortical visual areas, including the primary visual cortex (V1)2. However, there is no good test to determine where 'visual perception' occurs. In Fig. 1a, the progression of the visual signal at three stages during smooth pursuit is summarized. On the left, the physical signal of target T impinges on the retina at time tn–x. In the centre, the signal has been processed by the retina. This has taken 40 ms, during which the eye has moved as shown. On the right, the image T of the target T, as it was at time tn–x, reaches the hypothetical level of perception. But now the eye is looking at Ttn.

Figure 1: Effect of visual delay in smooth pursuit.
figure 1

a | Schematic representation of the progression of the visual signal generated by moving target T while the eye smoothly pursues it. From left to right are shown three selected snapshots in the course of this progression. On the left, the physical signal of target T impinges on the retina at time tn–x. In the centre, the signal has been processed by the retina. This has taken 40 ms, during which the eye has moved as shown. On the right, the image T of the target T, as it was at time tn–x, reaches the hypothetical level of perception. But now the eye is looking at Ttn. b | Measurement of mislocalization during smooth pursuit. During pursuit from right to left, a spot of light (flash 1; upper trace) is presented in perfect alignment with moving target T. A second marker (flash 2; lower trace) appears during the return trip of T (from left to right); the subject's task is to position flash 2 with a mouse so that its location coincides with that at which flash 1 was perceived. The subjects never see the two markers at the same time. Subjects reveal their misperception by placing the markers apart. The marker separation theoretically represents the sum of two position errors (2x ms): the mislocalization when the target moves in one direction plus the mislocalization when the target moves in the opposite direction. Adapted with permission from Ref. 5 © 2001 Elsevier Science.

Because of the delay, when perfect smooth pursuit keeps our gaze on a moving stimulus, what we perceive at a particular point of its trajectory cannot be the stimulus as it is physically at that point. Rather, we perceive it as it was x ms earlier1,3,4. Imagine that something happened to that stimulus x ms ago (for example, it changed colour, size or shape for a brief instant, or it simply vanished); if the brain matches the perception of the event with the site at which the eyes are now looking, the stimulus change will be seen at the wrong place (ahead in the direction of movement in this case). This is what is meant by 'mislocalization', and is the focus of this review.

In fact, this kind of mislocalization does occur during pursuit. To show it, subjects can be asked to follow a target that is moving predictably back and forth, in a straight horizontal line at a sinusoidal speed (Fig. 1b). To perform the test correctly, the subject's head is immobilized, and the experiment is run in the dark to avoid interference from visual cues (an important point, as will be shown below). During pursuit from right to left, a spot of light (flash 1), used as a marker, is presented in perfect alignment with the moving target T (upper trace in Fig. 1b). A second marker (flash 2 in the lower trace) pops up during the return trip of the target T (from left to right), and the subject's task is to position flash 2 with a mouse so that its location appears to coincide with the location at which flash 1 was perceived. Note that the subjects never see the two markers at the same time. Subjects would be able to match the positions of the markers perfectly if there were no target to pursue. But, in this task, they reveal their misperception by placing the markers apart (Fig. 1b). The marker separation theoretically represents the sum of two position errors: the mislocalization when the target moves in one direction plus the mislocalization when the target moves in the opposite direction. Brenner and his collaborators5, who devised this experiment, found that each localization error corresponds to a distance of pursuit travelled in 100 ms. They conclude that the brain probably does not compensate for the delay in the sensorimotor loop that controls pursuit.

Figure 1a illustrates only the visual delay afferent to perception, but there are other, smaller delays in the sensorimotor loop. We did not include them for two reasons. First, we do not know where the visual signal of 'what-we-see' and the corresponding signal of 'where-our-eyes-are-pointing-to-at-that-time' converge in the brain. Second, we do not know where the signal of eye position originates and how much time it takes to access it. It might be a feedback signal (corollary discharge) arising close to the point of movement initiation or further downstream, closer to the motor neurons, or else it could be a sensory (proprioceptive) feedback signal from the muscles that execute the eye movement. These feedback loops differ in length6. Neurons carrying signals that are modulated as a function of the eye position in the orbit have been recorded in several structures. Those that are found in the dorsomedial aspect of area MST (in the medial superior temporal sulcus in macaques; Fig. 2) are the most likely to be used in pursuit7. If the pursuit system conforms to the general rule, the synchrony of events in the world is judged by the brain on the basis of the synchrony of the inputs that signal these events. This is known as the Paillard–Fraisse hypothesis8,9.

Figure 2: Interconnections between oculomotor centres for saccades and smooth pursuit.
figure 2

From top to bottom: parallel cortico-cortical networks for controlling smooth-pursuit eye movements and saccades, proposed by Tian and Lynch105 on the basis of anatomical and physiological studies in monkeys105,106,107,108,109,110,111. The upper diagram represents a complex communication network that keeps pursuit and saccade channels separate between nodes, but which favours intranode interactions. The latter (not shown) are indicated by the systematic representation of the two oculomotor subsystems in the main eye fields. Reciprocal connections are indicated by green lines for pursuit and red lines for saccades. Arrows from the upper diagram to the brain point to the location of the main cortical oculomotor centres. In a window that is opened to visualize the brainstem are shown: the central thalamus (TH), which relays eye-position signals from subcortical structures to cortical centres112; the superior colliculus (SC), which receives descending projections from the frontal eye field (FEF), the supplementary eye field (SEF) and the parietal eye field (PEF–LIP); the nuclei that innervate the extrinsic eye muscles — III (oculomotor), IV (trochlear) and VI (abducens); and the main pre-oculomotor centres, which carry segregated signals for pursuit (for example, the dorsolateral pontine nuclei) and saccades (for example, the paramedian pontine reticular formation, or PPRF). Functional relationships between saccades and smooth pursuit are discussed in Ref. 113. 7m, medial parietal visual area 7; DM, dorsomedial visual area; LIP, lateral intraparietal area; PSR, principal sulcus region. Adapted with permission from Ref. 105 © 1996 American Physiological Society.

Localization errors during smooth pursuit illustrate the point that synchrony of signals within the brain does not match synchrony of the corresponding events in the outside world. Because the speed of nervous signals is not infinite, we need to be specific when we talk about the 'present time' or 'simultaneity'. Do we mean present time in the world, or time as we perceive it at present? Do we mean simultaneity of events in the world, or simultaneity of their signals in the brain10? In the next two cases, this point will become crucial as we realize how easily uncertainty in timing translates into uncertainty in spatial (and possibly other) dimensions.

The experiment described above characterizes one of two types of localization. When subjects had to indicate where a moving target was at a particular time, specified by a marker, they had to rely on the stored representation of where they were looking at that time. As this localization is made with respect to the observer, it is referred to as 'egocentric'. Egocentric localization should be distinguished from another type of localization — known as 'allocentric', 'exocentric' or 'relative' — which relies on the spatial relationships between spatially and temporally close visual cues.

Perisaccadic mislocalization

Whereas smooth pursuit allows us to track targets that are moving at reasonable speeds, such as aeroplanes, saccades are abrupt movements that point the eyes as fast as possible in a new direction. The two mechanisms differ in several respects. Smooth pursuit is initiated by stimulus velocity (with some acceleration boost11), whereas saccades are programmed to reach a goal. Smooth pursuit operates in a CLOSED LOOP, but saccades cannot, because they are terminated before the visual feedback of the eye displacement has time to reach the brain. These mechanisms are separate anatomically (Fig. 2). At the cortical level, pursuit is controlled mainly by area MST, and saccades to visual targets are controlled mainly by the frontal eye field (FEF). Both areas have separate projections to subcortical structures such as the superior colliculus and pontine nuclei. However, MST and FEF are linked by reciprocal connections with other visual–oculomotor centres, which form nodes in a cortical network that programmes saccades and smooth pursuit (Fig. 2).

Like smooth pursuit, saccades can cause stimulus mislocalization, but, curiously, this mislocalization is not limited to the brief moment when the eyes are in motion. It extends from at least 200 ms before to 100 ms after the saccade. 'Perisaccadic' mislocalization seems to be especially paradoxical when a spot of light is flashed in the dark just before the eyes start to move. Even though both the eyes and the stimulus are stationary at that time, subjects perceive the stimulus as being far from its actual location (in the direction of the saccade by as much as 70% of its amplitude12,13). In the following discussion, we consider stimulus presentations that precede saccades rather than accompany them, because, in the latter case, physical displacements of the eyes blur the image14, making it difficult to distinguish between mislocalization due to mechanical disturbances and that attributable to neural processes.

Measurements of perisaccadic mislocalization were first made in the 1960s15,16. At first, it was thought that this mislocalization was strictly perceptual (illusory) and that movements towards the stimulus (either saccades17 or hand pointing18) remained accurate. However, more recent studies found practically no difference in time course between perceptual errors (by verbal report or comparison with a visual reference) and targeting errors (by looking at the stimulus site12,13,19,20,21,22or pointing to it23) in experiments performed in total darkness.

To understand the possible source of perisaccadic mislocalization, it is useful to ask how the brain can localize a stimulus in total darkness when the eyes move between the appearance of the stimulus and its capture by a saccade. The classic experiment designed for this purpose is called the 'double-step' (Fig. 3). While the subject fixates point F, two targets, A and B, are flashed in succession within, say, 150 ms. The subject is instructed to look successively at the target sites. As the saccade latency is longer than 150 ms, both stimuli are already off before the first saccade starts. Looking at the first target site is straightforward; it can be done solely on the basis of stored retinal information. Indeed, saccade 1 equals retinal vector 1 in Fig. 3. However, looking at the second target is more difficult, because the eyes are no longer at starting point F, from which the retinal coordinates of the second target (retinal vector 2) were obtained17. There are two possible solutions. The allocentric solution is to store in memory the spatial relationship between A and B, which can provide the vector A→B. The egocentric solution is to sum the retinal coordinates of stimulus B (retinal vector 2) with the coordinates of the effected eye displacement to A (saccade 1) to provide the coordinates of stimulus B in space. The egocentric solution seems to be the only one possible in the double-step situation when subjects perform this task in the absence of visual cues (for example, when saccade F→A is executed voluntarily with no stimulus being presented at point A21). There is good psychophysical evidence for the existence of an internal signal of eye position or eye displacement in the brain24,25,26,27. However, we do not know exactly how the computation that involves an eye-position signal (EPS) is implemented. Two main hypotheses have been offered: vector subtraction (postsaccadic28,29 or presaccadic30) and neuronal network31.

Figure 3: Layout of the double-step task.
figure 3

A subject fixating point F has to make saccades successively to sites A and B, where spots of light have been flashed in the dark. How is the trajectory of saccade 2 programmed? If based only on the retinal vector 2, it should terminate at C.

When, in the double-step, the second stimulus is flashed shortly before the saccade, its signal might reach the brain after the saccade. Indeed, the afferent visual delay can be as long as 100 ms, whereas a 20° saccade usually lasts less than 50 ms. Such a difference is enough to produce a perisaccadic mislocalization: in fact, it would not be easy to explain why mislocalization should not happen. Human observers can easily locate a brief spot of light that is presented in the dark more than 300 ms before a saccade. They can report its position, look at its site and point to it. But when, in successive trials, the stimulus is presented closer and closer to saccade onset, a localization error develops and grows both in size and in variability. The error peaks with flashes presented at saccade onset, and then sharply declines throughout the duration of the movement and afterwards (Fig. 4a,b).

Figure 4: Steps involved in determining the time course of the hypothetical eye-position signal.
figure 4

a | Saccade trajectories in xy coordinates for a single experimental trial. Dots represent the eye position sampled at 2-ms intervals. The initial saccade is from the fixation point F to the initial target I. The second saccade is the subject's attempt to localize the test flash T (presented when the eye was at the point of the trajectory marked 'Actual'), resulting in an error of roughly 12°. Graphically subtracting the retinal vector (dashed line) from the end point of the second saccade indicates the value of the hypothetical eye-position signal (EPS), marked 'Int. rep' for internal representation. b | Magnitude of the localization errors from a single subject, plotted with respect to the delay between the presentation of the test flash and the onset of the initial saccade (at time 0 ms). Positive errors are errors in the direction of the initial saccade. The solid line depicts the time course of a typical initial saccade. The open circle represents the error value found in the trial shown in a. c | Time course of the hypothetical EPS, obtained by subtracting the actual time course of the saccade from the time course of mislocalization errors shown in b. d | Best-fit sigmoidal curves (blue) showing the time course of the hypothetical EPS for four subjects. Adapted with permission from Ref. 13 © 1993 American Psychological Society.

In the dark, the internal signal that represents eye position is crucial, because no other cues are available. In the past, this signal was often called 'extra-retinal', a label that (unfortunately) stresses what it is not. Here we call it 'eye-position signal' or 'EPS' in a general sense, without implying that it represents an absolute position or a relative displacement. As mentioned before, this signal could arise from an EFFERENCE COPY of the motor command or from proprioceptive (sensory) feedback of the movement. It could also be a signal of the intended saccade. The hypothesis of a signal of intended movement is interesting because it can account for anticipatory responses that have been recorded from neurons in the parietal cortex32, FEF33 and superior colliculus34, which occur before a stimulus site has entered the cells' receptive fields.

If a stimulus before a saccade is mislocalized, the error must arise from the visual signal, the EPS or the temporal match of their combination. The visual signal is not likely to be in error. Making this reasonable assumption, one can try to recover, from recorded data, the time course of the hypothetical EPS by subtracting the retinal coordinates of the stimulus from their coordinates in space (Fig. 4c,d). This analysis reveals that instead of reproducing the real time course of the saccade, the hypothetical EPS is considerably distorted: it begins 200 ms before the saccade itself begins and it is stretched in time12,13,19,20,22,23,35 (Fig. 4d).

Is the EPS really damped (stretched in time), or does it just appear to be so because noisy data have been pooled together? No obvious neurophysiological constraints can explain the signal distortion36. We know that the nervous system can handle signals of much higher frequencies than those of saccades. The explanation for the mislocalization might lie in the way very brief visual stimuli are processed. Because such stimuli have long visual persistence, the determination of their occurrence might be artificially delayed37,38. Another explanation could be that the time course of the EPS is correct, but the brain makes random errors about the timing of the stimulus onset22,39. Timing errors are common40 and they occur in other movements; for instance, of the hand41. A merit of this timing-error hypothesis is that it accounts not only for the extended time course of the errors, but also for their wide variation.

Only visual events that have a clear onset near the time of a saccade are mislocalized. If a stimulus is flashed 10 ms before a saccade, it is perceived far away from its real location, but if a stimulus is flashed 200 ms before a saccade, it is localized accurately. How do we perceive a stimulus that starts 200 ms before a saccade and is turned off 10 ms before the saccade? Does it appear to be moving from an initial site, which is veridical, to a final site, which is illusory? Fortunately not, otherwise the world would appear to slide with that saccade. That the stimulus is not perceived to move is strong evidence that the brain does not continuously update the position assignments of objects around us. If they are continuously visible and do not actually move, by default, it is simpler for the brain to assume that they are still where they were last seen. This is one of the main bases of SPATIAL CONSTANCY. It also means that the brain does not need to access the EPS constantly.

It is sufficient for the EPS to be 'sampled' only once: when a new object appears22. Hershberger has provided a nice demonstration of this sampling42. In his experiment, a stimulus is flashed at a frequency (for example, 120 Hz) that is well above the critical flicker frequency. Therefore, the stimulus is seen as a continuous light as long as the eyes are immobile. But as soon as a saccade starts, the image of the stimulus is displaced on the retina. Immediately, the first flash jumps in the direction of the saccade, and then the subsequent flashes gradually return the stimulus image to its veridical position. At first sight, one might think that perceptual mislocalization starts only with the saccade. However, before the saccade, the EPS must not have been sampled because the stimulus onsets, above critical flicker fusion, were too fast to trigger sampling.

Many neurophysiological single-unit studies concern the double-step paradigm, but we found only one43 that dealt with the stringent timing conditions that produce mislocalization. Practically all double-step experiments show that eye-movement neurons, which discharge in preparation for a saccade of a given amplitude in a particular direction, do so in most circumstances, and those include the second of the double-step saccades. This seems to be true whether these neurons are recorded in the superior colliculus, thalamus, basal ganglia or cerebral cortex. Other neurons, called 'quasi-visual', fire tonically when the site of a previous visual stimulus (which is no longer visible) is brought into their receptive field by a saccade. The existence of such neurons indicates how the nervous system can keep a memory trace of goal locations44.

Microstimulation in oculomotor centres has been important in the study of experimental situations that are similar to the double-step and perisaccadic mislocalization in monkeys. In some central oculomotor structures, such as the superior colliculus and FEF, microstimulation triggers a saccade of a given vector (amplitude and direction) that is specific to the stimulated site. Long ago, it was noticed that if two of these saccades are evoked in rapid succession, the trajectory of the second saccade is often deviated45. Sparks and Mays46 studied the interaction between successive saccades, one visually guided, the other electrically evoked by stimulation of the superior colliculus. They observed that the second saccade is redirected by the first, exactly as in the double-step paradigm. The modified trajectory has all the characteristics of 'compensation' for the deviation in eye position introduced by the first saccade. When the second saccade is evoked by electrical stimulation, this implies that stimulation specifies a goal, not a trajectory47.

If this is so, it might be possible to replace a visual target by electrical stimulation to produce a goal in a double-step task, and try to induce perisaccadic mislocalization47,48,49. The advantage of this strategy is that an electrically evoked signal, which is theoretically equivalent to the visual signal of a real target, bypasses the whole afferent visual pathway, including the retina. In this way it was shown that electrical stimulation, applied at different times during and after natural saccades, evokes compensatory saccades that vary in their dimensions depending on the structure stimulated and the exact timing of stimulation47,48,49,50,51,52. From data collected in the FEF49, the time course of the EPS was computed using the same method as applied in experiments performed with visual targets. The results show the same damping of the EPS as in behavioural experiments. However, the EPS starts earlier (that is, it appears to precede the actual saccade by as much as 50 ms), as expected if the afferent visual delay is bypassed. These experiments confirm the results of perisaccadic mislocalization by using an excitatory drive applied directly within neural structures. They also support the view that the length of the visual afferent delay has an important role in perisaccadic mislocalization.

In the psychophysical experiments discussed so far, human subjects were in total darkness and made saccades to flashed targets. This might be a scientifically useful situation, but certainly not one that fits any realistic set of circumstances. In everyday experience, visual cues are abundant and human subjects rely heavily on them (although with notable individual variations12,53). Two main effects are observed when allocentric visual references are introduced in localization tasks. First, visual cues reduce the systematic bias of mislocalization in the direction of saccades, which is maximal in complete darkness21. This reduction indicates that the brain can combine the use of egocentric and allocentric cues in its spatial computations21,54, and often settles on a compromise between what egocentric or allocentric information alone would advise55. Even though egocentric and allocentric maps in the brain are usually regarded as different and separate, they have much in common. Perhaps an egocentric map is nothing but an allocentric map with a privileged pointer to mark the site on which the eye fixates, and perhaps also with other pointers to indicate how the head is oriented and where the hand is placed. This concept of pointers has already been used in modelling56, but we do not know how the brain would implement the concept of a pointer.

The other observation is that allocentric information itself is strongly affected by the imminence of a saccade. When visual cues are presented in the dark at different times before the movement, their spatial relationships are disturbed57, as predicted by the temporal course of the mislocalization effect that we have just described. Indeed, as mislocalization increases when a stimulus is flashed closer to saccade onset (Fig. 4d), two spots of light that are flashed successively in the dark just before a saccade will appear to be unequally displaced: the second one will be more displaced than the first in the direction of the saccade. However, in a lit environment, a further effect is observed: a general compression of visual space58,59,60,61, which, unlike unidirectional mislocalization, cannot be accounted for simply by visual delays. One of its most dramatic demonstrations is obtained by presenting four equally spaced vertical bars just before a horizontal saccade. The subjects perceive the four bars straddling the saccade target as being fused into one59,61. In addition, for reasons that are not yet understood, the contraction of visual space is asymmetrical, depending on the direction of the saccade15,58,59,62. The critical period during which allocentric cues have this effect is after the saccade60 and, indeed, there are other indications that the postsaccadic period is very important to clear, refresh or update presaccadic information39,63. Saccades appropriately provide occasions to sample the environment. They are crucial moments in which to renew our contacts with the world. As MacKay suggested, we should think of saccades as questions.

In summary, a saccade is a very destabilizing period during which the eyes are moving at high speed. Suddenly, the brain has to shift its egocentric reference, which is its only available cue when movements must be programmed towards a goal in the dark. A long and unreliable afferent visual delay perturbs the timing for combining the visual signal and the EPS, an operation that is crucial for the localization of stimuli. The same kinds of problem that occur in smooth pursuit are encountered here, but they are exacerbated because a saccade is so fast that a mismatch of a few milliseconds can lead to an error of several degrees.

The flash-lag effect

If there is uncertainty about the time (and therefore the location) of an event (for example, a flashed spot of light) when the retina moves40, is there also uncertainty when the stimulus moves (but not the retina)? Several illusions involving stimulus motion have been reported that are strikingly similar to perisaccadic mislocalization (for example, Refs 64–66). In the 1970s, MacKay used to show such an illusion to large audiences. After turning off the room lights, he swung at arm's length the control box of a Grass photostimulator while, with the other hand, illuminating the front panel of the box with a stroboscopic lamp flashing at a rate of 3 Hz. There was a continuously visible red pilot light on the front panel of the Grass box. Everyone in the audience could clearly see the pilot light dancing in and out of the intermittently lit front panel of the box. The red pilot light appeared to be completely dissociated from the panel in which it was embedded.

Today, this illusion is best known by the name of 'flash-lag'. In the flash-lag effect, the perceived spatial relationship between a moving object and a stable one that is briefly visible appears to be distorted. In 1994, Nijhawan67 renewed interest in this phenomenon by designing an ingenious set-up that not only produces the misperception, but allows its objective measurement. The set-up is a rotating bar made of three light segments (Fig. 5). The middle segment is continuously visible in the dark, and the outer two segments are briefly flashed when all three segments are aligned. The two flashed segments appear to lag behind the middle one, as shown on the right in Fig. 5. Most recent experiments have used a similar design with either a rotary or straight, forward-moving stimulus. It is worth noting that the display that consists of a straight, forward-moving stimulus does not differ from the one used in the smooth-pursuit experiment described earlier (Fig. 1b). The only difference is in the instruction to the subjects: instead of pursuing a stimulus, they now have to fixate a steady point while the stimulus moves across the retina.

Figure 5: Nijhawan's flash-lag set-up.
figure 5

A bar made up of three aligned luminous segments rotates clockwise at 30 r.p.m. The middle segment is continuously lit, whereas the external segments are strobed. The segments are perceived to be unaligned (as shown on the right). Adapted with permission from Nature (Ref. 67) © 1994 Macmillan Magazines Ltd.

In the past seven years, the flash-lag has been the object of an intense controversy. First, noting that the brain is getting only outdated information on the position of a moving object (as in smooth pursuit), Nijhawan67,68 interpreted the flash-lag as an attempt at 'latency correction' by the brain to compensate for long visual delays. He argued that to catch a moving object (such as a tennis ball), we cannot rely on the sluggish visual system to tell us where it is. Therefore, he proposed that the brain 'extrapolates' the location at which the moving object is perceived. Clearly, this is one of the most daring proposals of a top–down hypothesis. Here, 'top–down' means that somehow the brain is attempting to correct its own input from the senses (with bold disregard for the venerable: “Nihil est in intellectu quod prius non fuerit in sensu”).

The parsimony principle does not very well tolerate top–down hypotheses unless all 'bottom–up' options have first been eliminated. Bottom–up hypotheses were soon advanced, invoking early visual mechanisms such as differences in visual latency69,70,71,72 or attention73. We know that physiological latencies vary in the visual system. For instance, they are shorter the greater the stimulus luminance69 and the closer the stimulus is to the fovea. Also, in terms of perceptual awareness (which might involve processing time74), there are latency differences75,76, such that colour might seem to be perceived before orientation, which is perceived before motion77. Along these lines, it is possible to explain the flash-lag effect as a consequence of differences in latency, if one assumes that the latency of a stationary stimulus (a flash) is longer than the latency of a moving stimulus69,70,71. There are good justifications for this assumption, including characteristics of retinal processing78. A model based on latency differences works as follows. Imagine that the brain regards as simultaneous events that produce signals that arrive at the same instant9,79. If latencies are unequal, visual signals judged to be simultaneous must have departed at different times. Being faster, the signal from a moving stimulus would have left later; that is, from a more recent position along the stimulus trajectory. This is how an error in time could lead to an error in space.

Several variants of Nijhawan's original experiment were designed to determine how the illusion survives the removal or modification of single elements of the original set-up (such as the flash, the stationary stimulus or part of the moving stimulus). Presenting sequences of flashes, Lappe and Krekelberg80,81 showed that the flash-lag effect decreases when the number, duration, frequency or predictability of the flashes are increased. The effect depends on whether the moving stimulus is perceived as an object of which the flash is a part82. The magnitude of the illusion changes if the velocity of the moving stimulus changes after the flash83, and the illusion completely disappears if the moving object is blanked immediately after the flash, clearly indicating that the critical period is the one that follows the flash70,71,84. On the other hand, the illusion remains intact if the flash is replaced by a continuously lit stimulus and the moving stimulus transiently changes (in size, luminance or colour) when it passes in alignment with the stationary stimulus. In this case, it is still the moving stimulus that appears to be ahead, even though this perception no longer depends on a comparison with a flashed stationary stimulus. One can even remove all stationary references and use two vertical bars that move, one above the other, in opposite directions. If both change transiently (in size, luminance or colour) exactly when they cross each other, the two transient changes appear to be displaced with respect to each other in the direction of their motion85. Clearly, it is not the flash that lags, but the changing stimulus that leads86.

More general questions can be asked. Is real stimulus motion on the retina essential for the flash-lag effect, or can illusory motion produce it? Extended viewing of moving patterns induces a motion aftereffect that causes the perception of motion in the opposite direction in the portion of the retina that has been adapted. If two parts of the retina are adapted to patterns moving in opposite directions, two stationary patterns presented afterwards appear to be displaced with respect to each other in the direction of the illusory movement87, as the flash-lag effect would predict. In another study88, subjects wear a helmet on which the stimulus display is attached in front of the eyes. The subjects fixate the display, which consists of a vertical bar with the upper third flashing and the lower two thirds continuously lit. Its displacement is linked to the movement of the head. Subjects are invited to oscillate their heads horizontally, and soon, in the dark, they perceive the flashing and continuous parts of the stimulus display as being dissociated from each other, just as in MacKay's experiment reported earlier. In a parallel study89, subjects sit in a vestibular chair in darkness and are rotated at 100° s−1 for 1 min while they fixate a stimulus display in front of them. The display, attached to the chair, is similar to the one just described, but contains five vertical flashing lines (Fig. 6a). Subjects are required to decide which of these five flashing lines is aligned with a continuously lit long line. Again, subjects perceive a flash-lag in the seconds that follow the start or stop of their body rotation (Fig. 6b). Most remarkable is the effect triggered by stopping the chair rotation abruptly because, at that particular instant, all motion has stopped (of the eyes, body and stimulus), but the feeling of rotation persists and with it the illusion.

Figure 6: Flash-lag induced by rotation in a vestibular chair.
figure 6

a | Scale reproduction of the actual display of light-emitting diodes (LED), and the perceived misalignment immediately after clockwise rotation has stopped. This represents a misalignment of 22 arcmin. b | Misalignment data from three subjects in a chair rotating at 100° s−1 (blue diamonds) and 60° s−1 (green circles). Time starts at the beginning or end of rotation. Adapted with permission from Ref. 89 © 2000 Springer–Verlag.

Finally, is movement itself even necessary? It can be replaced by the repetitive presentation of an invariant object90, and a phenomenon very similar to the flash-lag occurs in an experiment in which the colours of two spots are compared91. One spot gradually changes colour (say, from green to red), and the other spot is flashed in an intermediate colour (for example, orange) that exactly matches the colour of the changing spot at the instant of the flash. Subjects perceive the flashing spot as greener than it actually is. One could called it a 'colour-lag', but the best term is probably 'change-ahead'. Indeed, the same authors have generalized their observations to changes in luminance, spatial frequency and pattern entropy, instead of changes in colour.

Obviously, the differential-latency hypothesis cannot account for all these diverse observations. Either the flash-lag is an ensemble of different phenomena — and some, indeed, are unusual92 — that have their own explanations, or there is one explanation common to visual change in general. One could go one step further and ask: do changes in modalities other than vision produce similar illusions?

Conclusions

Common to the three cases discussed (smooth pursuit, saccade and stimulus motion, or even stimulus change in general) is the fact that a decision has to be made on the instantaneous state (for example, eye position, stimulus position, colour or luminance) of a variable that is continuously and predictably changing. Of course, 'instantaneous state' is meaningless until the instant is specified86. This is the role of an event and, indeed, there is an event in all three cases discussed (usually a flash in pursuit and flash-lag, or a stimulus onset in perisaccadic mislocalization). To find the instantaneous value of a changing variable, two conditions should ideally be met. One is that the timing of the event should be unambiguous whatever the reference. The second is that once the signal of the event is received by the brain, the information about the state of the changing stimulus should be either immediately available (in other words, already processed) or retrievable by extrapolation back in time. Apparently, neither condition can be fulfilled by the brain. In perisaccadic localization, for instance, subjects have great difficulty in deciding whether a flash occurred before, during or after a saccade, and they make large errors40,93. There is no evidence that timing mechanisms working below the level of awareness can do better.

Once the signal of an event is received by the brain, it can be used for sampling, probing or starting a process. There is good agreement on the necessity of this step22,83,84,94,95. However, some see this step as resetting an ongoing process84, whereas others have proposed that the state of a changing variable is sampled only when an event occurs22,86.

How is the state of a variable at a given instant determined by the brain when this variable changes? In the case of perisaccadic localization, most interpretations are given in terms of receptive fields being shifted on the occurrence of the saccade32,33,34,96. An implicit premise of this hypothesis is that neurons involved in localization directly or indirectly receive inputs from a much larger retinal area (perhaps the whole retina) than is apparent when receptive fields are tested with the eyes fixed. If so, what really determines the position and size of receptive fields is not the span of retinal inputs ultimately impinging on these neurons, but the present or future direction of gaze.

In the case of pursuit and flash-lag, the changing variable is stimulus position. Very early in the visual pathway, signals of position and signals of velocity are segregated. Whereas in physics, velocity is simply the time derivative of position and is treated as such, in physiology, visual velocity is a primary dimension, no less fundamental than position. From the initial stage, our visual system has motion detectors and position detectors, ultimately contributing to motion maps and position maps in the cerebral cortex97. In both types of map, cells have receptive fields. The position of a moving stimulus can be represented in two ways: by an instantaneous peak of neural activity in a position map, or by integration of a velocity signal. In smooth pursuit, Priebe et al.98 postulate that motion-selective neurons with a particular speed tuning are responsible for motion perception, whereas the sequential activation of receptive fields leads to the perception of change in position (but note that the two can be perceptually dissociated in the motion aftereffect).

In their explanation of the flash-lag, Krekelberg and Lappe99 and Eagleman and Sejnowski84 assume that an instantaneous stimulus position is computed by the brain by integrating velocity signals. Theoretically, this process takes more time than finding the peak of activity in a receptive field. First, integration is not instantaneous: it requires some sort of averaging over a defined period99. Second, the results of the computation cannot be immediately available because they are based not on current data, but on data that follow the triggering event. Nonetheless, these results are referred to the only available time marker, which is the event itself (called subjective time100). Eagleman and Sejnowski84 created the term 'postdiction' to stress the very likely possibility that some of our perceptions depend on future events. Whereas most people tolerate the idea that the memory of their perception might be faulty, they are not ready to accept that their instantaneous perceptions depend on things that have not happened yet. Why not? This might be a natural consequence of the fact that neural processing is not instantaneous.

A last point: recently, some researchers have expressed reservations about the role of a receptive field map in localization. For instance: “There is more to position encoding than simple spatial receptive fields”95, and “... there is something weird with the view that anatomical position serves as the code for perceived position”101. Throughout this review, the same question arises: why should it be more complicated to determine the spatial relationship of two features if one is moving than if both are static? Two peaks of activity in a map of visual neurons should do the trick, but apparently they do not. Perhaps our conception of maps is too simplistic. Perhaps velocity and position are represented together in the same neuronal population and are not always completely dissociable, as is suggested by several works102,103,104. Whitney and Cavanagh104 observed: “Assigning positions to brief stimuli depends on the configuration of motion signals throughout the visual field”. In other words, what happens at one point in a neuronal population is influenced by changes occurring in the whole population. We need models to exploit this idea.