Introduction

The input to our visual system shifts dramatically as we make eye movements several times per second, yet we are able to act successfully on the objects that we encounter in the world. It seems intuitive that we must create a stable representation of their locations, especially in domains such as memory, when the object of interest is not constantly visible in the scene. One way our visual system might accomplish this is by immediately transforming remembered locations from gaze-centered (retinotopic) coordinates into gaze-independent (spatiotopic) coordinates so that the remembered location would be relatively unperturbed by the large number of eye movements we make. Alternatively, our visual system may store remembered locations in retinotopic coordinates and dynamically update them after each eye movement.

A number of studies have examined retinotopic versus spatiotopic processing across a variety of domains (Afraz & Cavanagh, 2009; Golomb, Chun, & Mazer, 2008; Golomb & Kanwisher, 2012b; Hayhoe, Lachter, & Feldman, 1991; Irwin, 1991; Knapen, Rolfs, Wexler, & Cavanagh, 2010; Melcher, 2007; Melcher & Morrone, 2003; Ong, Hooshvar, Zhang, & Bisley, 2009; Pertzov, Zohary, & Avidan, 2010; Prime, Vesia, & Crawford, 2011; Rolfs, Jonikaitis, Deubel, & Cavanagh, 2011). In spatial memory, Golomb & Kanwisher (2012b) tested the two alternatives described above. They found that human participants were better able to remember retinotopic locations than spatiotopic locations across saccades and that there was a greater accumulation of error across saccades when remembering spatiotopic than retinotopic locations. This is consistent with storage of the location in a retinotopic format, with imperfect updating of the location with each saccade (retinotopic-plus-updating account).

However, the fact that spatial memory was better in retinotopic than spatiotopic coordinates is somewhat unintuitive, because spatiotopic coordinates seem more ecologically relevant and useful for human behavior. Why, then, would memory be better preserved in retinotopic coordinates? One possibility is that this finding is a byproduct of the spatial organization of the visual system, which has been shown to be coded in primarily retinotopic coordinates (Cohen & Andersen, 2002; Gardner, Merriam, Movshon, & Heeger, 2008; Golomb & Kanwisher, 2012a; Medendorp, Goltz, Vilis, & Crawford, 2003). Another possibility is that the retinotopic advantage in the Golomb and Kanwisher memory task was due to the method of reporting responses (mouse click), whereas a task that engages more action-based processes (e.g., reaching) might better engage the spatiotopic memory system.

Visual stability is important both for perceiving the world and for effectively acting within it. It is possible that spatial locations are represented differently when observers intend to act on them compared with when they do not. For example, in the case of peri-saccadic mislocalization, observers commonly misperceive the location of a briefly presented stimulus around the time of a saccade (Matin & Pearce, 1965), yet they are still able to point accurately at the location (Burr, Morrone, & Ross, 2001). An intriguing theory in the visual working memory literature is that the visual system is able to flexibly make use of different memory stores according to task demands (Serences, 2016). We tested the possibility that the spatial memory task in Golomb & Kanwisher (2012b) might similarly rely on different memory stores depending on task demands: specifically, that the original task may have implicitly encouraged the use of a retinotopic memory store, whereas a task better optimized to engage the motor system might rely on a more spatiotopic memory store.

In Golomb & Kanwisher’s (2012b) experiment, participants used a mouse to place a cursor over the location where they remembered seeing the object and clicked the mouse when the cursor was in the correct position. Participants were able to adjust the mouse and move the cursor around until they found something that looked right, and it is possible that the opportunity to make use of these fine visual discriminations preferentially recruited more low-level sensory processes. The low-level nature of the task may have made it more likely to reflect retinotopic processing, because it is well-established that early visual areas have retinotopic representations (Gardner et al., 2008; Golomb & Kanwisher, 2012a).

Moreover, the use of the mouse report and fine visual discriminations could have encouraged a reliance on more perceptual rather than motor processes. From an intuitive standpoint, we might see how retinotopic (eye-centered) coordinates could dominate in the perceptual domain, but spatiotopic (e.g., head-centered, body-centered, world-centered) coordinates certainly seem more relevant for executing motor actions in the world. More generally, there may be differences between tasks that engage vision-for-perception versus vision-for-action. A classic neural dissociation is that the ventral visual stream is responsible for vision for perception, whereas the dorsal visual stream is responsible for vision-for-action (Goodale et al., 1994; Goodale, Milner, Jakobson, & Carey, 1991; Goodale & Milner, 1992; James, Culham, Humphrey, Milner, & Goodale, 2003; Newcombe, Ratcliff, & Damasio, 1987); patients with damage to one visual stream or the other can exhibit strikingly different visual processing abilities. There has been debate over the extent to which other aspects of visual processing may or may not be altered depending on whether participants directly interact with a target. For example, there have been reports of visual illusions affecting perception but not action (Aglioti, DeSouza, & Goodale, 1995; Gentilucci, Chieffi, Daprati, Saetti, & Toni, 1996), although these have been subsequently challenged (Franz, Gegenfurtner, Bulthoff, & Fahle, 2000). Similarly, there is debate (Firestone 2013) over a group of studies that claim that perception is affected by actions that the observer intends to perform (Witt, 2011; Witt, Proffitt, & Epstein, 2005).

In the current experiment, we adopted a new task intended to manipulate task demands and better engage the motor system to encourage a more spatiotopic spatial memory store. Specifically, participants interacted directly with the remembered location by reaching out and touching the location on the screen. Reaching is a simple, naturalistic movement that is commonly used in tasks of visually guided action (Bruno, Bernardis, & Gentilucci, 2008; Cohen & Andersen, 2002; Cressman, Franks, Enns, & Chua, 2007; Culham, Gallivan, Cavina-pratesi, & Quinlan, 2008; Gallivan, Cavina-Pratesi, & Culham, 2009; Song & Nakayama, 2009) and is known to activate parietal, dorsal-stream brain regions (Andersen, Andersen, Hwang, & Hauschild, 2014; Cohen & Andersen, 2002; Johnson et al., 1996; Kertzman, Schwarz, Zeffiro, & Hallett, 1997; Snyder, Batista, & Andersen, 1997). We reasoned that while both the mouse and reaching tasks involve motor processes, the reaching task should more strongly engage the motor system, especially in this task.

In addition, there is another reason that we might predict that the reaching task would engage more spatiotopic motor processing. In Golomb and Kanwisher’s (2012b) task, the mouse movement was always initiated from the final fixation location, which was not known at the time of encoding. Thus, this task might have implicitly encouraged a retinotopic (fixation-relative) memory store, because the spatiotopic motor plan could not be stored in advance. In the new reaching task, subjects execute a reaching movement from a known start position on every trial (finger resting on spacebar), which could allow for the spatiotopic motor plan to be formed and preserved at the time of encoding. If what is remembered across the delay is the motor plan and not the visual representation—or if the visual representation is influenced by the intention to act—then we might expect stronger spatiotopic task performance in the reaching task, because the spatiotopic motor plan may not have to be updated with each eye movement.

Motor and reaching plans are thought to be encoded in parietal and frontal areas, and the evidence for different reference frame representations in these areas is mixed. Some findings advocate for head-centered (Duhamel, Bremmer, BenHamed, & Graf, 1997), hand-centered (Graziano, Yap, & Gross, 1994), or hybrid (Mullette-Gillman, Cohen, & Groh, 2009) reference frame representations in these areas, supporting the idea that that parietal cortex encodes movement in multiple coordinate systems depending on the task (Colby, 1998; Graziano, 2001; Mullette-Gillman et al., 2009; Pertzov, Avidan, & Zohary, 2011). However, other reports have found primarily eye-centered representations (Batista, Buneo, Snyder, & Andersen, 1999; Medendorp et al., 2003), arguing for a common eye-centered reference frame for movement plans (Cohen & Andersen, 2002). Behavioral studies of reaching have also found mixed evidence for hand-centered versus eye-centered representations (Graziano, 2001; Henriques, Klier, Smith, Lowy, & Crawford, 1998; Pouget, Ducom, Torri, & Bavelier, 2002; Soechting & Flanders, 1989; Thomas, 2017; Tipper, Howard, & Houghton, 1998; Tipper, Lortie, & Baylis, 1992), although none have tested accuracy directly as we do here, comparing an eye-centered task to a world-, head-, or body-centered task (here all grouped together as “spatiotopic”).

In the present experiment, we tested whether the benefit for retinotopic versus spatiotopic memory found by Golomb and Kanwisher (2012b) was modulated or reversed when participants directly interacted with remembered locations. To do this, we tested participants’ memory for retinotopic versus spatiotopic locations across a variable number of saccades (0 to 2) for two response types: reaching to tap the remembered locations on a touchscreen versus using a mouse to click on the screen location. We predicted that if the intention to act on a remembered location influenced the native reference frame used to store memories for spatial locations, both the overall lower performance and the larger accumulation of errors in the spatiotopic task would be modulated or reversed.

Methods

Participants

Twelve participants were included in the study, which consisted of four sessions each. One additional participant was run but did not meet our predetermined accuracy requirement (see below), so was not included in data analyses. All subjects reported normal or corrected-to-normal vision and gave informed consent. Study protocols were approved by the Ohio State University Behavioral and Social Sciences Institutional Review Board. Participants were compensated with payment.

Experimental setup and stimuli

Participants were seated with their chin in a chinrest, with their eyes approximately 29 cm away from a touchscreen monitor. Screen resolution was 1,280 x 1,024. The screen was cleaned before each session and calibrated before each touchscreen response session. An Eyelink tower-mount eye-tracking camera was mounted above the chinrest, allowing eye-tracking data to be collected throughout each session without occlusion of the camera due to reaching movements. The room was darkened aside from the stimulus computer, and an opaque mask with a circular aperture was placed over the screen to minimize strategic use of the corners or edges of the screen as screen-centered landmarks.

There were four possible fixation locations (upper left, upper right, lower left, lower right; forming the corners of an invisible square 11 degrees VA in width, centered with respect to the screen). Memory cues indicating locations to be remembered were black-outlined squares sized 0.8 x 0.8 degrees VA. The memory cue location on each trial was a randomly chosen location within the central portion of the screen (in an invisible square measuring 5.2 x 5.2 degrees VA between the possible fixation locations). This was done so that, on average, correct answers were an equal distance from the final fixation in the retinotopic and spatiotopic tasks (Figs. 1b and 3).

Fig. 1
figure 1

a) Each trial began with a white fixation dot. Participants were instructed to keep their eyes on the white dot whenever it was on the screen. After participants were fixating, a black square (memory cue) appeared on the screen for participants to remember the location of, either in retinotopic coordinates or spatiotopic coordinates, depending on the session. After the cue disappeared, participants fixated for another 500 seconds. Depending on the saccade condition, the fixation point moved to a new location between 0-2 times, waiting until the eye-tracker had picked up the participants’ correct fixation before moving to the next location. After an 850-ms delay at the final fixation location, participants were signaled to respond, either by a cursor that appeared at the fixation (mouse response sessions) or by a color change in the fixation dot (touchscreen response sessions). After participants responded, a square appeared at their responded location (green), followed by another at the correct location (black) to give participants feedback. b) Examples of correct responses in the spatiotopic task and the retinotopic task.

Experimental procedures

Each participant completed four sessions: spatiotopic memory task - mouse response; spatiotopic memory task - touchscreen response; retinotopic memory task - mouse response; retinotopic memory task - touchscreen response. Each session was performed on a different day; participants always did the two mouse sessions followed by the two touchscreen sessions, or vice versa. Aside from that constraint, the order of the sessions was fully counterbalanced across participants. In each session, participants were instructed to keep their eyes on a white fixation dot while remembering the location of a square that appeared on the screen, either its absolute location on the screen (spatiotopic memory task) or its location with respect to where they were looking (retinotopic memory task). Before beginning the second session (always a different memory task than the first), participants were asked to predict which memory task would be harder.

Each trial (Fig. 1a) began when the white fixation dot appeared at one of the four possible fixation locations. Once participants were fixating (verified by the eye-tracker), the memory cue (black square) appeared for 200 ms, followed by a fixation-only delay for 500 ms. Next, participants were cued to make a variable number of saccades. The different saccade conditions were: 0 saccades, 1 horizontal saccade, 1 vertical saccade, 2 saccades (horizontal and vertical), or 2 saccades “return” (saccading away and then returning to the original fixation). This final condition was included mainly as a control for secondary analyses, because retinotopic and spatiotopic coordinates reconverged here. Each saccade condition was equally likely and counterbalanced across trials. Within these saccade conditions, fixation location and saccade direction(s) were equally likely and randomized across trials. To cue each saccade, the fixation dot disappeared from its current location and immediately reappeared in one of the other possible fixation locations on the screen; participants were instructed to move their eyes to the new fixation location as quickly as possible. After the saccade was completed (verified by the eye-tracker), there was a post-saccade delay of 850 ms, and then participants were cued to make a second saccade (if the fixation dot moved elsewhere) or to report the remembered location (cued as described below). In 0-saccade trials, the fixation dot never moved, and the memory cue appeared after the initial delay. In mouse response sessions, the memory report cue was a cursor that appeared at the current fixation location, and participants dragged it to the remembered location and clicked on their final response. In touchscreen response sessions, the fixation dot turned green, signaling participants that they could lift their right finger off the keyboard and tap the remembered location on the touchscreen. In both response conditions, participants were required to continue fixating while they responded. To provide feedback in all sessions, a green square appeared at the location of the participants’ response, and a black square (identical to the original square) indicated the correct response. To prevent subjects from responding too early or “cheating,” the mouse cursor remained hidden until the memory report cue in mouse response sessions, and participants needed to keep their finger depressed on the spacebar until the memory report cue in touchscreen response sessions.

Participants completed six runs of the task in each session. Each run consisted of 40 trialsFootnote 1 (8 per each of the 5 saccade conditions), for a total of 48 trials per condition. Throughout each trial, gaze position was tracked, and trials were aborted and repeated later in a run if participants’ eyes deviated more than 2 degrees visual angle for more than 20 ms. Before the first two sessions (the first retinotopic session and the first spatiotopic session), participants completed a sequence of four practice runs. The first practice run consisted of four no-saccade trials, the second consisted of four 1-saccade trials, and the third consisted of four 2-saccade trials. The fourth practice run consisted of eight total trials, with saccade conditions intermixed as in the main experiment. The experimenter was available for questions during this time, and participants had the option to repeat the practice before moving onto the main task or again before the third and fourth sessions if they felt it was necessary.

Analyses

As in Golomb and Kanwisher (2012b), we planned to discard trials with errors larger than 5.5 degrees visual angle, which corresponded to a response in the wrong quadrant of the screen. We also planned to discard a participant’s data if any of their sessions had 10% or more of trials cut, replacing them with another participant with the same counterbalance order. A total of one participant was excluded for poor performance (two sessions with >10% large errors each) and replaced, and an average of 0.7% of trials for the remaining participants were discarded.

We calculated memory accuracy on each trial as the absolute value of the difference (distance, in degrees visual angle) between the reported position and the correct position. This “error” measure was averaged across trials for each condition, task, and subject, and statistical analyses were conducted using repeated-measures ANOVAs. Effect size was reported using partial eta-squared (np 2).

Results

Saccade direction

We first verified that there were no differences between the vertical and horizontal one-saccade conditions, so that we could collapse across saccade direction to examine accumulation of error across number of saccades (Golomb & Kanwisher 2012b). Using a three-way ANOVA with saccade direction (horizontal vs. vertical one-saccade conditions), memory task (retinotopic vs. spatiotopic), and response type (mouse vs. touchscreen), we found no significant main effects or interactions involving saccade direction (all p > 0.57 and F < 0.33). The analyses that follow collapse across saccade direction (a break-down of results by saccade direction can be found in the supplement).

Retinotopic vs. spatiotopic accumulation of error across saccades

Figure 2 shows the average error of the spatial memory report for the retinotopic and spatiotopic tasks as a function of number of saccades, for the mouse and touchscreen-reaching responses. For each response modality, we performed a two-way repeated measures ANOVA with within-participant factors of number of saccades (0-2; collapsed across vertical and horizontal 1-saccade conditions and excluding 2-saccade return condition) and memory task (retinotopic vs. spatiotopic). Note that we did not include the 2-saccade return condition in this primary analysis, because the retinotopic and spatiotopic coordinates reconverged in this condition.

Fig. 2
figure 2

Error distances plotted by saccade condition, memory task (retinotopic vs. spatiotopic sessions), and response type (mouse response sessions in subpanel a vs. touchscreen response sessions in subpanel b). We collapsed across horizontal and vertical 1-saccade conditions in this plot and in our main analyses. Error bars are across-subject standard error of the mean

Our mouse response results replicate Golomb and Kanwisher’s (2012b) findings. First, we found a main effect of number of saccades (F(1.27,13.92) = 94.98, p < 0.001, Greenhouse-Geisser-corrected, np 2 = 0.90), indicating an increase in error with more saccades and/or increasing memory delay. Importantly, we found larger overall error in the spatiotopic condition than the retinotopic condition (main effect of memory task: F(1,11) = 17.07, p = 0.002, np 2 = 0.608), as well as a greater accumulation of error in the spatiotopic compared to the retinotopic condition as number of saccades increased (task × number of saccades interaction: F(1.27,13.98) = 16.46, p = 0.001, Greenhouse-Geisser-corrected, np 2 = 0.60).

Do we find the same pattern of memory errors when participants must directly reach toward the remembered location during report? In the touchscreen-reach task we again found a main effect of saccade number (F(1.02,11.24) = 104.69, p < 0.001, Greenhouse-Geisser-corrected, np 2 = 0.91). Critically, we also still found greater spatiotopic error and steeper accumulation of spatiotopic error in the touchscreen task (main effect of memory task: F(1,11) = 46.00, p < 0.001, np 2 = 0.81; task × number of saccades interaction: F(1.24,13.66) = 38.32, p < 0.001, Greenhouse-Geisser-corrected, np 2 = 0.78). Thus, both for mouse and reaching responses, not only is retinotopic spatial memory better than spatiotopic spatial memory, but the errors in the spatiotopic task accumulate more with each additional saccade during the memory delay.

To investigate whether there were any differences between the response modalities in terms of these effects, we performed a three-way repeated measures ANOVA with within-participant factors of number of saccades (0-2), memory task (retinotopic vs. spatiotopic), and response type (mouse vs. touchscreen). Contrary to our predictions, neither the worse overall performance in the spatiotopic task nor the larger spatiotopic accumulation of error was decreased in the touchscreen condition. Instead, these effects were significantly amplified (memory task by response type interaction: F(1,11) = 10.65, p = 0.008, np 2 = 0.49; three-way interaction: F(1.30,14.29) = 8.48, p = 0.008, Greenhouse-Geisser-corrected, np 2 = 0.44), indicating an even greater benefit for retinotopic memory in the reaching task.

We also found a main effect of response type (F(1,11) = 26.90, p < 0.0001, np 2 = 0.71), with the magnitude of errors being overall larger for touchscreen responses than mouse responses. This effect is not particularly surprising—both because tapping the screen with a finger is inherently less precise than clicking with a small cursor and because the mouse response condition offered more opportunities for participants to visually fine-tune their responses. Importantly, this difference in overall accuracy cannot explain our key finding that in both tasks, memory for retinotopic locations across saccades is more precise (and accumulates less error) than for spatiotopic locations.

2-Saccade (Return) Condition

In the main analyses above, we compared the 0-, 1-, and 2-saccade (new) conditions and found that error accumulated with increasing number of saccades for all tasks, with the key finding being a greater accumulation of error for the spatiotopic tasks. What is the cause of this accumulation, and why would it be greater for spatiotopic? Memory error could have accumulated from 0 to 2 saccades due to an increase in memory delay duration, an increase in number of saccades executed during the delay, or a saccade-related memory updating process. The fact that spatiotopic and retinotopic tasks were matched for memory delays and number of saccades executed argues against these being critical factors in the steeper spatiotopic accumulation. The main difference between the tasks appears to lie in how spatial memory is updated across saccades; updating (remapping) appeared to occur with each additional saccade for the spatiotopic task but not the retinotopic task (as in Golomb & Kanwisher, 2012b). As a further test of this cumulative updating explanation, we performed a secondary analysis comparing trials that all had the same number of saccades (two) but ended either at a new fixation that had not been visited yet on that trial (2-saccade new condition) or returned to the original fixation (2-saccade return condition). As shown in Fig. 2, only in the 2-saccade new condition did spatiotopic memory deteriorate; when the second saccade returned the eyes back to the original location, there was no need for the updated representation, and spatiotopic task accuracy was improved to retinotopic levels. This memory task by 2-saccade type interaction was significant (F(1,11) = 25.8, p < 0.001; np 2 = 0.70). Consistent with the main findings above, we also found a significant three-way interaction here (F(1,11) = 11.68; p = 0.006; np 2 = 0.52), showing that this pattern was similar but amplified for the touchscreen condition.

Response bias

Finally, to investigate whether participants’ responses were systematically biased relative to the true location, we plotted the average reported locations aligned to saccade direction (Fig. 3) and found similar patterns to those in Golomb and Kanwisher (2012b). For the no-saccade condition and the two-saccade (return) condition, there did not appear to be a bias. For the one-saccade and two-saccade (new) conditions, subjects tended to report locations as closer to the initial fixation (i.e., foveal bias, Sheth & Shimojo, 2001) and/or overestimated relative to the final fixation (Bock, 1986; Henriques et al., 1998). Critically, the bias was larger in magnitude for the spatiotopic than the retinotopic task, and for 2-saccade-new than 1-saccade condition, similar to the overall accuracy pattern. In other words, as the number of saccades increased in the spatiotopic task, participants responded with decreased accuracy and increased bias. As above, the pattern was similar but amplified in the touchscreen version.

Fig. 3
figure 3

Average response locations (aligned across fixations) plotted by saccade condition (rows), memory task (retinotopic vs. spatiotopic sessions; indicated by diamonds and x’s), and response type (mouse vs. touchscreen response sessions; columns). All trials of a given condition were aligned to the example fixation locations and saccade directions shown. We collapsed across horizontal and vertical 1-saccade conditions in this plot and in our main analyses. Arrows indicate saccade direction and were not actually presented on the screen

Discussion

Our goal in this experiment was to investigate the underlying mechanisms of visuospatial memory—specifically, what is the native reference frame of spatial representations that are used to act on remembered locations? We hypothesized that the intention to act on a location in the world might influence the reference frame for spatial working memory. Specifically, we predicted that if the visual system is able to flexibly make use of different memory stores according to task demands (Serences, 2016), then a task relying more on vision-for-action might better engage a spatiotopic (world- or body-centered) memory store, whereas a task emphasizing vision-for-perception might rely more on a retinotopic (eye-centered) memory store (Burr et al., 2001). In this study, we replicated a recent study by Golomb and Kanwisher (2012b) which found a benefit for remembering locations in retinotopic rather than spatiotopic coordinates using a computer mouse to report responses, and we compared this to another condition in which participants responded by reaching out and tapping a touchscreen to report the remembered location.

We predicted that reaching to tap directly on a location using a finger might increase reliance on spatiotopic systems, causing a modulation or reversal of Golomb and Kanwisher’s original pattern. Instead, we found the same pattern of retinotopic dominance for both response modalities, suggesting that spatial memory is encoded in retinotopic coordinates and imperfectly updated with each eye movement—even during a reaching task—a surprising finding in light of our subjective experience that we are able to remember effectively and act on locations in real-world (spatiotopic) coordinates.

Our data suggest that not only is retinotopic spatial memory better than spatiotopic spatial memory, but the errors in the spatiotopic task accumulate more with each additional saccade during the memory delay. There was a slight accumulation of retinotopic error as well, likely due to generic effects such as increased memory delay and/or the execution of saccades themselves. What is interesting is that error accumulated far more in the spatiotopic task, and it did so for both mouse responses and reaching. This selective accumulation of spatiotopic error above and beyond that seen for the retinotopic condition—combined with the fact that the differential accumulation was found for the 2-saccade-new but not 2-saccade-return condition—suggests that the most challenging aspect of maintaining a spatial location in memory may be the demands associated with updating (remapping) its spatiotopic position.

It is particularly interesting that we found this pattern of noisier and faster-deteriorating representations in the spatiotopic reaching sessions, given that it was actually possible to encode the spatiotopic motor plan at the beginning of the trial and theoretically maintain this gaze-independent motor plan across the delay. This suggests that participants were either still relying on the natively retinotopic visual representations to perform the task or perhaps even that the reaching motor plans themselves are natively retinotopic. This could be consistent with reports of eye-centered coding of reaching in parietal cortex (Batista et al., 1999; Cohen & Andersen, 2002), as well as behavioral data patterns suggesting eye-centered reaching (Henriques et al., 1998), even for nonvisual cues (Pouget et al., 2002), although other studies have reported both neural and behavioral evidence for gaze-independent representations for reaching (Colby, 1998; Graziano, 2001; Soechting & Flanders, 1989; Tipper et al., 1992). Our spatiotopic task could have been based on any non-gaze-centered coordinate system (world-centered, head-centered, hand-centered, etc.), but we found no evidence spatial memory was encoded better in any of these coordinates compared with the eye-centered (retinotopic) task, even in the visually-guided reaching task.

Rather than being reduced, the patterns of larger spatiotopic errors and larger spatiotopic accumulation of errors were both amplified when participants responded with a touchscreen compared with a mouse. One potential explanation for this amplification of error is that an additional transformation may be involved in the spatiotopic reaching task. For example, it is possible that, in addition to the location being transformed from retinotopic to spatiotopic coordinates with each eye movement, it must be transformed to hand-centered coordinates, resulting in an additional accumulation of error (Andersen, Snyder, Li, & Stricanne, 1993; Pouget et al., 2002). Another possibility is that the updating process and/or the memory representations themselves are noisier for the touchscreen-reaching than mouse-based task. This would be consistent with the main effect of modality that we also found, with touchscreen responses overall less accurate than mouse responses. While this main effect may not be surprising given the differences between the tasks (e.g., mouse task allowing more opportunity to visually fine-tune responses), it is important to emphasize that these differences cannot account for the primary retinotopic versus spatiotopic results. In other words, it is possible that providing participants the opportunity to fine-tune their touchscreen responses (e.g., by placing a marker on the screen before the final response) could conceivably eliminate the main effect of response modality and/or the 3-way interaction, but we would not expect it to eliminate the significant retinotopic vs spatiotopic difference or reference frame by saccade number interaction within each modality.

Our lack of evidence for more efficiently updated spatiotopic representations when reaching invites the question of how memory for objects’ locations is represented in a format conducive to acting on those objects in our environments. Representing both visual locations and motor plans in the same reference frame could be one way to facilitate effective acting on objects in the world (Cohen & Andersen, 2002), and retinotopic representations may simply be more computationally efficient as a common reference frame. It is possible that we tolerate a bit of error in these representations at the cost of neural efficiency, especially given that during real-world processing, a number of external factors can allow us to compensate for this imperfect updating system. For example, it has been shown that when a target is redisplayed after the saccade (Deubel, Bridgeman, & Schneider, 1998; Vaziri, 2006), or when stable visual landmarks are present (Deubel, 2004; Lin & Gorea, 2011; McConkie & Currie, 1996), visual stability is much improved. Visual stability may also benefit from top-down factors and expectations (Rao, Abzug, & Sommer, 2016), indicating that the visual system might not need to rely solely on updating (Churan, Guitton, & Pack, 2011), instead deriving benefits from the largely stable visual information present in our everyday environments. In the current task, we intentionally employed an impoverished visual display devoid of external landmarks, precisely because we wanted to test the native reference frame of spatial memory representations in the absence of such facilitating cues. Thus, while the current results offer new insight into the underlying mechanisms of spatial memory representations, this does not necessarily mean that “spatiotopic” memory would be worse in real-world scenarios filled with rich, spatiotopically stable landmarks. Rather, our results suggest that these landmarks and external visual cues may be even more important for visual—and motor—stability than previously realized.

These findings may have implications for the debate over the extent of processing differences between the dorsal and ventral visual streams (Franz et al., 2000; Goodale et al., 1994, 1991; Goodale & Milner, 1992; James et al., 2003; Newcombe et al., 1987), as well as the extent to which vision-for-perception is separate from vision-for-action (Aglioti et al., 1995; Franz et al., 2000; Gentilucci et al., 1996). The finding of a similar pattern in our two tasks could reflect similar spatial processing in the dorsal and ventral streams—perhaps the idea of flexible memory stores (Serences, 2016) does not apply here or, even if participants were able to make use of multiple memory stores in the task, it appears that neither was very efficiently updated to spatiotopic coordinates. This is consistent with reports of retinotopic representations of spatial location throughout visual areas, including higher-level dorsal and ventral stream areas (Gardner et al., 2008; Golomb & Kanwisher, 2012a; but see Crespi et al., 2011; d’Avossa et al., 2007; McKyton & Zohary, 2007). Of course, it is possible that there may be different processing involved in vision-for-perception versus vision-for-action, but both our mouse clicking and touchscreen reaching tasks engaged both systems. Even if this were the case, it is still notable that even the naturalistic and well-studied reaching action did not engage more efficiently updated world-centered representations.

That said, while we found no evidence for a more efficiently spatiotopic representation for the reaching task, it is possible that other types of actions may be more conducive to finding spatiotopic representations. For example, it has been suggested that visibility of the hand during reaching, as in our experiment, may be preferable for preserving eye-centered coordinates (Batista et al., 1999). Different types of actions (e.g., eye movements vs. reaching movements) also may engage different visual processes (Lisi & Cavanagh, 2015, 2017). Indeed, there is evidence that when participants intend to make an eye movement to a location, there is some involvement of a gaze-independent coordinate system (Karn, Møller, & Hayhoe, 1997), although it is possible that this difference stems from the larger number of intervening eye movements (Sun & Goldberg, 2016). Future studies could investigate these possibilities further.

Conclusions

Our results provide further evidence that visual memory for locations is maintained in retinotopic coordinates and imperfectly updated to spatiotopic coordinates with each saccade. In addition to replicating Golomb and Kanwisher’s original (2012b) findings that memory for retinotopic locations is more accurate than memory for spatiotopic locations, critically, we found no evidence that acting directly on a location via reaching and tapping led to more accurate updating, or to the storage of locations in a natively spatiotopic format. Our results may reflect (1) a visual system that is overwhelmingly coded in retinotopic coordinates, both in the dorsal and ventral streams, for perception, action, and memory, and/or (2) flexible recruitment of memory store(s) that turn out to be preferentially retinotopic across different task demands. More broadly, these results fit into literature that support the idea of natively retinotopic representations that must be dynamically updated (Cohen & Andersen, 2002; Duhamel, Colby, & Goldberg, 1992; Golomb et al., 2008; Golomb & Kanwisher, 2012a)—sometimes imperfectly (Golomb & Kanwisher, 2012b)—to form the basis of our ability to perceive and act effectively in the world.