Skip to main content

Main menu

  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Blog
    • Collections
    • Podcast
  • TOPICS
    • Cognition and Behavior
    • Development
    • Disorders of the Nervous System
    • History, Teaching and Public Awareness
    • Integrative Systems
    • Neuronal Excitability
    • Novel Tools and Methods
    • Sensory and Motor Systems
  • ALERTS
  • FOR AUTHORS
  • ABOUT
    • Overview
    • Editorial Board
    • For the Media
    • Privacy Policy
    • Contact Us
    • Feedback
  • SUBMIT

User menu

Search

  • Advanced search
eNeuro
eNeuro

Advanced Search

 

  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Blog
    • Collections
    • Podcast
  • TOPICS
    • Cognition and Behavior
    • Development
    • Disorders of the Nervous System
    • History, Teaching and Public Awareness
    • Integrative Systems
    • Neuronal Excitability
    • Novel Tools and Methods
    • Sensory and Motor Systems
  • ALERTS
  • FOR AUTHORS
  • ABOUT
    • Overview
    • Editorial Board
    • For the Media
    • Privacy Policy
    • Contact Us
    • Feedback
  • SUBMIT
PreviousNext
Research ArticleResearch Article: New Research, Cognition and Behavior

Transformed Visual Working Memory Representations in Human Occipitotemporal and Posterior Parietal Cortices

Yaoda Xu
eNeuro 5 June 2025, 12 (7) ENEURO.0162-25.2025; https://doi.org/10.1523/ENEURO.0162-25.2025
Yaoda Xu
Department of Psychology, Yale University, New Haven, Connecticut 06510
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Yaoda Xu
  • Article
  • Figures & Data
  • Info & Metrics
  • eLetters
  • PDF
Loading

Abstract

Recent fMRI studies reported transformed representations between perception and visual working memory (VWM) in the human early visual cortex (EVC). This is inconsistent with the still widely cited original proposal of the sensory account of VWM, which argues for a shared perception-VWM representation based on successful cross-decoding of the two representations. Although cross-decoding was usually lower than within-VWM decoding and consistent with transformed VWM representations, this has been attributed to experimental differences between perceptual and VWM tasks: once they are equated, the same representation is expected to exist in both. Including human participants of both sexes, this study compared target and distractor representations during the same VWM delay period for the same objects, thereby equating experimental differences. Even with strong VWM representations present throughout the occipitotemporal cortex (OTC, including EVC) and posterior parietal cortex (PPC), fMRI cross-decoding revealed significant representational differences between distractors (perception) and targets (VWM) in both regions. Similar differences existed between target encoding (perception) and delay (VWM), being greater in OTC than PPC, indicating more invariant target representations in PPC than OTC. As only part of the sensory input is usually task-relevant, sustaining sensory input in VWM without selection/refinement/consolidation is both taxing and unnecessary. Transformed representations, mediated by task goals and associative areas coding task-relevant information (e.g., PPC), can easily account for these and other recent findings. A task-driven transformed account of VWM thus better captures the nature of VWM representation in the human brain (including EVC) than the sensory representations originally proposed by the sensory account of VWM.

  • fMRI
  • occipitotemporal cortex
  • posterior parietal cortex
  • vision
  • visual object representation
  • visual short-term memory
  • visual working memory

Significance Statement

The original proposal of the sensory account of visual working memory (VWM) argues for a shared representation between perception and VWM in sensory areas. This assumption, however, was not thoroughly tested due to differences in experimental settings in prior studies. Using fMRI cross-decoding and closely matched experimental conditions, this study compared object representations when they were VWM targets and distractors and during the encoding and delay period of VWM. Both comparisons revealed significant representational differences between perception and VWM in human sensory areas. These results are inconsistent with the sensory nature of VWM representations as it was originally proposed. Instead, they support a task-driven transformed account of VWM in which sensory input is selected/refined/consolidated before VWM storage in these areas.

Introduction

Prior research has put forward the sensory account of visual working memory (VWM) representation in the human brain (D’Esposito and Postle, 2015; Serences, 2016; Christophel et al., 2017), stating that “the systems and representations engaged to perceive information can also contribute to the short-term retention of that information” (p. 118; D’Esposito and Postle, 2015). There are thus two critical components of this account: (1) the involvement of sensory regions in VWM and (2) a shared representation between perception and VWM in sensory regions. Regarding the second component, Serences et al. (2009) stated that “the pattern of delay activity was qualitatively similar to that observed during the discrimination of sensory stimuli, suggesting that WM representations in V1 are reasonable “copies” of those evoked during pure sensory processing”(p. 207). Similarly, D’Esposito and Postle (2015) stated that sensory information is retained in working memory in a highly stimulus-specific manner “most parsimoniously explained as the persistent activation of the sensory representations themselves” (p. 119). By sustaining perceptual representations in sensory regions in VWM when stimuli are no longer in view, the sensory account attempts to provide a mechanistic understanding of VWM. Although these two particular studies were published ten or more years ago, they are still highly cited: A quick Google Scholar search shows that both papers have been cited 283 and 928 times since 2021, 103 and 306 times since 2024, and 17 and 70 times since 2025, respectively. Thus, these studies are still considered highly relevant in today’s working memory research.

While the existence of VWM signal in sensory regions has been well established (Harrison and Tong, 2009; Serences et al., 2009; Bettencourt and Xu, 2016a; Rademaker et al., 2019), whether perception and VWM share the same representation, however, is less clear. For one thing, it is at odds with goal-directed visual information processing. At any given moment, only part of the sensory input is task-relevant. Sustaining the sensory representation without selection and refinement is both unnecessary and taxing, especially given our limited information processing capacity (Xu, 2018a). Indeed, significant transformations between perception and VWM have been reported in recent studies, inconsistent with the sensory account of VWM. For example, Kwak and Curtis (2022) reported in VWM the retention of abstracted perceptual representation of the original perceptual input in the human early visual cortex (EVC; see also Duan and Curtis, 2024, and Yan et al., 2023). Li and Curtis (2023) further showed that the transformed EVC VWM representation can guide subsequent behavior. Xu (2023) showed that the content of VWM in the human occipitotemporal cortex (OTC) is more similar to the information encoded and maintained in the human posterior parietal cortex (PPC), a goal-directed adaptive visual processing region (Xu, 2028a,b), than to the sensory information initially encoded into OTC.

One key initial evidence supporting the sensory account of VWM comes from cross-decoding, whereby a linear classifier trained to distinguish neural representations in a separate perceptual task can successfully do so for the representations held in VWM (Harrison and Tong, 2009; Albers et al., 2013; Rademaker et al., 2019). Notably, cross-decoding performance, although significant, is usually lower than that of within-decoding, in which the classifier is trained and tested within VWM. Such a cross-decoding drop has been attributed to experimental differences since sensory and VWM data were typically collected in different fMRI runs under different task environments and attentional engagements. For example, in the sensory task of Harrison and Tong (2009), participants identified centrally presented letters while ignoring low-contrast grating flashing in the surround, while in the VWM task, they encoded the centrally presented high-contrast grating. Experimental differences could thus easily contribute to the observed cross-decoding drop without representations necessarily being different between perception and VWM. If perception and VWM indeed share a representation, then when experimental settings are properly controlled for, very little cross-decoding drop should be expected. Meanwhile, a significant cross-decoding drop can also signal a representational transformation between perception and VWM, consistent with recent reports. Thus, a critical test of the sensory nature of VWM comes down to whether or not a cross-decoding drop is still present when the testing condition is properly matched. This present study aims to differentiate between these two possibilities by reanalyzing data from a previous study with well-matched experimental settings for perception and VWM.

Materials and Methods

The present study reanalyzed the data from a recently published study (Xu, 2024) in which the same objects could appear either as VWM targets or distractors during the delay period in different trials. By training a linear classifier on a pair of objects when they were distractors during the delay period to then decode the same objects when they were VWM targets during the delay period, the present study examined if there was still a cross-decoding drop compared with when training and testing were both performed on VWM targets during the delay period. Here, because both types of training and testing occurred during the delay period, the experimental setting was well-matched. Moreover, given that representations were much stronger for the distractors (visible and sensory) than for VWM targets (invisible) in the sensory regions, a sensory account would predict a much greater VWM decoding when the classifier was trained on the distractors than on the VWM targets. If, however, a significant cross-decoding drop or chance-level cross-decoding was obtained, it would provide strong evidence showing distinct sensory and VWM representations and argue against the sensory nature of VWM representations.

Experimental details of the present study have been reported extensively in the original publication (Xu, 2024). They are reproduced/summarized here for the readers’ convenience. New analysis details pertinent to the present study are described in detail.

Participants

Fourteen (nine females) healthy human participants with normal or corrected to normal visual acuity, all right-handed and aged between 18 and 35 years, took part in the two scanning sessions of the study. All participants gave their informed consent prior to the study and received payment for participation. The study was approved by the Committee on the Use of Human Subjects at Yale University. Three additional participants also took part in the study but did not complete the second of the two scanning sessions. Their data were not analyzed.

Main VWM experiment

This experiment used four types of objects, namely, bikes, couches, hangers, and shoes (sneakers; Fig. 1B). To increase task difficulty and ensure that a visual code was employed to retain VWM representation, similar-looking exemplars of a given object were used, and the probe object at the end of the VWM delay was either the same target object or another exemplar from the same object (see https://osf.io/8rbkh/ for the complete set of images used). All images were placed on a white square (subtended 9.73° × 9.73°) and shown on a larger gray background. Each VWM trial contained a central presentation of a target object, a prolonged blank delay, and a probe object (Fig. 1A). The probe object was either an exact match to the target image or a different exemplar of the same type of object. Each trial was 15 s long, with the timing of the different events as follows: fixation (0.5 s) in the form of a looming red dot to alert the participants to the imminent presentation of the target images, target image (0.5 s), blank delay with a red fixation dot (1.5 s), blank delay or distractor delay with a red fixation dot (10 s), and probe image (2.5 s). In trials with a distractor delay, 20 distractor images were shown, with the 10 unique exemplars from the distractor object each shown twice with no back-to-back repetition of the same exemplar. Each distractor image was shown for 0.3 s, followed by a 0.2 s blank screen before the next distractor image was shown. There were a total of 16 unique trials in each run, including 12 trials for all the target and distractor object combinations (four target objects multiplied by three distractor objects) and 4 trials in which no distractors were shown. Each run started and ended with an 8 s blank period with a blue fixation dot. Successive VWM trials were sandwiched by a blank period with a blue fixation dot. Of the 15 such intertrial blank periods, 3 were 8 s long, and 12 were 2 s long, and they were randomly distributed. In a first scan session, 13 or 14 runs of data were collected from each participant, with each run lasting 5 min and 4 s; in a second scan session, 14 runs of data were collected from each participant.

The design of the present study was modeled closely after Harrison and Tong (2009), who asked participants to remember different exemplars from two orientation categories (i.e., with exemplars varying ±3 or ±6° from 25 or 115°). Responses in Harrison and Tong (2009) were then averaged over exemplars within each orientation category, and decoding was performed at the category level (25° vs 115°). Following this design, in the present study, target decoding was performed at the object level (e.g., bikes vs couches) by including all trials containing exemplars from the same object in the same condition (e.g., all trials with bike exemplars were treated as bike trials); likewise, distractor decoding was also performed at the object level when the aggregated distractor responses from the delay period were used for distractor decoding. Thus, although targets were remembered at the stimulus level, which requires the encoding of both object- and exemplar-specific information, only the basic-level object information contributed to VWM decoding. Similarly, when participants viewed the distractors and noticed the differences among the distractor exemplars, they also encoded both object- and exemplar-specific information from the distractors; however, only the basic-level object information contributed to distractor decoding from the aggregated distractor response during the delay period. In this regard, target and distractor decoding was well matched, and both involved the decoding of basic-level object information rather than exemplar-specific information.

Localizer experiments

Topographic visual regions

These regions were mapped with flashing checkerboards using standard techniques (Sereno et al., 1995; Swisher et al., 2007) with parameters optimized following Swisher et al. (2007) to reveal maps in the parietal cortex. Specifically, a polar angle wedge with an arc of 72° swept across the entire screen (19.07° × 13.54° of visual angle). The wedge had a sweep period of 32 s, flashed at 4 Hz, and swept for eight cycles in each run (for more details, see Swisher et al., 2007). Participants completed four to six runs, each lasting 4 min and 36 s. All participants were asked to detect a dimming that could occur anywhere within the polar angle wedge, thereby ensuring attention to the whole wedge.

Lateral and ventral occipitotemporal regions

To identify these ROIs, following Kourtzi and Kanwisher (2000) and as we have done previously (Jeong and Xu, 2017; Vaziri-Pashkam and Xu, 2017, 2019; Vaziri-Pashkam et al., 2019; Taylor and Xu, 2022), participants viewed blocks of objects and scrambled objects (all subtended approximately 9.73° × 9.73°). The images were photographs of gray-scaled common objects (e.g., cars, tools, and chairs) and phase-scrambled versions of these objects. Participants monitored a slight spatial jitter which occurred randomly once in every 10 images. Each run contained four blocks of each of the objects, phase-scrambled objects, and two other conditions that were used to define another brain region. Each block lasted 16 s and contained 20 unique images, with each appearing for 750 ms and followed by a 50 ms blank display. Besides the stimulus blocks, 8 s fixation blocks were included at the beginning, middle, and end of each run. Each participant was tested with two runs, each lasting 4 min and 40 s.

MRI method

Each participant completed two experimental sessions (1.5 h) and a localizer session (1.5 h) containing topographic mapping and functional localizers. MRI data were collected using a Siemens Prisma 3T scanner, with a 32-channel receiver array head coil. Participants lay on their backs inside the scanner and viewed the backprojected display through an angled mirror mounted inside the headcoil. The display was projected using an LCD projector at a refresh rate of 60 Hz and a spatial resolution of 1,280 × 1,024. An Apple MacBook Pro laptop was used to create the stimuli and collect the motor responses. Stimuli were created using MATLAB and Psychtoolbox (Brainard, 1997).

A high-resolution T1-weighted structural image (0.8 mm × 0.8 mm × 0.8 mm) was obtained from each participant for surface reconstruction. All blood oxygen level-dependent data were collected via a T2*-weighted echoplanar imaging pulse sequence that employed multiband RF pulses and simultaneous multislice (SMS) acquisition. For both the main experiment and the localizers, 72 axial slices (2 mm isotropic), 0 skip, covering the entire brain were collected (TR, 800 ms; TE, 37 ms; flip angle, 52°; SMS factor, 8).

Data analyses

fMRI data were analyzed using FreeSurfer (surfer.nmr.mgh.harvard.edu), FsFast (Dale et al., 1999), and in-house MATLAB codes. LIBSVM software (Chang and Lin, 2011) was used for the MVPA support vector machine analysis. fMRI data preprocessing included 3D motion correction and linear and quadratic trend removal. After reconstructing the inflated 3D cortical surface of each participant using the high-resolution anatomical data, we projected the fMRI data from that participant onto their native cortical surface. As was done in a recent study (Xu, 2023), all fMRI responses were analyzed directly on the inflated cortical surface (vertices) rather than on the cortical volume (voxels) of each participant, including ROI definition and the main VWM analysis, as surface-based analysis has been shown to exhibit more sensitivity and better spatial selectivity (Oosterhof et al., 2011; Brodoehl et al., 2020).

ROI definitions

Following the detailed procedures described in Swisher et al. (2007) and as was done in our prior publications (Bettencourt and Xu, 2013, 2016a,b; Vaziri-Pashkam and Xu, 2017, 2019; Vaziri-Pashkam et al., 2019), by examining phase reversals in the polar angle maps, we identified areas V1 to V4, V3a, V3b, and IPS0 to IPS4 in each participant (Fig. 1C). Following Kourtzi and Kanwisher (2000) and as was done in our prior studies (Jeong and Xu, 2017; Vaziri-Pashkam and Xu, 2017, 2019; Vaziri-Pashkam et al., 2019; Taylor and Xu, 2022, 2024), LOT and VOT were then defined as a cluster of continuous voxels in the lateral and ventral occipital cortex, respectively, that responded more to the intact than to the scrambled object images (Fig. 1C). LOT and VOT loosely correspond to the location of LO and pFs (Malach et al., 1995; Grill-Spector et al., 1998; Kourtzi and Kanwisher, 2000) but extending further into the temporal cortex in an effort to capture the continuous activations often seen extending into the ventral temporal cortex.

VWM decoding analysis

With the length of our VWM trials being 15 s, for each surface vertex, we estimated the fMRI response amplitude at each TR from the onset of the trial up to 24 s, totaling 30 TRs (with each TR being 800 ms). This was done separately for the trials in each of the 16 conditions. To obtain these estimates, we first constructed 30 finite impulse response functions (FIRs) corresponding to each TR of each condition's trials. As each condition appeared only once in a run, given the short intertrial interval (mostly 2 s) and the lag in hemodynamic responses, it was not possible to accurately estimate the amplitudes of the FIRs in each run. To obtain reliable amplitude estimates for the FIRs, following the procedure developed in a recent study (Xu, 2023), we combined five or six runs together, as detailed below, before applying GLM modeling to derive the beta weight estimate for each of the 480 FIR functions (30 TRs multiplied by 16 categories). Because the trial onset times were jittered with respect to TR onsets, trial onset times were rounded to the nearest TR before GLM modeling. To obtain independent training and testing data for pattern decoding, we split the runs into odd and even halves in each of the two scan sessions, with each split containing six or seven runs depending on the total number of runs acquired in a given session. We then applied a GLM to each six-run combination if the split contained seven runs and to each five-run combination plus a combination including all six runs if the split contained six runs. This resulted in six beta weight estimates for each FIR function in each split for each surface vertex. The beta weights of all the vertices in a given ROI formed our fMRI response pattern for that ROI. For each ROI and across the four data splits, we thus had a total of 28 patterns for each TR and each condition. Note that within a split, the seven patterns were not independent of each other as they were estimated from shared runs; however, the patterns between the different splits came from independent runs and were thus independent of each other.

To generate a response amplitude time course from each ROI for each condition, we averaged all the beta weights across all the surface vertices within an ROI and from all 28 patterns. Based on the peak responses from all the ROIs and conditions, we defined an encoding period (from 4 to 6.4 s) and a delay period (from 9.6 to 12 s before the onset of the probe). We then averaged the four beta weights within each period to generate an average response for each of these two VWM process stages.

Prior to our decoding analysis, to remove amplitude differences across categories, ROIs, and VWM processing stages, following our previous studies (Bettencourt and Xu, 2016a; Vaziri-Pashkam and Xu, 2017, 2019; Vaziri-Pashkam et al., 2019; Xu, 2023), we z-normalized each fMRI response patterns. For a given ROI and for a pair of conditions, we used SVM for pattern decoding (LIBSVM; Chang and Lin, 2011). We trained a decoder using all the response patterns from three splits of the data (totaling 21 patterns) to test its performance on the left-out data split (7 patterns). Training and testing were thus done on independent data sets. We rotated the training and testing order, with each data split serving as the test split and the remaining three as the training splits, and averaged the results from all four rotations.

Training and testing were performed separately for each pair of conditions of interest and the decoding results were averaged across all relevant pairs of conditions to derive the average decoding performance for a given analysis. To directly compare the different ROIs, maximize the contrast, and streamline the analysis, based on the anatomical locations, we formed three ROI sectors at the three ends of the ROIs and averaged the decoding performance within the ROIs in each sector: a posterior sector including lower visual areas V1–V4, a ventral sector including object areas LOT and VOT, and a dorsal sector including higher PPC areas IPS2–IPS4. Within each ROI sector, we performed decoding within each ROI and then averaged the results, rather than forming a combined ROI containing the individual ROIs and then performing decoding. This was done to allow equal weighing of the results from the different ROIs: as size differed across the different ROIs and different participants, decoding based on a combined ROI may bias the results toward the larger ROIs, which could further differ across participants. Our main comparisons focused on the differences among the three ROI sectors. We performed two types of decoding.

(1) Cross-decoding between distractors and targets. Here, as illustrated in Figure 2A, we trained a classifier to decode a pair of objects when they were distractors during the delay period to decode the same pair of objects when they were targets during the delay period. The irrelevant object in both cases was the same (e.g., training to classify A vs B in the training data when they were distractors with C being the target in both to then decode A vs B in the testing data when they were targets with C being the distractor in both). We then compared this cross-decoding with within-decoding in which training and testing were done within the target objects (e.g., training on A vs B in the training data when they were targets with C being the distractor in both to decode A vs B in the testing data when they were targets with C being the distractors in both). The results are shown in Figure 2B. The results did not differ if the irrelevant object differed between training and testing (e.g., training on A vs B when they were distractors with C being the target in both to decode A vs B when they were targets with D being the distractor in both; Extended Data Fig. 2-1). In addition to decoding targets from trials with distractors during the delay period, the distractor-trained classifier was also asked to decode targets from these trials during the encoding period (Fig. 2C,D) and targets from trials without distractors during the delay period (Fig. 3).

Figure 2-1

Training distractors to decode targets in trials with distractors. A. The irrelevant object was the same across training and test (e.g., training on A vs. B when they were distractors with C being the target in both to decode A vs. B when they were targets with C being the distractor in both). B. The irrelevant object differed across training and test (e.g., training on A vs. B when they were distractors with C being the target in both to decode A vs. B when they were targets with D being the distractor in both). Error bars indicate s.e. * p < .05, ** .01 < p < .001, *** p < .001. Download Figure 2-1, TIF file.

Note that the cross-decoding between distractors and targets was performed only in one direction (from distractors to targets) rather than both ways, which would include training targets to decode distractors. This was done due to the ceiling and highly saturated distractor decoding performance, especially from sensory regions (Fig. 1D). In other words, distractor decoding performance no longer tracked the underlying signal strength. As such, performing decoding both ways could significantly distort the results. Instead, a more conservative approach was taken here to use the distractor-trained decoder to decode targets. If perception and VWM share the same representation, cross-decoding should be similar to within-decoding (which was trained and tested on VWM targets), if not greater, as the distractor-trained classifier would be more robust than the VWM-trained classifier given the much stronger distractor than VWM representation during the delay period, especially in sensory areas. However, if cross-decoding is substantially lower than within-decoding or at chance, it would provide strong evidence indicating very little representational overlap between perception and VWM.

Although we could also train on the targets during the delay to decode the distractors (cross-target distractor decoding), it is unclear whether this type of decoding would be informative. The key question here is what we should compare this type of decoding with to draw valid conclusions. If we compare it with within-target decoding during the delay period (i.e., train on targets to decode targets), then because the distractor signal is ultrastrong, we may get high distractor decoding due to this, even when there is only partial overlap between target and distractor representations. This would prevent us from comparing distractor and target decoding to draw firm conclusions regarding whether targets and distractors share representations during VWM delay. If we compare cross-target distractor decoding with within-distractor decoding (i.e., train on distractors to decode distractors), because the classifier is weaker when trained with the targets than with the distractors during delay, a drop in performance in cross-target distractor decoding compared with within-distractor decoding could be due to that, rather than a difference in target and distractor representations. Again, we will not be able to draw firm conclusions here. It is thus unclear whether obtaining cross-target distractor decoding would be informative here.

To examine the differences between within- and cross-decoding and to account for decoding performance differences across the different ROIs and the different decoding periods, following prior studies documenting tolerance of visual object representations in OTC to changes such as position and size (Xu and Vaziri-Pashkam, 2022) and documenting VWM representation across different distractor conditions (Xu, 2024), the within- and cross-decoding measures were combined to form a cross-decoding ratio for the two types of decoding. This was done by first subtracting 0.5 from the within- and cross-decoding accuracies and then taking the ratio of the two resulting values. A ratio of 1 would indicate equally good decoding performance within VWM and across distractors and VWM and a complete generalization of perception and VWM, whereas a ratio of 0 would indicate a complete failure of such generalization. The results are shown in Figure 2, E and F.

(2) Cross-decoding between VWM encoding and delay for targets. In this analysis, a classifier was trained to decode a pair of targets during either VWM encoding or delay, and its performance was then tested during both VWM encoding and delay. The results were averaged across both directions of training and testing (since the within-decoding performance was not saturated/at the ceiling for either). This was done separately for trials with and without distractors (Fig. 4A,C). The results are shown in Figure 4, B and D.

As in (1), to evaluate potential differences in the two types of trials and to account for decoding performance differences across the different ROIs and decoding periods, the within and cross-decoding measures were combined by computing a cross-decoding ratio for the two types of trials following the procedures described in (1). The results are shown in Figure 4, E and F.

It is worth noting that pairwise decoding was conducted for all the analyses here. Since four types of targets and distractors were used in the present study, one might wonder if a single four-way decoder should be used. A four-way decoder, unfortunately, would not work for the present study. Although there were four types of target objects A, B, C, and D, and the same four types of distractor objects, given that target and distractor objects differed in a given trial, the four types of distractor objects did not appear equally often for each type of the target objects (i.e., target A with distractors B, C, and D, but never with A; target B with distractors A, C, and D, but never with B; so on and so forth). Thus, when performing a four-way target decoding, differences among the distractor objects among the four types of target objects would necessarily contribute to the results, and the decoding results would reflect differences among the distractors (given their much stronger representation) than the targets. This would significantly distort the findings. This problem, however, could be avoided with pairwise decoding. Here, the distractor objects were kept constant when performing target decoding, e.g., decode A versus B when C was the distractor for both. Thus, it was essential to perform pairwise decoding in the present study.

Statistical analyses

t tests were used to assess within and cross-decoding performance against chance (one sample, one-tailed, as only effects above chance were meaningful), cross-decoding drop (paired, one-tailed, as the effect was expected to be either null or with cross-decoding being lower than within-decoding), and pairwise comparisons among the three ROI sectors (paired, two-tailed). Correction for multiple comparisons was applied using the Benjamini–Hochberg method (Benjamini and Hochberg, 1995) for the number of tests of the same type performed within each ROI or sector and for the three pairs of tests performed across the three ROI sectors. Repeated measures ANOVAs were used to assess main effects and interactions between variables of interest.

Results

In this study, human participants retained real-world objects in VWM, either with or without distraction during the delay period, with the same objects serving as either targets or distractors in different trials (Fig. 1A,B). The target and distractor objects came from the same four types of objects sharing roughly the same outline (i.e., bikes, couches, hangers, and shoes). To increase task difficulty and ensure that a visual code was employed to retain VWM representation, similar-looking exemplars of a given object shown in the same viewpoint were used, and the probe object at the end of the VWM delay was either the same target object or another exemplar from the same object.

Figure 1.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 1.

Experimental design and example brain regions of interest (ROIs). A, Example trials showing the trial sequence. In each trial, a single target is shown; after an extended delay period filled with either a blank screen or a sequential presentation of exemplars of another object, a probe is shown. The probe was either an exact match or a different exemplar of the same object type. The entire image set is available from the online data deposit. B, The four types of objects used. C, Example ROIs shown on the inflated cortical surfaces. Reproduced from Vaziri-Pashkam and Xu (2019). D, Decoding accuracies during the VWM delay period for targets in trials without and with distractors and for distractors. The colored symbols above the bars mark the decoding significance of each bar compared with chance (0.5). Error bars indicate SEM. *p < 0.05, **0.01 < p < 0.001, ***p < 0.001.

As in a previous study (Bettencourt and Xu, 2016a), behavioral performance did not differ between trials with and without distractors (Xu, 2024). This may not be surprising. Given the ubiquitous presence of distractions in everyday life (Ongchoco and Xu, 2024), for VWM to be useful, it needs to be fairly robust to distraction. Indeed, as reviewed by Xu (2017), the effect of distraction was either absent or very small; when it was present, it was in studies when very precise sensory representations were retained—which is not how information is typically stored and used in VWM in everyday life (e.g., we rarely have to encode the precise object color or orientation in the real world). Thus, it should come as no surprise that VWM is fairly robust to distraction and that no effect of distraction on behavioral performance was found in the present study.

The main analysis of the present study focused on the fMRI responses extracted from regions of interest (ROIs) across OTC and PPC (Fig. 1C), including early visual areas V1 to V4, object processing regions in lateral occipitotemporal (LOT) and ventral occipitotemporal (VOT) cortex, and parietal topographic areas V3a, V3b, and IPS0 to IPS4. As in a recent study (Xu, 2023), fMRI responses were analyzed directly on the inflated cortical surface (vertices) rather than on the cortical volume (voxels) of each participant, as surface-based analysis has been shown to exhibit more sensitivity and better spatial selectivity (Oosterhof et al., 2011; Brodoehl et al., 2020). From the averaged fMRI response time courses of each ROI, encoding and delay periods were defined, fMRI responses were averaged within each period, and fMRI response patterns were generated for each ROI and each condition (see Materials and Methods). Significant VWM decoding was obtained across almost all the ROIs regardless of distractor presence, and decoding did not differ between trials with and without distractors in all the ROIs (Fig. 1D; see also Xu, 2024). To compare across the ROIs, maximize the contrast, and streamline the analysis, three sectors at the three ends of the ROIs were created, and the decoding performance was averaged within the ROIs in each sector: a posterior sector including lower visual areas V1–V4, a ventral sector including object areas LOT and VOT, and a dorsal sector including higher PPC areas IPS2 to IPS4. Comparison across sectors revealed that the strengths of VWM representations were similar across the three ROI sectors and across trials with and without distractors (Xu, 2024). As in Bettencourt and Xu (2016a) and Rademaker et al. (2019), successful decoding was also obtained for distractors during the delay period in all the ROIs (Fig. 1D; see also Xu, 2024).

Transformation between VWM target and distractor representations

In the present study, because the same objects appeared as both VWM targets and distractors during the delay in different trials, there was a unique opportunity to test the extent to which perception and VWM shared the same representation under the same experimental setting. To accomplish this, a classifier was trained to decode a pair of objects when they were distractors, and its ability to decode the same objects was then tested when these objects were VWM targets during the delay period (Fig. 2A). For example, a classifier would be trained to differentiate a bike and a hanger when they were distractors in trials with couches being the VWM targets. The classifier would then be asked to differentiate a bike and a hanger when they were VWM targets with couches being the distractors. This was then compared with decoding within VWM (e.g., classifying a bike and a hanger target when couches were the distractors during both training and testing). If perception and VWM share the same representation, cross-decoding should be similar to within-decoding, if not greater, as the distractor-trained classifier would be more robust than the VWM-trained classifier given the much stronger distractor than VWM representation during the delay period, especially in sensory areas (Fig. 1D). However, if cross-decoding is substantially lower than within-decoding or at chance, it would indicate very little representational overlap between perception and VWM.

Figure 2.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 2.

Cross-distractor decoding of VWM targets for trials with distractors. A, An illustration of within-target and cross-distractor target decoding during the delay period. A classifier was either trained by VWM targets or by distractors to decode VWM targets. The distractors during VWM target training and the targets during distractor training were matched (e.g., couches in the examples shown). See Extended Data Figure 2-1 for additional results showing that the same results were obtained regardless of whether or not the irrelevant object remained the same or different in distractor training and target testing. B, Results of A for all the ROIs (left) and the three ROI sectors (right). C, Same as A, except target training and decoding occurred during the encoding period. Distractor training still occurred during the delay period. D, Results of C for all the ROIs (left) and the three ROI sectors (right). E, Cross-decoding ratios for both types of cross-decoding for all the ROIs. F, The same results as in E for the three ROI sectors. In B and D, the colored symbols above the bars mark the decoding significance of each bar compared with chance (0.5). The black symbols right above the colored symbols mark the significance of decoding drop between within and cross-decoding. In E and F, the colored symbols above the bars mark the significance of the ratio compared with 1. The black symbols right above the colored symbols mark the significance of the ratio difference between the two types of cross-decoding. In B, D, and F, the black symbols above the brackets mark the significance of the difference in either cross-decoding drop (B and D) or ratio for the two types of cross-decoding (F) between pairs of ROI sectors. Error bars indicate SEM. *p < 0.05, **0.01 < p < 0.001, ***p < 0.001.

While cross-decoding from distractors to VWM targets during the delay period was greater than chance in LOT/VOT and IPS2–IPS4, interestingly, it was no greater than chance in V1–V4 (Fig. 2B; see the asterisk marking the significance level; corrected for the two comparisons made in each ROI sector). Comparison between within- and cross-decoding revealed a significant drop across all the ROI sectors (Fig. 2B; the same results were obtained when the irrelevant object differed between training and testing; Extended Data Fig. 2-1). Comparisons across the sectors using repeated measures ANOVA revealed a main effect of cross-decoding drop and a main effect of sector (F’s > 5.78, p’s < 0.0083), but no interaction between the two (F(2,26) = 1.03, p = 0.37). Pairwise comparisons among the sectors revealed that the amount of the cross-decoding drop did not differ across any pairs of ROI sectors (t’s < 1.30, p’s > 0.43; two-tailed and corrected for the three comparisons made). There were thus significant representational transformations between perception and VWM across all three ROI sectors. While there was still significant overlap between perceptual and VWM representations in LOT/VOT and IPS2–IPS4, such overlap appeared to be absent in V1–V4.

If the cross-decoding drop observed above was a result of information transformation between perception and VWM, then when the distractor-trained classifier was to ask to decode the perceptual representations of the targets during VWM encoding instead of delay, no cross-decoding drop should be present in the sensory areas (Fig. 2C). Indeed, significant cross-decoding was found in all the ROI sectors, and critically, no cross-decoding drop was present in V1–V4 or LOT/VOT compared with when training and testing were performed within the targets during encoding (Fig. 2D). In sensory regions (i.e., V1–V4 and LOT/VOT), target representations during VWM encoding and distractor representations during VWM delay thus shared similar representations. Meanwhile, a significant cross-decoding drop was still present in IPS2–IPS4, indicating a difference in representation. Repeated measures ANOVA revealed that, across the sectors, there was no main effect of cross-decoding drop (F(1,13) = 1.49, p = 0.24), a main effect of sector, and an interaction between the two (F’s > 11.08, p’s < 0.001). Further pairwise comparisons revealed that the cross-decoding drop was greater in IPS2–IPS4 than in either V1–V4 or LOT/VOT (t’s > 2.83, p’s < 0.022; two-tailed and corrected); the drop did not differ between LOT/VOT and V1–V4 (t(13) = 1.67, p = 0.12; two-tailed and corrected).

Note that although the baseline decoding performance was higher in OTC than in PPC in the above analysis, simulations from a prior study showed that decoding accuracy followed the underlying signal strength in a linear manner (except when decoding was very close to chance 0.5 or very close to 1; Xu, 2024). This enabled a direct comparison of cross-decoding drops even when the baseline decoding strengths differed. Simulations further showed that a cross-decoding drop was absent when two conditions merely differed in SNR, resulting in different baseline decoding performances, but otherwise shared the same underlying representation (Xu and Chun, 2025). The cross-decoding results reported here thus reflected the nature of the underlying visual representation rather than artifacts of the decoding measure used.

To directly compare cross-decoding during VWM delay and encoding, to streamline the analysis, and to account for decoding performance differences across the different ROIs, following prior studies documenting tolerance of visual object representations in OTC to changes such as position and size (Xu and Vaziri-Pashkam, 2022) and tolerance of VWM representation across different distractor conditions (Xu, 2024), the within- and cross-decoding measures were combined to form a cross-decoding ratio for decoding during VWM encoding and delay. This was done by first subtracting 0.5 from the within- and cross-decoding accuracies and then taking the ratio of the two resulting values. A ratio of 1 would indicate equally good decoding performance within VWM and across distractors and a complete generalization of perception and VWM, whereas a ratio of 0 would instead indicate a complete failure of such generalization. This analysis yielded similar results across the individual ROI and the three ROI sectors (Fig. 2E,F). Comparing the ratios across the sectors using repeated measures ANOVA revealed a main effect of cross-decoding type (VWM encoding vs delay), a main effect of sector, and an interaction between the two (F’s > 11.99, p’s < 0.001). Further pairwise comparisons revealed that the ratio difference between the two types of cross-decoding was smaller in IPS2–IPS4 than in either V1–V4 or LOT/VOT and smaller in LOT/VOT than in V1–V4 (t’s > 3.68, p’s < 0.01; two-tailed and corrected).

Thus, even under an identical experimental setting, there existed a large drop in cross-decoding between perception (i.e., distractors) and VWM across all the ROIs, showing that in neither OTC nor PPC, perception and VWM shared the same representation. Notably, while some representational overlap existed between perception and VWM in LOT/VOT and IPS2–IPS4 (i.e., with above-chance cross-decoding), such overlap was not observed in EVC (i.e., with chance-level cross-decoding). As expected, distractor representations were highly similar to target representations during target encoding in OTC, indicating a shared perceptual representation for both. However, distractor and target representations differed even during encoding in higher PPC regions, showing that these regions represent the same perceptual input differently based on the goal of visual processing. This is consistent with the goal-driven and adaptative nature of visual information processing in PPC (for extended reviews, see Xu, 2018a,b; see also Jeong and Xu, 2013; Xu and Jeong, 2015; Bracci et al., 2017; Vaziri-Pashkam and Xu, 2017).

The impact of distraction on VWM representation transformation

The preceding analysis showed that in trials with distractors during the delay period, even when the experimental testing conditions were matched, there were significant differences between VWM and perceptual representations across OTC and PPC. Given that distractions are ubiquitous in everyday life, these results suggest that in real-world vision, VWM, and perceptual representations likely differ substantially. Nevertheless, it may be argued that the representational difference between VWM and perception could be a result of distraction, with the presence of distractors altering target representation. Xu (2024) showed that targets and distractors formed orthogonal representations in PPC. Such a representational geometry allowed targets to be read out independently of distraction and effectively resisted distraction in PPC. The same representational scheme, however, was absent in OTC, with target representations being different in different distractor conditions. It is thus possible that, in the absence of distraction, similar VWM and perceptual representations may still be obtained.

To test this idea, in this analysis, after a classifier was trained on a pair of objects when they were distractors, the classifier was tested on the same pair of objects when they were targets both in trials with distractors (as in the previous analysis) and in trials without distractors (Fig. 3A,B), and the results were compared. The results from each ROI were reported in Figure 3C. Significant cross-decoding drop for trials without distractors was still present in a number of ventral sensory areas, including V1, V2, and LOT (although the results from the other ventral areas were noisy, they largely showed the same trend). When a repeated measures ANOVA was performed in each ROI with trial type (with vs without distractors) and decoding (within vs cross-decoding) being the independent variables, a majority of them showed a main effect of cross-decoding drop; critically, none of the ventral sensory areas showed an interaction between trial type and cross-decoding (Table 1). Confirming these results, the three ROI sectors all showed a main effect of cross-decoding but no interaction effects (Table 2; Fig. 3D).

Figure 3.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 3.

Comparing cross-distractor decoding of VWM targets for trials with and without distractors. A, An illustration of within-target and cross-distractor target decoding during the delay period for trials with distractors (same as in Fig. 2A). B, Same as A, but for trials without distractors. C, Results of A and B, plotted for separately each ROI. D, Same as C, for the three ROI sectors. In C and D, the colored symbols above the bars mark the decoding significance of each bar compared with chance (0.5). The black symbols above the colored symbols mark the significance of the decoding drop between within and cross-decoding. Error bars indicate SEM. *p < 0.05, **0.01 < p < 0.001, ***p < 0.001.

View this table:
  • View inline
  • View popup
Table 1.

Repeated measures ANOVA results for the individual ROIs for trial type (trials with vs without distractors), decoding (within vs cross-decoding), and their interaction

View this table:
  • View inline
  • View popup
Table 2.

Repeated measures ANOVA results for the three ROI sectors for trial type (trials with vs without distractors, decoding (within vs cross-decoding), and their interaction

Because cross-decoding involved training the classifier on the strong sensory representations of the distractors to decode the much weaker VWM representations, cross-decoding could be high as long as there was some overlap in representation. This worked against finding a cross-decoding drop when within-decoding involved training and testing within the weak VWM signal. Even so, a significant cross-decoding drop was still seen across a number of ventral sensory areas in trials without distractors. Moreover, despite the cross-decoding drop being numerically smaller for trials without than those with distractors, the cross-decoding drop did not significantly differ across the two trial types in the ventral sensory areas. Overall, these results showed the presence of a significant representational transformation between perception and VWM in ventral sensory areas even when distractors were not shown during the delay period.

Transformation between VWM encoding and delay representations

Results from the first analysis showed that, in OTC, a substantial amount of transformation in visual representations occurred between perception and VWM. However, the amount of transformation in PPC remained unknown as distractor and target representations differed during both encoding and delay. Although PPC could contain similar representations during VWM encoding and delay, a transformation could also occur as incoming sensory information was consolidated into VWM representations. To evaluate these two possibilities, in this analysis, a classifier was trained to decode a pair of targets during encoding; it was then asked to decode the same pair of targets during both encoding (within-decoding) or delay (cross-decoding). The same analysis was also carried out in the reverse direction, with training performed during delay and testing performed during both encoding and delay. The results were averaged across both directions of training and testing (Fig. 4A,C). Note that although cross-decoding between VWM encoding and delay has been performed in previous studies (Riggall and Postle, 2012; Emrich et al., 2013; Sreenivasan et al., 2014 ; Kwak and Curtis, 2022; Xu, 2023), none of these studies included trials with distractors; additionally, some of them either did not include PPC areas or did not explicitly quantified the difference between OTC and PPC areas.

Figure 4.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 4.

Within and cross-decoding between VWM encoding and delay periods. A, An illustration of within and cross-decoding between encoding and delay periods for trials with distractors. B, Results of A for all the ROIs (left) and the three ROI sectors (right). C, An illustration of within and cross-decoding between encoding and delay periods for trials without distractors. D, Results of C for all the ROIs (left) and the three ROI sectors (right). E, Cross-decoding ratios for both types of trials for all the ROIs. F, The same results for the three ROI sectors. In B and D, the colored symbols above the bars mark the decoding significance of each bar compared with chance (0.5). The black symbols right above the colored symbols mark the significance of decoding drop between within and cross-decoding. In E and F, the colored symbols above the lines/bars mark the significance of the ratio compared with 1. The black symbols right above the colored symbols mark the significance of the ratio difference between the two types of trials. In B, D, and F, the black symbols above the brackets mark the significance of the difference in either cross-decoding drop (B and D) or overall ratio (F) between pairs of ROI sectors. Error bars indicate SEM. *p < 0.05, **0.01 < p < 0.001, ***p < 0.001.

For both trials with distractors (Fig. 4A) and trials without distractors (Fig. 4C), significantly above-chance cross-decoding was found in all the ROI sectors; however, there was also a significant cross-decoding drop across all the ROI sectors, with the drop being greater in OTC than in PPC areas (Fig. 4B,D; see also results for the individual ROIs in these figures). Direct comparison across the ROI sectors using repeated measures ANOVA revealed a main effect of cross-decoding drop, a main effect of sector, and an interaction between the two (F’s > 13.15, p’s < 0.001) for both trial types. Further pairwise comparisons revealed that the drop in cross-decoding was much smaller in IPS2–IPS4 than in either V1–V4 or LOT/VOT (t’s > 4.62, p’s < 0.001), with no difference between the latter two (t’s < 0.25, p’s > 0.40) for both trial types.

Similar results were also obtained using the cross-decoding ratio measure (as detailed in the first analysis) for both the individual ROIs and the three ROI sectors (Fig. 4E,F). Specifically, comparisons across the sectors revealed no effect of trial type (F(1,13) = 0.13, p = 0.73), an effect of sector (F(2,26) = 5.30, p = 0.012), and no interaction between the two (F(2,26) = 0.37, p = 0.69). Further pairwise comparisons revealed that the overall cross-decoding ratio was higher in IPS2–IPS4 than in either V1–V4 or LOT/VOT and higher in LOT/VOT than in V1–V4 (t’s > 1.79, p’s < 0.048).

Overall, these results revealed significant cross-decoding between VWM encoding and delay. These results by themselves, however, did not necessarily indicate representational overlap between VWM encoding and delay. Due to the sluggishness of the fMRI signal, partial fMRI signal overlap between encoding and delay could also result in successful cross-decoding between these two stages of VWM processing. A significant cross-decoding drop, however, did signal representational transformation between VWM encoding and delay in both OTC and PPC. Given that substantially less transformation was found in PPC than in OTC, even with transformation, object representations appeared to be more stable in PPC than in OTC across the two stages of VWM processing. Notably, these effects were not impacted by the presence of distractors during the delay period, showing that the change in representation was not a result of distraction in VWM.

Discussion

A host of recent studies reported visual representation transformations between perception and VWM in human EVC (Kwak and Curtis, 2022; Li and Curtis, 2023; Xu, 2023; Yan et al., 2023; Duan and Curtis, 2024; see also behavioral evidence from Harrison and Bays, 2018; Yörük et al., 2020). These results are at odds with one aspect of the sensory account of VWM storage as it was originally proposed, which argues that representations formed during perception are maintained during VWM in sensory regions (D’Esposito and Postle, 2015; Serences, 2016; Christophel et al., 2017). One key evidence supporting this account was the finding that a linear classifier trained to distinguish neural representations in a separate perceptual task could do so successfully for the representations held in VWM (Harrison and Tong, 2009; Albers et al., 2013; Rademaker et al., 2019). Notably, cross-decoding performance was usually lower than within-decoding when the classifier was trained and tested within VWM. This cross-decoding drop, although consistent with there being a representational transformation between perception and VWM, has been attributed to differences in the experimental settings and stimuli between perceptual and VWM tasks, with the assumption being that should these factors be properly controlled for, minimal cross-decoding drop is expected (Harrison and Tong, 2009). In light of recent evidence arguing against sensory representations in VWM, it is critical to reexamine this initial evidence and document whether a cross-decoding drop is still present when experiment conditions are properly matched between perception and VWM. Doing so would clarify an important aspect of the original proposal of the sensory account of VWM, which is still widely cited today, and improve our understanding of the nature of VWM representations.

The present study analyzed the data from a recent study (Xu, 2024) in which the same objects appeared either as VWM targets or distractors during the delay period in different trials, thereby creating the same experimental setting for decoding perceptual and VWM representations for the same set of objects. Although no task was imposed on the distractors (only passive viewing), since distractors were shown at fixation at rapid succession, it was impossible to ignore them, as evidenced by the ceiling-level distractor decoding performance. Even with strong VWM representations present throughout OTC (including EVC) and PPC, when a linear classifier was trained on a pair of objects when they were distractors, to then decode the same objects when they were VWM targets during the same delay period, in both regions, a significant cross-decoding drop was obtained compared with within-decoding when training and testing were both performed within VWM targets during the delay. In V1–V4, cross-decoding did not differ from chance, indicating minimal overlap between perception and VWM in these brain areas. Given that visual representations in the sensory regions were much stronger for the distractors (visible and sensory) than for the VWM targets (invisible), if perception and VWM shared a representation, training on the distractors to decode the VWM targets should outperform within-decoding. The presence of a significant cross-decoding drop accompanied by chance-level cross-decoding in some sensory areas thus indicates distinctive perceptual and VWM representations, inconsistent with the sensory nature of VWM as put forward by the original proposal of the sensory account of VWM (see also Duan and Curtis, 2024, for a similar argument).

Meanwhile, a distractor-trained classifier was able to successfully decode targets during the encoding period with no cross-decoding drop in OTC (but not in PPC). Thus, similar representations were formed in sensory areas for perceptual representations, whether or not a stimulus is viewed actively during VWM encoding or passively during VWM delay. This provides further validation of the matched experimental settings for targets and distractors in the present experimental paradigm. Given PPC’s involvement in goal-directed visual processing (Xu, 2018a,b) and its ability to encode the same information differently based on task relevancy (Xu and Vaiziri-Pashkam, 2019; Taylor and Xu, 2024), it appears that PPC “transforms” input earlier in the process to differentiate targets from distractors. It may be argued that the representational difference between VWM and perception is due to the presence of distraction during the delay period, as Xu (2024) showed that target representations differed in different distractor conditions in OTC (but not in PPC). Given the ubiquitous presence of distractions in everyday vision (i.e., we do not usually stare at a blank screen when we hold information in VWM in the real world), distraction in VWM is the norm rather than the exception. The results from VWM under distraction would thus more closely resemble VWM in real-world vision than those without distraction. That being said, even in trials without distractions, significant representational change between perception and VWM was still observed across a number of ventral sensory areas, and there was no statistical evidence showing an effect of distraction here.

Assuming VWM representation during encoding is largely sensory in nature (especially in OTC), the same conclusion is also reached from cross-decoding results between VWM encoding and delay. Although prior work examined such cross-decoding (Riggall and Postle, 2012; Emrich et al., 2013; Sreenivasan et al., 2014; Kwak and Curtis, 2022; Xu, 2023), none included trials with distractors; additionally, some did not include PPC or explicitly quantify the difference between OTC and PPC. The present study found a significant cross-decoding drop across OTC and PPC, with the drop being greater in OTC than in PPC and unaffected by the presence of distractors. These results again showed a representational change between perception and VWM, indicating that representational transformation is a natural part of VWM consolidation and echoing the cross-decoding results obtained from training distractors to decode targets.

By training with a sensory task to decode the content of VWM, a previous study reported much lower perception to VWM cross-decoding in PPC than in EVC (Rademaker et al., 2019), leading to the conclusion that greater transformation between perception and VWM exists in PPC than in EVC. However, without participants attending to the stimuli in the sensory task and with PPC visual representations being more attention and task-driven than those in EVC (Xu, 2018a; see also a similar comment made by Rademaker et al., 2019), as noted by Xu (2020), the perceptual classifier employed in cross-decoding was likely much weaker in PPC than in EVC. This could have distorted the cross-decoding results in an unintended way. Without this complication, the present study found quite the opposite results, with PPC representations exhibiting less transformation and more stability and consistency across VWM processing than those in OTC. Meanwhile, because our study and Rademaker et al. (2019) differed in other aspects, such as stimuli and task demands, it remains a possibility that factors other than attention engagement could have contributed to the observed discrepancy in results.

The results of the present study thus support a transformed account of VWM representation in which perceptual input is transformed before it is stored in VWM, consistent with more recent evaluations of the sensory account of VWM (Adam et al., 2022). What could cause such a transformation? One possibility is that representations are naturally rotated between perception and VWM as part of the memory consolidation process, resulting in a significant cross-decoding drop. Indeed, rotation in visual representation has been observed in a host of other visual processes, including priority-based target coding in VWM (Xie, et al., 2022), separating attention-related versus VWM-related processing (Panichello and Buschman, 2021; Jahn et al., 2024), separating sensory input and stored long-term memory information (Libby and Buschman, 2021), separating target and distractor and separating target and target representations in VWM (Xu, 2024), and representing visual information in different tasks (Taylor and Xu, 2024).

Besides a rotation in the representational space, the transformation may also be caused by a change in the representational content between perception and VWM. At any given moment, only a fraction of the real-world perceptual input is relevant to behavior. A direct transfer of the perceived sensory information to VWM, as championed by the original proposal of the sensory account of VWM, is neither an efficient use of the limited VWM resources nor is it necessary. From the perspective of goal-directed visual information processing, significant transformations between perception and VWM are expected to allow the selection and consolidation of the most task-relevant information to be stored in VWM. Given PPC’s greater ability to encode task-relevant information than that of OTC during perception (Xu, 2018a) and PPC’s ability to drive the representational content of OTC during VWM delay (Xu, 2023), transformation is expected to be greater in OTC than in PPC. A task-driven transformed account of VWM can thus easily explain the present results and a host of recent findings showing transformed representations in VWM (Kwak and Curtis, 2022; Li and Curtis, 2023; Xu, 2023; Yan et al., 2023; Duan and Curtis, 2024). While it is possible to use tailor-made stimuli to elicit only the intended task-relevant features and then store them in VWM, thereby minimizing the transformation between perception and VWM to retain the sensory representations in VWM, such a scenario, however, is an exception, rather than the norm, of how visual information is normally extracted and stored in VWM, especially given the rich and complex nature of real-world visual inputs. The utility of the sensory nature of VWM representation as originally proposed by the sensory account of VWM is thus limited in this regard (see other drawbacks of this account, e.g., Bettencourt and Xu, 2016a; Leavitt et al., 2017; Xu, 2017, 2018c, 2020, 2021).

To conclude, the present study reevaluates critical evidence supporting the sensory nature of VWM representation in human sensory regions and shows that a task-driven transformed account better captures how visual information is retained in VWM in these regions.

Data Availability

All data files and analysis scripts contributed to this study are available at https://osf.io/8rbkh/.

Footnotes

  • The author declares no competing financial interests.

  • This work was supported by the National Eye Institute of the National Institutes of Health under Award Number R01EY030854 to Y.X. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. I thank SuKeun Jeong for creating the visual stimuli, Judy Young Hye Kwon and Hillary Nguyen for their assistance in fMRI data collection, and Marvin Chun for the general support.

This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license, which permits unrestricted use, distribution and reproduction in any medium provided that the original work is properly attributed.

References

  1. ↵
    1. Adam KC,
    2. Rademaker RL,
    3. Serences JT
    (2022) Evidence for, and challenges to, sensory recruitment models of visual working memory. In: Visual memory (Brady TF, Bainbridge WA, eds), pp 5–25. Routledge.
  2. ↵
    1. Albers AM,
    2. Kok P,
    3. Toni I,
    4. Dijkerman HC,
    5. de Lange FP
    (2013) Shared representations for working memory and mental imagery in early visual cortex. Curr Biol 23:1427–1431. https://doi.org/10.1016/j.cub.2013.05.065
    OpenUrlCrossRefPubMed
  3. ↵
    1. Benjamini Y,
    2. Hochberg Y
    (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B 57:289–300. https://doi.org/10.1111/j.2517-6161.1995.tb02031.x
    OpenUrlCrossRefPubMed
  4. ↵
    1. Bettencourt K,
    2. Xu Y
    (2013) The role of transverse occipital sulcus in scene perception and its relationship to object individuation in inferior intraparietal sulcus. J Cogn Neurosci 25:1711–1722. https://doi.org/10.1162/jocn_a_00422 pmid:23662863
    OpenUrlCrossRefPubMed
  5. ↵
    1. Bettencourt KC,
    2. Xu Y
    (2016a) Decoding under distraction reveals distinct occipital and parietal contributions to visual short-term memory representation. Nat Neurosci 19:150–157. https://doi.org/10.1038/nn.4174 pmid:26595654
    OpenUrlCrossRefPubMed
  6. ↵
    1. Bettencourt KC,
    2. Xu Y
    (2016b) Understanding location- and feature-based processing along the human intraparietal sulcus. J Neurophysiol 116:1488–1497. https://doi.org/10.1152/jn.00404.2016 pmid:27440243
    OpenUrlCrossRefPubMed
  7. ↵
    1. Bracci S,
    2. Daniels N,
    3. Op de Beeck H
    (2017) Task context overrules object- and category-related representational content in the human parietal cortex. Cereb Cortex 27:310–321. https://doi.org/10.1093/cercor/bhw419 pmid:28108492
    OpenUrlCrossRefPubMed
  8. ↵
    1. Brainard DH
    (1997) The Psychophysics toolbox. Spat Vis 10:433–436. https://doi.org/10.1163/156856897X00357
    OpenUrlCrossRefPubMed
  9. ↵
    1. Brodoehl S,
    2. Gaser C,
    3. Dahnke R,
    4. Witte OW,
    5. Klingner CM
    (2020) Surface-based analysis increases the specificity of cortical activation patterns and connectivity results. Sci Rep 10:5737. https://doi.org/10.1038/s41598-020-62832-z pmid:32235885
    OpenUrlCrossRefPubMed
  10. ↵
    1. Chang CC,
    2. Lin CJ
    (2011) LIBSVM: a library for support vector machines. Acm T Intel Syst Tec 2:1–27. doi:10.1145/1961189.1961199
    OpenUrlCrossRef
  11. ↵
    1. Christophel TB,
    2. Klink PC,
    3. Spitzer B,
    4. Roelfsema PR,
    5. Haynes JD
    (2017) The distributed nature of working memory. Trends Cogn Sci 21:111–124. https://doi.org/10.1016/j.tics.2016.12.007
    OpenUrlCrossRefPubMed
  12. ↵
    1. Dale AM,
    2. Fischl B,
    3. Sereno MI
    (1999) Cortical surface-based analysis. I. Segmentation and surface reconstruction. Neuroimage 9:179–194. https://doi.org/10.1006/nimg.1998.0395
    OpenUrlCrossRefPubMed
  13. ↵
    1. D’Esposito M,
    2. Postle BR
    (2015) The cognitive neuroscience of working memory. Annu Rev Psychol 66:115–142. https://doi.org/10.1146/annurev-psych-010814-015031 pmid:25251486
    OpenUrlCrossRefPubMed
  14. ↵
    1. Duan Z,
    2. Curtis CE
    (2024) Visual working memories are abstractions of percepts. Elife 13:RP94191. https://doi.org/10.7554/eLife.94191.3
    OpenUrlCrossRefPubMed
  15. ↵
    1. Emrich SM,
    2. Riggall AC,
    3. LaRocque JJ,
    4. Postle BR
    (2013) Distributed patterns of activity in sensory cortex reflect the precision of multiple items maintained in visual short-term memory. J Neurosci 33:6516–6523. https://doi.org/10.1523/JNEUROSCI.5732-12.2013 pmid:23575849
    OpenUrlAbstract/FREE Full Text
  16. ↵
    1. Grill-Spector K,
    2. Kushnir T,
    3. Edelman S,
    4. Itzchak Y,
    5. Malach R
    (1998) Cue-invariant activation in object-related areas of the human occipital lobe. Neuron 21:191–202. https://doi.org/10.1016/S0896-6273(00)80526-7
    OpenUrlCrossRefPubMed
  17. ↵
    1. Harrison SA,
    2. Tong F
    (2009) Decoding reveals the contents of visual working memory in early visual areas. Nature 458:632–635. https://doi.org/10.1038/nature07832 pmid:19225460
    OpenUrlCrossRefPubMed
  18. ↵
    1. Harrison WJ,
    2. Bays PM
    (2018) Visual working memory is independent of the cortical spacing between memoranda. J Neurosci 38:3116–3123. https://doi.org/10.1523/JNEUROSCI.2645-17.2017 pmid:29459370
    OpenUrlAbstract/FREE Full Text
  19. ↵
    1. Jahn CI,
    2. Markov NT,
    3. Morea B,
    4. Daw ND,
    5. Ebitz RB,
    6. Buschman TJ
    (2024) Learning attentional templates for value-based decision-making. Cell 187:1476–1489. https://doi.org/10.1016/j.cell.2024.01.041 pmid:38401541
    OpenUrlCrossRefPubMed
  20. ↵
    1. Jeong SK,
    2. Xu Y
    (2013) Neural representation of targets and distractors during object individuation and identification. J Cogn Neurosci 25:117–126. https://doi.org/10.1162/jocn_a_00298 pmid:23198893
    OpenUrlCrossRefPubMed
  21. ↵
    1. Jeong SK,
    2. Xu Y
    (2017) Task-context dependent linear representation of multiple visual objects in human parietal cortex. J Cogn Neurosci 29:1778–1789. https://doi.org/10.1162/jocn_a_01156
    OpenUrl
  22. ↵
    1. Kourtzi Z,
    2. Kanwisher N
    (2000) Cortical regions involved in perceiving object shape. J Neurosci 20:3310–3318. https://doi.org/10.1523/JNEUROSCI.20-09-03310.2000 pmid:10777794
    OpenUrlAbstract/FREE Full Text
  23. ↵
    1. Kwak Y,
    2. Curtis CE
    (2022) Unveiling the abstract format of mnemonic representations. Neuron 110:1822–1828. https://doi.org/10.1016/j.neuron.2022.03.016 pmid:35395195
    OpenUrlCrossRefPubMed
  24. ↵
    1. Leavitt ML,
    2. Mendoza-Halliday D,
    3. Martinez-Trujillo JC
    (2017) Sustained activity encoding working memories: not fully distributed. Trends Neurosci 40:328–346. https://doi.org/10.1016/j.tins.2017.04.004
    OpenUrlCrossRefPubMed
  25. ↵
    1. Li HH,
    2. Curtis CE
    (2023) Neural population dynamics of human working memory. Curr Biol 33:3775–3784. https://doi.org/10.1016/j.cub.2023.07.067 pmid:37595590
    OpenUrlCrossRefPubMed
  26. ↵
    1. Libby A,
    2. Buschman TJ
    (2021) Rotational dynamics reduce interference between sensory and memory representations. Nat Neurosci 24:715–726. https://doi.org/10.1038/s41593-021-00821-9 pmid:33821001
    OpenUrlCrossRefPubMed
  27. ↵
    1. Malach R,
    2. Reppas JB,
    3. Benson RR,
    4. Kwong KK,
    5. Jiang H,
    6. Kennedy WA,
    7. Ledden PJ,
    8. Brady TJ,
    9. Rosen BR,
    10. Tootell RB
    (1995) Object-related activity revealed by functional magnetic resonance imaging in human occipital cortex. Proc Natl Acad Sci U S A 92:8135–8139. https://doi.org/10.1073/pnas.92.18.8135 pmid:7667258
    OpenUrlAbstract/FREE Full Text
  28. ↵
    1. Ongchoco JDK,
    2. Xu Y
    (2024) Visual event boundaries trigger forgetting despite active maintenance in visual working memory. J Vis 24:1–11. https://doi.org/10.1167/jov.24.9.9 pmid:39259169
    OpenUrlPubMed
  29. ↵
    1. Oosterhof NN,
    2. Wiestler T,
    3. Downing PE,
    4. Diedrichsen J
    (2011) A comparison of volume-based and surface-based multi-voxel pattern analysis. Neuroimage 56:593–600. https://doi.org/10.1016/j.neuroimage.2010.04.270
    OpenUrlCrossRefPubMed
  30. ↵
    1. Panichello MF,
    2. Buschman TJ
    (2021) Shared mechanisms underlie the control of working memory and attention. Nature 592:601–605. https://doi.org/10.1038/s41586-021-03390-w pmid:33790467
    OpenUrlCrossRefPubMed
  31. ↵
    1. Rademaker RL,
    2. Chunharas C,
    3. Serences JT
    (2019) Coexisting representations of sensory and mnemonic information in human visual cortex. Nat Neurosci 22:1336–1344. https://doi.org/10.1038/s41593-019-0428-x pmid:31263205
    OpenUrlCrossRefPubMed
  32. ↵
    1. Riggall AC,
    2. Postle BR
    (2012) The relationship between working memory storage and elevated activity as measured with functional magnetic resonance imaging. J Neurosci 32:12990–12998. https://doi.org/10.1523/JNEUROSCI.1892-12.2012 pmid:22993416
    OpenUrlAbstract/FREE Full Text
  33. ↵
    1. Serences JT,
    2. Ester EF,
    3. Vogel EK,
    4. Awh E
    (2009) Stimulus-specific delay activity in human primary visual cortex. Psychol Sci 20:207–214. https://doi.org/10.1111/j.1467-9280.2009.02276.x pmid:19170936
    OpenUrlCrossRefPubMed
  34. ↵
    1. Serences JT
    (2016) Neural mechanisms of information storage in visual short-term memory. Vis Res 128:53–67. https://doi.org/10.1016/j.visres.2016.09.010 pmid:27668990
    OpenUrlCrossRefPubMed
  35. ↵
    1. Sereno MI,
    2. Dale AM,
    3. Reppas JB,
    4. Kwong KK,
    5. Belliveau JW,
    6. Brady TJ,
    7. Rosen BR,
    8. Tootell RB
    (1995) Borders of multiple visual areas in humans revealed by functional magnetic resonance imaging. Science 268:889–893. https://doi.org/10.1126/science.7754376
    OpenUrlAbstract/FREE Full Text
  36. ↵
    1. Sreenivasan KK,
    2. Vytlacil J,
    3. D'Esposito M
    (2014) Distributed and dynamic storage of working memory stimulus information in extrastriate cortex. J Cogn Neurosci 26:1141–1153. https://doi.org/10.1162/jocn_a_00556 pmid:24392897
    OpenUrlCrossRefPubMed
  37. ↵
    1. Swisher JD,
    2. Halko MA,
    3. Merabet LB,
    4. McMains SA,
    5. Somers DC
    (2007) Visual topography of human intraparietal sulcus. J Neurosci 27:5326–5337. https://doi.org/10.1523/JNEUROSCI.0991-07.2007 pmid:17507555
    OpenUrlAbstract/FREE Full Text
  38. ↵
    1. Taylor J,
    2. Xu Y
    (2022) Representation of color, form, and their conjunction across the human ventral visual pathway. Neuroimage 251:118941. https://doi.org/10.1016/j.neuroimage.2022.118941 pmid:35122966
    OpenUrlCrossRefPubMed
  39. ↵
    1. Taylor J,
    2. Xu Y
    (2024) Using fMRI to examine nonlinear mixed selectivity tuning to task and category in the human brain. Imaging Neurosci 2:1–21. https://doi.org/10.1162/imag_a_00354 pmid:40534624
    OpenUrlCrossRefPubMed
  40. ↵
    1. Vaziri-Pashkam M,
    2. Taylor J,
    3. Xu Y
    (2019) Spatial frequency tolerant visual object representations in the human ventral and dorsal visual processing pathways. J Cogn Neurosci 31:49–63. https://doi.org/10.1162/jocn_a_01335
    OpenUrlCrossRefPubMed
  41. ↵
    1. Vaziri-Pashkam M,
    2. Xu Y
    (2017) Goal-directed visual processing differentially impacts human ventral and dorsal visual representations. J Neurosci 37:8767–8782. https://doi.org/10.1523/JNEUROSCI.3392-16.2017 pmid:28821655
    OpenUrlAbstract/FREE Full Text
  42. ↵
    1. Vaziri-Pashkam M,
    2. Xu Y
    (2019) An information-driven 2-pathway characterization of occipitotemporal and posterior parietal visual object representations. Cereb Cortex 29:2034–2050. https://doi.org/10.1093/cercor/bhy080 pmid:29659730
    OpenUrlCrossRefPubMed
  43. ↵
    1. Xie Y, et al.
    (2022) Geometry of sequence working memory in macaque prefrontal cortex. Science 375:632–639. https://doi.org/10.1126/science.abm0204
    OpenUrlCrossRefPubMed
  44. ↵
    1. Xu Y
    (2017) Reevaluating the sensory account of visual working memory storage. Trends Cogn Sci 21:794–815. https://doi.org/10.1016/j.tics.2017.06.013
    OpenUrlCrossRefPubMed
  45. ↵
    1. Xu Y
    (2018a) A tale of two visual systems: invariant and adaptive visual information representations in the primate brain. Annu Rev Vis Sci 4:311–336. https://doi.org/10.1146/annurev-vision-091517-033954
    OpenUrl
  46. ↵
    1. Xu Y
    (2018c) Sensory cortex is nonessential in working memory storage. (A reply to commentaries). Trends Cogn Sci 22:192–193. https://doi.org/10.1016/j.tics.2017.12.008
    OpenUrlCrossRefPubMed
  47. ↵
    1. Xu Y
    (2020) Revisit once more the sensory storage account of visual working memory. Vis Cogn 28:433–336. https://doi.org/10.1080/13506285.2020.1818659 pmid:33841024
    OpenUrlCrossRefPubMed
  48. ↵
    1. Xu Y
    (2021) Towards a better understanding of information storage in visual working memory. Vis Cogn 29:437–445. https://doi.org/10.1080/13506285.2021.1946230 pmid:35496937
    OpenUrlPubMed
  49. ↵
    1. Xu Y
    (2023) Parietal-driven visual working memory representation in occipito-temporal cortex. Curr Biol 33:4516–4523. https://doi.org/10.1016/j.cub.2023.08.080 pmid:37741281
    OpenUrlCrossRefPubMed
  50. ↵
    1. Xu Y
    (2024) The human posterior parietal cortices orthogonalize the representation of different streams of information concurrently coded in visual working memory. PLoS Biol 22:e3002915. https://doi.org/10.1371/journal.pbio.3002915 pmid:39570984
    OpenUrlCrossRefPubMed
  51. ↵
    1. Xu Y,
    2. Chun M
    (Forthcoming 2025) Representing visual objects, attention, and load in human occipitotemporal and posterior parietal cortices. J Cogn Neurosci.
  52. ↵
    1. Xu Y,
    2. Jeong SK
    (2015) The contribution of human superior intraparietal sulcus to visual short-term memory and perception. Mech Sens Work Mem: Attent Perform XXV 1:33–42. https://doi.org/10.1016/B978-0-12-801371-7.00004-1
    OpenUrl
  53. ↵
    1. Xu Y,
    2. Vaziri-Pashkam M
    (2019) Task modulation of the 2-pathway characterization of occipitotemporal and posterior parietal visual object representations. Neuropsychologia 132:107140. https://doi.org/10.1016/j.neuropsychologia.2019.107140 pmid:31301350
    OpenUrlCrossRefPubMed
  54. ↵
    1. Xu Y,
    2. Vaziri-Pashkam M
    (2022) Understanding transformation tolerant visual object representations in the human brain and convolutional neural networks. Neuroimage 263:19635. https://doi.org/10.1016/j.neuroimage.2022.119635 pmid:36116617
    OpenUrlPubMed
  55. ↵
    1. Yan C,
    2. Christophel TB,
    3. Allefeld C,
    4. Haynes J
    (2023) Categorical working memory codes in human visual cortex. Neuroimage 274:120149. https://doi.org/10.1016/j.neuroimage.2023.120149
    OpenUrlCrossRefPubMed
  56. ↵
    1. Yörük H,
    2. Santacroce LA,
    3. Tamber-Rosenau BJ
    (2020) Reevaluating the sensory recruitment model by manipulating crowding in visual working memory representations. Psychon Bull Rev 27:1383–1396. https://doi.org/10.3758/s13423-020-01757-0
    OpenUrl

Synthesis

Reviewing Editor: Alexander Soutschek, Ludwig-Maximilians-Universitat Munchen

Decisions are customarily a result of the Reviewing Editor and the peer reviewers coming together and discussing their recommendations until a consensus is reached. When revisions are invited, a fact-based synthesis statement explaining their decision and outlining what is needed to prepare a revision will be listed below. The following reviewer(s) agreed to reveal their identity: Margaret Henderson. Note: If this manuscript was transferred from JNeurosci and a decision was made to accept the manuscript without peer review, a brief statement to this effect will instead be what is listed below.

The current study aims to test whether working memory and sensory representations share identical formats-an idea suggested in some early formulations of the sensory account (also known as the sensory recruitment model). The author shows a significant drop in cross-decoding performance between the working memory delay and distractor encoding,and between the working memory delay and working memory encoding. Although similar effects have been reported in previous studies (e.g., Rademaker et al., 2019; Kwak and Curtis, 2022), the current work provides stronger evidence by carefully matching conditions, as the author highlights. This supports the view that working memory representations are not a direct copy of sensory representations.

That said, I believe a few points could benefit from further clarification. One is how the sensory recruitment account is characterized. While some early studies (e.g., Serences et al., 2009; D'Esposito &Postle, 2015) described the account as involving both the engagement of sensory regions and a shared representational format between perception and working memory, more recent literature (e.g., Kirsten et al., Visual Memory, 2022; Yörük et al., Psychon. Bull. Rev., 2020) appears to focus on only the former, while acknowledging that working memory representations may transform during maintenance. Recent work emphasizes the engagement of sensory regions during WM maintenance, without necessarily assuming that WM codes are identical to sensory codes. Since the present study addresses the question of representational similarity, the conclusions may be more appropriately framed as a challenge to an earlier version of the account rather than testing a core prediction of its current form.

Another point. In the discussion, the author suggests that the difference in PPC decoding between the current study and Rademaker et al. (2019) may be due to differences in attentional engagement across tasks. While this is a reasonable possibility, the two studies also differ in other respects, including stimuli and task demands. It might be worth noting that attentional engagement is likely one of several potential factors contributing to the observed discrepancy.

Author Response

I thank the editor and the reviewers for their time and effort, and their comments. I have addressed in detail below each comment raised by the new reviewer and have revised the manuscript accordingly. The changes made are marked in blue in the revised manuscript. For easy referencing, I include the reviewer's original comments in the italicized text below. ******************************** Reviewer comment :

The current study aims to test whether working memory and sensory representations share identical formats-an idea suggested in some early formulations of the sensory account (also known as the sensory recruitment model). The author shows a significant drop in cross-decoding performance between the working memory delay and distractor encoding, and between the working memory delay and working memory encoding. Although similar effects have been reported in previous studies (e.g., Rademaker et al., 2019; Kwak and Curtis, 2022), the current work provides stronger evidence by carefully matching conditions, as the author highlights. This supports the view that working memory representations are not a direct copy of sensory representations.

That said, I believe a few points could benefit from further clarification. One is how the sensory recruitment account is characterized. While some early studies (e.g., Serences et al., 2009; D'Esposito &Postle, 2015) described the account as involving both the engagement of sensory regions and a shared representational format between perception and working memory, more recent literature (e.g., Kirsten et al., Visual Memory, 2022; Yörük et al., Psychon. Bull. Rev., 2020) appears to focus on only the former, while acknowledging that working memory representations may transform during maintenance. Recent work emphasizes the engagement of sensory regions during WM maintenance, without necessarily assuming that WM codes are identical to sensory codes. Since the present study addresses the question of representational similarity, the conclusions may be more appropriately framed as a challenge to an earlier version of the account rather than testing a core prediction of its current form.

I thank the reviewer for raising this point. It is indeed the case that Adam et al. (2022, Visual Memory) focused on the involvement of the sensory areas in WM and discussed instances where perceptual and WM representations could differ. Although they did not emphasize the transformed nature of WM, their overall take on the sensory recruitment theory is consistent with the present manuscript. Yörük et al. (2020, Psychon. Bull.

Rev.) showed that task manipulation can modulate whether or not perception and WM share similar representational characteristics. This is again consistent with the present manuscript, showing that retaining a sensory-like representation in WM is not the norm and is highly task-dependent. Given that the earlier version of the sensory account is still highly cited today, much more so than the current version, I thought that it would be useful to make a more definitive evaluation in the present manuscript to state more explicitly what is still valid and invalid of the earlier version in light of new data. All in all, I agree that the field as a whole has reached a consensus on this topic. I have added texts throughout Abstract, Significance, and General Discussion (highlighted in blue) to reflect this discussion and have added these two citations (plus a third relevant citation mentioned in Yörük et al., 2020) to the revised manuscripts.

Another point. In the discussion, the author suggests that the difference in PPC decoding between the current study and Rademaker et al. (2019) may be due to differences in attentional engagement across tasks. While this is a reasonable possibility, the two studies also differ in other respects, including stimuli and task demands. It might be worth noting that attentional engagement is likely one of several potential factors contributing to the observed discrepancy.

This is a fair point, although Rademaker et al. (2019) also acknowledged that the lack of attentional engagement in their perceptual task could have contributed to their failure to obtain reliable sensory responses from PPC (p. 1342, 1st paragraph of the right column). I have added texts in General Discussion to reflect this comment (highlighted in blue).

Back to top

In this issue

eneuro: 12 (7)
eNeuro
Vol. 12, Issue 7
July 2025
  • Table of Contents
  • Index by author
  • Masthead (PDF)
Email

Thank you for sharing this eNeuro article.

NOTE: We request your email address only to inform the recipient that it was you who recommended this article, and that it is not junk mail. We do not retain these email addresses.

Enter multiple addresses on separate lines or separate them with commas.
Transformed Visual Working Memory Representations in Human Occipitotemporal and Posterior Parietal Cortices
(Your Name) has forwarded a page to you from eNeuro
(Your Name) thought you would be interested in this article in eNeuro.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Print
View Full Page PDF
Citation Tools
Transformed Visual Working Memory Representations in Human Occipitotemporal and Posterior Parietal Cortices
Yaoda Xu
eNeuro 5 June 2025, 12 (7) ENEURO.0162-25.2025; DOI: 10.1523/ENEURO.0162-25.2025

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Respond to this article
Share
Transformed Visual Working Memory Representations in Human Occipitotemporal and Posterior Parietal Cortices
Yaoda Xu
eNeuro 5 June 2025, 12 (7) ENEURO.0162-25.2025; DOI: 10.1523/ENEURO.0162-25.2025
Twitter logo Facebook logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Jump to section

  • Article
    • Abstract
    • Significance Statement
    • Introduction
    • Materials and Methods
    • Results
    • Discussion
    • Data Availability
    • Footnotes
    • References
    • Synthesis
    • Author Response
  • Figures & Data
  • Info & Metrics
  • eLetters
  • PDF

Keywords

  • fMRI
  • occipitotemporal cortex
  • posterior parietal cortex
  • vision
  • visual object representation
  • visual short-term memory
  • visual working memory

Responses to this article

Respond to this article

Jump to comment:

No eLetters have been published for this article.

Related Articles

Cited By...

More in this TOC Section

Research Article: New Research

  • A progressive ratio task with costly resets reveals adaptive effort-delay tradeoffs
  • What is the difference between an impulsive and a timed anticipatory movement ?
  • Psychedelics Reverse the Polarity of Long-Term Synaptic Plasticity in Cortical-Projecting Claustrum Neurons
Show more Research Article: New Research

Cognition and Behavior

  • A progressive ratio task with costly resets reveals adaptive effort-delay tradeoffs
  • What is the difference between an impulsive and a timed anticipatory movement ?
  • Psychedelics Reverse the Polarity of Long-Term Synaptic Plasticity in Cortical-Projecting Claustrum Neurons
Show more Cognition and Behavior

Subjects

  • Cognition and Behavior
  • Home
  • Alerts
  • Follow SFN on BlueSky
  • Visit Society for Neuroscience on Facebook
  • Follow Society for Neuroscience on Twitter
  • Follow Society for Neuroscience on LinkedIn
  • Visit Society for Neuroscience on Youtube
  • Follow our RSS feeds

Content

  • Early Release
  • Current Issue
  • Latest Articles
  • Issue Archive
  • Blog
  • Browse by Topic

Information

  • For Authors
  • For the Media

About

  • About the Journal
  • Editorial Board
  • Privacy Notice
  • Contact
  • Feedback
(eNeuro logo)
(SfN logo)

Copyright © 2025 by the Society for Neuroscience.
eNeuro eISSN: 2373-2822

The ideas and opinions expressed in eNeuro do not necessarily reflect those of SfN or the eNeuro Editorial Board. Publication of an advertisement or other product mention in eNeuro should not be construed as an endorsement of the manufacturer’s claims. SfN does not assume any responsibility for any injury and/or damage to persons or property arising from or related to any use of any material contained in eNeuro.