Abstract
In models of visual spatial attention control, it is commonly held that top–down control signals originate in the dorsal attention network, propagating to the visual cortex to modulate baseline neural activity and bias sensory processing. However, the precise distribution of these top–down influences across different levels of the visual hierarchy is debated. In addition, it is unclear whether these baseline neural activity changes translate into improved performance. We analyzed attention-related baseline activity during the anticipatory period of a voluntary spatial attention task, using two independent functional magnetic resonance imaging datasets and two analytic approaches. First, as in prior studies, univariate analysis showed that covert attention significantly enhanced baseline neural activity in higher-order visual areas contralateral to the attended visual hemifield, while effects in lower-order visual areas (e.g., V1) were weaker and more variable. Second, in contrast, multivariate pattern analysis (MVPA) revealed significant decoding of attention conditions across all visual cortical areas, with lower-order visual areas exhibiting higher decoding accuracies than higher-order areas. Third, decoding accuracy, rather than the magnitude of univariate activation, was a better predictor of a subject's stimulus discrimination performance. Finally, the MVPA results were replicated across two experimental conditions, where the direction of spatial attention was either externally instructed by a cue or based on the participants’ free choice decision about where to attend. Together, these findings offer new insights into the extent of attentional biases in the visual hierarchy under top–down control and how these biases influence both sensory processing and behavioral performance.
Significance Statement
Attention can be deployed in advance of stimulus processing. Understanding how top–down control of attention facilitates the processing of the attended stimuli and enhances task performance has remained a long-standing question in attention research. Here, applying multivariate pattern analysis (MVPA) to functional magnetic resonance imaging data, we showed that throughout the entire visual hierarchy including the primary visual cortex, there exist distinct neural representations for different attended information in anticipatory visual spatial attention, and the distinctiveness of these neural representations is positively associated with behavioral performance. Importantly, the MVPA findings were consistent across two experimental conditions where the direction of spatial attention was driven either by external instructions or from purely internal decisions.
Introduction
Extensive neurophysiological, neuroimaging, and lesion evidence suggests that frontoparietal cortical areas including the frontal eye field (FEF) and the intraparietal sulcus (IPS in humans) are the neural substrate underlying the control of voluntary attention (Hopfinger et al., 2000; Corbetta and Shulman, 2002; Bisley and Goldberg, 2003; Buschman and Miller, 2007; Bressler et al., 2008; Gregoriou et al., 2009; Gunduz et al., 2012; Ibos et al., 2013; Astrand et al., 2015, 2016; Bichot et al., 2015; Meehan et al., 2017; Bowling et al., 2020). According to the prevailing theory, in addition to maintaining the attention set, these areas also issue top–down signals to bias the responsiveness of sensory neurons in a goal-directed fashion in advance of target processing (Luck et al., 1997; Chawla et al., 1999; Kastner et al., 1999; Hopfinger et al., 2000; Kastner and Ungerleider, 2000; Giesbrecht et al., 2006; Sylvester et al., 2007; Bressler et al., 2008; Carrasco, 2011; Snyder et al., 2021; Park and Serences, 2022). One physiological sign of the influence of top–down control signals in the visual cortex is the increase in baseline neural activity—baseline shifts—in the visual cortex during anticipatory attention.
Attention-related baseline shifts have been observed during anticipatory attention in both humans and nonhuman primates. In single neuron recordings in nonhuman primates, Luck and colleagues (Luck et al., 1997) reported increased baseline neuronal firing rates in V2 and V4 (but not in V1) in neurons that encoded the to-be-attended information. In human electroencephalographic (EEG) and magnetoencephalographic (MEG) studies, a highly replicable finding is that during anticipatory attention, alpha-band (8–12 Hz) activity is modulated by attention-directing cues (Worden et al., 2000; Thut et al., 2006), and the patterns of alpha modulation encode the locus of spatial attention (Rihs et al., 2007; van Gerven and Jensen, 2009; Treder et al., 2011; Trachel et al., 2015; Foster et al., 2016; Samaha et al., 2016; Desantis et al., 2020; Lu et al., 2024). This is also true for slow potential shifts in event-related potentials and EEG/MEG signal patterns during anticipatory attention (Harter et al., 1989; Yamaguchi et al., 1994; Hopf and Mangun, 2000; Hong et al., 2015, 2020; Thiery et al., 2016; Barnes et al., 2022). In human functional magnetic resonance imaging (fMRI) studies, attention-directing cues result in increases in anticipatory baseline BOLD activity in the thalamus (Woldorff et al., 2004) and retinotopic visual cortex (Kastner et al., 1999; Hopfinger et al., 2000; Ress et al., 2000; Serences et al., 2004; Woldorff et al., 2004; Giesbrecht et al., 2006; McMains et al., 2007; Park and Serences, 2022). Like in the animal studies of Luck and colleagues, the BOLD signals during anticipatory spatial attention are typically more robust in higher-order visual cortical areas (Hopfinger et al., 2000; O'Connor et al., 2002). In a more recent study applying multivariate fMRI analysis techniques, Park and Serences (2022) found differences in multivoxel patterns evoked by different attention-directing cues in several retinotopic extrastriate visual areas, but as in prior studies, the effects during the anticipatory period were weaker in V1. We applied both univariate and multivariate analysis to examine the baseline modulation during attention control in the entire retinotopic visual hierarchy.
What is the role of baseline activity shifts in the visual cortex during anticipatory visual attention? It is commonly thought that the stronger the top–down attentional modulation of the sensory cortex, the more effective would be the processing of the attended stimulus, which should then lead to better behavioral performance (Stokes et al., 2009). Ress et al. (2000) found that cue-related baseline BOLD increases in V1 through V3 were positively associated with task performance in a visual spatial attention task. Giesbrecht et al. (2006) reported similar findings for both spatial and feature attention. Fannon et al. (2008), however, found that cue-related prestimulus neural activity did not predict the stimulus-evoked activity or the subsequent behavioral performance. McMains et al. (2007) also reported that the poststimulus effects of attentional selection observed in feature-selective regions were not accompanied by increased background activity during the anticipatory (prestimulus) period. Thus, there remains uncertainty regarding the physiological role and behavioral significance of the attentional modulations of baseline activity in the visual cortex during anticipatory attention.
In order to shed new light on the physiological role and behavioral significance of baseline shifts in neural activity in the visual cortex during anticipatory attention, we analyzed anticipatory period BOLD activity in a trial-by-trial spatial attention orienting task. We analyzed the signals across the entire visual cortical hierarchy from V1 through the extrastriate cortex and posterior parietal cortex considering both the strength of attentional biasing and its relationship to behavioral performance. Importantly, we contrasted univariate and multivariate analyses.
Univariate analysis of fMRI data has contributed much to our understanding of how the human brain achieves adaptive behavior. Recent work, however, has shown that multivariate analysis can reveal patterns of activity within regions of the visual cortex that cannot be revealed by univariate methods alone (Davis et al., 2014). This is because in univariate analysis, for a voxel to be reported as activated in an experimental condition, in-principle, it needs to be consistently activated across individuals. Individual differences in voxel activation patterns could therefore lead to failure to detect the presence of neural activity in a given region of the brain (Haxby et al., 2001; Haxby, 2012). As well, when applied at the region-of-interest (ROI) level, univariate analysis involves averaging across the voxels within an ROI, and as such, important information about the pattern of activation may be lost in the process (Pereira et al., 2009; Greenberg et al., 2010; Mahmoudi et al., 2012; Cohen et al., 2017; Meyyappan et al., 2021). In contrast, multivariate pattern analysis (MVPA) reveals differences in the patterns of neural activity between experimental conditions by considering the variation in patterns across multiple voxels in an ROI at the single subject level (Haxby et al., 2001; Kamitani and Tong, 2005; Greenberg et al., 2010; Jehee et al., 2011; Meyyappan et al., 2021; Rajan et al., 2021; Park and Serences, 2022). Applying both univariate and multivariate analyses in the same study and contrasting their respective findings allows us to gain comprehensive insights that is not possible with either approach applied separately.
The present study addresses a long-standing question about background activity in the visual cortex during anticipatory attention by integrating univariate, multivariate, and brain–behavior analyses comprehensively across the entire retinotopic visual hierarchy. Notably, it does so within a single experimental design that combines two datasets recorded in separate sessions in different laboratories. Furthermore, in addition to the traditional spatial cueing approach—where participants are directed to covertly attend to left or right hemifield locations using different physical cues—we included a novel condition in which the participants were allowed to freely choose which hemifield to attend when prompted by a choice cue or “prompt” (Bengson et al., 2014). The inclusion of these choice trials offers a robust test of top–down control over baseline visual cortical activity by eliminating potential confounds arising from variations in the physical features of attention-directing cues. This is because in the choice trials, the cues are physically identical regardless of whether the participants chose to attend to the left or right hemifield location.
Materials and Methods
Overview
Two datasets from studies conducted at the University of Florida (UF dataset) and the University of California, Davis (UCD dataset), were analyzed. By design, both datasets used the same experimental conditions. These datasets have been analyzed and published before to address different questions (Bengson et al., 2015; Liu et al., 2016, 2017). None of the previously published studies examined how attention modulated baseline neural activity in the retinotopic visual cortex.
UF Dataset: The Institutional Review Board at the University of Florida approved the experimental procedure. Eighteen healthy participants, between 18–22 years of age, provided written informed consent and took part in the experiment. fMRI data were collected from the participants while they performed a cued visual spatial attention task. Two participants were excluded from the analysis: one failed to follow the experimental instructions and another did not achieve behavioral accuracy above 70%. Three additional participants were removed due to excessive head/body movements. Data from the remaining thirteen participants was analyzed and reported here (n = 13; 5 females and 8 males). EEG data were recorded along with the fMRI but not considered here.
UCD Dataset: The Institutional Review Board at the University of California, Davis, approved the experimental procedure. Nineteen healthy participants, between 18–22 years of age, provided written informed consent and took part in the experiment. fMRI data were recorded while the participants performed the visual spatial attention task. One participant was removed due to inconsistent behavioral performance (accuracy < 50% during the latter half of the experiment). Data from the remaining 18 participants were analyzed and reported here (n = 18; 2 females and 16 males).
We performed power analyses and found the following: (1) for univariate analysis, the required sample size to detect a significant effect [p < 0.05 false discovery rate (FDR)] with a power of 0.8 was 22 (Durnez et al., 2016), and (2) for the MVPA decoding analysis, the required sample size was 13. Thus, our study with a combined sample size of 31 is adequately powered to test our hypotheses. This sample size was also in line with the general recommendations for similar neuroimaging studies (Desmond and Glover, 2002; Pajula and Tohka, 2016). Similarly, our recent work applying both univariate and multivariate methods to fMRI data further suggests that there is sufficient power for carrying out the intended analyses (Bo et al., 2021; Meyyappan et al., 2021; Rajan et al., 2021). See below for our analytical strategy in which the two datasets were combined via meta-analysis.
Experimental paradigm
As shown in Figure 1, a fixation dot was placed at the center of the monitor, and the participants were instructed to maintain eye fixation on the dot during the experiment. Two white dots in the lower left and lower right visual fields marked the two peripheral spatial locations where stimuli would be presented.
Experimental paradigm. Each trial started with one of the three visual cues (200 ms), two of them instructing subjects to covertly attend either left (A) or right (B) hemifields, indicated by the two white dots. A third cue, the choice cue (C), prompted the participants to choose either left or right visual field to attend. Following a variable cue-to-target delay (2,000–8,000 ms), a target in the form of a grating patch appeared briefly for 100 ms with equal probability in the left or the right hemifield. Participants were asked to discriminate the spatial frequency of the grating displayed in the cued or chosen location and ignore the uncued or unchosen side. After a second variable ISI (2,000–8,000 ms), participants were presented with a “?SIDE?” screen and were asked to report the hemifield they paid attention to in that trial. An ITI, varied randomly from 2,000 to 8,000 ms, elapsed before the start of the next trial.
At the beginning of each trial, one of the three symbolic cues (a letter T-shaped stimulus, a diamond, or a circle) was presented above the fixation dot for 200 ms. Two of the symbols explicitly instructed (cued) the participants to direct their attention covertly either to the left or right hemifield location (instructed trials), and the third prompted them to ad libitum choose one of the two hemifields to covertly attend on that trial (choice trials). The three cue/prompt conditions occurred randomly and with equal probability, and the three symbols used as cues/prompts were counterbalanced as to their meanings across participants (i.e., cue-left, cue-right, and ad libitum choose). The choice condition, which was introduced originally to investigate willed attention (Bengson et al., 2015; Liu et al., 2016, 2017; Rajan et al., 2019), was important to include for the purpose of this analysis in that it elicited the two attentional states (attend-left vs attend-right) by using the same visual cue (the choice cue), thereby eliminating the influence of different visual cues on neural patterns in the visual cortex (especially the early visual cortex), thus permitting the cross-validation of the findings from the instructional cue trials.
Following a random cue–target interval ranging from 2,000 to 8,000 ms, a black-and-white grating pattern (100% contrast) appeared at one of the two peripheral spatial locations for 100 ms, and participants discriminated between two possible spatial frequencies of the target grating (0.2° per cycle vs 0.18° per cycle for UF dataset; 0.53° per cycle vs 0.59° per cycle for UCD dataset) appearing in the attended hemifield via a button-press and ignored the grating appearing in the unattended hemifield. The grating occurred in either the left or right visual hemifields with equal probability; in other words, there was a 50% chance that the stimulus appeared at the attended location, while in the remaining 50% of the trials, the stimulus appeared at the unattended location. According to this design, withholding response during unattended trials and accurately identifying the spatial frequency of the stimulus during attended trials were both deemed correct responses. After the target stimulus, following a variable interstimulus interval (ISI) ranging from 2,000 to 8,000 ms, participants were prompted by the visual cue “?SIDE?” to report the spatial location that they attended on that trial. This reporting was needed to track the participant's direction of attention in the choice attention condition and was also included in instructed trials to maintain consistency across conditions. A variable intertrial interval (ITI; 2,000–8,000 ms) was inserted after the subject reported the side attended.
Reporting of the side attended was highly accurate across the group of participants, with the accuracy being 98.4%, averaged across conditions. While side-reporting accuracy could be determined by comparing self-reports with the cue during instructed attention trials, the accuracy for choice attention trials was further determined by examining the consistency between participants’ responses to targets and the self-reports (e.g., responded to the target in the left visual field and reported choosing left).
For the choice or willed attention condition, participants were directed to make impromptu (spontaneous) decisions regarding which side to attend when presented with a choice cue and advised against adhering to any stereotypical strategies such as consistently attending to the same or opposite side as in the previous trial. Choose-left (UF, 50.57 ± 3.17%; UCD, 47.88 ± 1.84%) and choose-right (UF, 49.43 ± 2.69%; UCD, 52.12 ± 1.84%) trials were evenly distributed and not significantly different from each other (UF, p = 0.86; UCD, p = 0.26).
All participants went through a training session prior to scanning to familiarize them with the task and to ensure adequate performance (above 70% correct) and compliance with the procedures (e.g., maintain fixation to the center throughout the experiment). Though no eye-tracking was performed in the scanner at UF, at UCD, the study team monitored participants for eye movements using an MR-compatible Model 504 Applied Sciences Laboratory eye-tracker. The experiment was divided into blocks of trials with the duration of each block being ∼6 min (60 trials).
Data acquisition
UF dataset
MR images were acquired at the University of Florida using a 3 T Philips Achieva scanner equipped with a 32-channel head coil. fMRI were obtained using an echoplanar imaging (EPI) sequence with parameters as follows: repetition time (TR), 1,980 ms; echo time (TE), 30 ms; matrix, 64 × 64; field of view, 224 mm; and slice thickness, 3.5 mm. Thirty-six transverse slices parallel to the plane connecting anterior to posterior commissure were obtained in an ascending order with a voxel resolution of 3.5 × 3.5 × 3.5 mm.
UCD dataset
MR images were collected at the Imaging Research Centre of the UCD using a 3 T Siemens Skyra scanner with a 32-channel head coil. fMRI were obtained using an EPI sequence with parameters as follows: TR, 2,100 ms; TE, 29 ms; matrix, 64 × 64; field of view, 216 mm; and slice thickness, 3.4 mm. Thirty-four transverse slices were obtained within an interleaved acquisition order, and slices were oriented parallel to the plane connecting the anterior–posterior commissure. The voxel resolution was 3.5 × 3.5 × 3.5 mm.
The slight differences in scanning parameters were due to optimization done at each recording site. In particular, the simultaneous recording of EEG and fMRI at UF necessitated the setting of parameters that were more conducive to the removal of scanner artifacts from the EEG data.
Data preprocessing
The fMRI data were preprocessed in SPM with custom-created MATLAB scripts. The slice timing differences were corrected with a sinc interpolation to account for the acquisition delay. Motion realignment was performed by coregistering all images with the first scan of each session. Six motion parameters (three translational and three rotational) were calculated and regressed from the fMRI data. All images were spatially coregistered to the standard MNI space and resampled to a voxel resolution of 3 × 3 × 3 mm. Finally, the spatially normalized images were smoothed using a Gaussian kernel of 8 mm full-width at half-maximum. A high-pass filter with a cutoff frequency of 1/128 Hz was used to remove any low-frequency noise, and global signals were removed by adjusting the intensities of each voxel (Fox et al., 2009).
Whole-brain univariate analysis
Voxel-level univariate BOLD activation analysis was carried out using the general linear model (GLM) method where an experimental event (e.g., cue onset) is assumed to evoke a stereotyped hemodynamic response (i.e., the hemodynamic response function). Seven regressors were used to model the experimental events, including two for instructed attention conditions (attend-left and attend-right), two for choice attention conditions (choose-left and choose-right), one for incorrect trial, and two for stimuli appearing in the left and right visual field. The cue-evoked activation was obtained within each subject by contrasting the appropriate regressors using a t test for each voxel. The group-level activation map was obtained by performing a one-sample t test on each subject's contrast maps with a threshold of p < 0.05 correcting for multiple comparisons with FDR (Nichols and Hayasaka, 2003).
Estimation of single-trial BOLD activity
The MVPA approach is applied at the single-trial level. All trials with correct responses were used. A trial was considered correct if either no response was made during the unattended trials or if the spatial frequency of the stimulus was accurately identified during the attended trials. To estimate the trial-by-trial cue–evoked BOLD activity, we applied the beta series regression method (Rissman et al., 2004). Specifically, cue-evoked BOLD activity was modeled using a separate regressor for each instructed and choice trial in the GLM, and one additional regressor was used for all the targets to account for the target-evoked activities and to remove their possible influences on the estimation of cue-evoked activity. The coefficient of each regressor, referred to as the beta value, reflected the cue-related BOLD response for that trial.
ROI selection
Sixteen retinotopic visual regions from a probabilistic visual topography map by Wang et al. (2015) were used as ROIs. Ordered from low to high in the visual hierarchy (dorsal and ventral pathways combined), they are as follows: V1v, V1d, V2v, V2d, V3v, V3d, hv4, V3a, V3b, LO1, LO2, VO1, VO2, IPS, PHC1, and PHC2.
Multivariate pattern analysis (MVPA)
The MVPA method uses pattern classification algorithms to distinguish between two brain states based on distributed BOLD activities across multiple voxels in a given ROI. In this study, we used support vector machine (SVM) in the MATLAB Statistics and Machine Learning Toolbox as the classifier for the MVPA analysis. A 10-fold cross-validation approach was applied to minimize the effects of overfitting. The trial assignment into each fold was fully random. The classifier performance was measured in terms of decoding accuracy. Above-chance decoding accuracy is taken to signify the presence of attention-related signals in the ROI, and the higher the decoding accuracy, the larger the difference between the two BOLD activation patterns evoked by the two attention conditions (attend-left vs attend-right), which is hypothesized to indicate stronger top–down attention modulation. The cross-validation analysis was repeated 25 times over different random fold partitions to avoid any bias that may have resulted when grouping the trials into 10 specific groups (10-fold). Twenty-five, 10-fold cross–validation accuracies (a total of 250 decoding accuracies) were averaged to obtain the decoding accuracy for a subject.
For statistical analysis, the decoding accuracy was compared against the chance level accuracy of 50% using one-sampled t tests at p < 0.05, corrected for multiple comparisons using FDR.
Relating decoding accuracy with behavior
To understand the functional significance of top–down biasing of the visual cortex, the decoding accuracy at the subject level and behavioral performance at the subject level in target discrimination was correlated across subjects. Given that there are 16 ROIs representing all the areas within the retinotopic visual hierarchy and expecting that these decoding accuracies was likely to covary to a large degree, we performed a principal component analysis (PCA) on the decoding accuracies from the 16 ROIs for UF and UCD datasets to extract the underlying common variance and correlated (Spearman's rank correlation) the score on the first PCA component with behavior. Then, for completeness, each ROI's relation between decoding accuracy and behavior was further tested.
The behavioral performance was characterized by the efficiency score (ES), which is the ratio of accuracy over reaction time (accuracy/RT), with a higher score indicating better behavioral performance. Prior work has used the inverse ES (Bruyer and Brysbaert, 2011), which is the ratio of RT/accuracy, to measure behavioral performance, with a lower score indicating better behavioral performance. We note that in our design, the instruction to the participants was to respond “as fast as they could” while also being “as accurate as they could,” the speed–accuracy trade-off may prevent the use of RT or accuracy singly as the appropriate behavioral measure. Computing the ratio of accuracy and RT can account for the speed–accuracy trade-off effect while retaining the main task effect (Bruyer and Brysbaert, 2011; Vandierendonck, 2017; Liesefeld and Janczyk, 2019).
Combining datasets using meta-analysis
Analysis was done first on the UF and UCD datasets separately; we then combined the two datasets through meta-analysis to enhance statistical power. An effect was considered statistically significant if it is significant in the meta-analysis. The meta-analysis was done using the Lipták–Stouffer method (Lipták, 1958). This method has been used commonly to combine results from different datasets (Cheng et al., 2015; Huang and Ding, 2016). It involves converting the p value of each dataset to its corresponding Z value using the following formula:
Results
Behavioral analysis
UF dataset
For instructed trials, target discrimination accuracy for cue-left trials and cue-right trials was 85.72 ± 2.1% and 85.80 ± 1.2%, respectively (Fig. 2A); there was no statistically significant difference between the two conditions (t(12) = −0.03; p = 0.96; d = −0.01). RT for cue-left trials was 918.56 ± 24.49, and RT for cue-right trials was 911.81 ± 23.28 ms (Fig. 2B); there was no statistically significant difference between the two conditions (t(12) = 0.43; p = 0.67; d = 0.07). For choice trials, accuracy was 88.41 ± 1.9% for choose-left and 90.78 ± 1.3% for choose-right trials (Fig. 2A), and the RT was 942.27 ± 47.75 ms for choose-left and 938.25 ± 29.04 ms for choose-right trials (Fig. 2B). There was no statistically significant difference between choose-left and choose-right for either accuracy or RT (tacc(12) = −1.2; pacc = 0.25; dacc = −0.39; tRT(12) = 0.09; pRT = 0.92; dRT = 0.03). A one-way main–effect ANOVA further confirmed that there were no significant differences in behavioral performance between instructed and choice trials (Facc(3,12) = 2.07; pacc = 0.11; ηacc2 = 0.1; FRT(3,12) = 0.21; pRT = 0.89; ηRT2 = 0.01; Fig. 2A,B).
Behavioral results from instructed and choice trials. A, C, Comparison of accuracy (ratio of correct trials and total number of valid trials) between different attention conditions for UF (A) and UCD (C) datasets. B, D, Comparison of RT between different attention conditions for UF dataset (B) and UCD dataset (D).
UCD dataset
For instructed trials, response accuracy for cue-left trials and cue-right trials was 82.45 ± 2.3% and 81.97 ± 2.1% (Fig. 2C); there was no statistically significant difference between the two conditions (t(17) = 0.43; p = 0.67; d = 0.05). RT for cue-left trials was 1,027.02 ± 55.60 ms and for cue-right trials 1,041.87 ± 54.95 ms (Fig. 2D); there was no statistically significant difference between the two conditions (t(17) = −0.68; p = 0.50; d = −0.06). For choice trials, the accuracy was 84.07 ± 2.0% for choose-left and 82.60 ± 2.1% for choose-right trials, and the RT was 1,034.42 ± 48.06 ms for choose-left and 1,043.43 ± 45.84 ms for choose-right trials. There was no statistically significant difference between choose-left and choose-right in either accuracy or RT (tacc(17) = 0.98; pacc = 0.34; dacc = 0.16; tRT(17) = −0.36; pRT = 0.72; dRT = −0.04). A one-way ANOVA further confirmed that there were no significant differences in behavioral performance between instructed and choice trials (Facc(3,17) = 0.2; pacc = 0.90; ηacc2 = 0.008; FRT(3,17) = 0.02; pRT = 0.99; ηRT2 = 0.01; Fig. 2C,D).
Univariate analysis of cue-evoked BOLD activation
Two types of univariate analyses were carried out. First, GLM analysis was performed at the whole-brain level. For instructed trials, the attention-directing cues activated the dorsal attention network (DAN; compared with the baseline), including FEF and IPS/superior parietal lobule (SPL), as well as higher-order visual areas such as the superior occipital cortex, for both UF and UCD datasets (Fig. 3A–D), in line with previous studies (Heinze et al., 1994; Hopfinger et al., 2000; Giesbrecht et al., 2003). Contrasting attention-directing cues (cue-left vs cue-right) revealed no activations at p < 0.05 (FDR; data not shown). Some higher-order visual areas, including the lingual and fusiform cortex, began to appear at a much-relaxed threshold of p < 0.01 (uncorrected; Fig. 3E–H), suggesting that attended information is not strongly encoded in ROI-level univariate signals. Similar results were also found for choice trials (data not shown).
Whole-brain GLM analysis of cue-evoked BOLD activity. Attention cues (left and right combined) evoked significant activation in DAN (FEF, IPS/SPL) and the extrastriate visual cortex (p < 0.05 FDR corrected) for both UF (A and B) and UCD (C and D) datasets. However, contrasting attention-directing cues (cue-left vs cue-right) revealed no activation when thresholded at p < 0.05 with FDR correction (data not shown). High-order visual regions only began to appear after lowering the threshold to p < 0.01 uncorrected: cue-left > cue-right (E and F) and cue-right > cue-left (G and H).
Next, to investigate the univariate activity at the ROI level in the retinotopic visual cortex, we subtracted the BOLD data in the hemisphere when it was ipsilateral to the cued direction from that in the same hemisphere when it was contralateral to the cued direction (i.e., cue-left − cue-right for the right hemisphere and cue-right − cue-left for the left hemisphere), and the difference was averaged across the two hemispheres and within each ROI. As shown in Figure 4, A and B, for the higher-order visual cortex such as lateral occipital (LO) and IPS, covert attention elicited significant univariate BOLD activation, and the effects were consistent across the two datasets, in line with previous studies (Heinze et al., 1994; Hopfinger et al., 2000; Giesbrecht et al., 2003). For early visual areas such as V1 and V2, however, the effects were less consistent across the two datasets. A meta-analysis combining the two datasets showed that there were no significant attention-specific univariate neural modulations in early visual areas except V1v (Fig. 4C). To test whether ROI-level univariate activation exhibited systematically increasing tendency as we ascended the visual hierarchy, we applied a linear regression analysis, in which univariate activation for each ROI was treated as a function of the ROIs, and the ROIs were given numbers ranging from 1 to 16. A marginally significant positive slope in the meta-analysis (p = 0.07) suggested the possibility of such a tendency. When correlating the univariate activation difference for each ROI with behavioral efficiency (response accuracy/RT), combining the two datasets via meta-analysis, none of the regions in the visual hierarchy exhibited a significant correlation between attention-related univariate BOLD activity and behavior (p > 0.05 FDR; Fig. 4D).
ROI-level analysis of cue-evoked BOLD activity. BOLD activity difference between attention to contralateral and ipsilateral visual fields were computed and averaged across the hemispheres for instructional (A–D) and choice trials (E–H) for both datasets: UF (A, E) and UCD (B, F). C, G, Activation differences from UF and UCD datasets were averaged from instructed (C) and choice trials (G) and the respective p values obtained via meta-analysis. D, H, correlation between behavior and activation differences across ROIs from instructed (D) and choice trials (H). Here the correlation coefficients are averaged across the datasets, and the p values from both the datasets were combined using meta-analysis. The error bars denote the standard error of the mean (SEM). *p < 0.05 FDR.
Applying the same analysis to the choice trials, we found similar results (Fig. 4E–H). As with instructed trials, for choice trials, attention modulations of univariate BOLD activity were mainly found in higher-order visual areas with V1v being the notable exception. While Figure 4, C and G, exhibited similar patterns, a linear regression analysis combining the two datasets did not yield a statistically positive slope (p = 0.13). The smaller number of choice trials is likely a factor weakening the statistical power. Furthermore, no significant correlations were found between univariate attention-related BOLD activation and behavioral performance (Fig. 4H).
Multivariate analysis of cue-evoked BOLD activation: instructed trials
MVPA was applied to each ROI in each of the two datasets. The decoding accuracy between cue-left and cue-right was above the chance level in all visual regions in both UF (Fig. 5A) and UCD datasets (Fig. 5B). Meta-analysis combining the two datasets confirmed these findings, suggesting that the top–down biasing signals are present in all areas of the visual hierarchy. In addition, the decoding accuracy tended to be higher in lower-order visual areas than in higher-order areas (Fig. 5C,D), in contrast to the pattern of univariate activation seen in Figure 4. A linear regression analysis treating decoding accuracy for each ROI as a function of the ROIs yielded a significantly negative slope for both datasets with the meta-analysis of p value being p = 4.7 × 10−7. To further demonstrate the consistency of the results between the two datasets, for each ROI, we plotted the decoding accuracy averaged over all participants from the UF dataset against that from the UCD dataset in Figure 5E. The Spearman rank correlation across all 16 ROIs was highly significant with r = 0.75 (p = 0.001), suggesting a high degree of similarity in the decoding accuracy distribution across visual cortical regions between the two datasets.
MVPA of cue-left versus cue-right: instructed trials. A, B, Decoding accuracies in different visual ROIs for UF dataset (A) and UCD dataset (B). C, D, Posterior view of color-coded decoding accuracies for UF dataset (C) and UCD dataset (D). E, Scatterplot comparing UCD versus UF decoding accuracies where each dot represents a visual ROI. The error bars denote the SEM. *p < 0.05 FDR.
The decoding accuracy reflects the degree of distinctness of the neural activity in the two attention conditions (cue-left vs cue-right). The higher the decoding accuracy, the larger the difference between the two neural patterns underlying the two attention conditions, which was taken to indicate stronger attention modulation. We thus used decoding accuracy as a quantitative index of the strength of top–down biasing to evaluate the relationship between top–down attention control and behavioral performance. Because decoding accuracy across different ROIs exhibited a high degree of correlation across participants (data not shown), indicating that the variance in the decoding accuracy across ROIs can be captured by fewer latent variables, a PCA was performed to reduce the dimensionality of 16 ROI-based decoding accuracies. In particular, the first PCA component explained 79.75 and 83.49% of the variance for UF and UCD datasets, respectively (Fig. 6A,B). Thus, each subject's score on the first PCA component was used to represent the magnitude of decoding accuracy for that subject and correlated with the z-scored behavioral efficiency (response accuracy/RT). A significant positive correlation between decoding accuracy and behavioral performance was found for both datasets [UF dataset, r = 0.5824; p = 0.0403 (Fig. 6C); UCD dataset, r = 0.6780; p = 0.0026 (Fig. 6D)]. Combining the two datasets using meta-analysis showed that the overall correlation effect was robust and highly significant (p = 0.0005). At the ROI level, decoding accuracy is significantly correlated with behavior across all the retinotopic regions within the visual hierarchy; see Figure 6E for ROI-level correlation coefficients averaged across the two datasets (meta-analysis p < 0.05 FDR). There are no clear differences in the brain–behavior relationship between lower-order and higher-order visual areas. These findings suggest that the stronger the top–down biasing signals in the retinotopic visual regions, indexed by higher decoding accuracy, the better the behavioral performance (Stokes et al., 2009).
Decoding accuracy versus behavior: instructed trials. A, B, The percentage of variance explained by the principal components of decoding accuracies across ROIs for (A) UF and (B) UCD datasets. C, D, Decoding accuracy represented by score on first PCA component versus behavioral efficiency (z score) for (C) UF and (D) UCD datasets. E, Correlation between decoding accuracy and behavior across ROIs. Here the correlation coefficients are averaged across the two datasets, and the p values from both the datasets were combined using meta-analysis; the correlation is significant for all ROIs at p < 0.05 FDR.
Multivariate analysis of cue-evoked BOLD activation: choice trials
In instructed trials, the two attentional states (attend-left vs attend-right) were the result of two different symbolic instruction cues. The decoding analysis in the visual ROIs especially in the early visual ROIs could be influenced by the physical differences in the two symbolic cues. The inclusion of choice trials where the two attentional states resulted from the same cue (choice cue) helped to address this problem. As shown in Figure 7A, where the two datasets were combined, decoding choose-left versus choose-right yielded decoding accuracies that were significantly above the chance level in all ROIs (meta-analysis p < 0.015 FDR). It is worth noting that because we had half as many choice trials as instructed trials (cue-left, cue-right, and ad libitum choose cues each accounted for one-third of the trials), there was less statistical power to detect differences between attention conditions for choice trials. Figure 7B shows that the decoding accuracy distribution across the ROIs was consistent across the two datasets (compare Fig. 5E). Across the visual hierarchy, similar to the instructed trials, the linear regression analysis yielded a significantly negative slope combining the two datasets (meta-analysis p = 0.023), showing that the decoding accuracy decreases as we ascend the visual hierarchy.
Decoding accuracy for attend-left versus attend-right for choice trials. A, Decoding accuracies averaged from UF and UCD datasets for different visual ROIs; p values were determined via meta-analysis. B, Scatterplot comparing UCD versus UF decoding accuracies where each dot represents a visual ROI. C, D, Decoding accuracy versus behavioral efficiency for UF dataset (C) and for UCD dataset (D). E, Correlation between decoding accuracy and behavior across visual-topic ROIs. The error bars denote the SEM. *p < 0.05 FDR.
Next, we tested whether the decoding accuracies predict behavioral performance. The first PCA component of the 16 decoding accuracies (16 ROIs) explained 73 and 66% of the variance for the UF and UCD datasets, respectively. Correlating the score on the first PCA component with behavioral efficiency (response accuracy/RT), we found a significantly positive correlation for the UCD dataset (r = 0.5004; p = 0.0363; Fig. 7D) and a positive but not statistically significant correlation for the UF dataset (r = 0.3242; p = 0.2799; Fig. 7C). When the two datasets were combined, the p value from the meta-analysis was p = 0.0361, suggesting a significant positive correlation. This decoding accuracy–behavior result from the choice trials is in agreement with the decoding accuracy–behavior result from the instructed trials, namely, the more distinct the BOLD patterns underlying the two attention conditions in the retinotopic regions, the stronger the top–down biasing signals in these regions, and the better the behavioral performance. Across visual ROIs, the decoding accuracy–behavior correlation is significant in higher-order areas VO1 and LO2 (Fig. 7E). For the other ROIs, while a positive correlation was seen, the correlation values were not able to pass the statistical threshold of p < 0.05 (FDR). Here, as indicated earlier, the smaller number of choice trials is again a factor that has contributed to weaken the statistical power to detect the brain–behavior relation in some of the ROIs.
Comparing instructed and choice trials
The neural patterns underlying the two attentional states (attend-left vs attend-right), whether the result of instruction cues or the choice, were expected to share commonalities. Three analyses were carried out combining the two datasets to compare instructed and choice trials. First, the instructed and choice decoding accuracies (averaged across the two datasets) were compared in Figure 8A, and a significant correlation was found (r = 0.57 and p = 0.02), suggesting that each ROI was similarly modulated by top–down attention whether the attentional states were formed from instructional cues or by pure internal decisions of the subjects. Second, for each dataset, we utilized the method of cross-decoding, by taking the SVM classifier trained on instructed trials and testing it on choice trials. We found the cross-decoding accuracies averaged across the two datasets (Fig. 8B) were significantly above the chance level in all the ROIs (meta-analysis p < 0.05 FDR), suggesting that the patterns of neural activities resulting from instructional cues and from internal choices reflected attention modulations and not significantly influenced by the physical properties of the cues. Third, cross-decoding accuracy (model trained on instructed trials and tested on choice trials) and self-decoding accuracy (model trained on choice trials and tested on choice trials) were averaged across the two datasets and compared, and the differences were not statistically significant (meta-analysis p > 0.05 FDR; Fig. 8C), further confirming that the pattern of neural activity observed in the retinotopic visual cortex reflects top–down attentional biasing signals, and not merely the differences in the stimulus properties of the two cues in instructed trials.
Comparison between choice trials and instructed trials (UF and UCD datasets combined). A, Choice-left versus choice-right decoding accuracies plotted against cue-left versus cue-right decoding accuracies; each dot represents a visual ROI. B, Cross-decoding accuracy comparison across ROIs where classifiers were trained on instructed trials and tested on choice trials. C, Difference between cross-decoding accuracy (classifiers trained on instructed trials and tested on choice trials) and self-decoding accuracy (classifiers trained on choice trials and tested on choice trials) for different ROIs. The error bars denote the SEM. *p < 0.05 FDR.
Discussion
Visual–spatial attention—the cognitive ability to focus on certain locations while ignoring others—can be driven either reflexively, in a bottom–up manner, by salient stimuli, or voluntarily, in a top–down manner, based on behavioral goals. Understanding the neural mechanisms underlying voluntary attention control has long been a challenge in cognitive neuroscience. Voluntary attention can be studied using cueing paradigms, where an informative cue directs attention to a spatial location in anticipation of an upcoming target stimulus. During the anticipatory cue-to-target period, attention control is actively engaged and can be investigated independently from later sensory processes triggered by the target stimulus.
Motivation, approach, and summary of results
Motivation and approach
We considered two questions in the control of visual spatial attention. First, what is the distribution of top–down biasing signals across different identified visual areas within the visual cortical hierarchy during spatial attention? More specifically, are early visual areas such as V1 more weakly biased during anticipatory spatial attention as has often been reported (O'Connor et al., 2002)? Second, what is the functional role of top–down control signals? Most models hold that they result in neuronal biasing that facilitates subsequent selective target processing (Chawla et al., 1999). If so, is the magnitude of the observed biasing signals during anticipatory attention predictive of behavioral performance? The answer to this question has remained unclear in the literature (Fannon et al., 2007; Zheng et al., 2024). To answer these questions, we analyzed fMRI data from two different studies in which subjects performed a trial-by-trial visual spatial attention task. Two different methods of triggering the orienting of spatial attention were included: one involved the use of instructional cues to direct the subject's attention to the left or the right visual field, and the other involved the use of a choice cue that prompted the subject to make voluntary decisions on a trial-by-trial basis about whether to attend the left (choose-left) or the attend the right (choose-right) visual field (Bengson et al., 2014, 2015). Applying both univariate and multivariate techniques to analyze cue-related neural activity (i.e., baseline shifts) in the retinotopic visual cortex, we obtained the following results.
Summary of results
Among 16 retinotopic visual ROIs, differential attend versus ignore univariate BOLD activations were found in higher-order visual areas but only weakly or not at all in lower-order visual areas (i.e., V1v being the exception). These differential univariate activations in the visual cortex during the anticipatory period failed to predict task performance. In sharp contrast, multivoxel patterns differentiated the two attention states (attend-left vs attend-right) in all 16 ROIs, with lower-order ROIs (e.g., V1) having higher decoding accuracy than higher-order visual ROIs. The more distinct the attend-left versus attend-right multivariate neural patterns (i.e., the higher the decoding accuracy) in the visual cortex, the better the subject's behavioral performance measured by behavioral efficiency − accuracy / RT (Townsend and Ashby, 1983; Liesefeld and Janczyk, 2019). The neural patterns associated with instructive cues and choice prompts were similar and significant, but decoding accuracy was generally lower for the latter; this was likely the result of there being only half as many trials in the choice condition.
Attentional modulation of baseline activity in the visual cortex
Prior results from univariate analysis
Attention cueing improves task performance (Posner et al., 1980). The underlying neural mechanisms have been the subject of intense investigation. Kastner et al. (1999) cued subjects on a block-by-block basis to covertly attend the stimuli located in the upper-right quadrant of the visual field or to passively view the stimuli. They reported an increase in BOLD signal during the anticipatory period after each block was cued but before targets were presented. The effect was observed in V1, V2, V4, temporal–occipital regions (fusiform gyrus), and regions within the DAN consisting of the FEF and IPS, with the BOLD activation being stronger in higher-order visual regions. Despite the seminal nature of this work, because the cued location was not varied in these studies in a fashion that could equate attention demands to and away from the attended region, it is difficult to know whether the patterns of the baseline activity encoded spatially specific attended information or instead reflected nonspecific effects such as arousal or differences in the eccentricity of the focus of attention (i.e., fixation vs peripheral). Hopfinger et al. (2000) contrasted the background biasing in the visual cortex induced by attend-left versus attend-right cues—thereby equating the attention demands—and reported significant attention-related activations in higher-order visual areas in the fusiform and cuneus gyri (corresponding to V2, VP, and V4) in the hemisphere contralateral to the attended hemifield, but not in V1. Sylvester et al. (2007, 2008, 2009) and Bressler et al. (2008), on the other hand, reported attention modulation in multiple visual areas including V1 during anticipatory attention. Our univariate analysis, combining both UF and UCD datasets through meta-analysis, showed baseline increases in BOLD activity during anticipatory attention in high-order visual areas such as LO and parietal regions, but not consistently in the early retinotopic visual cortex, which is in agreement with the results of some prior studies (Hopfinger et al., 2000; O'Connor et al., 2002).
Overcoming the limitations of the univariate analysis
In voxel-based univariate analysis (e.g., whole-brain GLM analysis), for a voxel to be reported as activated by an experimental condition, it needs to be consistently activated across individuals. As such, individual differences in voxel activation patterns could lead to failure to detect the presence of neural activity in a given region of the brain. The univariate approach applied at an ROI level, on the other hand, may not have the spatial resolution to detect attention modulation that occurs at a sub-ROI level. For the latter, Silver et al. (2007) found that cue-related activity is enhanced in a subregion of V1 that corresponded to the attended stimulus, and more importantly, this enhanced region is surrounded by regions of deactivation, suggesting the formation of a neural pattern where there are both excitatory responses and inhibitory responses within the same ROI (Smith et al., 2000; Muller and Kleinschmidt, 2004). While such attention modulation patterns present a challenge for univariate analysis, they open the possibility for the fruitful application of MVPA. Hypothesizing that attending to different spatial locations leads to different neural patterns, we applied the MVPA approach to the fMRI data during the cue–target period. Our results showed above-chance decoding in all retinotopic visual areas. The decoding accuracy strength was higher in earlier visual areas and decreased going up the visual hierarchy. This shows that top–down biasing signals influence baseline neural activity in the early visual cortex during anticipatory spatial attention and that the effect is not weaker than that in higher-order visual areas.
Relation to the existing literature
It is tempting to attribute our findings to the use of sensitive multivariate analysis methods, but multivariate analysis was also used by Park and Serences (2022). They also found differences in multivoxel patterns evoked by different attention-directing cues in several retinotopic extrastriate visual areas, but as in prior univariate studies, the effects during the anticipatory period were weaker in V1 than in the extrastriate cortex. It is not clear why our decoding results and those of Park and Serences (2022) should differ, but we offer some thoughts on this. First, perhaps differences in the spatial scale of the target arrays was a factor. Hopf et al. (2006) argued that the spatial scale of the stimulus array (e.g., larger vs smaller) influences whether the selection occurs early (small scale) or late (large scale) in the visual hierarchy. Though both our study and the Park and Serences study used spatial gratings as the target stimuli, the stimulus array in Park and Serences (2022) was spread around the outline of an imaginary circle, and the stimuli could occur at 12 possible locations, and this may have effectively increased the spatial scale of the stimulus array. In our study, the stimuli could appear at only two locations in the lower visual fields, and thus the spatial scale of our stimulus array was smaller. A smaller spatial scale would favor earlier selection, and this might explain why we observed robust decoding of attention conditions in V1. Second, during the cue–target interval in their design, Park and Serences presented a flickering noise stimulus; this bottom–up signal could have interfered with the nature of top–down activity in V1 (Uchiyama et al., 2021) and resulted in a weaker top–down signal. Future work will be required to resolve these differences between our work and that of Park and Serences.
Behavioral significance of baseline modulation in the visual cortex
Brief literature review
Do the biasing signals in the visual cortex impact behavior? This question has been addressed in previous research. Ress et al. (2000), using logistic regression modeling to relate relative BOLD activity in visual regions to behavior (the signal detection measure d′), observed the activity in V1–V3 to be predictive of behavioral performance. Sylvester et al. (2007) found the prestimulus neural activity in the visual cortex region v3A, indexed by subtracting the ipsilateral hemispherical BOLD activity (left hemisphere for attend-left; right hemisphere for attend-right) from contralateral BOLD activity (right hemisphere for attend-left; left hemisphere for attend-right), was higher for trials with correct responses than incorrect trials, again suggesting a relation between cue-evoked cortical activity and behavior. Stokes et al. (2009), applying a multivariate decoding approach to a cued attention experiment, found enhanced shape-specific baseline (prestimulus) neural activity in LO regions and reported that both cue and stimulus-evoked decoding predicted target detection accuracy. However, in prior work from our group, Fannon et al. (2007) using a cued motion-color attention task found that the increase in cue-related neural activity in the visual cortex (MT and V8) was not predictive of the poststimulus neural activity or subsequent behavioral performance. Similarly, McMains et al. (2007) also reported differences in neural activity between baseline and target-evoked activity. Thus, despite growing evidence of studies reporting associations between the behavior and baseline neural activity, the issue of whether cue-related neural activity impacts behavior remains an important issue to be better understood (Zheng et al., 2024).
Our findings
We examined this issue using both univariate and MVPA approaches. We hypothesized that the differential activations from the univariate analysis and the decoding accuracy from MVPA (a measure of the strength of anticipatory attentional biasing) should be positively associated with behavioral performance. Our comparison of univariate activation difference (attend minus ignore) with behavioral efficiency (accuracy/RT), combining both UF and UCD datasets (through meta-analysis), failed to find a relation between the univariate BOLD activity in the retinotopic visual regions and behavioral performance. In contrast, the decoding accuracy from all the visual regions—whether as a whole as indexed by the first PCA component or analyzed on a ROI-by-ROI basis—was positively associated with behavioral efficiency, thus establishing a positive relationship between cue-related baseline activity in the visual cortex and subsequent behavioral performance.
Study design considerations
Three aspects of our study design are worth emphasizing. The first is that we included two datasets recorded at two different institutions using two different brands of scanners but with the same experimental paradigm in our analysis. By doing so, we were able to combine the datasets via meta-analysis to enhance statistical power, thus addressing a concern that neuroimaging studies often are underpowered and suffer from low reproducibility (Bennett and Miller, 2010; Noble et al., 2017; Poldrack et al., 2017).
The second is in our experimental paradigm we included a choice cue in addition to standard instructional cues. Upon receiving the randomly occurring choice cue intermixed with instructional cues, the subject had to ad libitum decide whether to attend the left or right visual field. While our paradigm was originally designed to investigate willed attention (Bengson et al., 2014; Rajan et al., 2019), the availability of the choice cue allowed us to rule out the possibility that the above-chance decoding of cue-evoked activity in the early visual cortex simply reflected the processing of the different physical attributes of the different visual cues (Carlson and Wardle, 2015; Grootswagers et al., 2017) and thus to rule in that it reflected the differences in neural patterns underlying the two different attentional states (attend-left vs attend-right). Importantly, the neural patterns between cue-left and cue-right and between choice-left and choice-right are highly similar, as indexed by the above-chance cross–decoding in which the classifiers trained on instructed trials were tested on choice trials, suggesting that the visual cortical biasing follows the same neural mechanisms irrespective of whether the attentional state arose from external instructions or from pure internal decisions. It is also worth noting, though, that because the number of choice trials was half that of the instructed trials, the effects from the choice trials were often weaker, whether it is the size of the decoding accuracy in the visual hierarchy or the correlation between decoding accuracy with behavior.
The third aspect of the study design bearing consideration is the stimulus array. As shown in Figure 1 and described in Materials and Methods, small, single white dots (placeholders) marked the general location to which targets would be presented. While these were stationary (i.e., not flashing or moving), shifting attention to the dots might result in attention-related modulations of sensory signals from the contrast border represented by the dots rather than baseline shifts of background neural activity in the complete absence of sensory signals. Microsaccades could also result in the dot markers serving as bottom–up sensory inputs due to the movement of the dot stimuli on the retina. Because the dots were present in both the attended and unattended visual hemifields, one might be tempted to assume that the physical sensory features of the display are equivalent for cue/choose-left versus cue/choose-right conditions. While this is true for the display independent of attention, it is not necessarily true for the well known interaction of top–down spatial attention and bottom–up sensory signals. That is, one might predict a pattern of BOLD effects similar to what we observed if the activity was the attentional modulation of the dot stimulus itself. There are reasons, however, to believe the current findings and interpretations of baseline activity are valid. First, we saw no consistent increase in univariate baseline activity in V1d, which would have been expected if the lower visual field dots were in fact being modulated by attention. Second, the dots and the flashed gratings are quite different in their physical features, and it is the task-relevant grating stimuli that are the targets of attention in our design. Third, even if one were to assume that the dots were providing a bottom–up signal that was differentially modulated by attention and therefore could be confounded with pure baseline BOLD shifts in this design, that would not invalidate the basic finding here that the magnitude of top–down attention during the anticipatory period correlates with the later behavioral performance for task-relevant target stimuli. Furthermore, our univariate BOLD results are generally consistent with findings from prior work that have (Hopfinger et al., 2000; Serences et al., 2004) and have not (Weissman et al., 2002; Giesbrecht et al., 2006; McMains et al., 2007; Walsh et al., 2011) included background markers of spatial locations where higher-order visual regions contralateral to the attended location are more activated following the attention-directing cue. Finally, our multivariate results differ from those of Park and Serences (2022) who also had highly salient placeholder stimuli (outline circles) throughout each trial, and thus it is unlikely that our strong decoding of activity in lower-order visual areas is simply due to the presence of placeholder stimuli because Park and Serences actually found weaker decoding in early visual areas, despite the presence of their placeholders.
Conclusions
In this study, we used multivariate analysis of fMRI data recorded during a trial-by-trial visual spatial attention task to examine how top–down spatial attention control modulates anticipatory activity across the visual hierarchy. We found clear evidence for strong attentional biasing during the anticipatory period in all visual areas, including the primary visual cortex. Importantly, this effect was observed regardless of whether spatial attention was directed by external cues (instructed attention) or based on the participants' internal decisions (choice attention). These results contrast with previous studies that employed univariate methods, where baseline biasing during the anticipatory period was typically found to be stronger in higher-order visual areas, findings we also replicated here using univariate methods. One possible explanation for this discrepancy is that multivariate decoding methods are more sensitive, enabling the detection of neural activity patterns that might not be apparent with univariate approaches.
Importantly, we observed a significant positive relationship between decoding accuracy during the anticipatory period and subsequent behavioral performance on the target discrimination task. In other words, the stronger the decoding accuracy in the anticipatory period, the better participants performed in the task. Overall, these findings provide new insights into how top–down attentional control biases visual processing across the visual hierarchy and highlight the important relationship between anticipatory brain activity and human perception and performance.
Footnotes
The authors declare no competing financial interests.
This work was supported by National Institutes of Health Grant MH117991 and National Science Foundation Grant BCS-2318886 to G.R.M. and M.D. We thank E. Ferrer, Y. Liu, J. Bengson, T. Kelley, K. Bo, Z. Hu, and S. Kim for their discussion and help with data collection and analysis.
This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license, which permits unrestricted use, distribution and reproduction in any medium provided that the original work is properly attributed.