Moment-to-Moment Fluctuations in Neuronal Excitability Bias Subjective Perception Rather than Strategic Decision-Making

Abstract Perceiving an external stimulus depends not only on the physical features of the stimulus, but also fundamentally on the current state of neuronal excitability, indexed by the power of ongoing alpha-band and beta-band oscillations (8–30 Hz). Recent studies suggest that heightened excitability does not improve perceptual precision, but biases observers to report the presence of a stimulus regardless of its physical presence. It is unknown whether this bias is due to changes in observers’ subjective perceptual experience (perceptual bias) or their perception-independent decision-making strategy (decision bias). We tested these alternative interpretations in an EEG experiment in which male and female human participants performed two-interval forced choice (2IFC) detection and discrimination. According to signal detection theory, perceptual bias only affects 2IFC detection, but not discrimination, while interval decision bias should be task independent. We found that correct detection was more likely when excitability before the stimulus-present interval exceeded that before the stimulus-absent interval (i.e., 8–17 Hz power was weaker before the stimulus-present interval), consistent with an effect of excitability on perceptual bias. By contrast, discrimination accuracy was unaffected by excitability fluctuations between intervals, ruling out an effect on interval decision bias. We conclude that the current state of neuronal excitability biases the perceptual experience itself, rather than the decision process.


Introduction
Ongoing neuronal activity just preceding, or in the absence of, experimental events is ubiquitous in electro-physiological recordings in the form of "spontaneous" or "prestimulus" oscillations. Two prominent types of such spontaneous activity are the ␣ and ␤ rhythm (8 -30 Hz),

Significance Statement
The current state of neuronal excitability, indexed by the power of ongoing low-frequency oscillations , has a strong influence on perception. However, the specific mechanism underlying this influence is a continuing subject of debate in neuroscience. Previous research showed that states of heightened excitability make observers report the presence of a sensory stimulus even when none is present. Heightened excitability may therefore indicate a state of strategic decision-making (i.e., observers prefer to report "Yes, I saw the stimulus") or a state of amplified subjective perception (i.e., observers experience a stimulus even when none is present). Here, we tested these alternative interpretations and found evidence that fluctuations in neuronal excitability bias the perceptual experience itself, rather than the decision strategy.
How do spontaneous neural oscillations interact with the processing of sensory events? Numerous studies have demonstrated that observers are more likely to detect visual targets that are preceded by weak prestimulus low-frequency power (8 -30 Hz), reflecting stronger neuronal excitability (e.g., Ergenoglu et al., 2004;Chaumon and Busch, 2014). But does strong excitability help observers distinguish more accurately between target presence and absence? Or does excitability simply make observers more likely to report the presence of a target, regardless of its physical presence? Recent studies demonstrated that in visual detection tasks, strong excitability increases the hit rate in target-present trials as well as the false alarm rate in target-absent trials (Limbach and Corballis, 2016;Iemi et al., 2017). Moreover, recent studies found that this effect is specific to the detection of target presence versus absence, while the accuracy of the discrimination between two alternative target types is unaffected by excitability (Iemi et al., 2017;Samaha et al., 2017b). In sum, these findings indicate that, contrary to the previously dominant view in the literature (Romei et al., 2008;van Dijk et al., 2008;Payne and Sekuler, 2014), heightened excitability does not lead to an increased perceptual precision, but to a more liberal detection bias.
These findings could be regarded as evidence refuting an effect of excitability on perception proper, showing instead an effect on observers' strategic decision bias: a deliberate preference to report "yes, I saw the target" in both target-present and target-absent trials. Accordingly, false alarms induced by a decision bias during states of strong excitability are due to a shift in decision strategy. However, not every change in bias implies a change in deliberate decision strategy (Witt et al., 2015); alternatively, excitability might modulate perceptual bias, which is a change in the amplification of the neural representation of both target and nontarget stimuli. In tasks requiring the detection of target presence versus absence, this process boosts the perception of target presence in both target-present and target-absent trials. Accordingly, false alarms induced by perceptual bias during states of strong excitability are due to genuine, albeit false, impressions of seeing a target. Evidently, decision bias and perceptual bias each lead to very different theoretical interpretations of how spontaneous brain activity is related to perception and behavior.
The present study was conducted to determine whether spontaneous fluctuations in prestimulus lowfrequency power (i.e., 5-30 Hz, a proxy for neuronal excitability) affect precision, perceptual bias, or decision bias. To this end, we used a two-interval forced choice (2IFC) paradigm, in which each trial includes a target and a nontarget interval. In 2IFC detection, a stimulus is presented in the target interval and no stimulus is presented in the nontarget interval; and observers report the interval comprising the stimulus. In 2IFC discrimination, a target stimulus (e.g., a left-tilted grating) is presented in one interval and a nontarget stimulus (e.g., a right-tilted grating) in the other interval, and observers report the interval comprising the target.
We tested three models that represent alternative hypotheses on how weak prestimulus low-frequency power, reflecting strong neuronal excitability, influences performance (for details, see Materials and Methods). If greater excitability improved perceptual precision, performance in both 2IFC detection and discrimination should be most accurate when power in both intervals is weak. If greater excitability leads to a more liberal perceptual bias, performance in 2IFC detection should be most accurate when power is weak in the target interval (enhancing the correct impression of stimulus presence), but strong in the nontarget interval (inhibiting the false impression of stimulus presence), implying that the excitability during the stimulus-present interval exceeds the excitability during the stimulus-absent interval. Since this effect reflects a bias in the perception of stimulus presence versus absence, it should be specific to the detection task. If greater excitability leads to a more liberal decision bias, performance in both 2IFC detection and 2IFC discrimination should be most accurate when power is weak in the target interval compared with the nontarget interval, indicating a strategic tendency to report the interval with the weakest power.
The results confirmed the predictions of the perceptual bias model, implying that spontaneous fluctuations in excitability, indexed by ␣ and ␤ power, bias subjective perceptual experience rather than strategic decisionmaking or precision.

Participants
Previous studies on the relationship between neuronal excitability and perception (Busch et al., 2009;Lange et al., 2013;Chaumon and Busch, 2014;Iemi et al., 2017) have typically reported samples of 12-33 participants. To ensure a robust estimate of our neurophysiological effect, we recruited a sample of 25 participants (mean age, 29.3 years; SEM, 0.75 years; 16 females, 2 left handed). All participants had normal or corrected-to-normal vision and no history of neurological disorders. Each participant took part in two sessions, one for each task, on 2 separate days within a 7-days period. One participant was excluded before EEG preprocessing and behavioral analysis, because she could not participate in the second experiment. Two participants were excluded after EEG preprocessing because of excessive artifacts. 22 participants were included in the analysis. Before the experiment, written informed consent was obtained from all participants. All experimental procedures were approved by the ethics committee of the German Psychological Society.

Stimuli
The experiment was written in MATLAB (RRID: SCR_001622) using the Psychophysics toolbox 3 (RRID: SCR_002881; Brainard, 1997;Pelli, 1997). Stimuli were presented on a black background, using a gammalinearized cathode ray tube monitor operated at 100 Hz and situated in a dark room. Low-contrast Gabor patches tilted by 10 X clockwise or counterclockwise from the vertical meridian with a diameter of 0.75 X visual angle were displayed at 10 X to the left or to the right (counterbalanced across conditions) of the fixation dot.
Each trial included two successive intervals, separated by a 2 s gap. Each interval lasted two frames (0.02 s) and was indicated by a 50% reduction in the diameter of the fixation point (Fig. 1). A Gabor stimulus was presented in one or both intervals for a duration of two frames (0.02 s). After a delay of 0.4 s following the second interval, the fixation dot turned into a question mark, which instructed participants to deliver a response with their dominant hand via button pressing, in accordance with the task instructions. After the button press, the question mark disappeared and participants received color-coded feedback: correct/incorrect responses were indicated by a green/red fixation dot, lasting 0.2 s. After the feedback, the fixation dot turned white and a new trial started. For each participant and task, an adaptive staircase procedure (QUEST; Watson and Pelli, 1983) was used to find the stimulus contrast yielding 75% accuracy. To ensure that the analysis included only trials of similar contrast, we rejected outlier trials in which the difference between the presented contrast value and the final threshold estimated by QUEST exceeded Ϯ10%. Using this procedure, we ensured that the 2IFC detection and discrimination tasks were equally difficult.

Experimental design
Participants performed a 2IFC detection and a 2IFC discrimination task in two separate sessions. Task order was counterbalanced across participants. Each session lasted ϳ1.5 h with breaks and included 700 trials divided into 14 blocks of 50 trials each. For both tasks, each trial included a target interval and a nontarget interval. The order of target and nontarget intervals within a trial was counterbalanced and randomized across trials, such that half of the trials were target-first (T1) and nontarget-first (nT1), respectively. In the 2IFC detection task, a Gabor stimulus was presented in the target interval and a blank screen was presented in the nontarget interval (Fig. 1a). Participants were informed that each trial contained a stimulus, which could appear in either interval, and were instructed to report in which interval they perceived the stimulus ("first" vs "second").
In the 2IFC discrimination task, each of the two intervals contained a Gabor stimulus: a left-tilted Gabor appeared in the target interval and a right-tilted Gabor appeared in the nontarget interval (Fig. 1c). Participants were informed that each trial contained two stimuli characterized by different tilts and were instructed to report in which interval they perceived the left-tilted target ("first" vs "second").

EEG recording and preprocessing
EEG was recorded with a 64-channel Biosemi Activ-eTwo system at a sampling rate of 1024 Hz. Electrodes were placed according to the international 10-10 system (electrode locations can be found on the Biosemi website: https://www.biosemi.com/download/Cap_coords_all.xls). The horizontal and vertical electro-oculograms were recorded by additional electrodes at the lateral canthi of both eyes and below the eyes, respectively.
The EEGLAB toolbox version 11, running on MATLAB (R2010b), was used to process and analyze the data (Delorme, 2004). Data were rereferenced to the average of all electrodes, epoched from Ϫ3.7 to 0.7 s relative to the second interval onset and downsampled to 256 Hz. The data were then filtered using an acausal bandpass filter between 0.25 and 80 Hz. Gross artifacts (eye blinks and noisy data segments) were removed manually, and entire trials were discarded when a blink occurred within a critical 0.5 s time window preceding interval onset to ensure that participants' eyes were open at interval onset. After rejecting trials with EEG artifacts and contrast outliers, the total number of trials analyzed was 680 (SEM, 5.5) and 645 (SEM, 10.5) for the detection and discrimination session, respectively.
Noisy channels were selected manually on a trial-bytrial basis for spherical spline interpolation (Perrin et al., 1989). In the detection task, we interpolated on average 8.1 channels (SEM, 1.13) in 22.3 trials (SEM, 4.74) in 21 of 22 participants. In the discrimination task, we interpolated on average 8.5 channels (SEM, 0.85) in 21.7 trials (SEM, 3.71) in 19 of 22 participants. Furthermore, the EEG data were transformed using independent component analysis (ICA), and SASICA (Semi-Automated Selection of Independent Components of the electroencephalogram for Artifact correction) (Chaumon et al., 2015) was used to guide the exclusion of independent components related to noisy channels and muscular contractions, as well as blinks and eye movements occurring before or after the critical intervals.
We then re-epoched the trials relative to the onsets of target and nontarget intervals of each trial, enabling us to analyze within-trial, between-interval fluctuations in oscil-latory power. Time-frequency analysis was conducted using a wavelet transform (Morlet wavelets, 30 frequencies; frequency range, 1-30 Hz; number of cycles increas-ing linearly from 1 to 12). Thus, a wavelet at 10 Hz was 4.4 cycles long and had a temporal resolution ( t ) of 0.14 s and a spectral resolution ( f ) of 4.53 Hz. Since wavelet

2IFC Detection
Report T Interval (1 st or 2 nd )

Decreased Excitability
Decreased Excitability b d E T E n T n T T Figure 1. Effects of perceptual bias on 2IFC detection and discrimination performance according to SDT. See Materials and Methods for predictions of the precision and decision bias models. a, In 2IFC detection, a target stimulus (T) appears in either one of two successive intervals and the task is to report which interval contained T. b, According to SDT, the internal response in the target, stimulus-present interval, R T , is compared with the internal response in the nontarget, stimulus-absent interval, R nT , and the interval yielding the stronger response is reported. If R T Ͼ R nT (below diagonal), the report is correct, otherwise it is incorrect. The perceptual bias model predicts that the accuracy of 2IFC detection reports is affected by the balance of excitability and inhibition between the stimulus-present and stimulus-absent intervals: weaker oscillatory power (i.e., stronger excitability) before the stimulus-present interval relative to before the stimulus-absent interval, is expected to boost detection accuracy (green circle); instead, stronger oscillatory power (i.e., stronger inhibition) before the stimulus-present interval relative to before the stimulus-absent interval, is expected to impair accuracy (red circle). c, In 2IFC discrimination, two successive intervals contain either a target (T) or a nontarget (nT) stimulus, and the task is to report which interval contained T. d, According to SDT, for each interval, the difference between the internal responses of the target detector and nontarget detector (i.e., distance from the diagonal; dashed line) represents the evidence of target presence in that interval (E), and the interval yielding stronger target evidence is reported. If there is more target evidence in the target interval, the response is correct, otherwise it is incorrect. The perceptual bias model predicts that increased excitability or inhibition in the target interval (left) or nontarget interval (right) affects both target and nontarget detectors equally, leaving target evidence in that interval unchanged (illustrated by a shift parallel to the diagonal without affecting the distance to the diagonal). Thus, this model predicts that between-interval fluctuations of excitability do not affect 2IFC discrimination accuracy. analysis is computed by convolving the data with a function that is extended in time, it is possible that prestimulus effects close to stimulus onset are actually affected by poststimulus data. Iemi et al. (2017) determined the extent of this contamination by estimating the t of the wavelet (Tallon-Baudry et al., 1996). Thus, we consider effects as truly "prestimulus" only if they occur at time points earlier than interval onset Ϫ t (Figs. 2a,c,3a,e,4a,d,f,h,red line).

Modeling the relationship between oscillatory power and behavioral performance
Signal detection theory (SDT) (Macmillan and Creelman, 2005; Fig. 1b,c) provides an account of behavioral performance and perceptual decision-making in 2IFC detection and discrimination tasks. For a 2IFC detection task (Fig. 1a), SDT posits that observers sample an internal response in each interval, compare the two internal responses, and report whichever interval yielded the stronger response. Thus, if the internal response during the target (stimulus-present) interval exceeds the response during the nontarget (stimulus-absent) interval, the participant makes a correct detection. Across trials, the overall accuracy in 2IFC detection depends on the relative distance between the response distributions for stimulus-present and stimulus-absent intervals (Fig. 1b, two-dimensional Gaussian distributions). For a 2IFC discrimination task (Fig. 1c), SDT posits that in each interval, observers sample the internal responses of two feature detectors selective for the target and nontarget stimulus, respectively. The relative strength of these two responses serves as an index of evidence of target presence in a given interval. Observers compare this evidence between both intervals and report whichever interval yielded the strongest evidence (Fig. 1d). If the strength of evidence for target presence in the target interval exceeds the strength of evidence in the nontarget interval, the participant makes a correct response. Across trials, overall accuracy in 2IFC discrimination depends on the relative distance between the response distributions for target and nontarget intervals (Fig. 1d, two-dimensional Gaussian distributions). Note that 2IFC detection is based on comparing the responses of a single signal detector across two intervals, while 2IFC discrimination is based on comparing the relative strength of two feature detectors across two intervals.
As described in the Introduction, different models hypothesize that prestimulus low-frequency power (i.e., 5-30 Hz, a proxy for neuronal excitability) affects perceptual precision, perceptual bias, or decision bias. Importantly, these models make specific, testable predictions regarding the relationship between prestimulus power and performance in 2IFC detection and discrimination tasks.
According to the precision model, weak prestimulus power (i.e., greater excitability) improves the accuracy in perceptual tasks by increasing the relative distance between the response distributions in the target and nontarget intervals, possibly via reduction of the trial-by-trial response variability (Iemi et al., 2017). In 2IFC detection, stronger excitability results in a greater distance between the response distributions of stimulus-present and stimulusabsent intervals. In 2IFC discrimination, stronger excitability results in a greater difference in target evidence between target and nontarget intervals. Thus, the precision model predicts that greater accuracy is related to weaker overall power in both the target and the nontarget interval in both detection and discrimination.
According to the perceptual bias model, weak prestimulus power (i.e., greater excitability) amplifies internal responses to any kind of stimulus. In 2IFC detection, such an amplification would improve accuracy if amplification happened to be stronger in the stimulus-present compared with the stimulus-absent interval, thus increasing the relative distance between the internal responses in target and nontarget intervals (Fig. 1b). In other words, 2IFC detection accuracy should be influenced by the balance of excitability and inhibition between the stimulus-present and stimulusabsent interval. In 2IFC discrimination, stronger amplification in either the target or nontarget interval would simultaneously increase the internal responses of both target and nontarget feature detectors in that interval, leaving the relative distance between their response distributions (i.e., target evidence) unchanged (Fig. 1d). In sum, the perceptual bias model predicts that 2IFC detection accuracy is greater when prestimulus power is weaker in the stimulus-present interval compared with the stimulus-absent interval, while 2IFC discrimination accuracy is expected to be unaffected by between-interval fluctuations of prestimulus power.
According to the decision bias model, weak prestimulus power influences observers' decision-making strategy rather than their perception, by increasing their tendency to report the interval with weakest power (i.e., strongest excitability). Importantly, this tendency should be task independent. For 2IFC detection, this interval bias improves accuracy if prestimulus power is weaker in the stimuluspresent compared with the stimulus-absent interval (similar to a perceptual bias). Likewise, interval bias improves 2IFC discrimination accuracy if prestimulus power is weaker in the target interval compared with the nontarget interval. Thus, in contrast to the perceptual bias model, the decision bias model predicts that both 2IFC detection and 2IFC discrimination accuracy are greater when prestimulus power is weaker in the stimulus-present/target interval compared with the stimulus-absent/nontarget interval, respectively.

Statistical analysis
The analysis included oscillatory power at all electrodes, at frequencies between 5 and 30 Hz and between Ϫ0.5 and 0 s relative to interval onset.

Generalized linear modeling
The predictions of the precision model concern the overall oscillatory power in the two intervals within a trial. Thus, for each single trial, time point, frequency, and electrode, we computed a measure, P ave , reflecting power averaged across the two intervals within a trial.
The predictions of the perceptual bias and decision bias models concern the relative oscillatory power between the two intervals within a trial. Thus, for each single trial, time point, frequency, and electrode, we computed a measure, P rel , comparing oscillatory power between the two intervals, as follows: where P T is power in the stimulus-present interval in 2IFC detection, and the target interval in 2IFC discrimination, and P nT is power in the stimulus-absent interval in 2IFC detection and the nontarget interval in 2IFC discrimination. Thus, P rel Ͼ 1 indicates that power was relatively stronger in the to-be-reported interval. P rel and P ave were rank scored to mitigate the influence of extreme values. Next, we modeled the relationship between single-trial oscillatory power, response accuracy and interval order using multilevel generalized linear modeling (GLM; Cohen and Cavanagh, 2011;Samaha et al., 2017b). Including interval order as a regressor enabled us to remove any possible contribution of the interval order from the estimation of the accuracy predictor. For each participant and for each electrode, frequency, and time point, we fit a regression model of the following form: where X is a continuous measure of oscillatory power (i.e., P ave for the precision model and P rel for the bias models), A is the accuracy (correct/incorrect) coded as a 1/Ϫ1 variable; O is the interval order (target first or nontarget first) coded as a 1/Ϫ1 variable; ␤ 0 , ␤ 1 , and ␤ 2 are the estimated coefficients; and is the residual error. This GLM corresponds to a linear regression model, where the coefficients ␤ 1 and ␤ 2 represent the independent contributions of the accuracy and the interval order predictors, respectively, in explaining the observed power (P rel or P ave ). GLMs were fit separately for detection and discrimination tasks. Relationship between overall oscillatory power (P ave ) and 2IFC accuracy; the test of the precision model. a, c, Group-level t statistics maps of the regression coefficient ␤ 1 , indicating the relationship between accuracy and P ave , in 2IFC detection and discrimination, respectively. Accuracy in neither 2IFC detection nor discrimination is related to P ave . This null effect is corroborated by the BF analysis, indicating that there is more evidence supporting H 0 than supporting H 1 (bottom insets). The maps in a and c are averaged across the electrodes comprising the cluster of significant effects illustrated in Figure 3, a and c, and masked by a p value of 0.01 using two-sided cluster permutation testing. Time 0 s indicates interval onset. The red line in a and c indicates the time points before which oscillatory activity is not contaminated by activity after interval onset (Iemi et al., 2017). b, d, Group average P ave in detection (b) and discrimination (d), shown separately for correct and incorrect trials, normalized by the P ave in all trials. The bar plots are shown for illustrative purposes for the cluster of significant effects illustrated in Figure 3, a and c. These results refute the prediction of the precision model that low P ave is related to higher accuracy.

Statistical testing and effect size
We then tested whether the regression coefficients ␤ 1 and ␤ 2 at each electrode, frequency, and time point were significantly different from zero within participants, and consistent across the sample of participants, using separate statistical tests at the participant level and at the group level. Again, this procedure was conducted separately for the 2IFC detection and 2IFC discrimination task.
At the participant level, we permuted the mapping between single trial power and single trial accuracy/interval order 1000 times, recomputing the ␤ coefficients each time. This procedure creates a within-participant null hypothesis distribution of the ␤ coefficients. The ␤ coefficients associated with the true data mapping were then converted to a z-statistic relative to the mean and SD of the distribution of the permuted data, resulting in a z-score for each participant and time-frequency-electrode point.
At the group level, we then tested whether z-scores of the ␤ 1 and ␤ 2 coefficients were significant across participants (i.e., whether their signs were consistent) using a nonparametric cluster-based permutation test, which also addresses multiple comparisons across time points, frequencies, and electrodes (Maris and Oostenveld, 2007). We obtained a distribution of z-scores under the null hypothesis by randomly permuting their signs 5000 times.
On each iteration, we tested the resulting z-scores with a two-tailed t test against zero and assessed the sum of the t values within the largest contiguous cluster of significant time-frequency-electrode points (cluster p ϭ 0.01), resulting in a distribution of t sums expected under the null hypothesis. A final p value was calculated as the proportion of t sums under the null hypothesis larger than the sum of t values within clusters in the observed data. Thus, p-values smaller than 0.01 indicate that the observed ␤ coefficients were significantly different from zero (two sided).
We computed Cohen's d to estimate the effect size of significant clusters of interest. For each time-frequencyelectrode point of the significant cluster, Cohen's d was estimated by dividing the t statistics by the square root of the number of participants. Conventionally, Cohen's d indicates whether the effect size is small Cohen, 1988;Lakens, 2013).

Bayes factor analysis
The perceptual bias model predicts a relationship between P rel and accuracy in 2IFC detection, but a null effect in 2IFC discrimination. However, in conventional inferential statistics, a nonsignificant result only indicates that the null hypothesis cannot be rejected. It does not necessarily follow that the null hypothesis is actually true; it is possible that the data might be inconclusive (e.g., due to insufficient statistical power). Thus, to directly estimate evidence supporting the null hypothesis, we estimated the Jeffreys-Zellner-Siow (JZS) Bayes factor (BF; Jeffreys, 1961;Zellner and Siow, 1980;Rouder et al., 2009). The JZS BF is a Bayesian measure of evidence, which takes the form of an odds ratio [i.e., the probability of the data under the alternative hypothesis (H 1 ) relative to that under the null hypothesis (H 0 )]. Conventionally, the BF indicates whether there is evidence supporting H 1 (␤ 0: if BF Ͼ 3) or supporting H 0 (␤ ϭ 0: if BF Ͻ 1 ⁄ 3), or whether the evidence is inconclusive (if 1 ⁄ 3 Ͻ BF Ͻ 3). For example, BF ϭ 1/3 indicates that the data are three times more likely under H 0 than under H 1 . For the significant negative t statistics, found by the cluster test, we set the prior on effect size following a Cauchy distribution with a scale factor of 0.707, as recommended by Rouder et al. (2009). We then computed for each time point the proportion of cluster electrodes and frequencies yielding evidence for H 1 and H 0 , respectively (Figs. 2a,c, 3e, insets below the time-frequency plots).

Behavior
For each participant, an adaptive staircase procedure (see Materials and Methods) adjusted stimulus contrast to ensure a proportion of 75% correct responses in both the 2IFC detection and the discrimination task. The participants included in the analysis had a mean proportion of correct detection responses of 73.2% (SEM, 0.002) and a mean proportion of correct discrimination responses of 72.5% (SEM, 0.007), indicating that the staircase procedure was successful. On average, the stimulus contrast necessary for achieving this level of performance was higher in the 2IFC discrimination task than in the 2IFC detection task (two-tailed paired-sample t test: t (21) ϭ 5.77, p Ͻ 0.001), which is consistent with previous work (Iemi et al., 2017). The group average contrast was 7.2% (SEM, 0.2) and 41.6% (SEM, 6.1) in the detection and discrimination task, respectively.

EEG
This study aimed to test three models of the relationship between low-frequency oscillatory power as a measure of excitability and performance in 2IFC detection and discrimination. To recap, the precision model predicts that correct responses are associated with weak overall prestimulus power in both intervals (i.e., low P ave ). The perceptual bias model predicts that correct responses are associated with relatively weaker prestimulus power in the target interval compared with the nontarget interval (i.e., low P rel ), but only in the detection task. The decision bias model predicts this association for both the detection and discrimination task (for details, see Materials and Methods).
To test these models, we analyzed both tasks using GLM to model within each participant the contributions of response accuracy (␤ 1 regressor) and interval order (␤ 2 regressor; target-first vs nontarget first) on either P ave (to test the precision model) or on P rel (to test the perceptual bias and decision bias models). We then used a cluster permutation test to determine whether regressors were significantly different from 0 across participants.

Precision model
The group-level statistical test of ␤ 1 (accuracy) identified significant clusters in neither 2IFC detection nor discrimination. In other words, 2IFC accuracy did not correlate with P ave across participants. To provide evidence of a true null effect of accuracy on P ave , as opposed to merely inconclusive evidence, we used BF analysis to quantify the proportion of data points providing evidence of H 1 or evidence of H 0 . We restricted this analysis to the significant time-frequency-electrode cluster found for ␤ 1 in the bias model for 2IFC detection (see below; Fig. 3a). The BF analysis revealed that for both 2IFC detection and discrimination, the proportion of data points in favor of a null effect by far outnumbered the proportion of data points in favor of an effect (H 0 Ͼ H 1 ; Fig. 2a,c, bottom insets). In sum, the relationship between P ave and 2IFC detection and discrimination accuracy was not merely weak or inconclusive, but entirely absent. These findings reject the precision model.
The group-level statistical test of ␤ 2 (interval order) identified a significant effect of interval order for 2IFC detection (p ϭ 0.003), starting from Ϫ0.5 s relative to interval onset and at frequencies between 7 and 24 Hz with an occipital topography (Fig. 4a,b). In other words, prestimulus 7-24 Hz P ave was greater in trials when a stimulus was presented in the first interval (T1) relative to . Relationship between relative oscillatory power (P rel ) and 2IFC accuracy; test of the perceptual bias and decision bias models. a, Group-level t statistics maps of the regression coefficient ␤ 1 , indicating the relationship between 2IFC detection accuracy and P rel . Correct performance is related to lower 8 -17 Hz P rel (i.e., reduced ␣ and ␤ power preceding the stimulus-present interval relative to that preceding the stimulus-absent interval). b, Group average power averaged across the window before the target interval (left) or that before the nontarget interval (middle), shown separately for correct and incorrect 2IFC detections, normalized by the average power in all trials. Group average P rel (right) shown separately for correct and incorrect trials, normalized by the average P rel in all trials. c, A time course of topographies of the significant negative cluster. Black dots represent cluster electrodes. d, Histogram of the effect size, estimated as Cohen's d, for the data within the time-frequency-electrode cluster of significant negative effects. The median value is highlighted by the blue vertical bar. e, Group-level t statistics maps of the regression coefficient ␤ 1 , indicating the relationship between 2IFC discrimination accuracy and P rel . 2IFC discrimination accuracy is not related to P rel . This null effect is corroborated by the BF analysis, indicating that there is more evidence supporting H 0 than supporting H 1 (bottom inset). f, Group average P rel shown separately for correct and incorrect 2IFC discriminations, normalized by the average P rel in all trials. The maps in a and e are averaged across the electrodes comprising the cluster of significant effects illustrated in a and c and masked by a p value of 0.01 using two-sided cluster permutation testing. Time 0 s indicates interval onset. The red line in a and e indicates the time points before which oscillatory activity is not contaminated by activity after interval onset (Iemi et al., 2017). The plots in b and f are shown for illustrative purposes for the cluster of significant effects illustrated in a and e. The negative relationship between P rel and accuracy in 2IFC detection, but not in 2IFC discrimination, confirms the perceptual bias model. trials when it was presented in the second interval (nT1; Fig. 4c). By contrast, no significant clusters were found for ␤ 2 in 2IFC discrimination (Fig. 4d,e).

Perceptual and decision bias model
The group-level statistical test of ␤ 1 (accuracy) in the 2IFC detection task yielded a significant negative cluster (p ϭ 0.005) starting from Ϫ0.344 s before interval onset and at frequencies between 8 and 17 Hz (Fig. 3a). In other words, correct detection was associated on a trial-by-trial basis with lower 8 -17 Hz P rel (i.e., reduced ␣ and ␤ power in the time window before the stimulus-present interval relative to the stimulus-absent interval; Fig. 3b). The topography of this negative effect evolved from frontocentral to parieto-occipital electrodes (Fig. 3c). The peak of this cluster was at electrode FC4, at 13 Hz, and at Ϫ0.305 s before interval onset (t (21) ϭ Ϫ5.20). Cohen's d esti-mated within this significant cluster had a median of Ϫ0.290, indicating a medium effect size (Fig. 3d). Note that this effect in 2IFC detection is predicted by both the perceptual bias and the decision bias model. No significant clusters were found for ␤ 1 in the 2IFC discrimination task (Fig. 3e). In other words, discrimination accuracy did not correlate with differences in P rel (Fig. 3f). We then tested whether there was evidence of a true null effect of accuracy on P rel , using BF analysis as described above. We restricted this analysis to the significant timefrequency-electrode cluster found for ␤ 1 in 2IFC detection (Fig. 3a). The BF analysis revealed that the proportion of data points in favor of a null effect on 2IFC discrimination accuracy by far outnumbered the proportion of data points in favor of an effect (H 0 Ͼ H 1 ; Fig. 3e, bottom inset). Hence, the relationship between 2IFC discrimination accuracy and P rel  Figure 4. Relationship between oscillatory power (P ave and P rel ) and interval order in 2IFC detection and discrimination. a, d, Group-level t statistics map of the regression coefficient ␤ 2 , indicating the relationship between interval order and P ave in 2IFC detection and discrimination, respectively. b, Topography of the significant positive cluster in 2IFC detection. Black dots represent cluster electrodes. c, Group average power averaged across the window before the stimulus-present interval (left) or that before the stimulus-absent interval (middle), shown separately for nT1 and T1 trials, normalized by the average power in all trials. Group average P ave (right) shown separately for nT1 and T1 detection trials, normalized by the average P ave in all trials. f, h, Group-level t statistics map of the regression coefficient ␤ 2 , indicating the relationship between interval order and P rel in 2IFC detection and discrimination, respectively. g, i, Group average P rel shown separately for nT1 and T1 trials, normalized by the average P rel in all trials in 2IFC detection and discrimination, respectively. The maps in a, d, f, and h are averaged across the electrodes comprising the cluster of significant effects illustrated in a and b and masked by a p value of 0.01 using two-sided cluster permutation testing. Time 0 s indicates interval onset. The red line in a, d, f, and h indicates the time points before which oscillatory activity is not contaminated by activity after interval onset (Iemi et al., 2017). The plots in c, e, g, and i are shown for illustrative purposes for the cluster of significant effects illustrated in a and b.
was not merely weak or inconclusive, but entirely absent. In sum, the results confirm the prediction of the perceptual bias model that for 2IFC detection, but not for 2IFC discrimination, relatively weak power before the target interval and strong power before the nontarget interval are related to higher accuracy. The group-level statistical test of ␤ 2 (interval order) found significant clusters in neither 2IFC detection (Fig.  4f,g) nor discrimination (Fig. 4h,i).

Excitability modulates perceptual bias rather than decision bias
What are the perceptual consequences of spontaneous fluctuations in neuronal excitability? Accumulating evidence suggests that, during states of strong neuronal excitability, indexed by weak ongoing ␣ and ␤ power, observers are more likely to report the presence of a sensory target, irrespective of its actual physical presence. Thus, contrary to the previously dominant view (Ergenoglu et al., 2004;Payne and Sekuler, 2014), strong excitability reflects a state of liberal detection bias rather than of improved perceptual precision. What is the mechanism linking fluctuations of excitability and bias? According to SDT, two alternative mechanisms are possible. On the one hand, strong excitability could indicate a state of more permissive detection strategy, during which observers prefer to report "yes, I saw the target." This mechanism is referred to as decision bias. On the other hand, strong excitability could reflect a state of increased baseline sensory processing, resulting in an amplification of the neural responses to both target and nontarget stimuli. At the behavioral level, this is paralleled by an amplification of subjective perceptual experience, during which observers "perceive" the presence of a target even when it is not physically present. This mechanism is referred to as perceptual bias. Past studies using single-interval detection tasks (Limbach and Corballis, 2016;Iemi et al., 2017) were unable to distinguish between these alternative interpretations of excitability, because standard SDT analysis cannot determine the underlying source of the bias, be it perceptual or decisional (Wixted and Stretch, 2000;Witt et al., 2015).
In this study, we addressed the issue by analyzing the relationship between low-frequency oscillatory power (5-30 Hz), as a measure of neuronal excitability, and performance in 2IFC detection and discrimination tasks, which are affected differently by perceptual bias and decision bias. The predictions of a SDT model of perceptual bias are twofold. First, 2IFC detection should be most accurate when excitability in the stimulus-present interval exceeds that in the stimulus-absent interval, reflecting an amplification of the stimulus representation in the stimulus-present interval and a dampening of the representation in the stimulus-absent interval. Second, in 2IFC discrimination, fluctuations of excitability between target and nontarget intervals should not affect discrimination accuracy. This is because a change in global excitability (i.e., not specific to a certain feature value) affects the response of all feature detectors equally, without chang-ing their relative strength, which determines the evidence for target presence. By contrast, an SDT model of decision bias posits that fluctuations in excitability influence the observer's strategic decision behavior, rather than perceptual processing. Note that a "yes" bias, as in a single-interval detection task, cannot affect decisions in a 2IFC detection task, because "yes, I saw it" is not among the given options. However, an interval decision bias predicts a tendency to report the interval with stronger excitability, regardless of perceptual task, and should therefore be manifest in both 2IFC detection and 2IFC discrimination.
To test these alternative models, we analyzed how 2IFC detection and discrimination accuracy is related to excitability in target and nontarget intervals (P rel ). We found that detection was most accurate when prestimulus ␣ and ␤ power was lower before the stimulus-present interval relative to the stimulus-absent interval (i.e., low P rel ). This effect rules out a "yes" decision bias that might have affected previous findings from single-interval detection tasks (Chaumon and Busch, 2014;Limbach and Corballis, 2016;Iemi et al., 2017). Moreover, using Bayes factor analysis we found evidence that discrimination accuracy was unaffected by between-interval fluctuations of excitability, ruling out an interval decision bias model. Together, the effect on 2IFC detection and the evidence of a null effect on 2IFC discrimination confirm the predictions of the perceptual bias model.

Excitability does not affect perceptual precision
It is important to note that the effect of perceptual bias on 2IFC detection (i.e., when excitability is specifically strong in the stimulus-present interval) merely represents a serendipitous distortion of subjective perception "in the right direction" rather than an actual improvement in perceptual precision. By contrast, the precision model predicts that overall excitability in both intervals improves detection and discrimination accuracy. However, using Bayes factor analysis we found that overall excitability (P ave ) affected neither 2IFC detection nor discrimination accuracy (Fig. 2). This result replicates, in a 2IFC paradigm, the findings of three recent studies, reporting a null effect of excitability on single-interval yes/no detection sensitivity (Limbach and Corballis, 2016;Iemi et al., 2017) and 2AFC discrimination accuracy (Iemi et al., 2017;Samaha et al., 2017b). Together, these findings challenge the long-held notion that neuronal excitability affects the accuracy of perceptual decisions (Romei et al., 2008;van Dijk et al., 2008;Payne and Sekuler, 2014). This notion has been based on the observation that successful stimulus detection (hit rate) is associated with relatively stronger excitability. Such findings have been obtained with visual (Ergenoglu et al., 2004;van Dijk et al., 2008), auditory (Leske et al., 2015), and somatosensory (Baumgarten et al., 2016) detection, and for detection of transcranial magnetic stimulation-induced phosphenes (Romei et al., 2008;Samaha et al., 2017a). However, without testing for an effect on the false-alarm rate in stimulus-absent trials, it is possible that excitability affects rather the bias to report a stimulus, irrespective of whether or not this is accurate. Indeed, recent studies analyzing signal detection measures found that increased excitability is associated with a more liberal detection bias in both vision (Limbach and Corballis, 2016;Iemi et al., 2017) and somatosensation (Craddock et al., 2017). Our results are also consistent with several experiments that found no effect of excitability on multiple-alternative forced choice (mAFC) discrimination performance (Bays et al., 2015; for a comprehensive literature review, see Iemi et al., 2017). An SDT model of perceptual bias, in fact, predicts these null findings because discrimination performance in mAFC tasks is unaffected by a "yes" bias, and a modulation of excitability does not change the discriminability between response alternatives (Iemi et al., 2017).
Accumulating evidence suggests that spontaneous neural oscillations modulate subjective, rather than objective, measures of performance in perceptual tasks. While replicating the finding that states of heightened excitability do not improve objective perceptual accuracy, two recent studies additionally demonstrated that excitability instead biases observers to report higher confidence in 2AFC discrimination (Samaha et al., 2017b) and higher visibility ratings (Benwell et al., 2017). These results can be reconciled by an SDT model of perceptual bias. For example, in a 2AFC discrimination task, confidence, unlike accuracy, is thought to depend on the absolute amount of evidence in favor of the chosen stimulus alternative, regardless of the amount of evidence against this choice (Zylberberg et al., 2012;Maniscalco et al., 2016). According to a perceptual bias model, heightened excitability amplifies evidence for both stimulus alternatives simultaneously (Iemi et al., 2017). Thereby, a perceptual bias amplifies evidence for the chosen alternative and, in turn, amplifies subjective confidence and subjective visibility, while leaving objective accuracy unchanged. Together, these results provide suggestive evidence for the perceptual bias model.

Spectral and topographical characteristics of the experimental effects
Correct detection was more likely when low-frequency power before the stimulus-present interval was weaker relative to the stimulus-absent interval (Fig. 3a-c). The effect of between-interval power fluctuations (P rel ) on detection accuracy was widely distributed over many electrodes and spanned frequencies between 7 and 18 Hz. The topography of this effect evolved from a frontocentral to parieto-occipital topography (Fig. 3c). This pattern is consistent with previous studies using single-interval perceptual tasks reporting a frontocentral (Busch et al., 2009;Achim et al., 2013), parietooccipital (Ergenoglu et al., 2004Romei et al., 2008;van Dijk et al., 2008;Lange et al., 2013;Chaumon and Busch, 2014;Mathewson et al., 2014;Samaha et al., 2017b), or widespread (Benwell et al., 2017;Iemi et al., 2017) topography. Furthermore, the broad frequency range of this effect (Fig.  3a) is consistent with recent reports (Benwell et al., 2017;Iemi et al., 2017;Samaha et al., 2017b) and is in line with studies showing that ␣ and ␤ oscillations are typically comodulated in time and colocalized in space (Bastos et al., 2015;Lakatos et al., 2016;Michalareas et al., 2016). A recent study (Watson et al., 2018) in rats demonstrated that lowfrequency power (10 -30 Hz) is negatively correlated with firing rate. Therefore, it is possible that ␤ oscillations exert an inhibitory function, similar to ␣ oscillations (Spitzer and Haegens, 2017). Together, these studies suggest that ␣ and ␤ power may reflect a similar function in regulating cortical excitability and perceptual decision-making.
In addition to detection accuracy, interval order (i.e., whether or not a stimulus is presented in the first interval) was also related to low-frequency power (Fig. 4a-c). In the detection task, low-frequency prestimulus power averaged across first and second intervals (P ave ) was significantly greater in trials when a stimulus was presented in the first interval (T1 trials) relative to trials when it was presented in the second interval (nT1 trials; Fig. 4a). During T1 trials, low-frequency synchronization (Kalcher and Pfurtscheller, 1995) induced by the first, stimulus-present interval probably leaked into the prestimulus period of the second, stimulus-absent interval. Conversely, during nT1 trials, low-frequency desynchronization due to temporal expectation and attention (Rohenkohl and Nobre, 2011) following the first, stimulus-absent interval leaked into the prestimulus period of the second, stimulus-present interval. Including the interval order in our regression model ensured that the effect of accuracy on oscillatory power was independent of effects of interval order.

Within-trial fluctuations of excitability
Past studies on the relationship between perception and excitability have typically analyzed how differences in perceptual reports were related to differences in prestimulus power across trials (Busch et al., 2009;Limbach and Corballis, 2016;Iemi et al., 2017). This across-trial approach treats individual trials as independent samples and therefore ignores the fact that data are collected in temporal order. This is potentially problematic, because it is known that both perceptual reports and excitability change over the course of an experiment. Specifically, behavioral measures such as hit rate (Boncompte et al., 2016;Carrasco-López et al., 2017) and sensitivity (Maniscalco et al., 2017) tend to decrease over time, possibly due to progressive fatigue, resulting from an exhaustion of cognitive resources. Likewise, ongoing ␣ power tends to increase over the course of an experiment (van Dijk et al., 2008), suggesting a decrease in excitability, possibly as a result of fatigue (Kaida et al., 2006). Since perceptual reports and excitability both covary across time (e.g., as a function of fatigue), their correlation could be epiphenomenal. Therefore, several studies have tried to rule out that the across-trial correlation between performance and ␣ power is confounded by fatigue by showing that the temporal factor does not explain the effect on performance (van Dijk et al., 2008;Mathewson et al., 2009).
To test the bias models in our study, we used a different approach and quantified the differences in excitability between two intervals within a trial (P rel ), instead of the differences in the absolute magnitude of excitability across trials. This approach ensures that our measure of excitability is not influenced by fatigue-related effects occurring over longer time scales. Crucially, our results show a significant corre-lation between excitability and perceptual reports, even when the effects of fatigue are ruled out. This study thus confirms that the relationship between excitability and perception is not determined by fatigue.