Strategic and Dynamic Temporal Weighting for Perceptual Decisions in Humans and Macaques

Abstract Perceptual decision-making is often modeled as the accumulation of sensory evidence over time. Recent studies using psychophysical reverse correlation have shown that even though the sensory evidence is stationary over time, subjects may exhibit a time-varying weighting strategy, weighting some stimulus epochs more heavily than others. While previous work has explained time-varying weighting as a consequence of static decision mechanisms (e.g., decision bound or leak), here we show that time-varying weighting can reflect strategic adaptation to stimulus statistics, and thus can readily take a number of forms. We characterized the temporal weighting strategies of humans and macaques performing a motion discrimination task in which the amount of information carried by the motion stimulus was manipulated over time. Both species could adapt their temporal weighting strategy to match the time-varying statistics of the sensory stimulus. When early stimulus epochs had higher mean motion strength than late, subjects adopted a pronounced early weighting strategy, where early information was weighted more heavily in guiding perceptual decisions. When the mean motion strength was greater in later stimulus epochs, in contrast, subjects shifted to a marked late weighting strategy. These results demonstrate that perceptual decisions involve a temporally flexible weighting process in both humans and monkeys, and introduce a paradigm with which to manipulate sensory weighting in decision-making tasks.


Introduction
Perceptual decisions are typically thought of as resulting from some form of accumulating samples of a stimulus over time. During this process, a decision variable is updated as evidence is integrated until a choice is made. In both human and nonhuman primates, perceptual decision-making has been studied extensively in the context of motion direction discrimination tasks, where the vast majority of stimuli provide statistically uniform sensory evidence over time (Gold and Shadlen, 2007). Despite a stationary level of expected sensory evidence, subjects often assign more weight to some stimulus epochs over others. In many instances, subjects have exhibited "early weighting," where sensory evidence presented in early epochs contributes more to choices than that in late (Huk and Shadlen, 2005;Kiani et al., 2008, Nienborg andCumming, 2009;Yates et al., 2017). In other instances, however, "late weighting" has been observed, where choices were primarily influenced by sensory evidence presented in late stimulus epochs (Tsetsos et al., 2012;Cheadle et al., 2014;Bronfman et al., 2016;Carland et al., 2016). In rodents, a mixture of either early or flat weighting profiles has been reported Scott et al., 2015;Pinto et al., 2017;Licata et al., 2017).
The diverse set of temporal weighting profiles observed across studies and species may be explained in a number of ways. One approach appeals to mechanistic models of decision-making. An early weighting strategy, for example, could be explained as a consequence of bounded accumulation (Huk and Shadlen, 2005;, which posits that sensory evidence is accumulated until reaching a bound, whereupon the decision is made. Because the remainder of the stimulus is ignored once the bound has been hit, early stimulus epochs contribute more to decisions than late. Late weighting, in contrast, may be interpreted as a consequence of leaky accumulation (Usher and McClelland, 2001), which stipulates that the representation of sensory evidence decays over time. In this model, early sensory evidence contributes less to decisions compared to late.
An alternative approach to explaining the variety in weighing strategies postulates that the temporal weighting strategy is flexible and is linked to the demands or structure of the task. This notion is supported by experiments in which weighting changes systematically with variable trial length and signal timings (Ghose 2006;Tsetsos et al., 2012;Ossmy et al., 2013;Bronfman et al., 2016), as well as by studies that explore effects of congruency between serially presented samples (Cheadle et al., 2014). Irrespective of a stipulated model or mechanism, these studies point to similar conclusions: subjects may reweigh stimulus information as dictated by the reliability of the evidence and demands of the task.
Without appeal to a specific decision-making mechanism, we set out to manipulate temporal weighting under the hypothesis that weights should be flexible and influenced by the dynamic features of the stimulus itself, either independent of or in addition to constraints imposed by integration mechanisms such as a bound or a leak.
To test this idea, we adopted a motion stimulus designed explicitly for psychophysical reverse correlation in the presence of experimenter-controlled manipulation of temporal stimulus statistics (Katz et al., 2016. The stimulus is similar to classic motion stimuli used in the study of perceptual decisions (Newsome and Paré, 1988;Britten et al., 1992), but with two crucial features: (a) the stimulus consists of seven consecutive motion pulses, each with a predetermined mean motion strength and direction, and thus can be precisely designed to carry more or less motion evidence at different epochs ( Fig. 1); (b) the stimulus is amenable to psychophysical reverse correlation analysis such that subject temporal weighting strategy may be computed directly. This motion discrimination task was performed under three temporal conditions: (1) "flat-stimulus," in which the mean motion strength per pulse was constant; (2) "earlystimulus," in which early pulses had high mean motion strength and late pulses had low; and (3) "late-stimulus," in which late pulses had high mean motion strength and early pulses had low ( Fig. 2A-C). In all conditions, the task was to report the net motion of the trial.
We found that in both time-varied conditions (earlystimulus and late-stimulus), subjects shifted their temporal weighting strategy, placing highest weight on motion pulses with the highest mean motion strength. In flatstimulus sessions, however, subjects exhibited a large range of temporal weighting strategies despite equal mean motion strength over time. Overall, these results demonstrate that temporal weighting strategies in human and monkey observers are flexible and can be adjusted to suit temporal stimulus statistics.

Subjects and apparatus
Data were collected from both monkeys and humans. Monkey data were collected from two adult rhesus macaques (one female and one male, referred to as M1 and M2 hereafter) aged 10 and 14 years, weighing 7.7 and 10 kg, respectively. All animal procedures were performed in accordance with The University of Texas at Austin animal care committee's regulations. Both M1 and M2 had standard surgery for implantation of a head-holder. Some portion of the monkey data were presented previously (Katz et al., 2016;Yates et al., 2017). Human data were collected from three subjects (all males, referred to as H1, H2, and H3), aged 23-41 years, all with normal or corrected-to-normal vision. Experiments were performed with the written consent of each observer and all proce-dures were approved by The University of Texas at Austin review board.
For both monkeys and humans, stimuli were presented using the Psychophysics Toolbox with Matlab (Mathworks) using a Datapixx I/O box (Vpixx) for precise temporal registration (Eastman and Huk, 2012). Sample stimulus presentation code is available on request. Eye position was tracked using an Eyelink eye tracker (SR Research), sampled at 1 kHz. Monkeys sat in a primate chair (Crist Instruments) and viewed stimuli on a 55-inch LCD (LG) display (resolution ϭ 1920 ϫ 1080p, refresh rate ϭ 60 Hz, background luminance ϭ 26.49 cd/m 2 ) that was corrected to have a linear gamma function. Monkeys viewed the stimulus from a distance of 118 cm (such that the screen width subtended 54 degrees of visual angle, and each pixel subtended 0.0282 degrees of visual angle).
Auditory feedback was played at the end of every trial, and fluid reward was delivered through a computercontrolled solenoid. Humans viewed stimuli on a linearized 16.5-inch OLED (LG) display (resolution ϭ 1920 ϫ 1080p, refresh rate ϭ 60 Hz, background luminance ϭ 67.22 cd/m 2 ) at a distance of 65.3 cm (such that screen width subtended 31 degrees of visual angle, and each pixel subtended 0.0163 degrees of visual angle).

Task and stimulus design
Stimulus and task design were identical between monkeys and humans unless otherwise noted. Subjects were required to discriminate the net direction of a motion stimulus and communicate their decision with an eye movement to one of two targets, placed on either side of the motion stimulus. presented in Fig. 1. A trial began with the appearance of a fixation point. Once the subject acquired fixation and held for 400 -1200 ms (uniform distribution), two targets appeared and remained visible until the end of the trial. 200 -1000 ms after target onset, the motion stimulus was presented at a range of eccentricities from 4°to 10°for a duration of 1050 ms. The fixation point was extinguished 200-1000 ms after motion offset, and the subject was then required to shift their gaze toward one of the two targets within 600 ms (saccade end points within 3°of the target location were accepted). The timing of each event was randomly and independently jittered from trial to trial (Fig. 1A).
The reverse-correlation motion stimulus contained motion toward one direction or the opposite, with varying motion strength. Spatially, the stimulus consisted of a hexagonal grid of 19 Gabor elements, 5°-7°across, scaled by eccentricity (Fig. 1B). Individual Gabor elements were set to approximate the receptive field (RF) size of a V1 neuron, and the entire motion stimulus approximated the RF size of an MT neuron (Van Essen et al., 1981). Motion was presented by varying the phase of the sinewave carrier of the Gabors. Each Gabor underwent a sinusoidal contrast modulation over time with independent random phase to prevent perceptual "pop-out" of individual drifting elements. Gabor spatial frequency (0.8 cycles/°, sigma ϭ 0.1 ϫ eccentricity) and temporal frequency 5-7 Hz, yielding velocities of 5.55-7.77°/s, respectively) were selected to match the approximate sensitivity of MT neurons (Bair and Movshon, 2004).
Each motion stimulus presentation consisted of seven consecutive motion pulses lasting 150 ms each (9 frames), producing a motion sequence of 1050 ms in duration in total. For human subjects S2 and S3, each motion pulse lasted 100 ms each (6 frames), producing a 700-ms-long stimulus. On any given pulse, a number of Gabor elements would have their carrier sine waves drift in unison to produce motion ("signal elements"), and the remaining would counterphase flicker ("noise elements"). Signal elements on any given pulse were assigned at random within the grid and all signal element drifted in the same direction. Motion strength on pulse i was defined as the proportion of signal elements out of the total number of elements, the value of which was drawn from a Gaussian distribution, X i ϳN͑ k , ͒ and rounded to the nearest integer, where k is the distribution index for the five trial types (strong left, weak left, zero-mean, weak right, strong right) and k was one of five values: -50%, -10%, 0%, 10%, and 50% (sign indicates motion in the opposite direction), and was set to 15%. Thus, while each pulse within a sequence could take on any value (and either sign/direction) from distribution N͑ k , ͒, the expectation of a sequence would be k (Fig. 1). The subjects were rewarded for selecting the target consistent with the sign of the motion pulse sequence sum (i.e., the net direction), independent of the distribution k from which the pulses were drawn.
The distributions N͑ k , ͒ were most commonly set to the values listed above but were occasionally varied to better maintain individual subject performance around threshold. Overall, humans performed sessions with strong ranging from 35% to 50% and weak ranging from 10% to 20%, with ranging from 10% to 24% coherence. Macaques performed sessions with strong ranging from 50% to 70% and weak ranging from 10% to 20%, with ranging from 8% to 24% coherence.

Temporal manipulation of stimulus
In the standard stimulus design described above, the mean of the motion strength distribution N͑ k , ͒ would be held constant throughout a stimulus presentation. In other words, the mean of the distribution from which X i was drawn was fixed at k , for pulses 1-7 ( Fig. 2A). We refer to this as the "flat-stimulus" condition and treat it as a baseline, because it is similar to most variants of the classic moving dot stimuli used in the past (Newsome and Paré, 1988;Britten et al., 1992Britten et al., , 1996Gold and Shadlen 2007). In the time-varying stimulus conditions (the earlystimulus or late-stimulus), k was varied over pulses 1-7. Fig. 2B depicts a stimulus condition in which motion strength is reduced substantially in early pulses (relative to baseline levels), but not late. In this "late-stimulus" condition, k is set to 0 for the first pulse (i ϭ 1), and reaches its expected value ( k ) by pulse 7. The transition from 0 at pulse 1 to k at pulse 7 is governed by a logistic function with parameters chosen to result in a smooth transition between the first 3 and last 3 pulses (midpoint ϭ 4, slope ϭ 0.3). Although k is near zero for the early pulses, is unchanged such that although the expectation for motion on pulse one is zero, the motion strength and direction will vary from trial to trial (see example trials in Fig. 2B). In other words, random draws of X i from distribution N͑ k , ͒ where k ϭ 0 still carry motion information, albeit less correlated with the net motion outcome of the trial as a whole. The opposite is done for the "early-stimulus" condition ( Fig. 2C), in which the first pulses maintain mean motion strength equal to k , and later pulses have a mean near zero. This stimulus design ensures that pulse sequences drawn from the k ϭ 0 Gaussian (i.e. "zero-mean trials") maintain a 0 mean throughout all 7 pulses, regardless of whether the stimulus condition is flat, early, or late. These trials were difficult because the motion strength and direction of each pulse is small and independent of the sequence, and the net motion summed to a small directional outcome. About one quarter of macaque sessions also contained frozen seed trials, in which an identical stimulus was displayed for 5% to 10% of trials. These trials summed to exactly zero and the subject was rewarded at random.
All subjects began the experiments with the flatstimulus condition. After multiple sessions of stable psychophysical performance within a condition, the stimulus was changed to either the late-or early-stimulus conditions. Finally, after multiple sessions of stable psychophysical performance under the second condition, they began the third and final condition. Subjects were exposed to only one stimulus condition per session and were not informed of which stimulus condition they were viewing before or during any given session.

Data analysis
Sessions with a minimum of 250 successfully completed trials were included in data analysis. Sessions were excluded from analysis if subject accuracy was lower than 85% for the strongest motion values (17/235 sessions for macaques, 0/52 for humans). Additionally, 30 macaque sessions were excluded from analysis for having psychophysical thresholds Ͼ2 median absolute deviations about the median. Overall, 188 and 52 sessions were included for macaques and humans, respectively, with median session lengths of 632 and 295 successfully completed trials, netting a total of 129,922 and 15,275 trials overall.
All analyses were performed in Matlab (Mathworks). Subject choices in the direction-discrimination task were analyzed with a maximum likelihood fit of a threeparameter logistic function (Wichmann and Hill 2001) assuming a Bernoulli distribution of binary choices, in which the probability of a rightward choice is p and leftward choice is 1 -p, where p is given by where x is the net motion strength value (z-scored over all sessions for each subject separately), ␣ is the bias parameter (reflecting the midpoint of the function in units of motion strength), ␤ is the slope (i.e., sensitivity, in units of log-odds per motion strength), and ␥ captures the lapse rate as the offset from the 0 and 1 bounds. Error estimates on the parameters were obtained from the square root of the diagonal of the inverse Hessian (2nd derivative matrix) of the negative log-likelihood. The temporal weighting kernel (which we also refer to as "temporal weighting strategy" or "temporal weighting profile") was computed using ridge regression via maximum likelihood. The log posterior of the psychophysical weights is given by where Yʦ͕0, 1͖ is a vector of choice on every trial and X is a matrix of the seven pulses on each trial, augmented by a column of ones (to capture bias). was estimated using evidence optimization (Sahani and Linden, 2003). Psychophysical weights are normalized by the Euclidean norm of the vector of weights. The seven temporal weights assigned to the seven motion pulses, w, were computed by using all trials within a session. These include trials where k was set to zero (i.e. "zero-mean trials", where motion on a given pulse is temporally independent of all other pulses in the sequence) and trials where k was set to a non-zero value ("signal trials", where motion is correlated over pulses). Psychophysical reverse correlation is traditionally performed on noise trials exclusively, but logistic regression effectively whitens the stimulus covariance, such that we could include all trials and increase our statistical power, regardless of whether they have correlated temporal structure. We verified the whitening step by comparing the psychophysical kernel computed on all trials to the kernel computed on only zero-mean trials and calculating the Pearson correlation between the pair of kernels (i.e., between the 7 weights of the all-trials-kernel and the 7 weights of the zero-mean-kernel) for each combination of subject and stimulus condition. This yielded 14 Pearson correlation values with a median of 0.886 ([0.819 to 0.952], 1 SEM) demonstrating a strong agreement between results of the two methods of reverse correlation for the subject-averaged data per condition. We also verified the whitening step at the level of individual sessions, using the same approach. This yielded 240 Pearson correlation values (one for each session) with a median of 0.846 ([0.829 to 0.864], 1 SEM), indicating a strong agreement between reverse correlation methods, even on single sessions.
The vector of weights, w, describes the temporal weighting adopted by the subject for a given set of trials. If the individual weights have a similar value, then that implies that the subject had weighted all pulses equally on average. If some weights are larger than others, that implies uneven weighting over time. We summarized temporal weighting by performing linear regression on the 7 weights and using the slope of the fit as a metric of temporal structure, where negative slopes indicate early psychophysical weighting and positive slopes indicate late. Comparisons of temporal weighting profiles across experimental conditions and species were assessed using the slope of the linear fit Ϯ 95% confidence intervals. Wilcoxon sign tests were used to evaluate whether slopes differed significantly from zero. ANOVA was used to assess differences in mean slopes across experimental conditions. Bartlett's test was used to evaluate differences in variance between distributions of slopes across experimental conditions. Table 1 details the statistical tests.

Results
Overall, subjects performed more than 145,000 trials of a one-interval motion direction discrimination task. After viewing a sequence of motion pulses, they indicated the net perceived direction by moving their eyes to one of two targets (Fig. 1). In addition to the usual practice of varying the net strength and direction of motion across trials, the temporal statistics of the motion stimulus were manipulated within trials (in different series of sessions). Thus, sessions varied in whether the motion stimulus offered an equal amount of motion information over time (flatstimulus condition) or whether some epochs contained more motion information than others (early-stimulus and late-stimulus conditions; Fig. 2A-C). This design is amenable to psychophysical reverse correlation such that in addition to computing standard subject performance as a function of stimulus strength, we calculated the psychophysical weights assigned by the subject to the motion stimulus over each epoch. We refer to the resulting weights as the temporal weighting strategy or temporal weighting profile. We found that both human and monkey observers shifted their temporal weighting profile in response to the differential temporal structure of motion statistics across the three stimulus conditions. We first present our subject-averaged results, followed by an ex-amination of the differences between species and individual subjects.

Temporal weighting strategies shift in response to stimulus statistics
Changes in temporal stimulus statistics led to clear shifts in the psychophysical weighting strategy in all subjects. We consider the flat-stimulus condition as a baseline, both because of the stationary statistics of the stimulus over time, and because the vast majority of stimuli used in the study of perceptual decision-making have temporally stationary statistics. In the flat-stimulus condition, subjects exhibited an inclination toward early weighting, with the highest weight on the first three pulses and then a steady decrease as time went on (Fig. 2D). The temporal weighting measurements were complimented by a standard analysis of subject psychometric performance. These indicate that observers were well engaged in the task and based their choices on the net strength and direction of the motion stimulus (Fig. 2G).
During late-stimulus sessions, subjects shifted their strategy to place higher weight on the later pulses, which more often carried high motion information and were therefore more reliably correlated with the final trial outcome. Temporal weights in the late-stimulus condition started low, increasing to a peak at the fifth or sixth motion pulse, followed by a decreased weight on the seventh (final) pulse (Fig. 2E). Although the late-stimulus condition had less motion information in early pulses, and consequently, less motion information overall compared to the flat-stimulus condition, subjects still exhibited standard psychometric performance, basing their choices on the net motion strength and direction (Fig. 2H).
In sharp contrast to the late-stimulus sessions, during early-stimulus sessions, subjects showed steep early weighting, where the first three pulses were weighted the highest followed by a large decrease (Fig. 2F). As with the late-stimulus condition, although the temporal weighting profile shifted markedly, both species exhibited standard psychometric performance (Fig. 2I).
The differences in temporal weighting strategies as a function of stimulus condition were robust and consistent across species (Fig. 3). Temporal weighting in the latestimulus condition was significantly different from the weighting in the baseline flat-stimulus condition in macaques (Fig. 3A,    In addition, no differences in temporal weighting strategy were observed between species within either the early-or late-stimulus conditions. In the flatstimulus condition, in contrast, macaques exhibited an early weighting that was substantially steeper than that exhibited by the human observers (Fig. 3A Lastly, the species-averaged psychometric functions exhibit a standard sigmoidal relationship between motion strength and choices in all stimulus conditions, demonstrating that subjects were properly engaged in the task. In the flat-stimulus condition, however, psychophysical performance was slightly decreased relative to performance in the early-and late-stimulus conditions, in both macaques ( Fig. 3C;  In summary, observers performing perceptual decisions shifted their temporal weighting strategy dynamically and placed the most value on pulses with the highest motion expectation, whenever they were located in time.

Ruling out extrema detection as a behavioral strategy
In all experiments, every trial was rewarded based on the true net direction of motion presented across the seven pulses, regardless of the underlying, generating distribution. Thus, integration of the motion information over all pulses would be ideal to maximize accuracy and reward. However, the possibility exists that subjects were not performing conventional temporal integration. For example, subjects could base their decisions on the strongest motion pulse within a trial as opposed to incorporating information from all pulses. Our stimulus design enabled us to perform a post hoc analysis to test whether subjects were performing this strategy of extrema detection (Fig. 3E).
We selected trials in which the direction of the strongest motion pulse (i.e., the pulse with the largest number of signal-carrying Gabor elements) was in conflict with the net direction of motion of the full trial (termed "inconsistent trials"). Most choices in these trials were in favor of the net direction of motion, as opposed to the direction of the extreme single pulse, in both human and macaque subjects (Fig. 3E). We then compared these inconsistent trials to trials that were matched for difficulty but in which the direction of the strongest pulse was in the same direction as the trial's net direction (termed "consistent trials"). If subjects were performing extrema detection, then performance should be worse on inconsistent trials (where the strongest pulse was in the opposite direction of the net) compared to consistent trials. In contrast to this idea, no subject performed significantly worse on inconsistent trials, demonstrating that extreme pulse strengths did not influence subject choices nonlinearly in their favor, ruling the extrema detection strategy as unlikely in this task.

Variability in temporal weighting strategy depends on stimulus condition
When averaged across sessions and subjects, temporal weighting profiles tell a fairly straightforward story: subjects adopt a late weighting strategy for the latestimulus, an early weighting strategy for the earlystimulus, and a flat-to-early weighting strategy for the flat-stimulus. Here we sought to quantify the weighing strategy at a higher resolution by looking at performance for individual subjects and sessions.
When each subject is considered individually, results were largely consistent with the average weighting profiles reported above. In the late-stimulus condition, human and macaque subjects' weighting was extremely similar (Fig. 4A). All observers exhibited a single-humped psychophysical weighting profile in which peak weight was at pulse five or six, before a dropoff on pulse seven. Even the unexpected drop in weighting of the last pulse was shared. In the early-stimulus condition (Fig. 4B), subject M1 and subject H2 exhibited fairly linear early weighting patterns, and the remaining two human subjects showed slightly higher weights on the second pulses rather than the first, though still globally consistent with early weighting. Individual performance in the flat-stimulus condition (Fig. 4C), however, was more variable than in the late and early conditions. In monkey subjects, M1 showed very strong early weighting, while M2 exhibited U-shaped weights. Human subjects deployed generally flat weights on average but did so in idiosyncratic ways compared to the very stereotyped strategies of the early and late conditions. On average, each subject changed their temporal weighting as dictated by early-and late-stimulus conditions compared to the flat-stimulus condition (Fig. 4D). Overall, temporal weighing strategies adopted in the flatstimulus condition were more variable than those adopted in the early-or late-stimulus conditions at the level of individual subjects.
When each session is considered individually, variability in temporal weighting strategy is evident both between and within each of three stimulus conditions. To quantify the degree of early versus late single-session weighting, we fitted a line to the seven temporal weights of the observer for each session and used the slope of this fit to summarize the temporal weighting profile: a positive slope indicates late weighting, a negative slope indicates early weighting, and a slope around zero indicates flat (or equal) weighting over time. The distribution of weighting slopes for all experimental sessions in the early-stimulus condition had an average of -0.079 (significantly less than zero, Wilcoxon sign test, p Ͻ 0.0001), with no single individual sessions having a slope greater than zero (Fig. 5A). The average slope for all late-stimulus sessions was 0.051 (significantly greater than zero, Wilcoxon sign test, p Ͻ 0.0001), with only 2 of 42 sessions having a slope less than zero. These distributions of weighting slopes reveal distinct populations across conditions (ANOVA, p Ͻ 0.0001), indicating that even at the resolution of single sessions, distinct strategies were adopted during the early-and late-stimulus conditions. The distribution of weighting slopes from the flat-stimulus condition had a mean of -0.0356, denoting slight early weighting (significantly less than zero, Wilcoxon sign test, p Ͻ 0.0001), but also differed in that it had a considerably larger range of results. The standard deviation of flat-stimulus weighting slopes was more than double that of the early-or latestimulus weighting slope distributions (Bartlett's test, flatto-early, p Ͻ 0.0001; flat-to-late, p Ͻ 0.0001), indicating that subjects adopted a larger variety of temporal weighting strategies in this condition. It is worth noting that some of the variance in all three of the distributions comes from noise inherent to fitting a two-parameter linear model to the seven weights that constitute the temporal weighting strategy; nevertheless, the difference in distribution widths is substantial and therefore likely meaningful.

Relationship between temporal weighting and psychometric performance
We next sought to examine the relationship between temporal weighting strategies and psychometric performance in the direction discrimination task. We compared the slope of the temporal weights to psychophysical threshold (i.e., the motion strength at which subject performed at 75% correct) for each stimulus condition (Fig.  5B). During the flat-stimulus condition, a negative correlation was present (r ϭ -0.29, p Ͻ 0.001), indicating that adopting an early weighting strategy is detrimental to psychophysical performance. The early-stimulus sessions exhibited a positive correlation between temporal weighting slope and psychophysical threshold (r ϭ 0.46, p ϭ 0.038), indicating that in the early-stimulus condition, an early weighting strategy is preferable. Little to no correlation was observed in the late-stimulus sessions (r ϭ 0.05, p ϭ 0.75).
Perhaps more compelling was the relationship between psychophysical threshold and the energy of the temporal weights, where energy was measured as the sum of the squared residuals of each weight from the mean of the seven weights (Fig. 5C). This measurement gives us an estimation of variation or deviation from a consistent, flat weighting scheme. Here, flat-stimulus sessions showed a strong positive relationship between threshold and weighting energy (r ϭ 0.40, p Ͻ 0.0001), demonstrating that during flat-stimulus sessions, employing weights that are highly variable from temporal uniformity (i.e., have high energy) is detrimental to psychophysical perfor- mance. Late-stimulus sessions showed a moderate positive correlation (r ϭ 0.31, p ϭ 0.048), and early-stimulus sessions showed no obvious linear relationship (r ϭ -0.004, p ϭ 0.99). Taken together, larger variability in weighting and higher energy appear to be detrimental toward psychometric performance. These were most pronounced in the flatstimulus condition, offering a potential explanation for the slight and unexpected decrease in psychophysical behavior during the flat-stimulus relative to early-and latestimulus conditions (Fig. 3C, D).

Discussion
We used psychophysical reverse correlation in the context of manipulations of temporal stimulus statistics to examine observers' ability to update their temporal weighting strategy to match the time course of available evidence in a dynamic motion discrimination task. First, we found that when motion strength was systematically varied over time within a stimulus presentation, subjects changed their temporal weighting strategy to weight the periods of strong motion more heavily than those with weak motion. Second, weighting strategies were rather consistent across species and subjects, with the exception of the flat-stimulus condition. Third, session-tosession variability in strategy was greater in the flatstimulus condition than in the late-and early-stimulus conditions. Each of these findings is discussed in more detail below.

Temporal weighting likely reflects a combination of dynamic sensory reweighting and decision-making mechanisms
The observation of early sensory evidence exerting a larger effect on decisions than late evidence (i.e., early weighting) has been identified in prior work and has been interpreted within the context of a drift diffusion decisionmaking model. Early weighting is often interpreted as a straightforward consequence of accumulation to a decision bound-sensory data arriving after the bound has been hit does not impact the accumulator (Huk and Shadlen, 2005;Okazawa et al., 2018;Kawaguchi et al., 2018). Just as past work has taken such early weighting as a signature of bounded accumulation, late weighting has been posited to reflect leaky integration. However, such models have been increasingly updated to accommodate either sort of behavioral signature (Usher and McClelland, 2001;Tsetsos et al., 2012;Bronfman et al., 2016). Thus, while time varying weighting has been identified before, it is almost always discussed as diagnostic about the structure of a decision-making mechanism, i.e., perfect or leaky integration to a bound (fixed or collapsing).
The shifts we identified in temporal weighting strategies show that time-varying weighting of a stimulus is a flexible strategy that adapts to the statistical structure of the stimulus. This flexibility highlights the possibility of a more direct reweighting of the sensory signal itself, regardless of downstream impacts, such as a bound or a leak in the sensory integration system. Temporal weighting strategies need not be solely the result of static decisionmaking mechanisms, but rather could reflect a dynamic strategy for directly weighting incoming stimulus. Another group made a similar observation (Cheadle et al., 2014), but in contrast to our findings, their results highlighted sequential dependencies within single trials and were interpreted via an appeal to normalization. Such normalization of evidence could be a part of many decision mechanisms, while the strategic shifts we identified here point to the possibility of a more general and flexible mechanism of dynamic reweighting of sensory evidence. The relationship between psychometric performance (75% psychophysical threshold) and temporal weighting (slope of linear fit to temporal weights), over all sessions across the three stimulus conditions. C, The relationship between psychometric performance (75% psychophysical threshold) and temporal weighting energy (sum of squared errors of temporal weight values from their mean), over all sessions across the three stimulus conditions.
By demonstrating an adaptive weighting strategy that easily shifts toward the most reliable motion information, we suggest that temporal weighting strategies could be interpreted as a gain on the incoming stimulus, rather then byproducts of mechanisms beyond the sensory stage of processing. Indeed, even when presenting a temporally uniform (flat) stimulus, the neural representation of that stimulus will impose its own time-varying signal-to-noise properties on whatever downstream circuits may receive that information for integration or other such computations (Osborne et al., 2004;Churchland et al., 2010;Yates et al., 2017). It is therefore possible that changes in temporal weighting strategy in the presence of temporally dynamic stimuli are due to direct reweighting of the timevaried responses in sensory circuits. It remains to be seen whether the observed timevarying weighting in sensory brain areas can be changed in response to temporal manipulations of the stimulus of the sort we employed, but the well documented effects of temporal attention in multiple visual cortical areas (Ghose and Maunsell, 2002) lend credence to this hypothesis. Likewise, changes in spike-count correlation structure with task instruction have been shown to reflect feedback in early sensory areas (Bondy et al., 2018), suggesting a possible source for context-dependent reweighting in the current experiments as well. Notably, our data do not rule out the impacts of decision mechanisms. The existence of a bound at later stages of decision formation could still interact with stimulus reweighting. This could be further sculpted by urgency signals or time-varying bounds (Ditterich, 2006;Bogacz et al., 2006;Churchland et al., 2008;Cisek et al., 2009;Okazawa et al., 2018). In fact, a potential example of such an interaction between stimulus reweighting and a bounded decision mechanism might be present in the late weighting behavior we observed, which often manifested with a seemingly idiosyncratic, low weight on the final pulse. Although subjects clearly down-weighted the first few pulses, and upweighted pulses 5 and 6, the low weight on the final pulse could be explained as a byproduct of achieving the bound before the end of the stimulus, even in the late-stimulus condition.

Increased variability during the flat-stimulus condition provides insights into previous variability in the literature
Variability in temporal weighting strategy during the flat-stimulus condition was far larger than in either the early-or late-stimulus conditions. This substantial variability is of general relevance to the study of evidence accumulation, because it is typically performed using stimuli that are similar to our flat-stimulus condition, in that their expectation is stationary over time. Although the average weighting strategies for both humans and macaques in the flat-stimulus condition trend toward early weighting, session-by-session analysis of weighting slopes revealed robust variability (Fig. 5). Few if any prior studies have characterized individual session strategies, likely owing to low statistical power of alternate designs that rely on post hoc characterization or infrequent probe trials. Our results suggest that even individual subject averages may gloss over strategic variability within the observer that occurs over sessions. Likewise, even the relatively high-resolution session averages we present here may mask variability over single trials, variability that current trial-based psychophysical methods lack the resolution to resolve. Consequently, all temporal weighting strategies presented here (and elsewhere, as far as we know) are computed as an average over multiple trials, each with a potentially unique temporal weighting strategy.
The large session-by-session variability in weighing strategies observed here may serve to reconcile those presented elsewhere. In the flat-stimulus condition, all time points (i.e., pulses) are equally informative of the trial outcome, and thus the flat-stimulus condition is more forgiving of different temporally biased weighting strategies compared to the early and late conditions, for which only approximately half of the stimulus contained informative evidence on average. Thus, increased variability in weighting strategies during the flat-stimulus condition compared to early-and late-stimulus conditions is likely a consequence of temporally uniform stimulus statistics-a feature of most evidence accumulation studies.
The consistency of temporal weighting across species displayed in the late and early stimulus conditions also suggests that, at least for humans and macaques, interspecies differences need not be a major player in variability of weighting. This is of possible broader interest, for example, in linking to rodent work , Scott et al., 2015, Morcos and Harvey 2016, Pinto et al., 2017, Odoemene et al., 2017, Licata et al., 2017.
One discrepancy across species was present in the flat-stimulus condition, in which macaque subjects (on average, but most pronounced in M1) displayed an earlyweighting strategy (despite flat stimulus expectation) compared to the flat-weighting strategy displayed by humans. This could be for a number of reasons. Macaques performed many more trials and sessions than human subjects, raising the possibility that extensive training may result in faster decisions, based on early epochs of the stimulus. This may be further accentuated by a desire to perform more trials and obtain more liquid reward (a factor not included in experiments with human subjects). While such a strategy does not in fact change the trial duration or, in turn, the speed-accuracy trade-off, it might factor into macaques' behavior. It is noteworthy that the species difference is present only in the flat-stimulus condition, and not the time-variant conditions. We believe this is because the flat-expectation and fixed-duration design is lenient with respect to temporal weighting, granting subjects the liberty to adopt any number of temporal weighting strategies (Fig. 5). This is very different from the time-varying conditions, which place clear constraints on the temporal weighting strategies that would benefit the subject. These considerations may serve to reconcile past conflicting results in different task designs and species and inform new work going forward. integration and temporal integration (Katz et al., 2015;.
In summary, past work has used reverse correlation and time-varied stimuli to probe temporal integration. In the present study, we used a reverse correlation task in the context of tractable manipulations of stimulus statistics, allowing for direct control over a subject's temporal weighting strategy. Although the neural correlates of such changes remain uncertain, the ability to both manipulate and characterize temporal weighting strategies should provide a powerful tool for neurophysiological experiments to come.