Cross-Modal Integration of Reward Value during Oculomotor Planning

Abstract Reward value guides goal-directed behavior and modulates early sensory processing. Rewarding stimuli are often multisensory, but it is not known how reward value is combined across sensory modalities. Here we show that the integration of reward value critically depends on whether the distinct sensory inputs are perceived to emanate from the same multisensory object. We systematically manipulated the congruency in monetary reward values and the relative spatial positions of co-occurring auditory and visual stimuli that served as bimodal distractors during an oculomotor task performed by healthy human participants (male and female). The amount of interference induced by the distractors was used as an indicator of their perceptual salience. Our results across two experiments show that when reward value is linked to each modality separately, the value congruence between vision and audition determines the combined salience of the bimodal distractors. However, the reward value of vision wins over the value of audition if the two modalities are perceived to convey conflicting information regarding the spatial position of the bimodal distractors. These results show that in a task that highly relies on the processing of visual spatial information, the reward values from multiple sensory modalities are integrated with each other, each with their respective weights. This weighting depends on the strength of prior beliefs regarding a common source for incoming unisensory signals based on their congruency in reward value and perceived spatial alignment.


Introduction
Sensory perception is not merely driven by the incoming sensory inputs but is also affected by top-down information (Gilbert and Sigman, 2007). Among the top-down influences on perception, the effect of reward is particularly important to motivate an agent, facilitate learning, and help the agent to behave adaptively given the limited capacity of both sensory and motor systems. It has been shown that reward acts to modulate selective attention when the subject's performance was directly linked to the monetary reward (Small et al., 2005;Mohanty et al., 2008;Engelmann et al., 2009), even when the monetary reward was no longer task relevant (Della Libera and Chelazzi, 2006;Hickey and van Zoest, 2012;Pooresmaeili et al., 2014;Asutay and Västfjäll, 2016;Luque et al., 2017). Rewards may act as guiding signals for learning and optimizing specific attentional operations (Chelazzi et al., 2013), and not only can increase the salience of associated stimuli (Hickey and van Zoest, 2012), but also enhance the suppression of the distractors (Della Libera and Chelazzi, 2009), change the priority maps of space (Chelazzi et al., 2014), and reduce the intrinsic neural noise in the motor and cognitive control (Manohar et al., 2015).
Nevertheless, although there is a plethora of studies on various aspects of reward, to date evidence showing the effect of reward association in one sensory modality on the perception of another modality and, more importantly, how reward from different sensory modalities interact with each other remains scarce (but see Leo and Noppeney, 2014;Pooresmaeili et al., 2014). This is despite the fact that in natural environments objects are typically multisensory and comprise multiple attributes that could potentially have either similar or distinct associative values, which underscores the importance of understating how information related to reward value is integrated across senses. From a sensory integration point of view, it is well known that our brain combines multiple sensory signals based on cue integration principles into coherent percepts to reduce uncertainty (Ernst and Banks, 2002;Kersten et al., 2004;Knill and Pouget, 2004) or use information in one sensory modality to prioritize information processing in another sensory modality that better serves as an "expert system" according to the task at hand (Macaluso and Driver, 2005). In other cases, however, perception may be dominated by information from one sensory modality, with other modalities being partially or completely disregarded (Colavita, 1974;Spence, 2009).
Critically, in addition to the physical characteristics of the input stimuli, such as their spatial or temporal properties, recent studies also provided evidence for the effect of cognitive factors on audiovisual integration. For example, expectations of stimulus characteristics can reduce reaction times in a task that requires audiovisual integration (Van Wanrooij et al., 2010;Zuanazzi and Noppeney, 2018), and emotional (Maiworm et al., 2012) and motivational (Bruns et al., 2014) factors can both influence cross-modal binding processes.
Together, the physical and cognitive characteristics of multisensory objects have been shown to affect the processing of information across senses. Although several computational models have been put forward to explain the principles governing cross-modal processing based on the physical characteristics of stimuli (Stein and Meredith, 1993;Calvert, 2001;Stein and Stanford, 2008;Chandrasekaran, 2017), the role of cognitive factors and their possible interactions with physical stimulus characteristics has remained underexplored.
In the present study, our aim was to examine the effect of (1) associated reward values and (2) spatial alignment of auditory and visual signals in the integration of reward value across sensory modalities. We systematically manipulated these factors during the course of the following two experiments: experiment 1 manipulated the congruence of reward values between the auditory and visual stimuli, and experiment 2 manipulated both reward value congruency and spatial congruency.
The experiments were based on a visually driven saccade task that involved oculomotor interference created by a distractor. Saccades provide a reliable readout of cross-modal interactions, and previous studies have investigated the influence of bimodal targets or distractors on saccade planning (Corneil et al., 2002;Doyle and Walker, 2002;Campbell et al., 2010;Heeman et al., 2016). Importantly, the transiently associated reward value of a visual distractor has been shown to modulate the magnitude of its interference with the planning of the saccades to a target (Hickey and van Zoest, 2012). We hypothesized that in such a task that highly relies on the processing of visual spatial information, visual rewards dominate the effect of bimodal distractors on saccade planning but reward value is integrated across senses if there is ambiguity as to whether unisensory auditory and visual inputs have a common source or not.

Materials and Methods of experiment 1 Participants
Twenty-four participants (18-34 years old; mean age = 25.0 years, SD = 4.0; 11 males) took part in study 1. One subject was excluded because .25% of trials had to be discarded (see the following analyses paragraph for the trial exclusion criteria). All participants were without any neurologic or psychiatric disorders and had no recent use of drugs, medications, or alcohol dependence, and all had normal or corrected-to-normal sight and normal hearing. Participants gave written informed consent after the experimental procedures were clearly explained to them. They received basic payment plus the payment that was proportional to the accumulated reward value they gained during the reward association part of the experiment. The experiment took ;1.5 h, and participants were compensated by e14. The study was conducted in full accordance with the Declaration of Helsinki and was approved by the local Ethics Committee of the Medical University Goettingen (proposal 15/7/15).

Apparatus
Eye movements were measured using the EyeLink 1000 eye tracker system (SR Research) in a desktop mount configuration, recording the right eye, with a sampling rate of 1000 Hz. The visual stimuli were presented on a calibrated LCD monitor with a refresh rate of 120 Hz, and the auditory stimuli were presented with over-ear headphones (HAD 280 audiometry headphones, Sennheiser). Stimuli were produced by MATLAB and the Psychtoolbox with custom-made scripts.

Materials and procedure
Participants were comfortably seated in a dimly lit room, with their head rested on a chinrest. The experiment included a pavlovian conditioning part to familiarize the participants with the reward associations, and a postconditioning part that required the participant to perform saccadic eye movements after they had learned the reward associations (Fig. 1). Each participant performed one block consisting of 120 trials in the conditioning part, 9 training trials to familiarize with the task needed for the postconditioning part, and five blocks consisting of 135 trials (15 repetitions of each of the nine experimental conditions) for the postconditioning part. Participants were allowed to take a break for exactly 3 min between the blocks. The eye tracker was calibrated at the beginning of the experiment and after each break by the participant fixating at 13 randomly presented targets spanning the display (EyeLink calibration type HV13).
In the conditioning part, participants learned the reward associations of two sounds (600 or 1000 Hz, sawtooth waveform) and two colors (light magenta and light mustard colors, RGB values [171,136,0] and [239,77,255]) by performing a localization task (Fig. 1A). The loudness level of the sounds and the luminance of the two colors had also been equalized (55 dB and 70 cd/m 2 , respectively), and the background color was mid-gray (RGB: [128,128,128]).
At the beginning of each trial, the participant was instructed to maintain eye position on a dot (0.3°) at the center of the screen for 1750-2250 ms. The stimulus display also contained two square-shaped frames to the left and right side of the fixation point (12°eccentricity, 10°s ize, RGB: [80,80,80] with a luminance of 70 cd/m 2 ). After the fixation period, either a sound (600 or 1000 Hz, counterbalanced across trials) was played on the right or left side of a headphone or a color change occurred in one of the two square frames. All sounds and colors were presented for a fixed duration of 250 ms. Participants were instructed to indicate the correct side of the sound or the color change by pressing the right or left arrow on the keyboard within 3.25 s from the onset of the sounds or colors. Participants received feedback about the amount of obtained reward in each trial, presented graphically as the filled area within a bar as well as a sentence indicating the amount of reward in Euro cents (all with a dark gray color, RGB: [80,80,80]). Incorrect answers led to a reward of zero, whereas correct answers were rewarded by a number drawn from a normal distribution, with high/low mean reward values of 24/4, and an SD of 0.5. The pairing of reward (high or low) with the sound or color was counterbalanced across subjects. Presentation of each modality (sound/color), reward (high/low), and side (right/left side relative to the fixation point) was pseudorandomized for each subject, and was programmed in a way that none of them repeated consecutively for more than three trials. At the end of the conditioning part, the participant was required to indicate the sound and the color that gave higher reward by pressing number "1" or "2" on a keyboard. The orders for the presentation of the two sounds and two colors were both randomized. The auditory pure tones in both of our experiments had an intensity of 55 dB and frequencies of 600 or 1000 Hz. We decided to use these frequencies since they produce approximately the same loudness based on their location on equal-loudness contours (Robinson and Dadson, 1956). In our pilot experiments, we tested a range of sound frequencies but did not observe a difference in terms of the strength of saccade trajectory deviations induced by auditory distractors in the main saccade task. The intensity of the sounds was selected based our subjective estimation of a sound level that is most comfortable for the participants, but we did not explore the effect of different sound intensities in terms of their effect on saccade curvatures.
In the postconditioning part, we used a modified version of the task used by a previous study (Hickey and van Zoest, 2012), where participants had to make an eye movement from the fixation point to a target (Fig. 1B). The participant first pressed the space bar to begin a trial. Following 1000 ms of successful fixation (eye position maintained within 1°from the center of fixation point with a size of 0.3°), the fixation point shrank in size (0.15°) and a second fixation period that jittered between 300 and 600 ms had to be fulfilled before the presentation of the experimental stimuli. If eye position offsets .1°from the center of the fixation point were detected any time within the 1300-1600 ms of the designated fixation period, this interval started over from the beginning. Successful fixations were followed by the presentation of a target (a dark gray ring, RGB: [80,80,80], diameter = 1°) that was presented either 7.38°above or below the fixation point, simultaneously with one of the following conditions: (1) with no distractor; (2) with a visual distractor (colored circle with a radius = 0.8°) either 3.6°to the right or left side of the horizontal position of the fixation point; (3) with an auditory distractor presented on the right or left side of the headphone; and (4) with a bimodal distractor, which was composed of both the visual and auditory distractors presented. The visual, auditory, and bimodal distractors were based on the stimuli used in the conditioning part (i.e., the colors and sounds that had been associated with reward values). Thus, distractors comprised nine conditions: no distractor, visual (high or low reward), auditory (high or low reward), and bimodal (high or low visual Â high or low auditory, four conditions in total). Each condition was repeated 75 times across all trials in a pseudorandomized sequence. Upon a saccade to the target (within 3°from the center of the target ring), the ring was filled in and stayed in view for another 100 ms before the disappearance of the target and distractors and the start of the next trial.
Same as the conditioning part, after each postconditioning block, the participant was also required to indicate the sound and the color that gave high reward by pressing the number 1 or 2 on the keyboard, and the orders for the presentation of the two sounds and two colors were both randomized. After the whole experiment, the participants had to fill out a questionnaire regarding the reward values of the stimuli and whether they had any preference toward any stimulus. This questionnaire was used to exclude participants that potentially had misunderstood reward associations (e.g., indicated that high reward sound or color was always presented on the right side).

Data analyses
Data were processed and analyzed by using MATLAB (version R2015a). Trials from the training sessions were excluded from the analyses. For each trial, the first saccade after the target onset was analyzed. To detect the saccades, eye position samples in a trial were smoothed using a Savitzky-Golay lowpass filter with an order of 2 and length of 10 ms (Nyström and Holmqvist, 2010). Saccade onsets were defined as the moment when a sample exceeded an angular velocity threshold of 35°/s and an acceleration of 9500°/s 2 . Saccade offsets were calculated as the first sample where the eye position velocity and acceleration fell below the aforementioned thresholds.
Trials with a saccadic latency ,80 ms or .400 ms, with a saccadic duration of .120 ms were excluded from the analyses. Additionally, trials where the angular deviation of the saccade end point or the saccade amplitude was .2.5 SDs away from the subjects' grand mean (across all trials of the session) were defined as outliers and were removed from the analysis. 1 subject was excluded because .25% of trials were discarded based on these criteria. In the remaining 23 subjects, 94.78 6 2.33% of trials were taken into the analysis.   Figure 1. Behavioral tasks. A, Reward-conditioning task: participants learned the reward associations of two colors and to sounds through a pavlovian conditioning paradigm. They were instructed to indicate the side of a sound or a color change by key presses. Correct answers received a nonzero reward, the amount of which depended on the preceding sound pitch (600 or 1000 Hz) or color (magenta or mustard). The reward was displayed graphically as the filled area of a bar as well as a number that corresponded to the amount of monetary reward in Euro cents. Incorrect answers received a reward of zero. B, Saccadic task in the postconditioning part of experiments 1 and 2: participants were instructed to make an eye movement from the fixation point to the target (a ring). In each trial, the participant started by maintaining eye position on the fixation point for a random duration between 1300 and 1600 ms before a target circle was presented either above or below the fixation point, simultaneously with one of the following conditions: (1) with no distractor; (2) with a visual distractor either to the right or left side of the horizontal position of the fixation point; (3) with an auditory distractor presented on the right or left side of the headphone; and (4) with a bimodal distractor, which was composed of both the visual and auditory distractors presented on the same side relative to the fixation point. In experiment 2, one more condition was included, in which the auditory distractor was presented on the opposite side relative to the visual distractor.
Our statistical inferences and conclusions throughout are based on the mean angular deviation of saccade trajectories, i.e., similar to previous studies of the effect of reward on oculomotor salience (Hickey and van Zoest, 2012). The calculation of the angular deviation of the saccade trajectories for a typical trial is demonstrated in Figure 2, A and B. Angular deviation was defined as the angular distance between the line joining each sampled eye position along the saccade trajectory to the saccade starting point and the straight path between the saccade starting point and the target. The mean angular deviation is thus the average angular deviation of all samples along the saccade trajectory.
Since the target position could be either in the upper or lower hemifield, and the distractor could be either on the left or right side relative to the fixation point, to examine the angular deviation, we rectified the target position to the upper hemifield, and the distractor to the right side relative to the fixation point. In each participant, the no-distractor condition served as baseline, and their averaged Positive values indicate that the saccadic trajectories deviate toward the distractor, while negative values indicate deviations away from the distractor. Visual distractors associated with the high reward value resulted in significantly larger deviations away from the distractor, compared with low-value distractors. D, Baseline corrected saccadic trajectory deviations in all bimodal conditions. From left to right: congruent-reward/both high-value, congruent-reward/both low-value, incongruent-reward/visual low-value and auditory high-value, and incongruent-reward/visual high-value and auditory lowvalue conditions. Saccade trajectories had significantly larger deviations when reward values were congruent compared with when they were incongruent across modalities. Error bars indicate the SEM. Asterisk corresponds to a significance level of p , 0.05. angular deviation was subtracted from the average angular deviation of each distractor condition with the corresponding target location (i.e., in upper or lower hemifield). Even in the absence of distractors, saccades may exhibit trajectory deviations, which are predominantly toward one of the quadrants. Subtraction of the no-distractor trajectories remedies this directional bias. However, our estimation of the baseline using the no-distractor condition could be noisy. Therefore, the baseline was determined by averaging the angular deviation of no-distractor trials that were within a 68% confidence interval around the mean in this condition (absolute z-score, ,1). We obtained the same statistical results as reported here, when the data were analyzed without the subtraction of the baseline. We analyzed the saccade average angular deviations by carrying out a series of two-tailed, paired t tests and repeated-measures ANOVAs (RANOVAs). The first preplanned analysis examined whether the effect of reward on visual distractors observed in previous studies can be reproduced in our setup and in the case of stimuli tested during the nonreward phase. In a second analysis, we focused on the effect of reward across modalities (in cases where the reward was unambiguous; i.e., congruent reward for the bimodal conditions), with factors for modalities (visual, auditory, bimodal) and reward (high vs low). A third analysis focused on reward congruency in the bimodal distractors, with factors for visual reward (high vs low), and auditory reward (high vs low). For each RANOVA, a Mauchly sphericity test (default test in MATLAB) assessed whether the assumption of sphericity is violated. In case of a violation of sphericity, a conservative lower-bound correction was applied to the df and p values of the RANOVAs. Significant effects from the omnibus RANOVAs were further investigated by performing post hoc tests (using the multcompare function in MATLAB with a Bonferroni correction for multiple comparisons). Effect sizes in RANOVAs are reported as partial h 2 (h 2 p ) and in pairwise comparisons as Cohen's d (Lakens, 2013).
Latencies of the saccades (the time between the onset of the target and distractors and the onset of the first saccade), saccade duration (the interval between the onset and the offset of a saccade), and the distance of the end point of the saccades from the target are also reported in Table 1-Table 4.

Results of experiment 1 Conditioning task
In the conditioning part, participants only had to identify the location of the presented color change/sound in each trial. The mean correct responses were 99.6% across 120 trials, with an SD of 0.6% indicating that subjects could accurately localize the colors/sounds. All subjects had learned the reward value associated with each color and sound (based on their responses at the end of conditioning block and the questionnaire data).

Saccadic task The effect of reward value of visual distractors on saccade trajectories
We first examined whether we obtained results similar to those of previous studies (Hickey and van Zoest, 2012) for visual distractors. A preplanned, paired-samples t test on high versus low visual reward showed a significant   Fig. 2A,C), similar to the results observed previously (Hickey and van Zoest, 2012).

Reward congruency
For the analysis of reward congruency, we took all bimodal conditions and performed a RANOVA with visual reward (high or low) and auditory reward (high or low) as independent factors. Note that in this analysis our bimodal conditions are represented as high/high, low/low, low/high, and high/low, with respect to visual or auditory rewards, with congruent conditions represented as high/ high or low/low (Figs. 1B, 2D). The results of RANOVA (lower bound corrected) showed a significant interaction between the visual and auditory rewards (F (1,22) = 4.72, p = 0.040, h 2 p = 0.17), indicating a significant effect of reward congruency. As can be seen in Figure 2D, saccadic trajectories exhibited stronger deviations away from the distractor when the visual and auditory reward values were congruent (mean 6 SD = À0.90 6 0.79) compared with when they were incongruent (mean 6 SD = À0.69 6 0.66). None of the main effects reached statistical significance (all F values , 1 and all p values . 0.5).
Having shown that learned reward associations of distractors affect the trajectory of visually guided saccades, in the following experiment 2 we further examined the interaction between cognitive factors (reward values of the visual and auditory components of the bimodal distractors) and the physical factors (spatial alignment of the visual and auditory components) of the distractor in a similar setup. Furthermore, we included a preconditioning part of the experiment to confirm that the effect of reward exists only after associative learning.

Materials and Methods of experiment 2 Participants
The number of the participants was calculated based on the results of experiment 1 (a two-tailed, paired t test comparing visual high versus low reward value conditions, a = 0.05, b = 0.8; the total number of participants required for this power was 32). In total, 36 participants were recruited, 4 participants were excluded (2 participants did not give correct answers to the questions regarding reward association right after the conditioning part, and were replaced by 2 new participants; 1 participant did not complete the second session; and 1 participant had poor calibration values). The final sample comprised 32 participants (age range, 19-37 years; mean   age = 25.3 years, SD = 4.0; 11 males). The experiment was conducted on 2 consecutive days. On the first day, all participants received e17; on the second day, all participants received basic payment plus the payment that was proportional to the accumulated value they gained during the reward association part of the experiment (with a maximum of e31). All procedures related to the recruitment of the participants were identical to those in experiment 1 and complied with the ethical guidelines as described for that experiment.

Apparatus
Identical to those used in experiment 1.

Materials and procedure
The conditioning tasks used for the learning of reward associations and the saccadic task of experiment 2 were identical to those of experiment 1 (Fig. 1), but the procedures and experimental conditions were modified as follows. Experiment 2 was conducted on 2 consecutive days. On the first day of the experiment, participants first performed a saccadic localization task in which they indicated the location of the stimuli (visual; auditory; or bimodal, with sounds on the same or opposite sides relative to the visual stimulus) by making an eye movement toward their perceived locations. This calibration task allowed us to estimate the perceived location of the target and distractors that were used in the subsequent main saccadic task. Visual stimuli were presented 7.38°a bove or below the fixation point (same as the target location in the saccade task) or 3.6°to the right or left side of the fixation point (same as the distractor location in the saccade task), whereas auditory and bimodal stimuli were presented either to the left or to the right side of the fixation point (same as the distractor locations in the main saccadic task). In total, data on 10 training trials and 200 experimental trials, comprising 20 trials per condition, and location were collected. The stimuli were identical to those used in the later saccade task (and the same as those used in experiment 1), with the exceptions that the color of the disks was always dark gray (RGB: [80,80,80], the same as the target color in experiment 1, and the sound had a pitch (800 Hz) that was different from those used in the later parts.
The saccadic localization task was followed by a preconditioning part that required the participant to perform the saccadic task before the learning of the reward values (Fig. 1B). Each participant performed 13 training trials to familiarize with the saccade task followed by 10 blocks consisting of 150 trials for the preconditioning part (i.e., 1500 trials in total). In the saccadic task, the distractor conditions were one of the following: (1) no distractor; (2) with a visual distractor (0.8°) either 3.6°to the right or left side of the horizontal position of the fixation point; (3) with an auditory distractor presented on the right or left side of the headphone; (4) with a bimodal distractor, which is composed of both the visual and auditory distractors presented on the same side; and (5) with a bimodal distractor, which is composed of both the visual and auditory distractors presented on the opposite side (Fig. 1B). Note that visual and auditory distractors could have been associated with either high or low reward values. Hence, bimodal distractors comprised four conditions with respect to the associated reward value in each modality (i.e., both modalities high, both modalities low, vision high and auditory low, and vision low and auditory high reward values). Since bimodal distractors could be on the same or opposite sides, this resulted in a total of eight bimodal conditions (i.e., four reward pairings Â two sides). Therefore, experimental stimuli comprised 13 conditions, each repeated 120 times (except for the no-distractor condition, which was repeated for 60 trials). The number of trials per distractor condition was increased compared with experiment 1 (120 trials per distractor condition in experiment 2 compared with 75 trials per condition used in experiment 1) to have a more robust estimation of the deviations in the trajectory of the saccades.
On the second day of the experiment, participants first performed the reward conditioning task (160 trials; i.e., 40 trials per reward condition in each modality), followed by the saccadic task (postconditioning part, same procedure, and number of trials as the preconditioning part). At the end of the session on day 2, participants performed the saccadic localization task where they indicated the perceived location of the experimental stimuli with eye movements again (10 training trials and 200 trials of the main task identical to day 1). This was to test whether training on the saccade task alters the perceived location of the target and the distractors.
Similar to experiment 1, we analyzed the mean angular deviations of the saccades by carrying out a series of RANOVAs. The first analysis focused on the effect of reward (in the cases where reward was unambiguous), with factors for modalities (visual, auditory, bimodal same side, bimodal opposite side) and reward (high vs low) and the second analysis focused on contrasting reward congruency and spatial congruency in the bimodal distractors, with factors for visual reward (high vs low), auditory reward (high vs low), and spatial congruency of the visual and auditory distractors (same vs opposite). In all RANOVAs, an extra factor time was included to model preconditioning to postconditioning changes in the angular deviation of the saccades. All statistical procedures (e.g., computation of effect sizes, post hoc tests) were identical to those used in experiment 1.

Results of experiment 2 Conditioning part
The mean correct responses were 99.4% across 160 trials, with an SD of 1%.

Saccadic localization of distractors (calibration task)
In a saccadic localization task, we asked participants to report the perceived location of the distractors by making a saccadic eye movement to them. The main purpose of this task was to estimate how participants localized the bimodal stimuli. Estimations of how well participants discriminated between bimodal stimuli on the same and opposite sides, derived from an independent saccadic localization task, could then be used as a predictor of the observed effects in the main task (see the Results of experiment 2; Fig. 3C).
The data files of one participant in this task were not properly stored by the eye tracker, and therefore this participant was removed from the following analyses (N = 31). Comparison of the distance of the end point of the saccade from the center of the visual distractor (i.e., X = 3.6°and Y = 0) revealed that participants localized bimodal stimuli on the same and opposite sides to the same location as the visual distractors (the median distance in visual, bimodal same side and bimodal opposite side conditions were: 0.36°6 0.1°, 0.34°6 0.1°, 0.36°6 0.11°before learning reward associations, and 0.41°6 1.29°, 0.67°6 1.77°, 0.68°6 1.76°after learning reward associations). A RANOVA analysis with time (two levels: before and after learning reward associations), modality (three levels: visual, bimodal same side, bimodal opposite side) revealed no significant main or interaction effect (all p values . 0.1).
Since in our experiments auditory stimuli were not spatially localized and since localization of auditory stimuli is in general imprecise, we analyzed auditory stimuli separately. As expected, auditory stimuli were localized less accurately, as demonstrated by a larger distance of the end points of saccades from the center of the visual distractor (mean 6 SD of the median distance across all trials in Pre: 5.99°6 5.53°; in Post: 4.05°6 5.35°). There was a significant reduction in the end point distance of the saccade from Pre to the Post (p = 0.002, Cohen's d = 0.6). This finding is likely due to the fact that after extensive training and experience with co-occurring audiovisual stimuli, auditory distractors were assumed to be localized to the same location as the visual distractors.

Saccadic task (main task)
Based on the exclusion criteria (identical to those in experiment 1), 5% of the trials were excluded in the Pre, and 4% of the trials were excluded in the Post.

Reward associations across different modalities
To examine whether reward value has a significant effect on saccade deviations after learning of reward associations, we next performed a RANOVA analysis with time (two levels: Pre and Post), modality (four levels: visual, auditory, bimodal same side, bimodal opposite side), and reward (two levels: high and low) as independent factors and angular deviations of saccades as the dependent factor (Fig. 3A). Note that in this analysis only conditions with unambiguous reward values (i.e., reward-congruent conditions) were included. RANOVA (lower bound corrected) revealed a significant main effects of time (Pre vs Post: F (1,31) = 11.353, p = 0.002, h 2 p = 0.26) and modality (F (3,93) = 26.73, p = 1.31 Â 10 À5 , h 2 p = 0.46). The main effect of time corresponds to significantly larger trajectory deviations in Pre (mean 6 SD across all modalities, À0.24 6 0.3) compared with Post (mean 6 SD, À0.07 6 0.3), indicating that training on the task leads to a reduction in active inhibition of distractor locations and hence decreased trajectory deviations of saccades.
The significant main effect of modality corresponded to a difference across modalities in the magnitude of Most importantly, the RANOVA also revealed a significant interaction between reward and time (F (3,93) = 5.5402, p = 0.02, h 2 p = 0.15). Post hoc analysis revealed that whereas after associative learning high-reward distractors led to significantly stronger trajectory deviations compared with low-reward stimuli (mean 6 SD: À0.12 6 0.32 and À0.03 6 0.34 for high and low reward, respectively; p = 0.047, Cohen's d = 0.36), no difference between these conditions was observed in the preconditioning phase (mean 6 SD: À0.24 6 0.31 and À0.25 6 0.31 for high and low reward, respectively, p = 0.79, Cohen's d = 0.04). This result demonstrates that reward associative learning changes the perceived salience of distractors across all modalities, and that this change in salience cannot be due to the differences in the physical attributes of stimuli before the associative learning.

Reward effects in bimodal conditions: the effect of spatial congruency and reward congruency
We next focused on the reward effects in bimodal stimuli (Fig. 3B). A four-way RANOVA with time (Pre vs Post), visual reward (high vs low), auditory reward (high vs low), and spatial location (same vs opposite sides) as factors was performed on the data of all bimodal conditions. This analysis (lower-bound correction) revealed a significant main effect of time (F (1,31) = 24.292, p = 2.63 Â 10 À5 h 2 p = 0.44) corresponding to a significant reduction in the interference of distractors in Post compared with Pre (mean 6 SD across all bimodal conditions: À0.30 6 0.36 and 0.07 6 0.31 in Pre and Post, respectively). Moreover, significant interactions were found between spatial location and time (F (1,31) = 5.68, p= 0.023, h 2 p = 0.15), and between visual reward and time (F (1,31) = 4.81, p = 0.035, h 2 p = 0.13). As mentioned in the previous section, the significant effect of time suggests that training on the task leads to a reduction in the interference of distractors. This change in the curvature of saccades across time was stronger for bimodal stimuli with auditory and visual components on the same side compared with opposite sides (mean 6 SD of angular deviations in Pre compared with Post: À0.28 6 0.42 vs 0.02 6 0.45 for same side and À0.33 6 0.38 vs À0.17 6 0.34 for opposite sides, respectively). Accordingly, post hoc tests for the significant interaction between spatial location and time showed that whereas before learning of reward values saccade curvatures of bimodal same and opposite sides were not different (p = 0.35, Cohen's d = 0.16), after associative learning these conditions significantly diverged from each other (p = 0.024, Cohen's d = 0.42), perhaps because participants paid more attention to the spatial characteristics of reward-associated sounds and visual stimuli.
Likewise, the significant interaction between visual reward and time was further examined by post hoc tests comparing bimodal distractors comprising a high-reward compared with a low-reward visual stimulus in Post and Pre. There was a trend for a significant difference between bimodal distractors with visual high-reward stimuli compared with low-reward stimuli in Post (mean 6 SD: À0.12 6 0.30 and À0.03 6 0.37, respectively; p = 0.08, Cohen's d = 0.31), whereas in Pre the differences were in the opposite direction and nonsignificant (mean 6 SD: À0.28 6 0.34 and À0.33 6 0.46 for high and low visual reward, respectively, p = 0.35, Cohen's d = 0.16). Although the difference between visual high-and low-reward stimuli in Post did not reach statistical significance, the significant interaction between visual reward and time indicates that the influence of visual reward on saccade trajectories significantly increased after associative learning. Furthermore, the effect of visual reward did not depend on whether bimodal stimuli were on the same or opposite sides (F (1,31) = 1.57, p = 0.22, h 2 p = 0.048, for the three-way interactions among visual reward, side, and time), indicating that across all bimodal conditions, the perceived salience of the bimodal distractors is predominantly determined by the reward value of the visual stimuli. One explanation for the dominance of visual reward is that the reward information of auditory stimuli is disregarded once a substantial conflict in spatial positions of the two modalities is detected. To examine this possibility, we next tested whether the strength of the influence of visual reward was correlated with the degree to which participants discriminated between bimodal stimuli on the same and opposite-sides before the conditioning. Note that, as discussed above, saccade trajectory deviations did not significantly differ between these bimodal conditions in Pre. Nevertheless, for each individual the amount of difference between bimodal-same and bimodal-opposite sides could reveal how well that participant could discriminate between the two configurations. To this end, two robust regression analyses were performed. The first analysis (Fig. 3C, top) examined the degree to which the difference in saccade deviations of same-side versus opposite-side bimodal stimuli in Pre (i.e., a proxy of how well participants discriminated the spatial characteristics of the two types of stimuli) predicted the strength of the visual reward effect (i.e., the difference in trajectory deviations of all bimodal conditions comprising a high-reward visual stimulus and those comprising a low-reward visual stimulus in Post while correcting for the differences in Pre). This analysis revealed a positive slope (regression slope = 0.35, t (30) = 2.13, p = 0.041), indicating that the effect of visual reward was stronger in participants who were more sensitive to the difference in spatial alignment of bimodal stimuli. To ensure that this effect is not driven by intrinsic correlations between saccadic parameters of different distractor conditions in the main task, we performed a second regression analysis, where the data from an independent task (saccade localization task performed before learning reward associations) was used to derive a measure of the discrimination between bimodal stimuli (Fig. 3C, bottom). To this end, we calculated the median distance of the end points of the saccades (relative to the visual distractor location) in bimodal same and opposite sides. Again, we found a significant positive slope between the magnitude of the difference between bimodal same and opposite sides and the visual reward effect (regression slope = 2.32, t (29) = 2.06, p = 0.046). These results indicate that the more a conflict in the spatial information between the two modalities was perceived, the higher the weight that was assigned to the visual compared with the auditory reward value became.

Discussion
In the present study, our aim was to examine the integration of reward value across sensory modalities during a visually guided saccade task. More specifically, we examined whether pavlovian conditioning has an effect similar to those in the previous studies that used transient rewards, and further examined whether factors such as reward congruency and spatial congruency would influence the integration of learned reward values. In experiment 1, we showed that the reward values learned via pavlovian conditioning permeated the stimuli even during the nonreward part, with a significant effect of reward value on the trajectory deviations of visual distractors and a significant effect of the value congruence in bimodal distractors. In experiment 2, we reconfirmed the effect of pavlovian conditioning on the following nonreward part as there was a significant effect of reward value across all modalities, which exclusively occurred after learning of reward associations compared with a preconditioning phase. Importantly, after extensive training with bimodal stimuli and experiencing stimuli that were clearly misaligned and those that were not, bimodal distractors were predominantly modulated by the visual reward value.
Pavlovian conditioning leads to long-lasting changes in perceived salience of reward associated stimuli Our experimental paradigm was similar to a previous study that specifically examined the impact of transient rewards of visual distractors (i.e., the color of a target associated with high or low reward value in a previous trial could serve as a distractor on next, switch trials) on saccadic trajectories (Hickey and van Zoest, 2012). However, in our study, stimulus features were never shared between the target and the distractors, and moreover, the distractors acquired their associative values through a separate pavlovian reward-conditioning task. It is therefore, remarkable that completely task-irrelevant distractors that were never rewarded during the oculomotor task could still compete with the processing of the target and lead to changes in the saccade trajectories.
This pattern of result is in line with a series of recent studies showing that associative reward learning can enhance the salience of rewarded stimuli and lead to valuedriven attentional capture, even in the following nonreward phase (Anderson et al., 2011;Theeuwes and Belopolsky, 2012;Yantis et al., 2012;Chelazzi et al., 2013;Hickey and van Zoest, 2013;Bucker and Theeuwes, 2018;Mine and Saiki, 2018). The impact of associative value during the nonreward phase depends on whether the previously rewarded stimuli serve as targets or distractors. Accordingly, stronger attentional capture by high-reward distractors can interfere with the task goals and decrease performance. Our results extend these previous findings to the domain of cross-modal associative reward value.

Reward value guides multisensory integration during oculomotor planning
In audiovisual integration, there are several bottom-up and top-down factors that help the brain decide whether audio and visual information should be integrated or segregated (Chen and Spence, 2017; but see Cao et al., 2019 for neural correlates). Over decades, numerous studies have shown that stimuli spatial misalignment and temporal asynchrony (Stein and Meredith, 1993) are the main bottom-up factors that disrupt cross-modal integration. Nevertheless, recent studies show emerging evidence that top-down factors such as expectations of stimulus characteristics (Van Wanrooij et al., 2010), emotional and motivational factors (Bruns et al., 2014), and semantic congruence (Doehrmann and Naumer, 2008) can all influence audiovisual integration (Macaluso et al., 2016). When incongruent information is provided by auditory and visual signals in the same event, conflicts may be resolved based on their individual modality precision or modality appropriateness (Welch and Warren, 1980) or the system would call for cognitive control (Pessoa, 2009).
Our results are in line with previous studies suggesting a role of top-down factors in cross-modal binding. In experiment 1, participants were asked to perform the saccadic task without prior experience with our experimental bimodal stimuli. Furthermore, the associated reward values of the visual and auditory stimuli were learned separately, without them being bound as a unity in the conditioning part. Consequently, during postconditioning phase top-down information related to the congruence in associated reward values was instrumental in determining whether co-occurring visual and auditory signals are likely to have a common source. This resulted in the significant effect of reward congruency in experiment 1, which is similar to the findings of a recent study (Sanz et al., 2018), demonstrating the importance of cross-modal semantic congruency in determining the perceptual salience of audiovisual stimuli.
Unlike experiment 1, in experiment 2, participants were asked to perform a localization task and the saccadic task before the conditioning part; hence, they were already exposed to the bimodal stimuli even before learning the reward associations of the stimuli in each modality respectively, and had extensive practice for the saccadic task. Moreover, since auditory stimuli were presented either on the same or the opposite side to the visual distractors, their reliability in predicting a common source for the two modalities was reduced. Accordingly, in experiment 2 the information related to the congruence in reward value was neither novel nor informative for deciding whether unisensory signals had common or separate sources. This led to downweighting of the auditory reward and dominance of visual reward information. The correlation of the visual reward effect with the sensitivity of participants in detecting a spatial misalignment in bimodal stimuli supports the idea that once a strong conflict in spatial relationships of unisensory signals is detected, visual reward wins over the auditory reward. This flexible weighting of reward-related information across sensory modalities is akin to the findings of previous studies that investigated the balance between multisensory integration and unisensory dominance by manipulating physical characteristics of stimuli, demonstrating the pivotal role of the reliability of the signals (Yuval-Greenberg and Deouell, 2009;Noppeney, 2015, 2018).
When a multisensory object is encountered, our brain should solve a binding problem, deciding whether inputs from different sensory modalities emanate from the same multisensory object or not (Körding et al., 2007). In doing so, different types of signals related to the physical properties of stimuli (e.g., their spatial and temporal overlap) or their high-level associative or task-related properties (reward or attention) can inform the brain of whether signals should be integrated or remain segregated. We show a gradient of integration based on the reward values that is strongest when auditory and visual signals are likely to originate from the same object and are congruent in their reward values, and weakest when a spatial misalignment is introduced leading to the dominance of visual rewards. We propose that the integration of reward values across sensory modalities that we observed follows the principles of Bayesian cue integration (Knill and Pouget, 2004;Angelaki et al., 2009). Statistically optimal, Bayesian integration has been repeatedly shown to underlie the integration of information across senses (Ernst and Banks, 2002;Alais and Burr, 2004;Ernst and Bülthoff, 2004). Bayes optimal integration entails a linear averaging between signals weighted by their reliability (i.e., inverse of the variance), resulting in the combined signal having a higher reliability than either of the individual inputs. Accordingly, if one of the input signals has higher uncertainty, more weight is assigned to the other signal. In line with this principle, we observed that whereas in experiment 1 reward values were combined across sensory modalities (demonstrated by an interaction effect between auditory and visual rewards), visual rewards dominated oculomotor responses once audition delivered unreliable information regarding the side of the audiovisual distractors (being either on the same side or opposite sides in experiment 2). This is further supported by showing that the strength of the dominance of visual rewards was correlated with the perceived misalignment of the auditory and visual stimuli (Fig. 3C). Assigning a higher weight to the reward values of vision, which in general has a higher precision in spatial localization (Welch and Warren, 1980), is an optimal strategy to deal with the uncertainty introduced by the spatial misalignment. We note, however, that our conclusion is preliminary since we did not systematically manipulate the spatial reliability of each individual unisensory signal in our current study. This could be done, for instance, by introducing different degrees of spatial noise to each signal (Heron et al., 2004). This way a gradual modulation of the weights assigned to the auditory and visual signals and their rewards as a function of their spatial uncertainty could be induced that allows the testing of predictions of a Bayes optimal integration model.

Limitations and future directions
The strength of the trajectory deviations of saccades in the distractor paradigm used in our study depends on the spatial position of the target, the distractor, and most importantly the target-distractor spatial separation, being stronger for near (i.e., a small angular separation between target and distractor ,20°) than far distractors (Van der Stigchel, 2010). Furthermore, previous studies have shown that the direction of saccade curvatures, being toward or away from the distractor, depends on the target-distractor spatial separation, with deviations toward mainly associated with close distractors and deviations away with far distractors (McSorley et al., 2009). In our experiments, visual distractors were presented on the horizontal meridian, thus having a 90°angular separation from the target location (always presented on the vertical meridian). Auditory distractors were lateralized to the left or right but were not spatially localized. Hence, both visual and auditory distractors were presented at a location that is not typically associated with the strongest saccadic deviation (i.e., far distractors). Nevertheless, we observed a significant deviation of the saccades for visual distractors, irrespective of the rewards. Auditory distractors had, however, no impact on their own, presumably because they were localized to even farther locations than visual distractors (as also supported by the saccadic localization task). An important direction for future studies is to test whether the reward modulation of the oculomotor responses to bimodal distractors depends on the strength of trajectory deviations produced by individual unimodal stimuli (e.g., compared between far and near distractors or between high-or lowsaliency auditory and visual unisensory stimuli).
The literature on cross-modal interactions suggests that audiovisual binding is strongest when visual and auditory stimuli are completely overlapping, in time as well as in space (Stein and Meredith, 1993). In our experiments, however, auditory and visual stimuli were overlapping in time, but auditory stimuli were only lateralized to the left and right but were not spatially localized. While the lack of spatial localization may account for the relatively weak effect of auditory distractors on saccade trajectories, we note that previous studies that attempted to optimally localize auditory stimuli reported comparable effect sizes for trajectory deviations of far auditory and bimodal distractors (Doyle and Walker, 2002;Walker and McSorley, 2008;Heeman et al., 2016). Furthermore, there are a number of naturalistic situations where auditory and visual stimuli are not completely colocalized but nevertheless coherent multisensory experience is elicited, as exemplified by the spatial ventriloquism effect (Chen and Vroomen, 2013). Our findings show that in these latter cases reward value can be used as an additional source of information to decide whether unisensory signals should be combined or separated. Notwithstanding, the lack of spatial localization might have resulted in an underestimation of the rewardrelated effects during oculomotor planning, and future studies will be needed to determine whether and how spatial localization of sounds impacts on the reward effects observed in our study. Moreover, further research is required to know whether the effects we observed are generalizable to other tasks (e.g., perceptual or value-based decision tasks), training procedures (e.g., training with operant conditioning, assignment of reward to combined audiovisual stimuli as opposed to separate reward values assigned to each modality), and task-contingence of reward (e.g., task-relevant vs task-irrelevant rewards).

Possible neural underpinnings
Distractors composed of different sensory modalities had been shown to have an influence on the curvature of the saccades made to a target (Doyle and Walker, 2002;Campbell et al., 2010;Heeman et al., 2016), and we provide evidence for the modulation of these effects by cross-modal reward value. Deviation of saccades toward or away from a visual distractor is proposed to be due to the top-down inhibition of the distractor-evoked responses at the level of superior colliculus, a midbrain structure that determines saccade vectors (Munoz et al., 2000). This inhibition could lead to the concurrent inhibition of neural populations that program the saccades to the target and are under a common motor map with the distractor (McPeek et al., 2003;McPeek, 2006). Our current results suggest that cross-modal reward may provide top-down information for such inhibition. Given the dense connectivity between the brain structures that encode reward value in basal ganglia and superior colliculus (Hikosaka et al., 2014), it is likely that cross-modal integration of reward value occurs at the level of superior colliculus. Future studies are needed to explore this possibility.

Conclusion
In conclusion, our results from the two experiments demonstrate that in a saccadic task that highly relies on the processing of visual spatial information, the reward values from a different sensory modality that does not render reliable spatial information can still be integrated with the reward value of the visual modality. The weighting of reward information depends on the assumptions regarding the source of visual and auditory stimuli. In case of uncertainty regarding the spatial colocalization, the congruence of reward value guides the audiovisual interactions. However, if a conflict in spatial information of unisensory signals is perceived, the modality that is more task relevant, is assigned with a higher weight for its reward information.