Pharmacological Dopamine Manipulation Does Not Alter Reward-Based Improvements in Memory Retention during a Visuomotor Adaptation Task

Abstract Motor adaptation tasks investigate our ability to adjust motor behaviors to an ever-changing and unpredictable world. Previous work has shown that punishment-based feedback delivered during a visuomotor adaptation task enhances error-reduction, whereas reward increases memory retention. While the neural underpinnings of the influence of punishment on the adaptation phase remain unclear, reward has been hypothesized to increase retention through dopaminergic mechanisms. We directly tested this hypothesis through pharmacological manipulation of the dopaminergic system. A total of 96 young healthy human participants were tested in a placebo-controlled double-blind between-subjects design in which they adapted to a 40° visuomotor rotation under reward or punishment conditions. We confirmed previous evidence that reward enhances retention, but the dopamine (DA) precursor levodopa (LD) or the DA antagonist haloperidol failed to influence performance. We reason that such a negative result could be due to experimental limitations or it may suggest that the effect of reward on motor memory retention is not driven by dopaminergic processes. This provides further insight regarding the role of motivational feedback in optimizing motor learning, and the basis for further decomposing the effect of reward on the subprocesses known to underlie motor adaptation paradigms.


Introduction
Motor adaptation tasks have traditionally been considered as investigating an exclusively implicit mechanism, driven by sensory prediction errors (Tseng et al., 2007) and unaffected by motivational feedback . Contrary to this assumption, the beneficial effects of reward and punishment during motor adaptation paradigms have been shown (Shmuelof et al., 2012Nikooyan and Ahmed, 2015;Gajda et al., 2016;Song and Smiley-Oyen, 2017). Specifically, by using reward-or punishment-based monetary feedback, it was previously shown that the latter accelerated error reduction, while the former increased retention , findings that have been, at least partially, recently replicated (Song and Smiley-Oyen, 2017). These results point toward the existence of independent mechanisms underpinning learning and retention, but also toward differential neural processes driving the effects of reward and punishment during motor adaptation tasks.
The reward system relies heavily on dopamine (DA), with DA neurons firing in response to reward and reward predictors (Volman et al., 2013;Schultz, 2016). In rodents, dopaminergic projections to the motor cortex (M1) are required for successful motor skill learning, and in particular for long-lasting storage of motor memories (Molina-Luna et al., 2009;Hosp, et al., 2011Hosp, et al., , 2013. These projections originate mainly from the rostro-lateral ventral tegmental area (VTA) and the rostro-medial portion of the substantia nigra, and thus form part of the reward mesocortico-limbic system (Hosp et al., 2011). Based on this work, it has been hypothesized that reward may improve motor memory retention by promoting plastic changes in M1 through the release of DA (Hosp and Luft, 2013). In addition, administration of levodopa (LD), a precursor of DA, improves motor learning in elderly healthy adults (Flöel, et al., 2005a(Flöel, et al., , 2008a(Flöel, et al., , 2008b and stroke patients (Flöel, et al., 2005b;Rösser et al., 2008). Indeed, do-paminergic stimulation coupled with motor rehabilitation has been proposed as a possible tool for improving motor recovery after stroke (Scheidtmann et al., 2001).
While DA is important to learn from rewards, its role in mediating the effect of punishment on adaptation is unclear. Indeed, the "single-dimension" hypothesis proposes that DA (but also any other reward-sensitive circuits) is also sensitive to punishment (Wang and Tsien, 2011), whereas the "two-dimension" hypothesis suggests that some dopaminergic neurons are sensitive only to reward, and others only to punishment (Mirenowicz and Schultz, 1996;Matsumoto and Hikosaka, 2009;Fiorillo, 2013). Moreover, another neuromodulator, namely serotonin, has been associated with the anticipation and/or the delivery of punishment (Deakin and Graeff, 1991;Amo et al., 2014;Dayan and Huys, 2015), thus making the study of punishment-related effects even more complex.
A deeper understanding of the neural mechanisms underpinning the effect of reward and punishment during motor adaptation tasks could inform attempts to potentiate the beneficial impact of motivational feedback on motor learning in health and in clinical rehabilitation. Indeed, the need to target motor recovery at multiple sites along the motor learning network by combining motor robotic therapy with pharmacotherapy and reward learning has already been pointed out (Tran et al., 2016).
We sought to investigate the role of DA during a motor adaptation task under reward or punishment conditions. To this end, we tested young healthy participants in the presence of reward-or punishment-based monetary feedback. In a placebo-controlled double-blind design, we examined the role of DA by either increasing DA availability with LD (DA precursor) or decreasing DA effects with haloperidol (DA antagonist). We predicted that manipulating the dopaminergic system would specifically alter the impact of reward-based feedback on motor memory retention.

Participants
A total of 96 participants [age 18 -40 years, 23.34 Ϯ 4.39 years (mean Ϯ SD), n ϭ 60 females] was recruited from the University College London Psychology pool who fulfilled the following criteria: (1) right-handed (as assessed with the Edinburgh handedness inventory; Oldfield, 1971); (2) 18 -45 years old; (3) no self-reported history of major medical disorders or drug abuse; (4) normal or corrected-to-normal vision; (5) no drug allergies; (6) currently taking no medication that would affect the central nervous system or interfere with the absorption of LD; and (7) not pregnant (self-report). The suitability of the participants for the pharmacological protocol was evaluated based on a review of their clinical history by a medical doctor. All participants were naïve to the experimental aims and provided written informed consent. The experiment was approved by the University Research Ethics Committee and was conducted in accordance with the principles expressed in the Declaration of Helsinki.

Cognitive scales
All participants underwent a battery of validated neuropsychological tests. The mini-mental state examination (Folstein et al., 1975) was used as a general cognitive screening tool, while the frontal assessment battery (Dubois et al., 2000) and the Stroop test (Stroop, 1935) assessed executive functions. We also evaluated apathy (apathy evaluation scale; Marin et al., 1991), depression (Beck depression inventory; Beck et al., 1961), and sensitivity to punishment and reward (SPSRQ-20;Aluja and Blanch, 2011). To control for the effect of sleep, participants were asked to sleep at least 6.5 h the night before the study day (Al-Sharman and Siengsukon, 2013). After completion of the session, participants reported whether they thought they had taken the active drug or placebo and scored their levels of alertness on a 10-point visual analog scale (0 ϭ very sleepy, 10 ϭ fully alert). All this information allowed us to control for trait and state differences across groups.

Experimental task
We used a standard visuomotor adaptation reaching task Taylor and Ivry, 2014). Participants sat with their forehead supported in front of a workstation while holding the handle of a two-joint robotic manipulandum with their dominant right arm. The forearm was stabilized by straps to a molded cast. A horizontal mirror, suspended 2 cm above the hand, prevented direct vision of the arm, but showed a reflection of a screen mounted above. Online visual feedback regarding hand position was provided by a white cursor (0.3 cm in diameter) projected onto the screen. In some blocks, the online visual feedback of the cursor was removed (no vision).
The task consisted of center-out fast ballistic movements to visual targets. Participants had to initially bring the cursor within a 1 cm 2 starting box located in front of the body's midline. Once the cursor was within the starting point, a white 0.5 cm 2 target appeared pseudorandomly in one of six positions arrayed radially at 6 cm from the start (15°, 75°, 135°, 195°, 255°, and 315°clockwise, with 0°representing 12 on a clock). Participants were instructed that, when ready, they should make a fast, accurate, "shooting" movement through the target, avoiding corrections. As the cursor crossed an imaginary 6-cm radius circle centered at the starting position, a green dot appeared at the endpoint. After 500 ms, the manipulandum returned the hand back to the start. Participants were instructed that they had to try to maintain a constant and relatively fast speed across the whole experiment. To encourage this, the target turned red or blue if the movement duration was Ͼ300 or Ͻ100 ms, respectively. This time criteria was just used as feedback, but trials were not removed based on this time (see below). In the adaptation trials, the manipulandum introduced a visuomotor perturbation, in which the cursor position was rotated 40°clockwise from the actual hand position (Fig.  1A,C).
Both the points received on a trial-by-trial basis and the cumulative score of the block were shown. Participants were informed that points had a monetary value (3.47 pence/point) and depended on performance. Participants in the reward groups started with £0 and could earn up to £30 based on the accumulated points, while those in the punishment groups were given an initial amount of £30 and lost money based on the cumulative negative points.

Experimental protocol
The study was composed of four phases (Fig. 1B). Participants initially performed a baseline (baseline 1) composed of one block (72 trials) with visual feedback and one with no visual feedback (no vision) of the cursor (nor of the endpoint green dot). After the drug/placebo administration and the waiting time, a second equivalent baseline (baseline 2) was performed. The cursor was then rotated 40°clockwise and reward/punishment feedback was provided as described above for three blocks (adaptation). To avoid the perturbation beginning at the start of a block, the first adaptation block started with six baseline trials with veridical visual feedback and no reward/punishment feedback, followed by 72 trials with the perturbation. Finally, participants were exposed to 216 (retention, three blocks) trials with no perturbation and no visual feedback (retention). Again, to avoid this change in context starting at the beginning of a block, the last adaptation block finished with six retention trials (i.e., there were 78 trials in the last adaptation block, followed by two retention blocks of 72 trials and 66 trials). The removal of visual feedback of the cursor restricts re-learning and therefore the observed gradual drift back to baseline performance represents memory retention Kitago et al., 2013;). Each block was separated by a short (Ͻ1 min) rest period.

Randomization and blinding procedure
Participants were randomly allocated to one of six groups (n ϭ 16 per group): reward-LD (R-LD), punishment-LD (P-LD), reward-haloperidol (R-Halo), punishment-haloperidol (P-Halo), reward-placebo (R-Pl), and punishment-placebo (P-Pl). After baseline 1, subjects received either 100 mg of the DA precursor LD (plus 25 mg of carbidopa) or 2.5 mg of the D1/D2-antagonist haloperidol or placebo. We used a nonselective DA-receptor antagonist as motor learning depends on both D1-and D2-receptors mechanisms (Molina-Luna et al., 2009), probably through the activation of the intracellular phospholipase-C pathway in M1 . To coincide with the peak plasma concentration of LD (Nutt and Fellman, 1984) and haloperidol (Tomassini et al., 2016), the task was restarted, respectively, after a 60-min wait for LD and placebo groups and a 120-min wait for Halo groups. During the waiting period partici-pants sat quietly in the laboratory. The randomization and administration of the drug were performed by a medical doctor, whereas the examiner and participants were naïve to the aim of the experiment and blinded to the drug/ placebo status. All participants were told that they will receive either a placebo tablet or an active drug (LD or placebo). The doses and administration times were similar to previous studies that have shown clear behavioral and neurophysiological effects for LD and haloperidol . All participants fasted for at least 2 h preceding drug/placebo intake to prevent interference with drug absorption (Nutt and Fellman, 1984). No adverse events were reported.

Data analyses
The 2D (x, y) position of the hand was collected through a custom Cϩϩ code at a sampling rate of 100 Hz. Movement onset was defined as the point at which radial velocity crossed 10% of peak velocity. Movements were considered terminated when the cursor breached the 6-cm target perimeter. Performance was quantified using angular reach direction (AD, o ), i.e., the difference between the target angle and the angular hand position at the end of the movement (Hadipour-Niktarash et al., 2007). During veridical feedback, the goal was for reach direction to be 0°. With the visuomotor perturbation, reach direction had to compensate; i.e., for a ϩ40°(clockwise) visuomotor rotation, a reach direction of -40°(counterclockwise) was required. To adjust for between-subject baseline directional biases (Ghilardi et al., 1995), AD was corrected by subtracting the average AD of the first baseline one block from the trials with cursor vision, and the average AD of the second baseline one block ("no vision") to the trials with no visual feedback of the cursor (Krakauer et al., 2005).
Reaction time (RT; time between target appearance and movement onset) and movement time (MT; time between movement onset and movement end) were calculated for Figure 1. Task and paradigm. A, Task. Participants made 6 cm reaching movements to a target. Visual feedback was perturbed by a 40°clockwise rotation (R) in adaptation phase (rotation). In no vision trials, the cursor and the hand position corresponded but there was no visual feedback. B, Study protocol. Participants completed 72 trials of baseline training with veridical visual feedback, followed by 72 baseline trails with no visual feedback (no vision). Drug (LD/haloperidol/placebo) was then administered and participants waited the corresponding waiting time (1 h for LD or placebo, 2 h for haloperidol). After that, the two baseline blocks were repeated (baseline 2). During adaptation, visual feedback was perturbed 40°clockwise for 216 trials (three blocks). To avoid this starting abruptly at the beginning of a block, the first adaptation block started with six baseline trials with veridical visual feedback, followed by 72 trials with the perturbation. Then, participants were exposed to 216 (retention, three blocks) trials with no perturbation and no visual feedback. Again, to avoid a context change at the beginning of a block, the last adaptation block finished with six retention trials (i.e., total 78 trials in last adaptation block, followed by two retention blocks of 72 trials and one block of 66 trials). C, Hand trajectories toward each target of one representative subject in the R-Pl (violet) and punish-placebo (blue) group. From left to right, Last trial toward each target of baseline 1, last trial toward each target of adaptation, last trial toward each target of retention.
Negative Results each trial. Trials in which AD exceeded 20°or was less than -60° (Tanaka et al., 2009;, or MT or RT exceeded 1000 ms or were Ͻ100 ms, were removed. This accounted for 1.67% of trials. Epochs of all kinematics were created by averaging across 6 consecutive trials (Krakauer et al., 2005;. For the purpose of analysis, the first six trials of the first adaptation block (which were still without perturbation, as described in Experimental protocol) were annexed to baseline 2, while the final six trials of the last adaptation block (without vision and no perturbation, see Experimental protocol) were considered as retention.
Data and statistical analysis were performed using MATLAB (version R2013a, The MathWorks) and IBM SPSS (version 21.0). Differences between demographics, cognitive scores, baseline MT, RT, and AD were evaluated by separate one-way ANOVAs (quantitative data) or 2 or Fisher's exact test (proportions).
We first performed repeated-measure ANOVAs for each study phase (adaptation, retention) by comparing AD with drug (placebo‫ء‬LD‫ء‬haloperidol) and feedback (reward-‫ء‬punishment) as between-subject factors, and blocks as a within-subject factor (three blocks in adaptation, three blocks in retention).
A model-based analysis was also performed. Specifically, we applied a single-rate state-space model (SSM; Thoroughman and Shadmehr, 2000;Donchin et al., 2003;Tanaka et al., 2009; to each participant's entire dataset. This has the advantage of estimating learning and retention rates from all available data, with no arbitrary selection of time points or trials of interest. The SSM took the following form: y n ϭ Ϫz n t z nϩ1 t ϭ Az n t ϩ B͑r n Ϫ z n t ͒ y n represents the angular direction (relative to target) on trial n; z n t is the state of the learner, i.e., the current estimated visuomotor mapping (rotation) with the target t; r n represents the visuomotor rotation that was imposed on trial n; r n Ϫ z n t is the error in the visuomotor mapping (i.e., cursor error). The learning rate (B) determines how much of the cursor error ͑r n Ϫ z n t ͒ is adapted for. In addition, the visuomotor mapping slowly forgets at a rate determined by the scalar parameter A (decay rate). During blocks with no visual feedback (no vision, retention phase) we assume that B ϭ 0. Therefore, in this case, the system forgets with constant A (with larger values signifying increased retention). Using the MATLAB function fmincon, for each subject we estimated A and B to minimize the squared error between trial-by-trial predicted hand direction (y t͑n͒ ) and actual trial-by-trial hand direction, subject to constraints (0 Ͻ A Ͻ 1) and (-1 Ͻ B Ͻ 1). The model's goodness of fit was determined using R 2 . As the assumption of normality was violated, we examined between-groups differences for the A and B parameters using an adjusted rank transform (ART) test (Leys and Schumann, 2010;Chan, 2014), with feedback (reward‫ء‬punishment) and drugs (place-bo‫ء‬LD‫ء‬haloperidol) as independent variables.
All data were tested for normality using the Shapiro-Wilk test and nonparametric tests were used when warranted (as indicated in the tables and text). Homogeneity of variance was evaluated using Levene test and Welch test was used when this assumption was violated. Greenhouse-Geisser (if epsilon, Ͻ 0.75) or Huynh-Feldt (if Ͼ 0.75) corrections were used when sphericity was violated (Mauchly's test). Tukey post hoc test was used when warranted. No statistical methods were used to predetermine sample sizes, but our sample sizes are similar to those reported in previous studies . Significance level was set at p Ͻ 0.05. Effect sizes were provided by phi for 2 test, Cohen's d for t tests or r score for Mann-Whitney test, partial ( 2 ) for ANOVA, and 2 for Kruskal-Wallis H test.
AD was similar across groups during baseline 1 and baseline 2 (Table 2; Fig. 2A). Apart from the R-Pl group showing slower RTs than the punish-placebo group during baseline 2 (p ϭ 0.017, Tukey post hoc test), MTs and RTs were similar across groups for baseline 1 and 2 (  [F (2,90) ϭ 3.92, p ϭ 0.023, 2 ϭ 0.08]. A post hoc Tukey test revealed that this was due to longer MTs in the haloperidol versus the LD groups [p ϭ 0.020]. Therefore, although we observed a significant drug effect on RT and MT during retention, this was consistent across reward and punishment. Figure 2A shows the AD across epochs in the six groups. All groups showed clear error-reduction in response to the visuomotor perturbation with a main effect of block [F (1.1,101.8) ϭ 708.9, p Ͻ 0.001, 2 ϭ 0.89, Greenhouse-Geisser corrected]. However, contrary to our expectations, this was not differentially affected by punishment versus reward [F (1,90) ϭ 1.69, p ϭ 0.196, 2 ϭ 0.018], or by drug status [F (2,90) ϭ 0.69, p ϭ 0.505, 2 ϭ 0.015]. for Kruskal-Wallis. R-LD, n ϭ 16; P-LD, n ϭ 16; R-Halo, n ϭ 16; P-Halo, n ϭ 16; R-Pl, n ϭ 16; P-Pl, n ϭ 16; Education, participants with Ն15 years of education; BMI, body mass index (kg/m 2 ); MMSE, mini-mental state examination; FAB, frontal assessment battery; AES-S, apathy evaluation scale, self-administered version; BDI, Beck depression inventory; SP, sensitivity to punishment; SR, sensitivity to reward; Money, GBP (£) received at the end of the session; Success rate, number of trials in which the maximum amount of points was received (i.e., four points in the reward groups and zero points in the punishment groups). Values depict the mean Ϯ SEM by averaging over consecutive epochs for each participant and group. A one-way ANOVA was used to compare mean values across groups during baseline 1 and baseline 2. A multifactorial ANOVA was used to compare mean values across groups, with feedback (reward‫ء‬punishment) and drug (LD‫ء‬haloperidol‫ء‬placebo) as between-groups factors. R-LD, n ϭ 16; P-LD, n ϭ 16; R-Halo, n ϭ 16; P-Halo, n ϭ 16; R-Pl, n ϭ 16; P-Pl, n ϭ 16; RT, in ms; MT, in ms; AD,°; Fb, feedback; D, drug. Significant results are bold.
These results did not change when average MTs and RTs during retention were added as covariates; specifically there still was a nonsignificant effect of drug [MANOVA: F (1,84) ϭ 0.51, p ϭ 0.602, 2 ϭ 0.011]. In Figure 2. Reward was associated with greater retention than punishment, independently of LD, haloperidol or placebo. A, Epoch (average across six trials) AD (°) during baseline, adaptation, and retention for the six groups (n ϭ 16 each). The x-axis indicates the number of epochs. The plots represent mean Ϯ SEM. The solid vertical line indicates the wait period after the administration of drug or placebo. The dashed vertical lines indicate the actual beginning and end of first and last adaptation blocks (i.e., the first adaptation block started with six baseline "vision" trials, and the last adaptation block finished with six retention no vision trials). B, Bar graph on the left: average (ϮSEM) AD (°) for each group during the retention phase. Black dots represent average AD for each participant. The reward groups retained significantly more than the punishment groups [F (1,90) ϭ 9.8, p ϭ 0.002, 2 ϭ 0.098] irrespective of drug status. Bar graph on the right: model parameter A (decay rate, higher values signifying larger retention, average Ϯ SEM) across groups [ART test, F (1,90) ϭ 5.51, p ϭ 0.021, 2 ϭ 0.058]. Black dots represent average decay rate for each participant; ‫ء‬p Ͻ 0.05. C, Epoch (average across six trials) AD (°) during baseline, adaptation, and retention for the combined reward groups (n ϭ 48) versus the combined punishment groups (n ϭ 48). addition, a power analysis (G‫ء‬Power 3.1.9.2) revealed our sample size gave us 91% power (1-␤) to detect a significant block‫ء‬feedback‫ء‬drug interaction effect (n ϭ 96, 2 ϭ 0.037, effect size f ϭ 0.196). This suggests that the nonsignificant effect of drug status on retention was unlikely due to an insufficient sample size, or drug-related differences in RT and MT.

Model-based analysis confirmed model-free results
To estimate learning and retention rates from all available data, we also performed a model-based analysis by applying a single-rate SSM to each participant's entire dataset (Thoroughman and Shadmehr, 2000;Donchin et al., 2003;Tanaka et al., 2009;. The model was able to explain a substantial amount of variance (R 2 : 0.79, range 0. 0.80,0.80,0.79,0.78,0.80,, with a similar goodness of fit across groups [F (5,90) [ART test,F (2,90) ϭ 1.08, p ϭ 0.344, 2 ϭ 0.023] but was influenced by feedback [ART test,F (1,90) ϭ 5.51, p ϭ 0.021, 2 ϭ 0.058], with reward leading to greater retention than punishment (Fig. 2B). The interaction between feedback‫ء‬drug status was also not significant [ART test,F (2,90)  In summary, we showed that reward caused greater retention of the newly acquired motor memory relative to punishment. However, LD and haloperidol had no effect on either error-reduction or retention.

Discussion
The aim of this study was to investigate the role of DA during a visuomotor adaptation task under reward or punishment conditions. Although we showed that rewardbased feedback enhanced motor memory retention relative to punishment, this was unaffected by dopaminergic medication that either increased (LD) or decreased (haloperidol) DA availability in the brain. particular, it did not decrease the effect of reward on motor memory retention.
Various hypotheses, not necessarily excluding each other, could explain these results. First of all, the lack of significance could be due to a small sample size. However, as described previously, a power analysis revealed we achieved 0.91 power to detect a significant block-‫ء‬feedback‫ء‬drug interaction effect, thus suggesting that the nonsignificant effect of drug status on retention was not simply due to an insufficient sample size.
Secondly, it could be that the doses of LD and/or haloperidol used here were too low to have a behavioral effect. Indeed, previous evidence has suggested a doseresponse effect of LD in regard to learning enhancement (Knecht et al., 2004). However, the oral doses used here have previously been employed in a range of studies, demonstrating clear behavioral and neurophysiological effects for both LD and haloperidol (Knecht et al., 2004;Pleger et al., 2009;de Vries et al., 2010;Adam et al., 2013;. Despite this, as we did not observe any consistent global drug effect on behavior, it is possible that the doses used here were not sufficient to modulate the dopaminergic system. To overcome this possibility, future studies should investigate at least two tasks: an "experimental" one and another in which a consistent drug effect has already been demonstrated. Additionally the between-subjects pharmacological approach, despite the advantage of directly manipulating the dopaminergic system, is nonspecific, and the administered drugs have widespread effects (Crockett and Fehr, 2014). In particular, it is well known that haloperidol acts at all levels of the central nervous system, primarily at subcortical levels, and that it also has strong antiadrenergic and weaker peripheral anticholinergic activity. Therefore, strictly speaking, our approach did not examine selectively just the dopaminergic pathways, and more studies are needed to directly and specifically investigate the dopaminergic circuitry in motor learning. Moreover, the genetic variability of DA receptors and DA cleaving or metabolizing enzymes could influence the effect of exogenous dopaminergic stimulation (Pearson-Fuhrhop et al., 2013). This confound could have been ruled out by using a within-subjects design, however this is not advisable in motor learning tasks as it introduces the problem of powerful carry-over effects (Crockett and Fehr, 2014;Huberdeau et al., 2015a). Finally, as all participants in this study received a tablet (either a placebo or an active drug), a placebo effect on retention and error-reduction in the placebo groups cannot be ruled out. Future work might wish to include a group in which no tablet is provided to discount this possibility.
Finally, it could be that the effect of reward on motor memory retention observed here is not DA dependent. On this point, we have to highlight that the current adaptation task does not disentangle the differential effects of positive or negative reinforcement on the multiple learning processes now known to influence performance (Smith et al., 2006;Taylor et al., 2014;Bond and Taylor, 2015;Huberdeau et al., 2015b;McDougle et al., 2015). For example, when participants made no vision movements we instructed them to "reach toward the target even without vision." As this instruction was relatively ambiguous, the effect of reward on retention could either be due to participants maintaining the use of an explicit strategy or reflecting a highly stable reinforcement-based learning process (Smith et al., 2006;Taylor et al., 2014;Bond and Taylor, 2015;Huberdeau et al., 2015b;McDougle et al., 2015). Although the role of DA in reinforcement-based mechanisms is well known (Schultz, 2013), its importance for other cognitive processes is less clear. For example, Anguera et al. (2010) showed that visuomotor adaptation performance was correlated with a participant's mental rotation working memory capacity. Interestingly, LD medication does not seem to improve PD patient's ability to perform a mental rotation working memory task (Crucian et al., 2014). Therefore, it is possible that the positive effects of reward on motor memory retention are dependent on a cognitive ("frontal") process unaffected by DA.
Punishment showed no effect on error-reduction during visuomotor adaptation Contrary to previous findings , we found no benefit of punishment on error-reduction in response to the perturbation. In both studies, we used a visuomotor perturbation, but the magnitude of the perturbation was larger here than in our previous paper (40°vs 30°in . As the degree of explicit awareness is known to increase as a function of perturbation size (Werner et al., 2015), error-reduction here may have involved a greater use of explicit strategies. With smaller perturbations, the motivational salience of punishment (Kahneman and Tversky, 1979;De Martino et al., 2010) may motivate participants to use a strategy (and thus show faster error-reduction) in circumstances in which they are more difficult to develop. Conversely, in the present study punishment may have been unable to potentiate further an already well-represented explicit strategy. Therefore, we think that punishment may enhance performance during adaptation paradigms by increasing the use of a cognitive strategy, and that this becomes overtly beneficial in cases where this strategy is not yet optimally implemented. However, we are aware that this would not explain all the literature results (Song and Smiley-Oyen, 2017), and further examination of the effects of punishment on motor learning is clearly warranted. Additionally, the lack of effect makes it hard to evaluate the role of DA in motor learning under punishment.

Implications and conclusions
This is the first direct pharmacological investigation on the role of DA in motor adaptation tasks under reward or punishment. Our results failed to support the hypothesis that reward increases motor retention through dopaminergic pathways. We here provide further evidence for a role of reward-feedback in adaptation tasks, but future work is needed to decompose the impact of reward on the various subprocesses involved in motor adaptation, and on the neural pathways underlining these mechanisms. In particular, this study highlights the critical role played by task instructions in investigating learning pro-cesses. In our specific case, for example, making subjects aware that the rotation was removed in the retention phase would have allowed us to decompose, and individually measure, the explicit component (disengaged by such explicit instructions) from the implicit one (Werner et al., 2015). Alternatively, we could have restricted the expression of explicit strategies through the use of a force-RT paradigm . Although we suggest that reward could be acting on the explicit component, there is also evidence that reward can modulate implicit adaptation processes (Kojima and Soetedio, 2017). Therefore, how reward and dopaminergic pharmacological manipulation influences the explicit and implicit components of adaptation is an exciting question for future research.