The relationship between reinforcement and explicit control during visuomotor adaptation

Codol, Olivier; Holland, Peter J.; Galea, Joseph M.

doi:10.1038/s41598-018-27378-1

Download PDF

Article
Open access
Published: 14 June 2018

The relationship between reinforcement and explicit control during visuomotor adaptation

Scientific Reports volume 8, Article number: 9121 (2018) Cite this article

3597 Accesses
36 Citations
1 Altmetric
Metrics details

Subjects

Abstract

The motor system’s ability to adapt to environmental changes is essential for maintaining accurate movements. Such adaptation recruits several distinct systems: cerebellar sensory-prediction error learning, success-based reinforcement, and explicit control. Although much work has focused on the relationship between cerebellar learning and explicit control, there is little research regarding how reinforcement and explicit control interact. To address this, participants first learnt a 20° visuomotor displacement. After reaching asymptotic performance, binary, hit-or-miss feedback (BF) was introduced either with or without visual feedback, the latter promoting reinforcement. Subsequently, retention was assessed using no-feedback trials, with half of the participants in each group being instructed to stop aiming off target. Although BF led to an increase in retention of the visuomotor displacement, instructing participants to stop re-aiming nullified this effect, suggesting explicit control is critical to BF-based reinforcement. In a second experiment, we prevented the expression or development of explicit control during BF performance, by either constraining participants to a short preparation time (expression) or by introducing the displacement gradually (development). Both manipulations strongly impaired BF performance, suggesting reinforcement requires both recruitment and expression of an explicit component. These results emphasise the pivotal role explicit control plays in reinforcement-based motor learning.

Adapting to visuomotor rotations in stepped increments increases implicit motor learning

Article Open access 28 March 2023

Shanaathanan Modchalingam, Marco Ciccone, … Denise Y. P. Henriques

Implicit adaptation compensates for erratic explicit strategy in human motor learning

Article 28 February 2020

Yohsuke R. Miyamoto, Shengxin Wang & Maurice A. Smith

Distinct learning, retention, and generalization patterns in de novo learning versus motor adaptation

Article Open access 17 April 2024

Raphael Q. Gastrock, Bernard Marius ’t Hart & Denise Y. P. Henriques

Introduction

In a constantly changing environment, our ability to adjust motor commands in response to novel perturbations is a critical feature for maintaining accurate performance¹. These adaptive processes have often been studied in the laboratory through the introduction of a visual displacement during reaching movements². The observed visuomotor adaptation, characterized by a reduction in performance errors, was believed to be primarily driven by a cerebellar-dependent process that gradually reduces the mismatch between the predicted and actual sensory outcome (sensory prediction error) of the reaching movement^1,3,4. Cerebellar adaptation is a stereotypical, slow and implicit process and therefore does not require the individual to be aware of the perturbation to take place^5,6. However, a single-process framework cannot account for the great variety of results observed during visuomotor adaptation tasks⁷. Specifically, it has recently been shown that several other non-cerebellar learning mechanisms also play a pivotal role in shaping behaviour during adaptation paradigms such as explicit control^8,9 and reward-based reinforcement^{10,11,12,13,14,15}.

Explicit control usually consists of employing simple heuristics such as aiming off target in the direction opposite to a visual displacement, to quickly and accurately account for it⁵. However, this requires explicit knowledge of the perturbation, which in turn usually requires experiencing large and unexpected errors^8,16,17,18. Explicit control contrasts with cerebellar adaptation in that it is idiosyncratic⁹, volitional, and can lead to fast adaptation rates¹⁹. Importantly, in this work, we consider explicit control as the contribution to performance that can be suppressed (or expressed) by participants upon request²⁰, as opposed to the additional requirement of being able to verbalise a strategy. Critically, cerebellar adaptation takes place regardless of the presence or absence of any explicit process, even at the cost of accurate performance⁵.

More recently, another putative mechanism contributing to motor adaptation has been proposed, through which the memory of actions that led to successful outcomes (hitting the target) is strengthened, and therefore more likely to be re-expressed^14,21. Such reinforcement is considered to be an implicit process, but distinct from cerebellar adaptation in that it is not driven by sensory prediction error but task success or failure^10,11. To examine this phenomenon, several studies employed a binary, hit-or-miss feedback (BF), paradigm which promotes reinforcement over cerebellar processes^11,12,22. For example, in one study, participants receiving only binary feedback following successful adaptation expressed stronger retention than participants who had received a combination of visual and binary feedback¹². The authors argued this could be due to greater involvement of reinforcement-based process that is less susceptible to forgetting¹².

With the multiple processes framework of motor adaptation, the question of interaction between the distinct systems becomes central to understanding the problem as a whole, and it remains an under-investigated question for reward-based reinforcement. In decision-making literature, it has long been suggested that two distinct “model-based” and “model-free” systems interact^23,24 and even require communication to be optimal^25,26. Interestingly, model-based processes share many characteristics with explicit control during motor adaptation, in that they are both more explicit, rely on an internal model of the world (explicit control^27,28; model-based decision-making²⁹), and are closely related to working memory capacity (explicit control^30,31; model-based decision-making^32,33) and pre-frontal cortex processes (explicit control³⁰; model-based decision-making^26,34). On the other hand, the concept of reinforcement in motor adaptation comes directly from the model-free systems described in decision-making literature²⁸, and is often labelled as such. It is considered more implicit, relies on immediate action-reward contingencies and is thought to recruit the basal ganglia in both cases (visuomotor adaptation²²; decision-making²³). Despite these interesting similarities, unlike model-based and model-free decision-making, the relationship between explicit control and reinforcement during visuomotor adaptation paradigms is currently unknown. Evidence of this relationship exists from a recent study which showed participants needed to experience a large reaching error in order to express a reinforcement-based memory¹⁸. In addition, there is a wealth of evidence which shows explicit control also requires experiencing large errors^16,17,27. Thus, it is possible that the formation of a reinforcement-based memory requires, or at least benefits, from some form of explicit control³⁵.

To address this possibility, we first examined the contribution of explicit control to the reinforcement-based improvements in retention following binary feedback^12,22. Secondly, we used a forced reaction time (forced RT) paradigm³⁶ to investigate the importance of being able to express explicit control when encountering binary (reinforcement-based) feedback.

Results

Experiment 1: Explicit control occurs during reinforcement-based retention

We first sought to investigate the role of explicit control in the retention of a reinforced visual displacement memory. In experiment 1, participants made fast ‘shooting’ movements towards a single target (Fig. 1a). After a baseline block involving veridical vision (60 trials) and an adaptation block (75 trials) where a 20° counter-clockwise (CCW) visuomotor displacement was learnt with online visual feedback (VF), participants experienced the same displacement for 2 blocks (asymptote blocks; 100 trials each) with either only binary feedback (BF group, Fig. 1b, top) to promote reinforcement, or BF and VF together (VF group, Fig. 1b, bottom). Following this, retention was assessed through 2 no-feedback blocks (100 trials each), during which both BF and VF were removed. Before these no-feedback blocks, half of the participants were told to “carry on” as they were (“Maintain” group) and the remaining ones were informed of the nature of the perturbation, and to stop re-aiming off target to account for it (“Remove” group). Thus, there were four groups: BF-Maintain, BF-Remove, VF-Maintain and VF-Remove (N = 20 for each group).

Group performance is shown in Fig. 2a. All groups showed similar baseline performance (Fig. 2b; H(3) = 4.59 p = 0.20; see Methods for detailed information on statistical analysis), and had fully adapted to the visuomotor displacement prior to the asymptote/reinforcement blocks (average reach angle in the last 20 trials of adaptation, Fig. 2c; H(3) = 2.56 p = 0.46). Interestingly, at the start of the first asymptote block, participants in both BF groups showed a dip in performance, effectively drifting back toward baseline before adjusting back and returning to plateau performance. This “dip effect” was completely absent in the VF groups, and has previously been observed independently of our study when switching to BF after a displacement is abruptly introduced¹². Therefore, success rate was compared independently across groups in the first 30 trials (Fig. 2d) and the remaining 170 trials (Fig. 2e) of the asymptote block. Both BF groups exhibited lower success rates than the VF groups in the early asymptote phase (H(3) = 46.79, p < 0.001, Tukey’s test p < 0.001 for BF-Maintain vs VF-Maintain and vs VF-Remove, and for BF-Remove vs VF-Maintain and vs VF-Remove). This was also seen in the late asymptote phase (H(3) = 31.29, p < 0.001, Tukey’s test p < 0.001 for BF-Maintain vs VF-Maintain and vs VF-Remove, and for BF-Remove vs VF-Maintain and vs VF-Remove), although performance greatly improved for both BF groups compared to the early phase (Z = 3.692 and Z = −3.81 for BF-Remove and BF-Maintain, respectively, p < 0.001 for both). Of note, both BF groups express a slight decrease in reach angle at the beginning of the second asymptote block, but removing this second dip does not qualitatively alter the result (H(3) = 27.46, p < 0.001, Tukey’s test p < 0.001 for BF-Maintain vs VF-Maintain and vs VF-Remove, and p < 0.01 for BF-Remove vs VF-Maintain and vs VF-Remove). Finally, no across-group difference in RTs or movement duration was found during the asymptote blocks (Supplementary fig. S1a,b).

Participants then performed a series of 2 no-feedback blocks. Similar to Shmuelof et al.¹², we assessed retention by looking at the last 20 trials of the second block. However, our results are fundamentally the same irrespective of the trials used to represent retention. Overall, the BF-Maintain group showed greater retention relative to all other groups, largely maintaining the reach angle values achieved during the asymptote phase, whereas there was no difference between the other groups (Fig. 2f; H(3) = 27.66, p < 0.001, Tukey’s test p = 0.001 for BF-Remove vs BF-Maintain and p < 0.001 for BF-Maintain vs both VF groups; p = 0.6 for BF-Remove vs VF-Remove; p = 1 for BF-Remove vs VF-Maintain; p = 0.68 for VF-Maintain vs VF-Remove). We therefore replicated previous work which showed that BF led to enhanced retention of a visual displacement when compared to VF¹². However, this effect of BF was abolished by asking participants to remove any re-aiming strategy they had developed (BF-remove). This suggests the increase in retention following BF was mainly a consequence of the greater development and expression of explicit control.

Experiment 2: Re-aiming is necessary for maintaining performance under binary feedback

If the conclusion from our first experiment is correct, then successful asymptote performance under BF only should be dependent on the ability to develop and express explicit control. Therefore, in experiment 2 we restricted participant’s capacity to recruit an explicit component by using a forced RT adaptation paradigm^36,37,38 (Fig. 1c, see methods for details). Specifically, two groups adapted to a 20° CCW visuomotor displacement by performing reaching movements to 4 targets (Fig. 1d), with the amount of available preparation time (i.e. time between target appearance and movement onset) being restricted. A first group was allowed to express slow RTs (SRT; RT constraints were 930 to 1100 ms after target onset; N = 10), while the second group was only allowed very fast RTs (FRT; 130 to 300 ms; N = 10; Fig. 1c and Supplementary fig. S2a). The latter condition has been shown to prevent time-demanding explicit processes such as mental rotations necessary to express re-aiming in reaching tasks^36,38,39. Critically, this paradigm prevented expression of re-aiming, but may not prevent development of an explicit component, at least reliably. Therefore, to ensure any between-group difference was task-dependent and not related to inter-individual differences in awareness or understanding of the task, we explained in detail the nature of the perturbation and the optimal policy to counter it. In addition, a third condition was designed in which participants were kept unaware of the visual displacement by introducing the perturbation gradually^16,18 (N = 10; Fig. 1d, bottom), and were not informed of any optimal policy to employ. Participants in this group were given no RT constraint whatsoever. Finally, it should be mentioned that a large portion of participants in the Gradual group reported noticing a slight perturbation by the end of the adaptation block when informally asked after the experiment. However, they underestimated its amplitude significantly at best, reporting effects of the order of 5°. Nevertheless, for the sake of simplicity we will qualify this group as “unaware”, although we acknowledge they reported very partial, reduced awareness of the perturbation.

Overall group performance is displayed in Fig. 3a. During baseline, average reach direction was similar for all groups (Fig. 3b; H(2) = 0.45, p = 0.79). To examine whether the FRT and SRT groups displayed different rates of learning during adaptation, we applied an exponential model to each participant’s adaptation data. Note, this was not done for the gradual group whose adaptation rate was restricted by the incremental visuomotor displacement. Surprisingly, we found no significant difference between the FRT and SRT group’s learning rates (U = 74; p = 0.34; Supplementary fig. S2b). Indeed, one would expect the SRT group to express faster learning since they can express strategies to account for the perturbation^19,36,38,40. This is most likely a consequence of the small size of the perturbation encountered (i.e. 20°), which leaves less margin for strategic re-aiming^20,40,41. At the end of the adaptation block, all groups adapted successfully, with no significant difference in reaching direction (Fig. 3c; H(2) = 2.34, p = 0.31). However, despite the lack of statistical significance, the mean reach direction for the FRT group was slightly under 15° (mean: 14.87°), which represents the limit of the reward region in the subsequent block. We discuss the implications of this later.

Participants then experienced an asymptote block with BF, similar to the first experiment, with the exception that hit-miss feedback was provided with a green tick and a red cross onscreen, because audio BF would potentially temporally align with movement initiation cues and confuse participants. Several other studies have already employed visual BF successfully^11,22,42. During asymptotic performance, where participants were restricted to binary feedback, the SRT group showed a striking ability to maintain performance within the rewarded region whereas the two other groups clearly could not (Fig. 3d; H(2) = 17.5, p < 0.001, Bonferroni-corrected (see Methods), Tukey’s test p < 0.001 vs FRT and p = 0.001 vs Gradual). Next we compared success rates across groups for early BF trials (Fig. 3e) and the remainder of BF trials (Fig. 3f) independently. Early success rates were significantly lower for the Gradual group compared to the SRT (H(2) = 9.2, p = 0.02, Bonferroni-corrected, Tukey’s test p = 0.011), and a similar but non-significant trend was observed between the FRT and SRT groups (Tukey’s test p = 0.059). The absence of a significant difference in early success rate between the FRT and SRT groups cannot be explained by average reach angles, as the FRT group actually express a larger decrease in reach angle during that timeframe compared to the Gradual group (Fig. 3a). Rather, the greater variability in reach angle within individuals in the FRT as opposed to the Gradual group is likely to cause this result (average individual variance; FRT: 47.5; Gradual: 18.9). However, success rate during the remaining trials reached significance for both the FRT and Gradual groups compared to the SRT group (H(2) = 16.67, p < 0.001, Bonferroni-corrected, Tukey’s test p < 0.001 for both FRT and Gradual). Surprisingly, no dip in performance was observed for the SRT group in the early phase of the BF blocks, suggesting that informing participants of the perturbation and how to overcome it at the beginning of the experiment is sufficient to prevent this drop in reach angle.

Next, to ensure the low end adaptation reach angles expressed by the FRT group did not explain the low success rates, we removed every participant who expressed less than 15° reach angle at the end of the adaptation from each group (e.g.⁴³). Henceforth, we refer to those participants as non-adapters, as opposed to adapters. This procedure resulted in 1, 5 and 2 participants being removed in the SRT, FRT and Gradual groups, respectively. Performance for the adapters was fundamentally the same as the original groups (Fig. 4a), except for end adaptation reach angles, which were now all above 15° (Fig. 4b; SRT 17.0 ± 1.2; FRT 16.9 ± 1.2; Gradual 16.7 ± 1.4). Specifically, the SRT-adapter group still showed a clear ability to remain in the rewarded region during binary feedback performance (asymptotic blocks), whereas the other two adapter groups could not (Fig. 4c; H(2) = 14.0, p = 0.002, Bonferroni-corrected, Tukey’s test p = 0.028 vs FRT-adapter and p = 0.001 vs Gradual-adapter). Because the full groups (i.e. non-Adapters included) did not express a drop in success rate during early asymptote trials, we compared Adapters’ success rates during asymptote as a whole, rather than splitting them between early and late performance. The SRT-adapter group still displayed greater success than the Gradual-adapter group (Fig. 4d; H(2) = 13.74, p = 0.002, Bonferroni-corrected, Tukey’s test p < 0.001). However, the difference between the SRT-adapter and the FRT-adapter group was now non-significant (Tukey’s test p = 0.12). Despite this, the reach angle differences clearly show that successful binary performance remained strongly affected by one’s capacity to develop and express explicit control even for the successful adapters, as shown by the Gradual-adapter and FRT-adapter groups, respectively (Fig. 4a).

Finally, since trials were reinitialised if participants failed to initiate reaching movements within the allowed timeframe, we compared the average occurrence of these failed trials between the FRT and SRT groups (Supplementary fig. S2c) to ensure any between-group difference cannot be explained by this. Both groups expressed similar amounts of failed attempts per trial (U = 100, p = 0.73). In addition, movement times were significantly faster across all blocks for the FRT group compared to the SRT group (Supplementary fig. S2d; H(2) = 11.78, p = 0.005, Tukey’s test p = 0.002), although they remained strictly under 400 ms for all groups as in the first experiment (Fig. 1c). This difference is to be expected due to the tendency to express faster velocities in movements with rapid initiation⁴⁴. RTs expressed by the Gradual group were between the SRT and FRT constraints (Supplementary fig. S2a; Gradual group RT range 385 to 1610 ms).

Overall these findings demonstrate that preventing explicit control by restricting its expression or making participants unaware of the nature of the task results in the partial incapacity of participants to perform successfully during binary feedback performance. It should be noted, however, that performance did not reduce back to baseline entirely, as participants in both the FRT and Gradual groups were still able to express intermediate reach angle values in the order of 10 to 15°.

Discussion

Previous work has led to the idea that BF induces the recruitment of a model-free reinforcement system that strengthens and consolidates the acquired memory of a visuomotor displacement^10,12,22. Here, we investigated the role of explicit control in the context of BF, and our results suggest that it may have a more central role in explaining some BF-induced behaviours than previously expected. In the first experiment, the increased retention observed in the BF-Maintain group was suppressed if participants were told to “stop aiming off target” (BF-Remove group). In the second experiment, preventing expression of explicit control by using a secondary task or preventing its development with a gradual introduction of the perturbation resulted in participants being unable to maintain accurate performance during BF blocks. This suggests an explicit component is necessary for performing a BF reaching task, at least within the present study’s experimental design.

The initial performance drop observed at the introduction of BF for both BF groups suggests that participants cannot immediately account for a visuomotor displacement they have already successfully adapted to¹². A possible explanation is that the cerebellar memory is not available anymore, most likely because removing VF results in a context change, which is known to prevent retrieval and expression of an otherwise available memory^45,46,47. Considering this, the restoration of performance observed after this dip could not be explained by recollection of the cerebellar memory, suggesting another mechanism took place. Two possible candidates to explain this drift back are model-free reinforcement^10,11,12,22 and explicit processes^7,8,41.

Reinforcement learning is usually considered to operate through experiencing success^10,11,48. It is thus difficult to argue for a reinforcement-based reversion to good performance during BF because participants in the trough of the dip did not experience a large amount of success (Supplementary fig. S3), if any. Furthermore, participants experienced little “plateau” performance during the previous block, making formation of a model-free reinforcement memory unlikely, because it is considered a rather slow learning process as opposed to model-based reinforcement^10,49; though the adaptation block remains longer compared to Shmuelof and colleagues¹². On the other hand, both BF groups experienced a large amount of unexpected errors during this drop, which may promote a more explicit approach^{16,17,18,27,35}. In line with this, the SRT group in the forced RT task, which had been informed of the displacement and of the right policy to counter it, did not express such a dip when starting the BF block.

The forced RT task addresses this question more directly, and shows that impeding explicit control with a secondary task^36,38 prevents participants from restoring performance over BF blocks, confirming our interpretation. Interestingly, both the FRT and Gradual groups did not show a return to baseline during asymptote. Likely, the FRT group was aware of the optimal policy, and could partially express it, leading to these intermediate reach angles. In line with this, previous work on forced RT paradigms shows that adapting the constraints based on each individual’s baseline proficiency at this task more efficiently prevents explicit control³⁸. Furthermore, even in the presence of BF, the Gradual group showed a striking inability to find the optimal policy, suggesting the lack of structural understanding of the task strongly impeded their exploration^35,48. This overall incapacity of the Gradual group to express an efficient explorative approach is consistent with previous findings showing that rewarding success alone, without providing any explanation of the task structure, is not sufficient to make participants reliably learn an optimal policy^48,50.

Previous studies employing the forced RT paradigm have shown it usually leads to slower learning rates during adaptation because participants can less easily employ explicit control from the beginning^19,36,38. In contrast, no such difference in learning rate was observed in our forced RT groups. This is possibly due to the difference in size of the perturbation between our study (20°) compared to others^36,38 (30°), making the explicit contribution potentially smaller during the adaptation phase⁷.

Our findings qualitatively replicate results from a previous study employing a similar design¹². However, it should be noted that our paradigm differs in several ways. First, retention was assessed using feedback removal rather than visual error clamps, although there is evidence that both methods lead to quantitatively similar results⁵¹. Second, our displacement was only 20° of amplitude and no additional displacement was introduced after the asymptote blocks. There is now a growing wealth of evidence that the cerebellum cannot account for more than 15 to 20° displacements^20,38,52, with the remaining discrepancy usually being accounted for through explicit re-aiming⁴¹. Therefore, the absence of a second, larger displacement, if anything, should only result in a less explicit performance. Nevertheless, instructing participants to remove any explicit re-aiming policy (Remove groups) resulted in a near-complete nullification of the binary feedback effect, suggesting it is mainly underlain by a simple re-aiming process. However, the Maintain instruction alone was not sufficient to produce this high retention profile, as the VF-Maintain group did not express it. We believe this can be explained in two ways. First, experiencing no feedback may result in a stronger context change for the VF groups compared to the BF groups, because the latter experienced the absence of VF during the asymptote blocks beforehand. Thus, this should lead to a stronger drop in reaching angle at the beginning of the no feedback trials for the VF groups, as observed here. Alternatively, the VF-Maintain group experienced 200 more trials with visual feedback at asymptote. Consequently, it is very likely that the cerebellar memory at the beginning of the no-feedback blocks was stronger¹¹, and the explicit contribution was less for this group compared to the BF-Maintain group^7,19,41,53. This would therefore result in the slow drop in reach angle observed during early no-feedback trials due to gradual decay of the cerebellar memory^45,51,54. Critically, both possibilities are not incompatible, and may well occur together.

A notable feature of retention performance is that both BF- and VF-Remove groups show a residual bias of around 5° in their reach angle in the direction opposite to the displacement. Participants in the Remove conditions were not aware of this upon asking them after the experiment. This has been reliably observed in studies using no-feedback blocks to assess retention^21,55 (but see⁵¹). Possible explanations include use-dependent plasticity-induced bias^56,57, perceptual bias⁵⁸ or an implicit model-free reinforcement-based memory, although this study cannot provide any account toward one or the other. Note however that although the BF-Remove group expressed slightly more bias than its VF counterpart, this clearly did not reach statistical significance, meaning this cannot be explained by feedback type alone. Regardless, the implicit and lasting nature of this phenomenon makes it a promising focus for future research with clinical applications^13,15.

Overall, our findings point towards a central role of explicit control during BF-induced behaviours in this study. In line with this, 14/54 participants had to be removed from the BF groups in the feedback-instruction task (experiment 1) because of poor performance in the asymptote blocks (see methods), suggesting that structural learning was required to perform accurately^35,48,50. Though this is a significant proportion of participants, it should be noted that other studies using BF-based reaching also found a similar percentage of “learners” and “non-learners”^42,43. Although not expected in our study, this seemingly consistent outcome across a variety of BF experimental designs raises questions regarding either the reliability of this learning mechanism across individuals or the tasks used to examine it. The possibility that this dichotomy between participants is due to structural learning is in line with the dip observed in the BF groups and the absence of dip in the (i.e. informed) SRT group. If correct, then predictors of structural learning capacity should also predict an individual’s ability to learn a visuomotor displacement under BF, a hypothesis that will be tested in future studies. Finally, our view is that implicit, model-free reinforcement takes a great amount of time and practice to form^49,59, and usually arises from initially model-based performance in behavioural literature^23,60, as illustrated by popular reinforcement models (e.g. DYNA^61,62). Two interesting possibilities are that 200 trials of BF alone are not sufficient to result in a strong, habit-like enhancement of retention⁶⁰, or that such behavioural consolidation must take place through sleep^60,63. Future work is required to address these hypotheses.

In conclusion, this study provides further insight into the use of reinforcement during motor learning, and suggests that successful reinforcement is tightly coupled to the development and expression of explicit control. We suggest that explicit control bears many similarities with model-based reinforcement, thus creating important questions regarding the link between model-based and model-free reinforcement systems during motor learning. At the very least, future studies investigating reinforcement during visuomotor adaptation should proceed with care in order to map which behaviour is the consequence of implicitly reinforced memories or explicit control.

Methods

Participants

80 participants (20 males) aged 18–37 (M = 20.9 years) and 30 participants (11 males) aged 18–34 (M = 22.1 years) were recruited for experiment one and two, respectively, and pseudo-randomly assigned to a group after providing written informed consent. All participants were enrolled at the University of Birmingham. They were remunerated either with course credits or money (£7.5/hour). They were free of psychological, cognitive, motor or auditory impairment and were right-handed. The study was approved by and done in accordance with the local research ethics committee of the University of Birmingham.

General procedure

Participants were seated before a horizontal mirror reflecting a screen above (refresh rate 60 Hz) that displayed the workspace and their hand position (Fig. 1a), represented by a green cursor (diameter 0.3 cm). Hand position was tracked by a sensor taped on the right hand index of each participant and connected to a Polhemus 3SPACE Fastrak tracking device (Colchester, Vermont U.S.A.; sampling rate 120 Hz). Programs were run under MatLab (The Mathworks, Natwick, MA), with Psychophysics Toolbox 3⁶⁴. Participants performed the reaching task on a flat surface under the mirror, with the reflection of the screen matching the surface plane. All movements were hidden from the participant’s sight. When each trial started, participants entered a white starting box (1 cm width) on the centre of the workspace with the cursor, which triggered target appearance. Targets (diameter 0.5 cm) were 8 cm away from the starting position. Henceforth, the target position directly in front of the participant will be defined as the 0° position and other target positions will be expressed with this reference. Participants were instructed to perform a fast “swiping” movement through the target. Once they reached 8 cm away from the starting box, the cursor disappeared and a yellow dot (diameter 0.3 cm) indicated their end position. When returning to the starting box, a white circle displaying their radial distance appeared to guide them back.

Task design

Experiment 1: Feedback-instruction

For each trial, participants reached to a target located 45° counter-clock wise (CCW). Participants first performed a baseline block (60 trials) with veridical cursor feedback, followed by a 75 trials adaptation block in which a 20° CCW displacement was applied (Fig. 1b). In the following 2 blocks (100 trials each), participants either experienced the same perturbation with only BF, or with BF and VF. BF consisted of a pleasant sound selected based on each participant’s preference from a series of 26 sounds before the task, unbeknownst of the final purpose. When participants’ cursor reached less than 5° away from the centre of the target, the sound was played, indicating a hit; otherwise no sound was played, indicating a miss. For the BF group, no cursor feedback was provided, except for one “refresher” trial every 10 trials where VF was present. Participants in the VF group could see the cursor position during the outbound reach of the trial, along with the BF. Finally, participants went through 2 no-feedback blocks (100 trials each) with BF and VF completely removed. Before those blocks, participants were either told to “carry on” (“Maintain” group) or informed of the nature of the perturbation, and asked to stop using any explicit approach to account for it (“Remove” group). Therefore, we had four groups in a 2 × 2 factorial design (BF versus VF and Maintain versus Remove). Finally, if a trial’s reaching movement duration was greater than 400 ms or less than 100 ms long, the starting box turned red or green, respectively, to ensure participants performed ballistic movements, and didn’t make anticipatory movements. Participants who expressed a success rate inferior to 40% during asymptote blocks were excluded (BF-Remove N = 6; BF-Maintain N = 8). Although this exclusion rate was high, it was crucial to exclude participants who were unable to maintain asymptote performance in order to reliably measure retention.

Experiment 2: Forced RT

In this experiment, participants were forced to perform the same reaching task at slow (SRT) or fast reaction times (FRT), the latter condition preventing explicit re-aiming by enforcing movement initiation before any mental rotation can be applied to the motor command³⁶^,³⁹. A third group (Gradual) also performed the task with no RT constraints.

In the SRT/FRT groups, for each trial, entering the starting box with the cursor triggered a series of five 100 ms long pure tones (1 kHz) every 500 ms (Fig. 1c). Before the fifth tone, a target appeared at one of four possible locations equally dispatched across a span of 360° (0–90–180–270°). Participants were instructed to initiate their movement exactly on the fifth tone (Fig. 1c). Targets appeared 1000 ms (SRT) or 200 ms (FRT) before the beginning of the fifth tone. Movement initiations shorter than 130 ms are likely anticipatory movements³⁷, and explicit control starts to be difficult to express under 300 ms^36,38. Therefore, in both conditions, movements were successful if participants exited the starting box between 70 ms before the start of the fifth tone and the end of the fifth tone, that is, from 130 ms to 300 ms after target appearance in the FRT condition. If movements were initiated too early or too late, a message “too fast” or “too slow” was displayed and the cursor did not appear upon exiting the starting box. The trial was then reinitialised and a new target selected. Finally, if participants repeatedly missed movement initiation, making trial duration over 25 seconds, RT constraints were removed, to allow trial completion before cerebellar memory time-dependent decay^51,54,65. Participants in the SRT and FRT groups were informed of the displacement and of the optimal policy to counter it, to ensure that any effect was related to expression, rather than development of explicit control. They were also instructed to attempt using the optimal policy as much as possible when sensible, but not at the expense of the secondary RT task, so as to preserve the pace of the experiment and prevent time-dependent memory decay.

To attain proficiency in the RT task, SRT and FRT participants performed a training block (pseudo-random order of VF and BF trials) of at least 96 trials, or until they could initiate movements on the fifth tone reliably (at the first attempt) at least for 75% of the previous 8 trials. All participants achieved this in 96 to 157 trials. Once this was achieved, participants first performed a 40 trials baseline (Fig. 1d), followed by introduction of a 20° CCW displacement for 260 trials. Participants then underwent a 200-trials asymptote block with only BF (1 “refresher” trial every 10 trials). The BF consisted of a green tick or a red cross if participants hit or missed the target, respectively. Visual (instead of audio) BF was used to avoid BF sounds from lining up with the tones, which could potentially confuse participants. The Gradual group underwent the same schedule, except that no tone or RT constraint were used, and the perturbation was introduced gradually from the 41^st to the 240^th trial (increment of 0.4°/trial) occurring independently for each target. This ensured participants experienced as few large errors as possible to prevent awareness of the perturbation and therefore explicit control. After the experiment, participants in the Gradual group were informed of the displacement, and subsequently asked if they noticed it. If they answered positively, they were asked to estimate the size of the displacement.

Data analysis

All data and analysis code is available on our open science framework page (osf.io/hrgzq). All analyses were performed in MatLab. We used Lilliefors test to assess whether data were parametric, and we compared groups using Kruskal-Wallis or Wilcoxon signed-rank tests when appropriate, as most data were non-parametric. Post-hoc tests were done using Tukey’s procedure. As we analysed the data from experiment two twice (Fig. 3 and 4), success rates and reach angles during asymptote were Bonferroni-corrected with corrected pvalues (multiplied by 2).

Learning rates were obtained by fitting an exponential function to adaptation block reach angle curves with a non-linear least-square method and maximum 1000 iterations (average R² = 0.86 ± 0.14 for feedback-instruction task and R² = 0.58 ± 0.26 for forced-RT task):

$$y=a.\,\,{e}^{\beta x}+b$$

where y is the hand direction for trial x, a is a scaling factor, b is the starting value and β is the learning rate. Reach angles were defined as angular error to target of the real hand position at the end of a movement. Trials were considered outliers and removed if movement duration was over 400 ms or less than 100 ms, end point reach angle was over 40° off target, and for the SRT and FRT groups in the forced-RT task, if failed initiation attempts continued for more than 25 sec. In total, outliers accounted for 3755 trials (8%) in the feedback-instruction task and 1013 trials (6%) in the forced-RT task.

Even though 4 targets were used during the forced-RT task, trials were reset and a new random target was selected when participants failed to initiate movements on the 5^th tone. Therefore, all possible target positions would not be represented for each epoch, and epochs were consequently not used.

References

Tseng, Y.-w., Diedrichsen, J., Krakauer, J. W., Shadmehr, R. & Bastian, A. J. Sensory Prediction Errors Drive Cerebellum-Dependent Adaptation of Reaching. J. Neurophysiol. 98, 54–62 (2007).
Article PubMed Google Scholar
Krakauer, J. W. Motor Learning and Consolidation: The Case of Visuomotor Rotation. In Progress in Motor Control (ed. Sternad, D.) 629, 405–421 (Springer US, 2009).
Wolpert, D. M. & Miall, R. C. Forward Models for Physiological Motor Control. Neural Netw. Off. J. Int. Neural Netw. Soc. 9, 1265–1279 (1996).
Article MATH Google Scholar
Wolpert, D. M., Miall, R. C. & Kawato, M. Internal models in the cerebellum. Trends Cogn. Sci. 2, 338–347 (1998).
Article PubMed CAS Google Scholar
Mazzoni, P. & Krakauer, J. W. An Implicit Plan Overrides an Explicit Strategy during Visuomotor Adaptation. J. Neurosci. 26, 3642–3645 (2006).
Article PubMed CAS Google Scholar
Shadmehr, R. & Krakauer, J. W. A computational neuroanatomy for motor control. Exp. Brain Res. 185, 359–381 (2008).
Article PubMed PubMed Central Google Scholar
Taylor, J. A., Krakauer, J. W. & Ivry, R. B. Explicit and Implicit Contributions to Learning in a Sensorimotor Adaptation Task. J. Neurosci. 34, 3023–3032 (2014).
Article PubMed PubMed Central CAS Google Scholar
Taylor, J. A. & Ivry, R. B. Flexible Cognitive Strategies during Motor Learning. PLoS Comput. Biol. 7, e1001096 (2011).
Article PubMed PubMed Central ADS CAS Google Scholar
Taylor, J. A. & Ivry, R. B. Cerebellar and Prefrontal Cortex Contributions to Adaptation, Strategies, and Reinforcement Learning. In Progress in Brain Research 210, 217–253 (Elsevier, 2014).
Huang, V. S., Haith, A., Mazzoni, P. & Krakauer, J. W. Rethinking Motor Learning and Savings in Adaptation Paradigms: Model-Free Memory for Successful Actions Combines with Internal Models. Neuron 70, 787–801 (2011).
Article PubMed PubMed Central CAS Google Scholar
Izawa, J. & Shadmehr, R. Learning from Sensory and Reward Prediction Errors during Motor Adaptation. PLoS Comput. Biol. 7, e1002012 (2011).
Article PubMed PubMed Central ADS CAS Google Scholar
Shmuelof, L. et al. Overcoming Motor ‘Forgetting’ Through Reinforcement Of Learned Actions. J. Neurosci. 32, 14617–14621a (2012).
Article PubMed PubMed Central CAS Google Scholar
Quattrocchi, G., Greenwood, R., Rothwell, J. C., Galea, J. M. & Bestmann, S. Reward and punishment enhance motor adaptation in stroke. J. Neurol. Neurosurg. Psychiatry 88, 730–736 (2017).
Article PubMed Google Scholar
Kojima, Y. & Soetedjo, R. Selective reward affects the rate of saccade adaptation. Neuroscience 355, 113–125 (2017).
Article PubMed CAS PubMed Central Google Scholar
Goodman, R. N. et al. Increased reward in ankle robotics training enhances motor control and cortical efficiency in stroke. J. Rehabil. Res. Dev. 51, 213–228 (2014).
Article PubMed Google Scholar
Leow, L.-A., de Rugy, A., Marinovic, W., Riek, S. & Carroll, T. J. Savings for visuomotor adaptation require prior history of error, not prior repetition of successful actions. J. Neurophysiol. 116, 1603–1614 (2016).
Article PubMed PubMed Central Google Scholar
Malfait, N. Is Interlimb Transfer of Force-Field Adaptation a Cognitive Response to the Sudden Introduction of Load? J. Neurosci. 24, 8084–8089 (2004).
Article PubMed CAS Google Scholar
Orban de Xivry, J.-J. & Lefèvre, P. Formation of model-free motor memories during motor adaptation depends on perturbation schedule. J. Neurophysiol. 113, 2733–2741 (2015).
Article PubMed PubMed Central Google Scholar
Huberdeau, D. M., Krakauer, J. W. & Haith, A. M. Dual-process decomposition in human sensorimotor adaptation. Curr. Opin. Neurobiol. 33, 71–77 (2015).
Article PubMed CAS Google Scholar
Werner, S. et al. Awareness of Sensorimotor Adaptation to Visual Rotations of Different Size. PLOS ONE 10, e0123321 (2015).
Article PubMed PubMed Central CAS Google Scholar
Galea, J. M., Mallia, E., Rothwell, J. & Diedrichsen, J. The dissociable effects of punishment and reward on motor learning. Nat. Neurosci. 18, 597–602 (2015).
Article PubMed CAS Google Scholar
Therrien, A. S., Wolpert, D. M. & Bastian, A. J. Effective reinforcement learning following cerebellar damage requires a balance between exploration and motor noise. Brain 139, 101–114 (2016).
Article PubMed Google Scholar
Daw, N. D., Gershman, S. J., Seymour, B., Dayan, P. & Dolan, R. J. Model-Based Influences on Humans’ Choices and Striatal Prediction Errors. Neuron 69, 1204–1215 (2011).
Article PubMed PubMed Central CAS Google Scholar
Sun, R., Slusarz, P. & Terry, C. The Interaction of the Explicit and the Implicit in Skill Learning: A Dual-Process Approach. Psychol. Rev. 112, 159–192 (2005).
Article PubMed Google Scholar
Huys, Q. J. M. et al. Bonsai Trees in Your Head: How the Pavlovian System Sculpts Goal-Directed Choices by Pruning Decision Trees. PLoS Comput. Biol. 8, e1002410 (2012).
Article MathSciNet PubMed PubMed Central CAS Google Scholar
Gläscher, J., Daw, N., Dayan, P. & O’Doherty, J. P. States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning. Neuron 66, 585–595 (2010).
Article PubMed PubMed Central CAS Google Scholar
Hwang, E. J., Smith, M. A. & Shadmehr, R. Dissociable effects of the implicit and explicit memory systems on learning control of reaching. Exp. Brain Res. 173, 425–437 (2006).
Article PubMed PubMed Central Google Scholar
Haith, A. M. & Krakauer, J. W. Model-Based and Model-Free Mechanisms of Human Motor Learning. In Progress in Motor Control (eds Richardson, M. J., Riley, M. A. & Shockley, K.) 782, 1–21 (Springer New York, 2013).
Daw, N. D., Niv, Y. & Dayan, P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8, 1704–1711 (2005).
Article PubMed CAS Google Scholar
Anguera, J. A., Reuter-Lorenz, P. A., Willingham, D. T. & Seidler, R. D. Contributions of spatial working memory to visuomotor learning. J. Cogn. Neurosci. 22, 1917–1930 (2010).
Article PubMed Google Scholar
Christou, A. I., Miall, R. C., McNab, F. & Galea, J. M. Individual differences in explicit and implicit visuomotor learning and working memory capacity. Sci. Rep. 6, (2016).
Otto, A. R., Gershman, S. J., Markman, A. B. & Daw, N. D. The curse of planning: dissecting multiple reinforcement-learning systems by taxing the central executive. Psychol. Sci. 24, 751–761 (2013).
Article PubMed Google Scholar
Otto, A. R., Skatova, A., Madlon-Kay, S. & Daw, N. D. Cognitive Control Predicts Use of Model-based Reinforcement Learning. J. Cogn. Neurosci. 27, 319–333 (2015).
Article PubMed PubMed Central Google Scholar
Simon, D. A. & Daw, N. D. Neural Correlates of Forward Planning in a Spatial Decision Task in Humans. J. Neurosci. 31, 5526–5539 (2011).
Article PubMed PubMed Central CAS Google Scholar
Chen, X., Mohr, K. & Galea, J. M. Predicting explorative motor learning using decision-making and motor noise. PLOS Comput. Biol. 13, e1005503 (2017).
Article PubMed PubMed Central ADS CAS Google Scholar
Haith, A. M., Huberdeau, D. M. & Krakauer, J. W. The Influence of Movement Preparation Time on the Expression of Visuomotor Learning and Savings. J. Neurosci. 35, 5109–5117 (2015).
Article PubMed CAS Google Scholar
Haith, A. M., Pakpoor, J. & Krakauer, J. W. Independence of Movement Preparation and Movement Initiation. J. Neurosci. 36, 3007–3015 (2016).
Article PubMed CAS Google Scholar
Leow, L.-A., Gunn, R., Marinovic, W. & Carroll, T. J. Estimating the implicit component of visuomotor rotation learning by constraining movement preparation time. J. Neurophysiol. jn.00834, 2016, https://doi.org/10.1152/jn.00834.2016 (2017).
Article Google Scholar
Fernandez-Ruiz, J., Wong, W., Armstrong, I. T. & Flanagan, J. R. Relation between reaction time and reach errors during visuomotor adaptation. Behav. Brain Res. 219, 8–14 (2011).
Article PubMed Google Scholar
Morehead, J. R., Qasim, S. E., Crossley, M. J. & Ivry, R. Savings upon Re-Aiming in Visuomotor Adaptation. J. Neurosci. 35, 14386–14396 (2015).
Article PubMed PubMed Central CAS Google Scholar
Bond, K. M. & Taylor, J. A. Flexible explicit but rigid implicit learning in a visuomotor adaptation task. J. Neurophysiol. 113, 3836–3849 (2015).
Article PubMed PubMed Central Google Scholar
Holland, P. J., Codol, O. & Galea, J. M. The contribution of explicit processes to reinforcement-based motor learning. J. Neurophysiol. https://doi.org/10.1152/jn.00901.2017 (2018).
Saijo, N. & Gomi, H. Multiple Motor Learning Strategies in Visuomotor Rotation. PLoS ONE 5, e9399 (2010).
Article PubMed PubMed Central ADS CAS Google Scholar
Orban de Xivry, J.-J., Legrain, V. & Lefèvre, P. Overlap of movement planning and movement execution reduces reaction time. J. Neurophysiol. 117, 117–122 (2017).
Article PubMed Google Scholar
Brennan, A. E. & Smith, M. A. The Decay of Motor Memories Is Independent of Context Change Detection. PLOS Comput. Biol. 11, e1004278 (2015).
Article PubMed PubMed Central ADS CAS Google Scholar
Pekny, S. E., Criscimagna-Hemminger, S. E. & Shadmehr, R. Protection and Expression of Human Motor Memories. J. Neurosci. 31, 13829–13839 (2011).
Article PubMed PubMed Central CAS Google Scholar
Smith, M. A., Ghazizadeh, A. & Shadmehr, R. Interacting Adaptive Processes with Different Timescales Underlie Short-Term Motor Learning. PLoS Biol. 4, e179 (2006).
Article PubMed PubMed Central CAS Google Scholar
Chen, X., Holland, P. & Galea, J. M. The effects of reward and punishment on motor skill learning. Curr. Opin. Behav. Sci. 20, 83–88 (2018).
Article Google Scholar
Sutton, R. S. & Barto, A. Reinforcement Learning: An Introduction. (A Bradford Book, 1998).
Manley, H., Dayan, P. & Diedrichsen, J. When Money Is Not Enough: Awareness, Success, and Variability in Motor Learning. PLoS ONE 9, e86580 (2014).
Article PubMed PubMed Central ADS CAS Google Scholar
Kitago, T., Ryan, S. L., Mazzoni, P., Krakauer, J. W. & Haith, A. M. Unlearning versus savings in visuomotor adaptation: comparing effects of washout, passage of time, and removal of errors on motor memory. Front. Hum. Neurosci. 7 (2013).
Morehead, J. R., Taylor, J. A., Parvin, D. & Ivry, R. B. Characteristics of Implicit Sensorimotor Adaptation Revealed by Task-irrelevant ClampedFeedback. J. Cogn. Neurosci. 1–14, https://doi.org/10.1162/jocn_a_01108 (2017).
McDougle, S. D., Bond, K. M. & Taylor, J. A. Explicit and Implicit Processes Constitute the Fast and Slow Processes of Sensorimotor Learning. J. Neurosci. 35, 9568–9579 (2015).
Article PubMed PubMed Central CAS Google Scholar
Yang, Y. & Lisberger, S. G. Role of Plasticity at Different Sites across the Time Course of Cerebellar Motor Learning. J. Neurosci. 34, 7077–7090 (2014).
Article PubMed PubMed Central CAS Google Scholar
Galea, J. M., Vazquez, A., Pasricha, N., Orban de Xivry, J.-J. & Celnik, P. Dissociating the Roles of the Cerebellum and Motor Cortex during Adaptive Learning: The Motor Cortex Retains What the Cerebellum Learns. Cereb. Cortex 21, 1761–1770 (2011).
Article PubMed Google Scholar
Bütefisch, C. M. et al. Mechanisms of use-dependent plasticity in the human motor cortex. Proc. Natl. Acad. Sci. 97, 3661–3665 (2000).
Article PubMed PubMed Central ADS Google Scholar
Classen, J., Liepert, J., Wise, S. P., Hallett, M. & Cohen, L. G. Rapid plasticity of human cortical movement representation induced by practice. J. Neurophysiol. 79, 1117–1123 (1998).
Article PubMed CAS Google Scholar
Vindras, P., Desmurget, M., Prablanc, C. & Viviani, P. Pointing Errors Reflect Biases in the Perception of the InitialHand Position. J. Neurophysiol. 79, 3290–3294 (1998).
Article PubMed CAS Google Scholar
Wunderlich, K., Dayan, P. & Dolan, R. J. Mapping value based planning and extensively trained choice in the human brain. Nat. Neurosci. 15, 786–791 (2012).
Article PubMed PubMed Central CAS Google Scholar
Tricomi, E., Balleine, B. W. & O’Doherty, J. P. A specific role for posterior dorsolateral striatum in human habit learning. Eur. J. Neurosci. 29, 2225–2232 (2009).
Article PubMed PubMed Central Google Scholar
Sutton, R. S., Szepesvári, C., Geramifard, A. & Bowling, M. Dyna-style planning with linear function approximation and prioritized sweeping. In Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence (2008).
Sutton, R. S. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming. In In Proceedings of the Seventh International Conference on Machine Learning, 216–224 (Morgan Kaufmann, 1990).
Reis, J. et al. Noninvasive cortical stimulation enhances motor skill acquisition over multiple days through an effect on consolidation. Proc. Natl. Acad. Sci. 106, 1590–1595 (2009).
Article PubMed PubMed Central ADS Google Scholar
Brainard, D. H. The Psychophysics Toolbox. Spat. Vis. 10, 433–436 (1997).
Article PubMed CAS Google Scholar
Kim, S., Oh, Y. & Schweighofer, N. Between-Trial Forgetting Due to Interference and Time in Motor Adaptation. PLOS ONE 10, e0142963 (2015).
Article PubMed PubMed Central CAS Google Scholar

Download references

Acknowledgements

We thank Raphael Schween for helpful discussions on interpretation of the data. This work was supported by the European Research Council grant MotMotLearn 637488.

Author information

Authors and Affiliations

School of Psychology, University of Birmingham, Birmingham, UK
Olivier Codol, Peter J. Holland & Joseph M. Galea

Authors

Olivier Codol
View author publications
You can also search for this author in PubMed Google Scholar
Peter J. Holland
View author publications
You can also search for this author in PubMed Google Scholar
Joseph M. Galea
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

O.C., P.J.H. and J.M.G. designed the experiments, O.C. implemented and ran the experiments, O.C. and P.J.H. analysed the data, O.C., P.J.H. and J.M.G. interpreted the results, O.C. wrote the paper, O.C., P.J.H. and J.M.G. approved the final version of the manuscript.

Corresponding author

Correspondence to Olivier Codol.

Ethics declarations

Competing Interests

The authors declare no competing interests.

Additional information

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Supplementary figures

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Codol, O., Holland, P.J. & Galea, J.M. The relationship between reinforcement and explicit control during visuomotor adaptation. Sci Rep 8, 9121 (2018). https://doi.org/10.1038/s41598-018-27378-1

Download citation

Received: 24 October 2017
Accepted: 01 June 2018
Published: 14 June 2018
DOI: https://doi.org/10.1038/s41598-018-27378-1

This article is cited by

Implicit reward-based motor learning
- Nina M. van Mastrigt
- Jonathan S. Tsay
- Richard B. Ivry
Experimental Brain Research (2023)
Reinforcement Motor Learning After Cerebellar Damage Is Related to State Estimation
- Christopher M. White
- Evan C. Snow
- Amanda S. Therrien
The Cerebellum (2023)
Learning a reach trajectory based on binary reward feedback
- Katinka van der Kooij
- Nina M. van Mastrigt
- Jeroen B. J. Smeets
Scientific Reports (2021)
Pitfalls in quantifying exploration in reward-based motor learning and how to avoid them
- Nina M. van Mastrigt
- Katinka van der Kooij
- Jeroen B. J. Smeets
Biological Cybernetics (2021)
Reinforcement Signaling Can Be Used to Reduce Elements of Cerebellar Reaching Ataxia
- Amanda S. Therrien
- Matthew A. Statton
- Amy J. Bastian
The Cerebellum (2021)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.