Introduction

In a constantly changing environment, our ability to adjust motor commands in response to novel perturbations is a critical feature for maintaining accurate performance1. These adaptive processes have often been studied in the laboratory through the introduction of a visual displacement during reaching movements2. The observed visuomotor adaptation, characterized by a reduction in performance errors, was believed to be primarily driven by a cerebellar-dependent process that gradually reduces the mismatch between the predicted and actual sensory outcome (sensory prediction error) of the reaching movement1,3,4. Cerebellar adaptation is a stereotypical, slow and implicit process and therefore does not require the individual to be aware of the perturbation to take place5,6. However, a single-process framework cannot account for the great variety of results observed during visuomotor adaptation tasks7. Specifically, it has recently been shown that several other non-cerebellar learning mechanisms also play a pivotal role in shaping behaviour during adaptation paradigms such as explicit control8,9 and reward-based reinforcement10,11,12,13,14,15.

Explicit control usually consists of employing simple heuristics such as aiming off target in the direction opposite to a visual displacement, to quickly and accurately account for it5. However, this requires explicit knowledge of the perturbation, which in turn usually requires experiencing large and unexpected errors8,16,17,18. Explicit control contrasts with cerebellar adaptation in that it is idiosyncratic9, volitional, and can lead to fast adaptation rates19. Importantly, in this work, we consider explicit control as the contribution to performance that can be suppressed (or expressed) by participants upon request20, as opposed to the additional requirement of being able to verbalise a strategy. Critically, cerebellar adaptation takes place regardless of the presence or absence of any explicit process, even at the cost of accurate performance5.

More recently, another putative mechanism contributing to motor adaptation has been proposed, through which the memory of actions that led to successful outcomes (hitting the target) is strengthened, and therefore more likely to be re-expressed14,21. Such reinforcement is considered to be an implicit process, but distinct from cerebellar adaptation in that it is not driven by sensory prediction error but task success or failure10,11. To examine this phenomenon, several studies employed a binary, hit-or-miss feedback (BF), paradigm which promotes reinforcement over cerebellar processes11,12,22. For example, in one study, participants receiving only binary feedback following successful adaptation expressed stronger retention than participants who had received a combination of visual and binary feedback12. The authors argued this could be due to greater involvement of reinforcement-based process that is less susceptible to forgetting12.

With the multiple processes framework of motor adaptation, the question of interaction between the distinct systems becomes central to understanding the problem as a whole, and it remains an under-investigated question for reward-based reinforcement. In decision-making literature, it has long been suggested that two distinct “model-based” and “model-free” systems interact23,24 and even require communication to be optimal25,26. Interestingly, model-based processes share many characteristics with explicit control during motor adaptation, in that they are both more explicit, rely on an internal model of the world (explicit control27,28; model-based decision-making29), and are closely related to working memory capacity (explicit control30,31; model-based decision-making32,33) and pre-frontal cortex processes (explicit control30; model-based decision-making26,34). On the other hand, the concept of reinforcement in motor adaptation comes directly from the model-free systems described in decision-making literature28, and is often labelled as such. It is considered more implicit, relies on immediate action-reward contingencies and is thought to recruit the basal ganglia in both cases (visuomotor adaptation22; decision-making23). Despite these interesting similarities, unlike model-based and model-free decision-making, the relationship between explicit control and reinforcement during visuomotor adaptation paradigms is currently unknown. Evidence of this relationship exists from a recent study which showed participants needed to experience a large reaching error in order to express a reinforcement-based memory18. In addition, there is a wealth of evidence which shows explicit control also requires experiencing large errors16,17,27. Thus, it is possible that the formation of a reinforcement-based memory requires, or at least benefits, from some form of explicit control35.

To address this possibility, we first examined the contribution of explicit control to the reinforcement-based improvements in retention following binary feedback12,22. Secondly, we used a forced reaction time (forced RT) paradigm36 to investigate the importance of being able to express explicit control when encountering binary (reinforcement-based) feedback.

Results

Experiment 1: Explicit control occurs during reinforcement-based retention

We first sought to investigate the role of explicit control in the retention of a reinforced visual displacement memory. In experiment 1, participants made fast ‘shooting’ movements towards a single target (Fig. 1a). After a baseline block involving veridical vision (60 trials) and an adaptation block (75 trials) where a 20° counter-clockwise (CCW) visuomotor displacement was learnt with online visual feedback (VF), participants experienced the same displacement for 2 blocks (asymptote blocks; 100 trials each) with either only binary feedback (BF group, Fig. 1b, top) to promote reinforcement, or BF and VF together (VF group, Fig. 1b, bottom). Following this, retention was assessed through 2 no-feedback blocks (100 trials each), during which both BF and VF were removed. Before these no-feedback blocks, half of the participants were told to “carry on” as they were (“Maintain” group) and the remaining ones were informed of the nature of the perturbation, and to stop re-aiming off target to account for it (“Remove” group). Thus, there were four groups: BF-Maintain, BF-Remove, VF-Maintain and VF-Remove (N = 20 for each group).

Figure 1
figure 1

Experimental design. (a) Experiment 1: feedback-instruction. Screen display and hand-cursor coupling before and after introduction of the visuomotor displacement (right and left, respectively). The rightmost part shows a small cartoon representation of the experimental setup. (b) Feedback-instruction task perturbation and feedback schedule for the BF groups (top) and VF groups (bottom). The white and grey areas represent blocks where VF was available or not available, respectively, as indicated with a crossed or non-crossed eye. Blocks in which hits (with 5° tolerance on each side of the target) were followed by a pleasant sound are indicated with a small speaker symbol. The y-axis represents the value of the discrepancy between hand movement and task feedback. The double dashed vertical lines represents the time point at which “Maintain” or “Remove” instructions were given. The number of trials and names for each block are indicated at the bottom of each schedule. (c) Experiment 2: forced RT. Schedule of tone playback and target appearance before each trial during the forced RT task (SRT and FRT conditions). The green area represents the allowed movement initiation timeframe, and the red dots represent target onset times for each condition. The grey areas represent the tones. (d) Forced RT task perturbation and feedback schedule for the SRT and FRT groups (top) and for the Gradual group (bottom). Grey areas represent blocks without VF. The green tick and red cross represent binary feedback cues for a hit (5° tolerance on each side of the target) and miss, respectively. The white and grey areas represent blocks in which VF was available or not available, respectively, as indicated with a crossed or non-crossed eye, and the y-axis represents the value of the discrepancy between hand movement and task feedback. The number of trials and names for each block are indicated at the bottom of each schedule. BF: binary feedback; VF: visual feedback; RT: reaction time; SRT: slow reaction time; FRT: fast reaction time.

Group performance is shown in Fig. 2a. All groups showed similar baseline performance (Fig. 2b; H(3) = 4.59 p = 0.20; see Methods for detailed information on statistical analysis), and had fully adapted to the visuomotor displacement prior to the asymptote/reinforcement blocks (average reach angle in the last 20 trials of adaptation, Fig. 2c; H(3) = 2.56 p = 0.46). Interestingly, at the start of the first asymptote block, participants in both BF groups showed a dip in performance, effectively drifting back toward baseline before adjusting back and returning to plateau performance. This “dip effect” was completely absent in the VF groups, and has previously been observed independently of our study when switching to BF after a displacement is abruptly introduced12. Therefore, success rate was compared independently across groups in the first 30 trials (Fig. 2d) and the remaining 170 trials (Fig. 2e) of the asymptote block. Both BF groups exhibited lower success rates than the VF groups in the early asymptote phase (H(3) = 46.79, p < 0.001, Tukey’s test p < 0.001 for BF-Maintain vs VF-Maintain and vs VF-Remove, and for BF-Remove vs VF-Maintain and vs VF-Remove). This was also seen in the late asymptote phase (H(3) = 31.29, p < 0.001, Tukey’s test p < 0.001 for BF-Maintain vs VF-Maintain and vs VF-Remove, and for BF-Remove vs VF-Maintain and vs VF-Remove), although performance greatly improved for both BF groups compared to the early phase (Z = 3.692 and Z = −3.81 for BF-Remove and BF-Maintain, respectively, p < 0.001 for both). Of note, both BF groups express a slight decrease in reach angle at the beginning of the second asymptote block, but removing this second dip does not qualitatively alter the result (H(3) = 27.46, p < 0.001, Tukey’s test p < 0.001 for BF-Maintain vs VF-Maintain and vs VF-Remove, and p < 0.01 for BF-Remove vs VF-Maintain and vs VF-Remove). Finally, no across-group difference in RTs or movement duration was found during the asymptote blocks (Supplementary fig. S1a,b).

Figure 2
figure 2

Experiment 1: feedback-instruction. (a) Reach angles with respect to target (°) of each group during the visuomotor displacement task. Values are averaged across epochs of 5 trials. Vertical bars represent block limits. The binary feedback consisted of a pleasant sound in the rewarded region. The black solid line represents the hand-to-cursor discrepancy (the perturbation) for all groups across the task. The upper and lower horizontal axes represent block-relative and absolute trial number, respectively. Coloured lines represent group mean and shaded areas represent s.e.m. (b) Average reach angle of participants during baseline. Of note, the y axis is on a smaller scale than the following figures (c) Average reach angle during the last 20 trials of the adaptation phase. The shaded area represents the region to be rewarded in the subsequent asymptote phase. (d) Success rate (%) during the first 30 trials of the asymptote phase. (e) Success rate during the remainder of the asymptote phase (trial 166–335). (f) Average reach angle during the last 20 trials of the second no-feedback (retention) phase. Each dot represents one participant. The yellow dot represents the same participant across all plots, who expressed atypical end adaptation reach angle values; however this was not seen across the other variables. For the distribution plots, horizontal black lines are group medians and the shaded areas indicate distribution of individual values. BF: binary feedback; VF: visual feedback. ***p < 0.001, **p < 0.01.

Participants then performed a series of 2 no-feedback blocks. Similar to Shmuelof et al.12, we assessed retention by looking at the last 20 trials of the second block. However, our results are fundamentally the same irrespective of the trials used to represent retention. Overall, the BF-Maintain group showed greater retention relative to all other groups, largely maintaining the reach angle values achieved during the asymptote phase, whereas there was no difference between the other groups (Fig. 2f; H(3) = 27.66, p < 0.001, Tukey’s test p = 0.001 for BF-Remove vs BF-Maintain and p < 0.001 for BF-Maintain vs both VF groups; p = 0.6 for BF-Remove vs VF-Remove; p = 1 for BF-Remove vs VF-Maintain; p = 0.68 for VF-Maintain vs VF-Remove). We therefore replicated previous work which showed that BF led to enhanced retention of a visual displacement when compared to VF12. However, this effect of BF was abolished by asking participants to remove any re-aiming strategy they had developed (BF-remove). This suggests the increase in retention following BF was mainly a consequence of the greater development and expression of explicit control.

Experiment 2: Re-aiming is necessary for maintaining performance under binary feedback

If the conclusion from our first experiment is correct, then successful asymptote performance under BF only should be dependent on the ability to develop and express explicit control. Therefore, in experiment 2 we restricted participant’s capacity to recruit an explicit component by using a forced RT adaptation paradigm36,37,38 (Fig. 1c, see methods for details). Specifically, two groups adapted to a 20° CCW visuomotor displacement by performing reaching movements to 4 targets (Fig. 1d), with the amount of available preparation time (i.e. time between target appearance and movement onset) being restricted. A first group was allowed to express slow RTs (SRT; RT constraints were 930 to 1100 ms after target onset; N = 10), while the second group was only allowed very fast RTs (FRT; 130 to 300 ms; N = 10; Fig. 1c and Supplementary fig. S2a). The latter condition has been shown to prevent time-demanding explicit processes such as mental rotations necessary to express re-aiming in reaching tasks36,38,39. Critically, this paradigm prevented expression of re-aiming, but may not prevent development of an explicit component, at least reliably. Therefore, to ensure any between-group difference was task-dependent and not related to inter-individual differences in awareness or understanding of the task, we explained in detail the nature of the perturbation and the optimal policy to counter it. In addition, a third condition was designed in which participants were kept unaware of the visual displacement by introducing the perturbation gradually16,18 (N = 10; Fig. 1d, bottom), and were not informed of any optimal policy to employ. Participants in this group were given no RT constraint whatsoever. Finally, it should be mentioned that a large portion of participants in the Gradual group reported noticing a slight perturbation by the end of the adaptation block when informally asked after the experiment. However, they underestimated its amplitude significantly at best, reporting effects of the order of 5°. Nevertheless, for the sake of simplicity we will qualify this group as “unaware”, although we acknowledge they reported very partial, reduced awareness of the perturbation.

Overall group performance is displayed in Fig. 3a. During baseline, average reach direction was similar for all groups (Fig. 3b; H(2) = 0.45, p = 0.79). To examine whether the FRT and SRT groups displayed different rates of learning during adaptation, we applied an exponential model to each participant’s adaptation data. Note, this was not done for the gradual group whose adaptation rate was restricted by the incremental visuomotor displacement. Surprisingly, we found no significant difference between the FRT and SRT group’s learning rates (U = 74; p = 0.34; Supplementary fig. S2b). Indeed, one would expect the SRT group to express faster learning since they can express strategies to account for the perturbation19,36,38,40. This is most likely a consequence of the small size of the perturbation encountered (i.e. 20°), which leaves less margin for strategic re-aiming20,40,41. At the end of the adaptation block, all groups adapted successfully, with no significant difference in reaching direction (Fig. 3c; H(2) = 2.34, p = 0.31). However, despite the lack of statistical significance, the mean reach direction for the FRT group was slightly under 15° (mean: 14.87°), which represents the limit of the reward region in the subsequent block. We discuss the implications of this later.

Figure 3
figure 3

Experiment 2: forced RT. (a) Reach angles with respect to target (°) of each group during the visuomotor displacement task. Values are averaged across epochs of 4 trials. Vertical bars represent block limits. The binary feedback consisted of a large green tick displayed on top of the screen if participants were within the reward region (see figure), and of a red cross if they were not (not shown). The black solid line represents the hand-to-cursor discrepancy (the perturbation) for the SRT and FRT group across the task, and the grey dashed line represents the perturbation for the Gradual group only. The upper and lower horizontal axes represent block-relative and absolute trial number, respectively. Coloured lines represent group mean and shaded areas represent s.e.m. (b) Average reach angle of participants during baseline. Of note, the y axis is on a smaller scale than the following figures (c) Average reach angle during the last 20 trials of the adaptation phase. The shaded grey area represents the region to be rewarded in the subsequent asymptote phase. (d) Average reach angle during the asymptote block. The shaded grey area represents the rewarded region. (e) Success rate during the first 30 trials of the asymptote phase. (f) Success rate during the remainder of the asymptote phase (trial 331–500). Each dot represents one participant. For the distribution plots, horizontal black lines are group medians and the shaded areas indicate distribution of individual values. SRT: short reaction time; FRT: fast reaction time. #p = 0.059; ***p < 0.001; **p < 0.01; *p < 0.05.

Participants then experienced an asymptote block with BF, similar to the first experiment, with the exception that hit-miss feedback was provided with a green tick and a red cross onscreen, because audio BF would potentially temporally align with movement initiation cues and confuse participants. Several other studies have already employed visual BF successfully11,22,42. During asymptotic performance, where participants were restricted to binary feedback, the SRT group showed a striking ability to maintain performance within the rewarded region whereas the two other groups clearly could not (Fig. 3d; H(2) = 17.5, p < 0.001, Bonferroni-corrected (see Methods), Tukey’s test p < 0.001 vs FRT and p = 0.001 vs Gradual). Next we compared success rates across groups for early BF trials (Fig. 3e) and the remainder of BF trials (Fig. 3f) independently. Early success rates were significantly lower for the Gradual group compared to the SRT (H(2) = 9.2, p = 0.02, Bonferroni-corrected, Tukey’s test p = 0.011), and a similar but non-significant trend was observed between the FRT and SRT groups (Tukey’s test p = 0.059). The absence of a significant difference in early success rate between the FRT and SRT groups cannot be explained by average reach angles, as the FRT group actually express a larger decrease in reach angle during that timeframe compared to the Gradual group (Fig. 3a). Rather, the greater variability in reach angle within individuals in the FRT as opposed to the Gradual group is likely to cause this result (average individual variance; FRT: 47.5; Gradual: 18.9). However, success rate during the remaining trials reached significance for both the FRT and Gradual groups compared to the SRT group (H(2) = 16.67, p < 0.001, Bonferroni-corrected, Tukey’s test p < 0.001 for both FRT and Gradual). Surprisingly, no dip in performance was observed for the SRT group in the early phase of the BF blocks, suggesting that informing participants of the perturbation and how to overcome it at the beginning of the experiment is sufficient to prevent this drop in reach angle.

Next, to ensure the low end adaptation reach angles expressed by the FRT group did not explain the low success rates, we removed every participant who expressed less than 15° reach angle at the end of the adaptation from each group (e.g.43). Henceforth, we refer to those participants as non-adapters, as opposed to adapters. This procedure resulted in 1, 5 and 2 participants being removed in the SRT, FRT and Gradual groups, respectively. Performance for the adapters was fundamentally the same as the original groups (Fig. 4a), except for end adaptation reach angles, which were now all above 15° (Fig. 4b; SRT 17.0 ± 1.2; FRT 16.9 ± 1.2; Gradual 16.7 ± 1.4). Specifically, the SRT-adapter group still showed a clear ability to remain in the rewarded region during binary feedback performance (asymptotic blocks), whereas the other two adapter groups could not (Fig. 4c; H(2) = 14.0, p = 0.002, Bonferroni-corrected, Tukey’s test p = 0.028 vs FRT-adapter and p = 0.001 vs Gradual-adapter). Because the full groups (i.e. non-Adapters included) did not express a drop in success rate during early asymptote trials, we compared Adapters’ success rates during asymptote as a whole, rather than splitting them between early and late performance. The SRT-adapter group still displayed greater success than the Gradual-adapter group (Fig. 4d; H(2) = 13.74, p = 0.002, Bonferroni-corrected, Tukey’s test p < 0.001). However, the difference between the SRT-adapter and the FRT-adapter group was now non-significant (Tukey’s test p = 0.12). Despite this, the reach angle differences clearly show that successful binary performance remained strongly affected by one’s capacity to develop and express explicit control even for the successful adapters, as shown by the Gradual-adapter and FRT-adapter groups, respectively (Fig. 4a).

Figure 4
figure 4

Performance of successful adapters during the forced RT task. (a) Reach angles with respect to target (°) of each group’s successful adapters exclusively. Values are averaged across epochs of 4 trials. Vertical bars represent block limits. The binary feedback consisted of a large green tick displayed on top of the screen if participants were within the reward region (see figure), and of a red cross if they were not (not shown). The black solid line represents the hand-to-cursor discrepancy (the perturbation) for the SRT and FRT group across the task, and the grey dashed line represents the perturbation for the Gradual group only. The upper and lower horizontal axes represent block-relative and absolute trial number, respectively. Coloured lines represent group mean and shaded areas represent s.e.m. (b) Average reach angle during the last 20 trials of the adaptation phase. The shaded area represents the region to be rewarded in the subsequent asymptote phase. (c) Average reach angle during the binary feedback (BF) block. (d) Success rate during the asymptote phase. The black dashed line represents 50% success rate. Each dot represents one participant. For the distribution plots, horizontal black lines are group medians and the shaded areas indicate distribution of individual values. >15° and < 15° indicate the average reach angle during the end of the adaptation phase (i.e. adapter and non-adapter, respectively). SRT: short reaction time; FRT: fast reaction time. ***p < 0.001; **p < 0.01; *p < 0.05.

Finally, since trials were reinitialised if participants failed to initiate reaching movements within the allowed timeframe, we compared the average occurrence of these failed trials between the FRT and SRT groups (Supplementary fig. S2c) to ensure any between-group difference cannot be explained by this. Both groups expressed similar amounts of failed attempts per trial (U = 100, p = 0.73). In addition, movement times were significantly faster across all blocks for the FRT group compared to the SRT group (Supplementary fig. S2d; H(2) = 11.78, p = 0.005, Tukey’s test p = 0.002), although they remained strictly under 400 ms for all groups as in the first experiment (Fig. 1c). This difference is to be expected due to the tendency to express faster velocities in movements with rapid initiation44. RTs expressed by the Gradual group were between the SRT and FRT constraints (Supplementary fig. S2a; Gradual group RT range 385 to 1610 ms).

Overall these findings demonstrate that preventing explicit control by restricting its expression or making participants unaware of the nature of the task results in the partial incapacity of participants to perform successfully during binary feedback performance. It should be noted, however, that performance did not reduce back to baseline entirely, as participants in both the FRT and Gradual groups were still able to express intermediate reach angle values in the order of 10 to 15°.

Discussion

Previous work has led to the idea that BF induces the recruitment of a model-free reinforcement system that strengthens and consolidates the acquired memory of a visuomotor displacement10,12,22. Here, we investigated the role of explicit control in the context of BF, and our results suggest that it may have a more central role in explaining some BF-induced behaviours than previously expected. In the first experiment, the increased retention observed in the BF-Maintain group was suppressed if participants were told to “stop aiming off target” (BF-Remove group). In the second experiment, preventing expression of explicit control by using a secondary task or preventing its development with a gradual introduction of the perturbation resulted in participants being unable to maintain accurate performance during BF blocks. This suggests an explicit component is necessary for performing a BF reaching task, at least within the present study’s experimental design.

The initial performance drop observed at the introduction of BF for both BF groups suggests that participants cannot immediately account for a visuomotor displacement they have already successfully adapted to12. A possible explanation is that the cerebellar memory is not available anymore, most likely because removing VF results in a context change, which is known to prevent retrieval and expression of an otherwise available memory45,46,47. Considering this, the restoration of performance observed after this dip could not be explained by recollection of the cerebellar memory, suggesting another mechanism took place. Two possible candidates to explain this drift back are model-free reinforcement10,11,12,22 and explicit processes7,8,41.

Reinforcement learning is usually considered to operate through experiencing success10,11,48. It is thus difficult to argue for a reinforcement-based reversion to good performance during BF because participants in the trough of the dip did not experience a large amount of success (Supplementary fig. S3), if any. Furthermore, participants experienced little “plateau” performance during the previous block, making formation of a model-free reinforcement memory unlikely, because it is considered a rather slow learning process as opposed to model-based reinforcement10,49; though the adaptation block remains longer compared to Shmuelof and colleagues12. On the other hand, both BF groups experienced a large amount of unexpected errors during this drop, which may promote a more explicit approach16,17,18,27,35. In line with this, the SRT group in the forced RT task, which had been informed of the displacement and of the right policy to counter it, did not express such a dip when starting the BF block.

The forced RT task addresses this question more directly, and shows that impeding explicit control with a secondary task36,38 prevents participants from restoring performance over BF blocks, confirming our interpretation. Interestingly, both the FRT and Gradual groups did not show a return to baseline during asymptote. Likely, the FRT group was aware of the optimal policy, and could partially express it, leading to these intermediate reach angles. In line with this, previous work on forced RT paradigms shows that adapting the constraints based on each individual’s baseline proficiency at this task more efficiently prevents explicit control38. Furthermore, even in the presence of BF, the Gradual group showed a striking inability to find the optimal policy, suggesting the lack of structural understanding of the task strongly impeded their exploration35,48. This overall incapacity of the Gradual group to express an efficient explorative approach is consistent with previous findings showing that rewarding success alone, without providing any explanation of the task structure, is not sufficient to make participants reliably learn an optimal policy48,50.

Previous studies employing the forced RT paradigm have shown it usually leads to slower learning rates during adaptation because participants can less easily employ explicit control from the beginning19,36,38. In contrast, no such difference in learning rate was observed in our forced RT groups. This is possibly due to the difference in size of the perturbation between our study (20°) compared to others36,38 (30°), making the explicit contribution potentially smaller during the adaptation phase7.

Our findings qualitatively replicate results from a previous study employing a similar design12. However, it should be noted that our paradigm differs in several ways. First, retention was assessed using feedback removal rather than visual error clamps, although there is evidence that both methods lead to quantitatively similar results51. Second, our displacement was only 20° of amplitude and no additional displacement was introduced after the asymptote blocks. There is now a growing wealth of evidence that the cerebellum cannot account for more than 15 to 20° displacements20,38,52, with the remaining discrepancy usually being accounted for through explicit re-aiming41. Therefore, the absence of a second, larger displacement, if anything, should only result in a less explicit performance. Nevertheless, instructing participants to remove any explicit re-aiming policy (Remove groups) resulted in a near-complete nullification of the binary feedback effect, suggesting it is mainly underlain by a simple re-aiming process. However, the Maintain instruction alone was not sufficient to produce this high retention profile, as the VF-Maintain group did not express it. We believe this can be explained in two ways. First, experiencing no feedback may result in a stronger context change for the VF groups compared to the BF groups, because the latter experienced the absence of VF during the asymptote blocks beforehand. Thus, this should lead to a stronger drop in reaching angle at the beginning of the no feedback trials for the VF groups, as observed here. Alternatively, the VF-Maintain group experienced 200 more trials with visual feedback at asymptote. Consequently, it is very likely that the cerebellar memory at the beginning of the no-feedback blocks was stronger11, and the explicit contribution was less for this group compared to the BF-Maintain group7,19,41,53. This would therefore result in the slow drop in reach angle observed during early no-feedback trials due to gradual decay of the cerebellar memory45,51,54. Critically, both possibilities are not incompatible, and may well occur together.

A notable feature of retention performance is that both BF- and VF-Remove groups show a residual bias of around 5° in their reach angle in the direction opposite to the displacement. Participants in the Remove conditions were not aware of this upon asking them after the experiment. This has been reliably observed in studies using no-feedback blocks to assess retention21,55 (but see51). Possible explanations include use-dependent plasticity-induced bias56,57, perceptual bias58 or an implicit model-free reinforcement-based memory, although this study cannot provide any account toward one or the other. Note however that although the BF-Remove group expressed slightly more bias than its VF counterpart, this clearly did not reach statistical significance, meaning this cannot be explained by feedback type alone. Regardless, the implicit and lasting nature of this phenomenon makes it a promising focus for future research with clinical applications13,15.

Overall, our findings point towards a central role of explicit control during BF-induced behaviours in this study. In line with this, 14/54 participants had to be removed from the BF groups in the feedback-instruction task (experiment 1) because of poor performance in the asymptote blocks (see methods), suggesting that structural learning was required to perform accurately35,48,50. Though this is a significant proportion of participants, it should be noted that other studies using BF-based reaching also found a similar percentage of “learners” and “non-learners”42,43. Although not expected in our study, this seemingly consistent outcome across a variety of BF experimental designs raises questions regarding either the reliability of this learning mechanism across individuals or the tasks used to examine it. The possibility that this dichotomy between participants is due to structural learning is in line with the dip observed in the BF groups and the absence of dip in the (i.e. informed) SRT group. If correct, then predictors of structural learning capacity should also predict an individual’s ability to learn a visuomotor displacement under BF, a hypothesis that will be tested in future studies. Finally, our view is that implicit, model-free reinforcement takes a great amount of time and practice to form49,59, and usually arises from initially model-based performance in behavioural literature23,60, as illustrated by popular reinforcement models (e.g. DYNA61,62). Two interesting possibilities are that 200 trials of BF alone are not sufficient to result in a strong, habit-like enhancement of retention60, or that such behavioural consolidation must take place through sleep60,63. Future work is required to address these hypotheses.

In conclusion, this study provides further insight into the use of reinforcement during motor learning, and suggests that successful reinforcement is tightly coupled to the development and expression of explicit control. We suggest that explicit control bears many similarities with model-based reinforcement, thus creating important questions regarding the link between model-based and model-free reinforcement systems during motor learning. At the very least, future studies investigating reinforcement during visuomotor adaptation should proceed with care in order to map which behaviour is the consequence of implicitly reinforced memories or explicit control.

Methods

Participants

80 participants (20 males) aged 18–37 (M = 20.9 years) and 30 participants (11 males) aged 18–34 (M = 22.1 years) were recruited for experiment one and two, respectively, and pseudo-randomly assigned to a group after providing written informed consent. All participants were enrolled at the University of Birmingham. They were remunerated either with course credits or money (£7.5/hour). They were free of psychological, cognitive, motor or auditory impairment and were right-handed. The study was approved by and done in accordance with the local research ethics committee of the University of Birmingham.

General procedure

Participants were seated before a horizontal mirror reflecting a screen above (refresh rate 60 Hz) that displayed the workspace and their hand position (Fig. 1a), represented by a green cursor (diameter 0.3 cm). Hand position was tracked by a sensor taped on the right hand index of each participant and connected to a Polhemus 3SPACE Fastrak tracking device (Colchester, Vermont U.S.A.; sampling rate 120 Hz). Programs were run under MatLab (The Mathworks, Natwick, MA), with Psychophysics Toolbox 364. Participants performed the reaching task on a flat surface under the mirror, with the reflection of the screen matching the surface plane. All movements were hidden from the participant’s sight. When each trial started, participants entered a white starting box (1 cm width) on the centre of the workspace with the cursor, which triggered target appearance. Targets (diameter 0.5 cm) were 8 cm away from the starting position. Henceforth, the target position directly in front of the participant will be defined as the 0° position and other target positions will be expressed with this reference. Participants were instructed to perform a fast “swiping” movement through the target. Once they reached 8 cm away from the starting box, the cursor disappeared and a yellow dot (diameter 0.3 cm) indicated their end position. When returning to the starting box, a white circle displaying their radial distance appeared to guide them back.

Task design

Experiment 1: Feedback-instruction

For each trial, participants reached to a target located 45° counter-clock wise (CCW). Participants first performed a baseline block (60 trials) with veridical cursor feedback, followed by a 75 trials adaptation block in which a 20° CCW displacement was applied (Fig. 1b). In the following 2 blocks (100 trials each), participants either experienced the same perturbation with only BF, or with BF and VF. BF consisted of a pleasant sound selected based on each participant’s preference from a series of 26 sounds before the task, unbeknownst of the final purpose. When participants’ cursor reached less than 5° away from the centre of the target, the sound was played, indicating a hit; otherwise no sound was played, indicating a miss. For the BF group, no cursor feedback was provided, except for one “refresher” trial every 10 trials where VF was present. Participants in the VF group could see the cursor position during the outbound reach of the trial, along with the BF. Finally, participants went through 2 no-feedback blocks (100 trials each) with BF and VF completely removed. Before those blocks, participants were either told to “carry on” (“Maintain” group) or informed of the nature of the perturbation, and asked to stop using any explicit approach to account for it (“Remove” group). Therefore, we had four groups in a 2 × 2 factorial design (BF versus VF and Maintain versus Remove). Finally, if a trial’s reaching movement duration was greater than 400 ms or less than 100 ms long, the starting box turned red or green, respectively, to ensure participants performed ballistic movements, and didn’t make anticipatory movements. Participants who expressed a success rate inferior to 40% during asymptote blocks were excluded (BF-Remove N = 6; BF-Maintain N = 8). Although this exclusion rate was high, it was crucial to exclude participants who were unable to maintain asymptote performance in order to reliably measure retention.

Experiment 2: Forced RT

In this experiment, participants were forced to perform the same reaching task at slow (SRT) or fast reaction times (FRT), the latter condition preventing explicit re-aiming by enforcing movement initiation before any mental rotation can be applied to the motor command36,39. A third group (Gradual) also performed the task with no RT constraints.

In the SRT/FRT groups, for each trial, entering the starting box with the cursor triggered a series of five 100 ms long pure tones (1 kHz) every 500 ms (Fig. 1c). Before the fifth tone, a target appeared at one of four possible locations equally dispatched across a span of 360° (0–90–180–270°). Participants were instructed to initiate their movement exactly on the fifth tone (Fig. 1c). Targets appeared 1000 ms (SRT) or 200 ms (FRT) before the beginning of the fifth tone. Movement initiations shorter than 130 ms are likely anticipatory movements37, and explicit control starts to be difficult to express under 300 ms36,38. Therefore, in both conditions, movements were successful if participants exited the starting box between 70 ms before the start of the fifth tone and the end of the fifth tone, that is, from 130 ms to 300 ms after target appearance in the FRT condition. If movements were initiated too early or too late, a message “too fast” or “too slow” was displayed and the cursor did not appear upon exiting the starting box. The trial was then reinitialised and a new target selected. Finally, if participants repeatedly missed movement initiation, making trial duration over 25 seconds, RT constraints were removed, to allow trial completion before cerebellar memory time-dependent decay51,54,65. Participants in the SRT and FRT groups were informed of the displacement and of the optimal policy to counter it, to ensure that any effect was related to expression, rather than development of explicit control. They were also instructed to attempt using the optimal policy as much as possible when sensible, but not at the expense of the secondary RT task, so as to preserve the pace of the experiment and prevent time-dependent memory decay.

To attain proficiency in the RT task, SRT and FRT participants performed a training block (pseudo-random order of VF and BF trials) of at least 96 trials, or until they could initiate movements on the fifth tone reliably (at the first attempt) at least for 75% of the previous 8 trials. All participants achieved this in 96 to 157 trials. Once this was achieved, participants first performed a 40 trials baseline (Fig. 1d), followed by introduction of a 20° CCW displacement for 260 trials. Participants then underwent a 200-trials asymptote block with only BF (1 “refresher” trial every 10 trials). The BF consisted of a green tick or a red cross if participants hit or missed the target, respectively. Visual (instead of audio) BF was used to avoid BF sounds from lining up with the tones, which could potentially confuse participants. The Gradual group underwent the same schedule, except that no tone or RT constraint were used, and the perturbation was introduced gradually from the 41st to the 240th trial (increment of 0.4°/trial) occurring independently for each target. This ensured participants experienced as few large errors as possible to prevent awareness of the perturbation and therefore explicit control. After the experiment, participants in the Gradual group were informed of the displacement, and subsequently asked if they noticed it. If they answered positively, they were asked to estimate the size of the displacement.

Data analysis

All data and analysis code is available on our open science framework page (osf.io/hrgzq). All analyses were performed in MatLab. We used Lilliefors test to assess whether data were parametric, and we compared groups using Kruskal-Wallis or Wilcoxon signed-rank tests when appropriate, as most data were non-parametric. Post-hoc tests were done using Tukey’s procedure. As we analysed the data from experiment two twice (Fig. 3 and 4), success rates and reach angles during asymptote were Bonferroni-corrected with corrected pvalues (multiplied by 2).

Learning rates were obtained by fitting an exponential function to adaptation block reach angle curves with a non-linear least-square method and maximum 1000 iterations (average R2 = 0.86 ± 0.14 for feedback-instruction task and R2 = 0.58 ± 0.26 for forced-RT task):

$$y=a.\,\,{e}^{\beta x}+b$$

where y is the hand direction for trial x, a is a scaling factor, b is the starting value and β is the learning rate. Reach angles were defined as angular error to target of the real hand position at the end of a movement. Trials were considered outliers and removed if movement duration was over 400 ms or less than 100 ms, end point reach angle was over 40° off target, and for the SRT and FRT groups in the forced-RT task, if failed initiation attempts continued for more than 25 sec. In total, outliers accounted for 3755 trials (8%) in the feedback-instruction task and 1013 trials (6%) in the forced-RT task.

Even though 4 targets were used during the forced-RT task, trials were reset and a new random target was selected when participants failed to initiate movements on the 5th tone. Therefore, all possible target positions would not be represented for each epoch, and epochs were consequently not used.