Continuous Tracking of Task Parameters Tunes Reaching Control Online

Abstract A hallmark of human reaching movements is that they are appropriately tuned to the task goal and to the environmental context. This was demonstrated by the way humans flexibly respond to mechanical and visual perturbations that happen during movement. Furthermore, it was previously showed that the properties of goal-directed control can change within a movement, following abrupt changes in the goal structure. Such online adjustment was characterized by a modulation of feedback gains following switches in target shape. However, it remains unknown whether the underlying mechanism merely switches between prespecified policies, or whether it results from continuous and potentially dynamic adjustments. Here, we address this question by investigating participants’ feedback control strategies in presence of various changes in target width during reaching. More specifically, we studied whether the feedback responses to mechanical perturbations were sensitive to the rate of change in target width, which would be inconsistent with the hypothesis of a single, discrete switch. Based on movement kinematics and surface EMG data, we observed a modulation of feedback response clearly dependent on dynamical changes in target width. Together, our results demonstrate a continuous and online transformation of task-related parameters into suitable control policies.


Introduction
Humans can execute reaching movements in various environments in the presence of unexpected disturbances such as visual or mechanical perturbations, that can interfere with their ability to succeed. Indeed, a large body of work characterized human control policies during reaching in presence of step mechanical (Knill et al., 2011;Nashed et al., 2012;Lowrey et al., 2017;Cross et al., 2019), visual (Georgopoulos et al., 1981;Soechting and Lacquaniti, 1983;Prablanc and Martin, 1992;Sarlegna and Mutha, 2015), or vestibular perturbations (Keyser et al., 2017;Oostwoud Wijdenes et al., 2019). Crucially, the perturbations used in these experiments recruited feedback circuits without altering the limb dynamics, which allowed establishing the dependency of the control policy on task requirements. These results highlighted that reaching control policies flexibly adapted to a wide variety of contexts while relying on different sensory modalities.
To capture this feature, the control of upper limb reaching movements can be modeled in the framework of optimal feedback control (OFC). This theory posits that reaching control policies optimize a performance index captured by a cost-function consisting of a weighted combination of motor cost and state-dependent movement penalties. This cost-function encompasses the task requirements by determining how to control the limb optimally with respect to this goal (Todorov and Jordan, 2002;Todorov, 2004). OFC has been used to model a diverse set of perturbation paradigms and established the flexibility of goal-directed feedback control in humans (Diedrichsen, 2007;Izawa and Shadmehr, 2008;Diedrichsen and Dowling, 2009;Omrani et al., 2013;Nashed et al., 2014;Scott, 2016).
It is important to realize that in most studies, the planning and control phases have been dissociated. Indeed, it was often assumed that the movement goal is selected before executing the corresponding control policy (Wong et al., 2015). In the OFC framework, the dissociation of planning and execution corresponds to the assumption that the feedback gains, and therefore the control policy, are derived before movement. In this view, it is unclear whether and based on which variables can the nervous system update control of an ongoing movement following changes in task-related parameters altering the movement goal, thereby implying a novel cost and requiring an adjustment of the policy. Crucially, we must distinguish perturbations as target jumps or mechanical loads, which computationally can be handled by altering the state vector without changing the controller, from changes in task requirements such as the structure of the target, that impose a change in the controller itself.
We recently demonstrated that the goal-directed policy used during reaching was adjusted online in response to changes in target width (De Comite et al., 2021). Here, we sought to investigate whether such adjustments reflected participants' ability to switch between two prespecified control strategies, or whether they resulted from a feedback system considering continuous changes in the goal structure, and responded accordingly.
We addressed this question in two experiments where participants had to perform reaching movements toward a target the width of which could gradually decrease at different rates during movement, corresponding to a continuous modification of the target redundancy along its main axis. Two alternative hypotheses can be formulated: if adjustments in control policy do not integrate the dynamical changes in target, we expect to see stereotyped switches in behavior and feedback responses across conditions reflecting switches between two extreme cases (corresponding to maximal and minimal target widths). On the contrary, if dynamic changes are monitored, different rates of changes in target width should evoke different amounts of modulation in feedback responses. In agreement with the second alternative, we observed across the two experiments that participants adjusted their response to the rate of change in target width. Together, our results demonstrate the existence of a feedback mechanism conveying continuous information about task-parameters and adjusting the control policies dynamically.

Materials and Methods Participants
A total of 24 right-handed participants were recruited for this study and were enrolled in one of the two experiments. Fourteen participants (10 females) ranging in age from 18 to 30 years old took part to experiment 1. The second group performed experiment 2 and included 10 right-handed participants (5 females) ranging in age from 19 to 27 years old. Participants were naive to the purpose of the study, had normal or corrected vision, and had no known neurologic disorder. The ethics committee of the local university approved the experimental procedures and participants provided their written informed consent before the experiment.

Experimental paradigm
Participants were seated on an adjustable chair in front of a Kinarm end-point robotic device (KINARM) and grasped the handle of the right robotic arm with their right hand. The robotic arm allowed movements in the horizontal plane and direct vision of both the hand and the robotic arm was blocked. Participants sat such that, at rest, their arm was approximately vertical and their elbow formed an angle of ;90°.Their forehead rested on a soft cushion attached to the frame of the robot. A semi-transparent mirror, located above the handle and reflecting a virtual reality display (VPixx, 120 Hz) allowed participants to interact with visual targets. A white dot of 0.5-cm radius aligned to the position of the right handle was displayed throughout the whole experiment.

Experiment 1
In this experiment, participants (N = 14) were instructed to perform reaching movements to a visual target initially represented as a wide rectangle (30 Â 2.5 cm) located 20 cm away from the home target in the y-direction. The home target was a circle of 1.5 cm in diameter. The main axis of the rectangle was aligned with the x-axis and was orthogonal to the straight-line path from the home target to the center of the goal target (see Fig. 1A). Participants first had to bring the hand-aligned cursor in the home target displayed as a red circle that turned green as they reached it. After a random delay (uniform, between 1 and 2 s), the goal target was projected as a gray rectangle and participants could begin their movement whenever they wanted. There was no constraint on the reaction time. The exit from the home target was used as an event to determine reach onset, and starting then participants had to complete their movement between 350 and 600 ms to successfully complete the trial. The trial was successfully completed if (1) they reached the goal target within the prescribed time window; and (2) they were able to stabilize the cursor in it for 500 ms. The goal target turned green at the end of successful trials and red otherwise. To motivate the participants, a score corresponding to their number of successful trials was projected next to the goal target.
During movements, two types of perturbations could occur. The first one was a mechanical load consisting of a lateral step force applied by the robot to participants' hand (53.6% of trials). The magnitude of this force was 69 N aligned with the x-axis, with a 10-ms linear build-up. This force was triggered when the hand-aligned cursor crossed a virtual line parallel to the x-axis and located at 6 cm from the center of the home target (see Fig. 1A, horizontal black dashed line). This step force was switched off at the end of the trial. The second type of perturbation was a visual change in target width starting when participants exited the home target (51.2% of trials). Hereafter, we refer to the visual perturbation as the target condition. Participants had no information about the target condition before movement initiation. This change could either be an instantaneous change from a wide rectangle to a narrow square (switch condition; Fig. 1B, magenta) or a continuous change in target width either at a speed of -30 cm/s (slow condition; Fig. 1B, green) or at a speed of À45.8 cm/s (fast condition; Fig. 1B, blue). The speed of the fast condition was selected such that the target width at the end of the movement was similar to the switch conditions for the slowest correct movements. This was done to assess whether participants could anticipate the final width of the target and select a corresponding controller, which would produce identical responses in the fast and switch conditions. The decrease in target width stopped as participants entered the goal target. Importantly, the location of the center of the goal target did not change across conditions and was always aligned with the home target. Unperturbed and perturbed trials were randomly interleaved such that participants could not predict the occurrence and the nature of disturbances. Participants were instructed to reach the target as it was actually displayed. They started with a 25-trials training block to become familiar with the task, the timing constraints, and the force intensity of perturbation loads. Crucially, this training block did not contain any visual perturbation. After completing this training block, participants performed six blocks of 82 trials. Each 82-trials block contained: 38 trials without mechanical perturbation (20 with no target change and 6 for each target condition) and 44 trials with mechanical perturbation (20 with no target change and 8 for each target condition, equally likely for rightward and leftward mechanical perturbations). Participants performed a total of 492 trials, including 24 of each combination of perturbed condition (direction of the mechanical perturbation and target condition), see Table 1. Participants were compensated for their participation.

Experiment 2
We designed a second experiment which was a variant of the first one to assess reproducibility of the results in a slightly different version of the protocol, and also to investigate possible influence of the delay between the visual and mechanical perturbations on the modulation of feedback responses. Experiment 2 was almost identical to experiment 1, except that the mechanical perturbation was triggered when the hand-aligned cursor crossed a virtual line parallel to the x-axis and located at 8 cm (instead of 6) from the center of the home target (see Fig. 1A, gray dashed line). The intensity of this mechanical load was reduced compared with the main experiment (7 vs 9 N) to keep a similar success rate. All the other experimental parameters (target conditions, number of trials, and time constraints) were identical to those of experiment 1, see Table 1.
Since both visual and mechanical perturbations were triggered based on position threshold (respectively, when participants exited the home target and when they crossed a virtual line located at 6 or 8 cm from the center of the home target), there was some variability in the time span between these two perturbation triggers. The variability in this time span is represented in Figure 1C, black and gray for experiments 1 and 2, respectively, and had a median value of 96 6 6.41 ms for Figure 1. Experimental paradigms. A, Schematic representation of the task paradigm. Participants had to perform reaching movement from the home target to the goal target, initially represented as a 30-cm-wide rectangle. During movement, they could experience mechanical step forces triggered in position at 6 cm (experiment 1, black line) or 8 cm (experiment 2, gray line) from the home target and visual changes in target width (triggered as they exited the home target). B, Evolution of the target width with respect to time in the different target conditions. The time axis is aligned on the visual perturbation onset defined by the onset of movement. The vertical dashed lines represent the median force onsets for experiments 1 and 2. C, Histograms of the distribution of the time interval between the visual and mechanical perturbation onset (respectively, target and force onset) across all participants and conditions in experiments 1 (black) and 2 (gray). D, Histograms of the distribution of the target width at force onset for the two dynamical conditions (fast in blue and slow in green) across all participants in experiments 1 (top) and 2 (bottom). experiment 1 and 145 6 21.73 ms for experiment 2. As the target width in the slow and fast conditions were continuously changing with time, some variability was also present in the target width at the mechanical perturbation onset. In the fast condition, we observed a median value of 25.6 6 0.3 and 23.31 6 0.14 cm, while in the slow condition, we observed a median value of 27.1 6 0.15 and 26.3 6 0.54 cm, respectively, for experiments 1 and 2, represented in Figure 1D in blue (fast) and green (slow).

Data collection and analysis
Raw kinematics data were sampled at 1 kHz and low-pass filtered using a fourth order double-pass Butterworth filter with cutoff frequency of 20 Hz. Hand velocity, acceleration and jerk were computed from numerical differentiation of the position using a fourth order centered finite difference.
Surface EMG electrodes (Bagnoli surface EMG sensor, Delsys INC.) were used to record muscles activity during movements. We measured the pectoralis major (Pect. Maj.) and the posterior deltoid (Post. Delt.) based on previous studies (Crevecoeur et al., , 2020bDe Comite et al., 2021) showing in the same configuration that these muscles were stretched by the application of lateral forces, and therefore strongly recruited for feedback responses. Before applying the electrodes, the skin of participants was cleaned and abraded with cotton wool and alcohol. Conduction gel was applied on the electrodes to improve the quality of the signals. The EMG data were sampled at a frequency of 1 kHz and amplified by a factor of 10,000. A reference electrode was attached to the right ankle of the participant. Raw EMG data from Pect. Maj. and Post. Delt. were bandpass filtered using a fourth order double-pass Butterworth filter (cut-offs: 20 and 250 Hz), rectified, aligned to force onset, and averaged across trials or time as specified in Results. EMG data were normalized for each participant to the average activity collected when they maintained postural control against a constant force of 9 N (rightward for Pect. Maj., leftward for Post. Delt.) This calibration procedure was applied after the second and the fourth blocks.

Statistical analyses
Data processing and parameter extractions were performed using MATLAB 2019a. We fitted linear mixed models (Brown and Prescott, 2006) to infer the effect of target conditions on different kinematics parameters and on the EMG activities. These models were fitted using the fitlme function and the formula used was the following: In this formula, the fixed predictors were the intercept (b 0 ) and the target condition (b 1 ) while participants were included as a random offset (a i ). The individual residual of trial j for participant i, captured by e ij followed a normal distribution. Each target condition was associated with an integer number such that they were ordered in decreasing order of constraints on the final target (no change,slow,fast,switch) and that positive/negative values for the regressor b 1 indicate a decrease/ increase of the measured parameter with the task difficulty. For these linear mixed model analyses that we performed, we reported the mean estimate of b 1 , its SD, the t statistics for this estimate and the corresponding p value.
The continuous predictor for the condition can be seen as a nonlinear transform of task difficulty and the parameter b 1 can be interpreted as a slope, meaning that the more difficult the task is (with narrower target), the larger the feedback response. However, this approach can be criticized as the condition may also be considered as a categorical predictor. To address this concern, we also ran a discrete version of the linear mixed models where the target condition was defined as categorical. This categorical model confirmed the conclusion of the continuous one in all the conditions (results not shown). Post hoc tests between pairs of target conditions were performed using similar linear mixed model applied on the two compared target conditions. For these post hoc tests, we reported the mean estimate of b 1 , its SD, the t statistics for this estimate, the corresponding p value, and the effect size defined as the standardized mean difference between two groups of independent observations (Lakens, 2013).
In order to determine whether the timing of the mechanical perturbation relative to the onset of visual change could modulate the feedback responses, we compared the results of experiments 1 and 2 as follows. We normalized the EMG activity by the intensity of the mechanical perturbations (9 and 7 N for experiments 1 and 2, respectively), and binned them within trial in the long-latency (LL; 50-100 ms following perturbation onset) and earlyvoluntary (VOL; 100-180 ms following perturbation onset; Pruszynski et al., 2008;Pruszynski and Scott, 2012). We then ran the following linear mixed effect models for each of these binned response value: The fixed predictors are the intercept (b 0 ), the target condition (b 1 ) and the experiment (b 2 , a proxy for the onset of mechanical perturbation) while participants were included as random offset (a i ). The individual residual of trial j for participant i, captured by e ij followed a normal distribution. As above, we verified that continuous and categorical definitions of the target condition yielded similar results and reported the statistics corresponding to the continuous predictor. Post hoc tests between pairs of conditions were performed using linear mixed models applied on the two compared target conditions.
We also investigated a potential learning or habituation effect across fast and slow conditions as participants did not encounter those trials in the training phase. In order to investigate the lag between the first and last trials in the dynamical conditions (namely, the slow and fast conditions that were not met during the training phase), we used a cross-correlation analysis applied on resampled data. We generated 1000 bootstrap samples from the individual acceleration profiles. For each of these samples, we computed the mean acceleration traces for the first and last trials and computed the cross-correlation between these two mean traces. We then extracted the peak value of this crosscorrelation, corresponding to the lag between the two signals. The bootstrap resampling allowed us to obtain a distribution for this lag such that we could perform statistical analyses on it. Wilcoxon signed rank test was used to assess whether differences in lag were statistically different or not from zero.
In all our analyses, significance was considered at the level of p = 0.05 although we decided to exactly report any p-value that was larger than p = 0.005 as previously proposed (Benjamin et al., 2018).

Experiment 1
Participants were asked to perform reaching movements to a target that was initially a 30-cm-wide rectangle, in all cases. During movement and in a random subset of trials, the target could either instantaneously turn into a 2.5-cm-wide square target (switch condition) or gradually decrease in width either at a high (fast condition) or low (slow condition) speed. Additionally, unexpected mechanical perturbations were used during movements to evoke rapid motor responses and investigate their properties in relation with the change in target width

Kinematics
We observed that the target condition clearly influenced participants' behavior. Indeed, the mean hand path trajectories in the mechanically perturbed conditions ( Fig. 2A) Figure 2. Experiment 1, hand kinematics during movement. A, Group mean of the hand path for unperturbed and perturbed trials in the no change (black), slow (green), fast (blue), and switch (magenta) conditions. B, Group mean and SEM of the x-position of participants' hand as a function of time (aligned on force onset) for trials perturbed with rightward mechanical perturbations in the four target conditions. The black dashed line represents the onset of the mechanical perturbation. C, Group mean and SEM of the x-position of participants' hand as a function of time for trials perturbed with leftward mechanical perturbations in the four target conditions. The black dashed line represents the onset of the mechanical perturbation. D, Group mean (black) and individual means (gray) of the maximal hand deviation in presence of a rightward perturbation for the four target conditions. E, Group mean (black) and individual means (gray) of the final position for trials with rightward perturbation for the four target conditions. F, Group mean (black) and individual means (gray) of the maximal hand deviation in presence of a rightward perturbation for the four target conditions. G, Group mean (black) and individual means (gray) of the final position for trials with rightward perturbation for the four target conditions; **p , 0.005. differed across conditions. Consistent with our previous findings (De Comite et al., 2021) we observed online adjustments in the behavior in the switch condition (magenta) compared with the no change condition (black). These adjustments consisted of smaller lateral deviations in the switch condition ( Fig. 2B,C, black and magenta traces). Interestingly, the behaviors in the dynamical conditions (slow and fast, green and blue, respectively) differed from both the no change and switch conditions. In order to quantify these differences, we investigated the maximal hand deviation induced by the mechanical perturbations and the final hand position defined as the x-position of the hand as its velocity dropped below 2 cm/s. The maximal lateral hand deviation induced by rightward mechanical perturbations (Fig. 2D) varied significantly across the target conditions. A linear mixed model (see Materials and Methods) revealed a significant effect of target condition (b 1 = 0.0417 6 0.0022, t = 18.14, p , 0.005) on the maximal hand deviation with larger deviations for slower changes in target width. Post hoc pairwise analyses revealed that the maximal hand deviation was larger in the no change condition than in slow and fast conditions (slow b 1 = À0.006 6 0.0006, t = À10.46, p , 0.005, d = 0.60 and fast b 1 = À0.007 6 0.0006, t = À12.50, p , 0.005, d = 0.71). The hand deviation was larger in these dynamical conditions than it was in the switch condition (fast b 1 = À0.0021 6 0.0008, t = À2.56, p , 0.005, d = 0.11 and slow b 1 = À0.0017 6 0.0008, t = À3.15, p , 0.005, d = 0.21). Finally, we even observed that the hand deviation was larger in the slow than in the fast condition (b 1 = À0.0013 6 0.0006, t = À2.14, p = 0.0319, d = 0.1527). Similar results were observed for leftward mechanical perturbations (see Similarly, we observed that the final hand position along the x-axis, computed as the hand position when the total velocity dropped below 2 cm/s, exhibited similar dependency on the target condition. Indeed, a linear mixed model analysis (see Materials and Methods) revealed a significant effect of the target condition (b 1 = À0.008 6 0.0003, t = À25.75, p , 0.005). As for the maximal hand deviation, post hoc pairwise analyses revealed that both dynamical conditions were characterized by less eccentric final hand positions than the no change condition (slow, b 1 = À0.014 6 0.0008, t = À16.37, p , 0.005, d = 0.80 and fast b 1 = À0.025 6 0.0008, t = À28.74, p , 0.005, d = 1.27). These final hand positions in the slow condition were more eccentric than the one in the switch condition, no differences were found between the fast and switch conditions (fast b 1 = À0.0018 6 0.0013, t = 1.36, p = 0.17, d = 0.09 and slow b 1 = À0.008 6 0.0013, t = À6.49, p , 0.005, d = 0.43). The final hand positions in the slow condition were significantly more eccentric than those in the fast condition (b 1 = À0.01 6 0.0008, t = À13.10, p , 0.005, d = 0.83). Trials that included a leftward mechanical perturbation (see Fig. 2G) contained the same effects (linear mixed models: b 1 = 0.008 6 0.0003, t = 27.89, p , 0.005).

Muscle activity
The kinematics results that we reported indicated that participants were able to adjust their control strategy during movements according to dynamical changes in movement goal. They were even able to tune their adjustment to the speed of these dynamical changes. We hypothesized that the stretched EMG activity in Pect. Maj. and Post. Delt. should also depend on the target condition. If such modulation exists in the LL epoch, 50-100 ms following the onset of the mechanical perturbation, it would indicate that the adjustment in behavior did not only reflect changes in voluntary intent but also changes in reflexive responses previously associated with goal-directed state-feedback control (Pruszynski and Scott, 2012;Crevecoeur and Kurtzer, 2018).
We observed that the target condition modulated the EMG activity of the muscles stretched by the mechanical perturbation. Figure 3A,B represent the mean EMG activities collapsed across participants for trials perturbed by rightward or leftward perturbation in all target conditions in the stretched (full lines) and shortened muscles (dashed lines). Visual inspection of target specific responses for the stretched muscles, obtained by subtracting the no change condition, confirmed this modulation of the EMG response (see Fig. 3C,D, respectively, for Pect. Maj. and Post. Delt.). In order to characterize this modulation, the EMG activity of the stretched muscle was averaged in the LL (50-100 ms after force onset) and VOL time epochs (100-180 ms after force onset) for each perturbation direction. The deviations from the mean activity in these time bins are reported in Figure 3E,F for stretched Pect. Maj. in the LL and VOL windows at population (black) and individual (gray) levels.
Altogether, these results indicate that participants adjusted their behavior during movements in response to dynamical changes in target shape. Indeed, we showed that the hand deviation induced by the mechanical perturbations was different in the dynamical (slow and fast) and in the static conditions (no change and switch). Moreover, we reported larger hand deviation for the slow than for the fast condition: indicating that the rate of change in target width was integrated in the control strategy. The differences observed in acceleration profiles and EMG correlates confirmed this finding. The sensitivity of the online adjustments of control policy to dynamical changes and speed of changes suggest the existence of a mechanism able to finely tune to control strategies within movement.

Experiment 2
Although there was, in experiment 1, a significant effect in the LL window, the pairwise comparisons did not allow to conclude that the modulation was as gradual as in the Figure 3. Experiment 1, EMG activity. A, Group mean for the stretched (Pect. Maj., full lines) and shortened (Post. Delt., dashed lines) responses to rightward perturbations in the different target conditions. The gray rectangles represent the LL and VOL epochs where the EMG activity was averaged to perform statistical analyses. The black dashed line represents mechanical perturbation onset and the time axis is aligned with force onset. B, Group mean for the stretched (Post. Delt., full lines) and shortened (Pect. Maj., dashed lines) responses to leftward perturbations in the different target conditions. C, Group mean of target-specific EMG responses to perturbation for Pect. Maj. in the presence of rightward perturbation for the switch (magenta), fast (blue), and slow (green) target conditions. Time axis is aligned with force onset. The small insets represent the mean and SEM target-specific EMG responses at the end of the movement. D, Group mean of targetspecific EMG responses to perturbation for Post. Delt. in the presence of leftward perturbation for the switch (magenta), fast (blue), and slow (green) target conditions. Time axis is aligned with force onset. The small insets represent the mean and SEM target-specific EMG responses at the end of the movement. E, F, Group mean (black) and individual means (gray) of the binned EMG activity in the LL (E) and VOL (F) time windows for Pect. Maj. in the presence of rightward perturbations for the different target conditions. G, H, Group mean (black) and individual means (gray) of the binned EMG activity in the LL (G) and VOL (H) time windows for Post. Delt. in the presence of leftward perturbations for the different target conditions; *p , 0.05, **p , 0.005. VOL epoch. We designed experiment 2 to test the possibility that the shallower modulation in the LL, compared with that in the VOL epoch (see Fig. 4 panels E vs F and G vs H), was because of a too short delay between the onset of visual changes and the mechanical perturbation, which could therefore leave too little time to develop a clear modulation in the LL epoch. In experiment 2, the onset of the visual perturbation was similar as in experiment 1 but that of the mechanical perturbations occurred later (150 ms after the visual onset instead of 100 ms in experiment 1) which allowed more time to adjust control policies as a delay of 150 ms was previously reported between the onset of change in target width and changes in control (De Comite et al., 2021).
The impact of the different target conditions on the behavior was qualitatively similar to that of experiment 1 described in Figure 2. We then investigated the modulation of the EMG activity during experiment 2 in the LL and VOL epochs with the same linear mixed model as in experiment 1. We observed significant modulation in both the LL (Pect. Maj.: b 1 = À0.021 6 0.04, t = À4.43, p , 0.005 and Post. Delt.: b 1 = À0.014 6 0.006, t = À2.09, p = 0.036) and VOL (Pect. Maj.: b 1 = À0.041 6 0.006, t = À6.17, p , 0.005 and Post. Delt.: b 1 = À0.062 6 0.011, t = À5.83, p , 0.005) time epochs during this control experiment. Similar to what was found in experiment 1, we observed a shallower modulation in the LL time epoch than in the VOL epoch indicating that the design of experiment 1 did not unintentionally reduce the modulation of the response in the LL time epoch.
We investigated whether the differences in responses observed between the fast and the slow conditions result from differences in rates of change in target width or from the instantaneous target width at perturbation onset, by comparing the normalized EMG responses observed in Experiments 1 and 2. If the hypothesis whereby these modulations of the feedback responses are mediated by the width of the target at perturbation onset holds, we should observe larger responses in experiment 2 as the mechanical perturbations were triggered later resulting in smaller target width at perturbation onset (see Fig. 4A,B). To test this hypothesis, we grouped the normalized stretch muscle activities, binned in the LL and VOL time epochs, from both experiment and investigated a potential effect of the experiment (see Materials and Methods). We did not observe any differences between the normalized responses of experiments 1 and 2, neither in the LL epoch (b 2 ¼ 0:1260:08, t = 1.58, p = 0.1134; Fig. 4D) nor in the  4F). Since we did not observe differences between the feedback responses across the two experiments, we decided to pool these responses to gain a more robust statistical description of the main effect in the LL and VOL time windows. We grouped the muscle activity of the stretched muscles from both experiments (Fig. 4G for the mean traces) and used linear mixed models that considered target conditions and experiments as fixed factors (see Materials and Methods). We found a main effect of the target condition in both LL (b 1 = À0.03 6 0.003, t = À9.62, p , 0.005) and VOL (b 1 = À0.09 6 0.003, t = À25.61, p , 0.005) epochs. Post hoc pairwise comparisons performed between conditions in these two time epochs also reported differences demonstrating larger feedback responses for more constrained movements (LL: switch vs fast b 1 = À0.024 6 0.029, t = À0.89, p = 0.39, d = 0.02 switch vs slow b 1 = À0.039 6 0.013, t = À2.80, p , 0.005, d = 0.10 and fast vs slow b 1 = À0.063 6 0.028, t = À2.24, p = 0.005, d = 0.07 VOL: switch vs fast b 1 = À0.24 6 0.035, t = À7.04, p , 0.005, d = 0.35, switch vs slow b 1 = À0.19 6 0.017, t = À11.504, p , 0.005, d = 0.21 and fast vs slow b 1 = À0.15 6 0.031, t = À4.89, p , 0.005, d = 0.14).
This second experiment revealed that the modulation of EMG activity in the LL epoch was small but robust and reproducible. We also found across the two experiments, for which the target width at perturbation onset was different, that the responses were very similar. Observe that the perturbation in experiment 2 were triggered a bit later, which potentially increase the response gains (Poscente et al., 2021). Thus, this effect should add to a potential sensitivity to target width. Nevertheless, we found essentially similar normalized EMG despite (slightly) later occurrence and smaller instantaneous width. This result suggests that the underlying neural pathways may consider the speed or rate of change of target width, which is clearly consistent with our hypothesis that continuous change in task parameters modulate control gains dynamically.

Differences between the first and last trials in dynamical conditions
Interestingly, we observed that participant's behavior during the fast and slow conditions changed across blocks. Figure 5A,B represents the mean and SEM of the position along the x-axis for the first (full line) and last trials (dashed line) in the fast condition for rightward and leftward mechanical perturbations, respectively. We observed that these first and last trials differed and decided to take a look at their acceleration profiles to quantify these differences. The corresponding acceleration profiles are represented in Figure 5C,D. We observed a consistent and significant lag of the last trial with respect to the first one. This lag was computed by taking the median of the lags distribution that was obtained from the maximal values of the cross-correlation between the first and last fast trials of the fourteen subjects with 10,000 bootstrap samples (see Materials and Methods). The resulting distribution of this lag, obtained through this resampling method is represented in Figure 5E. This method revealed a median lag of -18 ms (Fig. 5E, blue vertical line) that was significantly smaller than zero (signrank test z = À32.18, p , 0.005).
The first and last trials of each dynamical condition also differed in the smoothness of their acceleration profile as shown in Figure 5C,D for the fast condition. This difference in smoothness was quantified by comparing the integral of the absolute values of the derivatives of these acceleration profiles: the jerk. We reported in Figure 5F,G, these integrals for all participants in the slow and fast conditions, respectively. In the fast condition, the final state was less jerky than the first one as reported by a signrank test (z = À2.835, p , 0.005). Similar results were obtained in the slow condition (signrank test z = À3.19, p , 0.005) indicating an increase in the smoothness of the acceleration profiles.
Thus, there were measurable behavioral changes that could be related to practice; however, they did not interfere with the interaction between target condition and behavior. Indeed, we still observed a significant modulation of the EMG activity in both LL (linear mixed models, b 1 = À0.041 6 0.009, t = À4.21, p , 0.005) and VOL epochs (linear mixed models, b 1 = À0.144 6 0.014, t = À10.11, p , 0.005) when we only considered the last twelve trials of each dynamical condition for all participants.

Discussion
We investigated how humans responded to continuous changes in target width during reaching. More specifically, we studied participants' behavior as they were reaching to a target, initially represented as a wide rectangle, with time varying width. We observed that the way participants responded to unexpected mechanical perturbations depended on the target condition and specifically on the rate of change in target width during movement. This demonstrated that the control policies used to perform reaching movements were adjusted online to the specific change in target width, which captures participants' ability to continuously track and respond to task parameters during movement.
Here, we leveraged an experimental paradigm developed in a previous work (De Comite et al., 2021), consisting of abrupt changes in target structure within movements, to dynamically alter the task constraints and investigate whether participants' control policies were adjusted online. This paradigm exploits the minimum intervention principle (Todorov and Jordan, 2002), which states that participants only correct deviations that interfere with the task success during reaching movements. This means that participants exploit the target redundancy when available, even in the absence of perturbations (Scholz et al., 2000;Vetter et al., 2002;Berret et al., 2011;Knill et al., 2011;Nashed et al., 2012;Togo et al., 2017). The observed behavior and the feedback responses to mechanical perturbations confirmed that these control policies were adjusted during movement. Indeed, we reported modulations induced by the different dynamical changes in target width, corresponding to different alterations of the cost-function and that this mechanism considered the rate of change in target width. In our view, these results demonstrate the existence of a mechanism that adjusts the control policy during movement thanks to a continuous tracking of target width. In the present study, this task-specific adjustment in control relied on visual processing of task-parameters and impacted LL and VOL responses to the mechanical perturbations.
We must emphasize a critical difference between a feedback response to an external perturbation and the results that we highlighted here. In standard perturbation paradigms, visual or mechanical events alter the state of the system, including limb and target position, velocity, and higher order derivatives. These perturbation paradigms allowed showing that the control policy used to perform movement is tuned to the task-goal as demonstrated by the goal-dependent characteristics of the feedback responses to disturbances (Knill et al., 2011;Nashed et al., 2012;Sarlegna and Mutha, 2015;Keyser et al., 2017;Lowrey et al., 2017). These feedback responses are defined by rapid feedback loops (Fig. 6, inner loop, gray) whose latencies depend on the sensory modalities involved (Franklin and Wolpert, 2008;Knill et al., 2011;Pruszynski and Scott, 2012;Scott, 2016). In the case of mechanical disturbances applied to the limb, this inner feedback loop is mediated by LL feedback pathways that have a latency of 50 ms (Pruszynski and Scott, 2012). Here, we probed not only the feedback responses to changes in the state of the system, but also the change in the controller itself in response to dynamic changes in task parameters during movement. Taken in the context of OFC (Todorov and Jordan, 2002;Scott, 2004;Shadmehr and Krakauer, 2008), the task-parameters (such as target width) define the cost-function and the control law (Nashed et al., 2012), which is derived from these cost parameters. We demonstrated that this selection of the control policy based on the task parameters is itself continuous and must be considered in closed loop control models of human reaching movements (Fig. 6, outer loop, black). The latency of this outer loop was ;150 ms as reported in previous work (De Comite et al., 2021), here we used this number to design the task, and highlighted that indeed dynamic changes This interpretation implies a possible overlap of movement planning and execution, as participants may alter their motor plan during an ongoing movement. Such overlap of planning and execution has been suggested in studies reporting that reaction times before movement initiation could be shortened at the price of a reduced accuracy (Haith et al., 2016;Orban de Xivry et al., 2017). More recently, it has been suggested that such overlap of planning and execution could occur during movement in presence of visual perturbations (Dimitriou et al., 2013;Franklin, 2020, 2021). However, these results could also be explained by other mechanisms such as an infinite horizon controller (Li et al., 2018). In our previous study, we provided clear evidence for this overlap of movement planning and execution in presence of perturbations that altered the cost-function from which the control policy is derived (De Comite et al., 2021). This overlap is also necessary to explain the present results as participants have to continuously adjust their control strategy in response to the change in target width.
These continuous adjustments of control policy are reminiscent of the theoretical framework of model predictive control (for review, see Lee, 2011). This framework posits that the control policy is continuously adjusted during movement to integrate any change in the cost-function or in the environmental context. An alternative hypothesis to explain these online adjustments in control policy is that participants switched between several prespecified control policies. A similar process was suggested to account for the selection of the most appropriate strategy specified in parallel (Chapman et al., 2010;Gallivan et al., 2016Gallivan et al., , 2017Wong and Haith, 2017) but was questioned and compared with a single optimal intermediate motor plan Alhussein and Smith, 2021). Our experimental paradigm differed as multiple options were never presented at the same time and changes between targets occurred within movements.
We favor an interpretation that assumes dynamical adjustments of the control strategy although we cannot formally rule out the possibility of discrete switches between different predefined controllers. However, there are observations that do plead for a continuous and dynamic monitoring. First, we observed that the first slow and fast trials where different, although participants had not encountered these conditions during training and therefore could not have acquired a controller tuned to these specific conditions at that time. This was observed despite the fact that these first trials were jerkier than the later ones, which could indicate that even when participants had not familiarized with the dynamical conditions, they seemed to exploit well the outer feedback loop as input to the controller relative to target structure (Fig. 6, outer loop). One caveat to the hypothesis of continuous monitoring was that the normalized feedback responses across experiments 1 and 2 (Fig. 4) did not change much with longer viewing time, which suggests that there may be constraints on the amount of modulation that can take place. Nevertheless, this absence of modulation of the feedback responses within movement must be interpreted with caution because other factors such as time or urgency also modulate these feedback responses within movement (Crevecoeur et al., 2013;Dimitriou et al., 2013;Poscente et al., 2021).
A parallel can be drawn between the present study and a series of studies that reported within-trials tuning of feedback corrections when exposed to velocity-dependent force fields randomly (Crevecoeur et al., 2020a,b;Mathew et al., 2020;Mathew and Crevecoeur, 2021). It proposed that continuous tracking of model parameter also happens during movement, suggesting that adaptation to an altered plant dynamics also happens online (Crevecoeur et al., 2020b). In the present study, we reported the online tracking of cost parameters that define the movement goal. Interestingly, besides the conceptual similarity between these two processes linked to online evaluation of task or dynamical parameters, they were associated to different latencies: ;150 ms for updating the control policies following changes in movement goal (based on the present and on De Comite et al., 2021), while a latency of ;250 ms was associated with the online tuning of the feedback controller (Crevecoeur et al., 2020b). It is therefore conceivable that they engage dissociable neural operations that remain to be investigated.
An interesting question is to determine which neural substrates are involved in the online modulation of the reaching controller. Sensorimotor control is mostly supported by multiple cortical areas, the basal ganglia, and the cerebellum (Shadmehr and Krakauer, 2008;Scott, 2016;Haar and Donchin, 2020). We identify two neural  Figure 6. Modified goal-directed feedback control architecture. The inner loop (in gray), combining sensory feedbacks and efference copy is the one responsible for the goal-directed feedback responses observed behaviorally. The latency of this inner loop is 50-100 ms depending on the sensory modalities involved. The outer feedback loop (in black) that modifies the task definition and the feedback control policy captures the online adjustments in control policy elicited by continuous alteration of the target structure during movement.
pathways that likely underlie the online changes in behavior documented in the present study. The first is that the parametric feedback controller supported by the LL feedback (Pruszynski and Scott, 2012;Crevecoeur and Kurtzer, 2018) is modulated online. Such modulation must have occurred based on visual input which has a fast route to the network supporting LL responses through associative areas in the parietal cortex (Cross et al., 2021). The second pathway that is likely involved is related to the definition of the task demands, which includes the basal ganglia known to represent motor costs (Mazzoni et al., 2007;Shadmehr and Krakauer, 2008). Since our task paradigm altered the motor costs by modulating the target width, it is conceivable that the adjustment in control policy depended on a signal originating from the basal ganglia inducing the change of controller or a selection of a different controller. Note that this cannot happen independently of the visual input, and thus any interaction between rapid feedback pathways conveying information about the target structure, and selection of controllers based on a representation of motor costs in basal ganglia may support the observed change in behavior.
To sum up, we reported here that humans are able to dynamically adjust their control policy when they experience a dynamical change in task demands. These findings highlight the existence of a continuous monitoring of task-related parameters which supports dynamic changes in online feedback control.