Evolution of Gross Forelimb and Fine Digit Kinematics during Skilled Reaching Acquisition in Rats

Abstract The ability to learn dexterous motor skills is a fundamental aspect of human behavior. However, the underlying neural circuit mechanisms for dexterous skill learning are unclear. Advancing our understanding of motor skill learning requires the integration of modern neuroscientific techniques with a rigorously characterized dexterous task. The development of automated rodent skilled reaching with paw tracking allows detailed analysis of how reach-to-grasp kinematics evolve during learning. We assessed how both “gross” forelimb and “fine” digit kinematics changed as rats learned skilled reaching. Rats whose success rates increased (learners) consistently reduced the variability in their reach trajectories. Refinement of fine digit control generally continued after consistency in gross hand transport to the pellet plateaued. Interestingly, most rats whose success rates did not increase (non-learners) also converged on consistent reach kinematics. Some non-learners, however, maintained substantial variability in hand and digit trajectories throughout training. These results suggest that gross and fine motor components of dexterous skill are, on average, learned over different timescales. Nonetheless, there is significant intersubject variability in learning rates as assessed by both reaching success and consistency of reach kinematics.


Introduction
Dexterous motor skills, or coordinated multijoint and digit movements, are acquired through practice and essential to normal daily function. However, the neural mechanisms of motor skill learning are poorly understood. Such knowledge is important for improving de novo skill acquisition, understanding disorders of motor learning (e.g., dystonia), and optimizing rehabilitation after neurologic injury (e.g., poststroke). To determine mechanisms of motor learning at the neuronal/circuit level, animal models of skill acquisition are needed. Such models allow the application of modern circuit dissection techniques, which are especially well developed in rodents.
Rodent skilled reaching is a common paradigm for studying dexterous skill learning. Rodent reach-to-grasp movements are strikingly similar to those of humans (Klein et al., 2012). This is true when comparing well-trained rats to healthy human subjects or rodent disease models (e.g., dopamine-depleted or cortically lesioned rats) to humans with motor system dysfunction (e.g., Parkinson's disease or cortical strokes; Miklyaeva et al., 1994;Whishaw et al., 1994;Klein et al., 2011Klein et al., , 2012. Therefore, this task is well suited for studying the neural circuit mechanisms that regulate dexterous motor control under "normal" conditions and in disease states. Furthermore, rodents are not naturally adept at this task, but gradually improve over several training sessions . The rodent skilled reaching task is therefore useful for studying motor skill learning as well as performance. However, how reachto-grasp movements evolve as rats learn this task has not been carefully characterized. Learning in the skilled reaching task has typically been assessed by success rate (i.e., whether the pellet was grasped or not). Therefore, studies could describe if, but not how, reach-to-grasp movements changed. To address this, a semi-quantitative movement element rating system was developed, wherein each submovement of the reach is evaluated as "normal," "abnormal," or "absent" Alaverdashvili and Whishaw, 2010). This system is useful for assessing how experimental manipulations (e.g., targeted lesions) affect reaching movements in well-trained animals, but is less informative for learning. Additionally, it still only provides information on if an individual submovement is abnormal or missing, not how the submovement changed. Finally, this scoring system is time-intensive and subjective, as it requires experimenters to evaluate twelve movement elements individually for each video.
Recent studies have leveraged automated skilled reaching systems and position tracking software to assess changes in reach-to-grasp kinematics over learning with greater detail. Rats' forelimb trajectories became less variable and the duration of reaching movements decreased as success rate increased. However, success rate continued to improve even once these measures of "gross" forelimb kinematics had plateaued (Li et al., 2017;Lemke et al., 2019). Therefore, it appears that "fine" motor components (i.e., individual digit control) evolve on a different timescale than the "gross" motor components of reach-to-grasp movements. However, neither of these studies directly measured changes in fine digit control. It is, therefore, unknown exactly how individual digit control changes as rats learn to perform skilled reaching.
In this study, we assessed how "gross" and "fine" motor components of reach-to-grasp movements evolve as rats learn skilled reaching. By using DeepLabCut (Mathis et al., 2018;Nath et al., 2019) to track individual digit joints and hands, we measured changes in both forelimb and digit kinematics. Consistent with the above findings, we show that for rats whose success rates improve with training, the control of fine digit movements evolves on a longer timescale than that of gross forelimb movements. However, there was significant variability among rats in the timing and magnitude of changes in both gross hand movements and fine digit movements. These results suggest that gross and fine movements are refined over different timescales to improve performance.

Animals
All animal procedures were performed in accordance with the University of Michigan Institutional Animal Care and Use Committee regulations. Male (n = 9) and female (n = 5) tyrosine hydroxylase-Cre 1 (n = 9) and wildtype (n = 5) rats were housed in groups of two to three on a reverse light/dark cycle. All behavioral training was conducted during the dark phase. Food restriction was imposed on all animals during the training period for no more than 6 d in a row such that rats' weights were kept ;85-90% of their free-feeding weight. Rats were only tested on food restricted days. Water was available ad libitum in their home cages. Data collected after seven of these rats were well trained have been reported previously. However, none of the "early learning" kinematics described here have been reported previously.

Skilled reaching Automated reaching system
Training was conducted in custom-built skilled reaching chambers housed within soundproof, ventilated cabinets. Custom LabVIEW software controls the experiment. The chambers were rectangular prisms with a reaching slot (1.1 Â 7 cm) cut in the front panel 3.5 cm from the floor and a photobeam (HoneyWell) at the back of the chamber (Fig. 1A). Before each session, a linear actuator (Creative Werks Inc) was positioned so that the pellet delivery rod was aligned with the right or left edge of the slot according to each rat's hand preference 15 mm from the front of the reaching slot. Each training session began with the pellet delivery rod in the "ready" position, part of the way between the bottom of the reaching chamber and the reaching slot. Rats "requested" a pellet by breaking the photobeam, which caused the pellet delivery rod to rise to the bottom of the reaching slot. Mirrors angled at either side of the front of the reaching chamber allowed side views of the hand during reaches. When the reaching hand passed the front plane of the chamber into the "region of interest" in the side mirror view (Fig. 1A, red box) and surpassed the minimum threshold of pixel intensity, video acquisition was triggered, time-stamped, and labeled with the trial number. Two seconds after the video was triggered, the pellet delivery arm lowered into the pellet funnel to pick up a new pellet and then reset to the "ready" position, allowing the rat to initiate a new trial.
Videos were recorded at 300 frames/s and 2400 Â 1024 pixels by a high-definition color camera (acA2000-340kc, Basler) mounted in front of the reaching slot. A cameralink field-programmable gate array (FPGA) frame-grabber card (PCIe 1473R, National Instruments) acquired the images, and an FPGA data acquisition (DAQ) task control card (NI PCIe 7841R) provided an interface with the behavior chamber. The real-time FPGA card detected pixel intensity changes within a "region of interest" in front of the reaching slot in the side mirror views (Fig. 1A), allowing videos of the reaching event ("video trigger") to be captured. Video frames were stored in RAM in a rolling buffer. When the trigger event occurred, 300 frames before the trigger event and 1000 frames posttrigger were saved.

Pretraining
"Pretraining" consists of familiarizing the rats with the reaching chamber, evaluating them for hand preference, training them to reach for the linear actuator, and training them to request a pellet by moving to the back of the chamber. A week before pretraining, rats were placed on food restriction and introduced to the sucrose reward pellets in their home cages. On day 1 of pretraining, piles of five pellets each were placed in the front and rear of the skilled reaching chamber to encourage exploration of the entire chamber. Once rats ate these pellets, they were evaluated for hand preference.
Rats were allowed to eat three pellets (held in forceps through the reaching slot) with their tongues. The experimenter then began to pull the pellet away from the rat so that it could not be obtained by licking. Therefore, the rat was forced to reach with its hand to retrieve the pellet. Hand preference was assigned to the hand used for the majority of the first eleven reaches. Once hand preference was determined, animals were trained to reach for the pellet delivery rod. As the rat reached, the experimenter pulled the forceps back so that the rat's hand would extend to a pellet on the delivery rod. Once rats reached for the delivery rod 10 times without being baited by the experimenter, they began training to request pellets. (1) Rat breaks IR beam at the back of the chamber to request a sugar pellet ("beam break"). (2) Real-time analysis detects the hand breaching the reaching slot to trigger 300-fps video from 1 s before to 3.33 s after the trigger event ("video trigger"); (3) 2 s after the trigger event, the pellet delivery rod resets and the rat can initiate a new trial ("intertrial interval"). B, Average number of trials per day. Learners (green) and non-learners (pink) did not differ in the number of trials performed per day (linear mixed model: effect of group: t (28) = 0.49, p = 0.63). However, both groups increased the number of trials performed over days (linear mixed model: effect of day: t (124) = 3.58, p = 4.88 Â 10 À4 ; group Â day interaction: t (124) = À0.47, p = 0.64). Black line represents the average number of trials per day for both groups combined. C, Average first attempt success rate per day. Although both groups had similar success rates on day 1, only learners increased their success rates over days (linear mixed model: effect of group: t (25) = 0.44, p = 0.66; effect of day: t (123) = 5.55, p = 1.70 Â 10 À7 ; group Â day interaction: t (123) = À4.78, p = 4.98 Â 10 À6 . Error bars in B, C represent SEM. Extended Data Figure 1-1 shows individual rat data for each group and additional performance measures (number of attempts per trial and breakdown of reach outcomes across days); ***p , 0.001 for the day term in the linear mixed model in B and ###p , 0.001 for the group Â day interaction in the linear mixed model in C. Data and code to generate this figure are contained in Extended Data 1, 2.
Rats began training in the center of the chamber with the pellet delivery rod set to the "ready" position. The experimenter placed a pellet in the rear of the chamber to bait the rat to break the rear IR beam, causing the delivery rod to rise so that the rat could move to the front and reach for the pellet. This was repeated until the rat began to quickly move to the front of the chamber to reach for the pellet after breaking the IR beam. At this point, the experimenter would stop baiting the rat to the rear of the chamber. Pretraining was complete once the rat requested a pellet and then immediately moved to the front to reach for the pellet 10 times.

Training
Once pretrained, rats began 30-min training sessions with the automated system. Rats were trained for one session per day for 10 d, with 1 d off when rats were taken off food restriction. The day number during training when the break occurred was varied (e.g., not all rats had a break between days 5 and 6) to minimize effects of a day off on skilled reaching performance.

Analysis of skilled reaching data
Analyses were performed using custom-written scripts and functions in MATLAB 2019a (MathWorks) on a MacBook Pro (2018) with macOS Mojave (version 10.14.6; Apple).

Performance outcomes
Reach outcome was scored by visual inspection as follows: 0, no pellet presented or other mechanical failure ("no pellet"); 1, obtained pellet on initial limb advance ("first success"); 2, obtained pellet, but not on first attempt ("multiple success"); 3, forelimb advanced, pellet was grasped then dropped in the box ("drop in box"); 4, forelimb advance, but the pellet was knocked off the shelf ("pellet knocked off"); 5, pellet was obtained using its tongue ("tongue"); 6, the rat approached the slot but retreated without advancing its forelimb or the video triggered without a reach ("trigger error"); 7, the rat reached, but the pellet remained on the shelf ("pellet remained"); 8, the rat used its non-preferred hand to reach ("non-preferred hand"); or 9, used preferred hand after obtaining or moving pellet with tongue ("tongue and hand"). An additional outcome evaluated by kinematic analysis was defined as videos which began with the rat's hand through the slot (i.e., the video triggered late; "hand through slot"). Outcome percent was calculated by dividing the number of trials of each outcome by the total number of trials per training day (Extended Data Fig. 1-1D). For comparisons of kinematics between successful and failed reaches, successes were defined as "first success" (1) and failed reaches were defined as "pellet knocked off" (4) or "pellet remained" (7).
First reach success was calculated for each training day by dividing the total number of scores of 1 by the total number of trials (sum of scores of 1, 2, 3, 4, and 7; Fig.  1C; Extended Data Fig. 1-1B). Success rate was not calculated for individual days if ,10 trials were performed (only one such instance occurred).
Rats were divided into learners and non-learners based on the change in their performance between the first two training days and last two training days. The proportion of first success reaches from days 1 and 2 combined were compared with the proportion of first success reaches from days 9 and 10 combined using a x 2 test. Rats with p , 0.05 whose success rate increased between days 1-2 and 9-10 were considered learners (n = 4); all others were considered non-learners (n = 10). All subsequent analyses of reach-to-grasp kinematics were performed with rats divided into learners and non-learners, with averages across all rats also shown.
To assess how success rate changed within individual sessions, a moving average was calculated as the fraction of "1" (first success) scores in a moving block of 10 reaches. For averages within a group, the last data point for each individual was carried forward to the maximum number of reaches for any rat on that training day. This avoided sudden changes in the average caused by dropout (Fig. 2D).

3D reconstruction of reach trajectories
Bodyparts/objects identified in the direct and mirror views were triangulated to 3D points using custom MATLAB software. Before each training session, several images of a cube with checkerboards (4 Â 4 mm squares) on its sides were taken so that the checkerboards were visible in the direct and mirror views. These images were used to determine the essential matrix relating the direct and mirror views, which was used to determine how the real camera and "virtual" camera behind the mirror were translated and rotated with respect to each other (Hartley and Zisserman, 2003). By assuming a 3D coordinate system centered at the camera lens with the z-axis perpendicular to the lens surface, camera matrices were derived for the real and virtual cameras. These matrices were used to triangulate matching points in the camera and mirror views using the MATLAB triangulate function in the Computer Vision toolbox. 3D points with large reprojection errors were excluded from the analysis, which could happen if an object was identified accurately in one view but misidentified in the other. The coordinate system was set with the pellet at the origin, positive x to the right of the pellet, positive y above the pellet, and positive z on the other side of the pellet from the reaching chamber (Fig. 3A).

Processing reach kinematics
To place reach kinematics in a common reference frame, the pellet location before reaching was identified and set as the origin. For left-handed reaches, x-coordinates were negated to allow direct comparison with righthanded reaches. The initial reach on each trial was identified by finding the first frame in which digits were visible outside the box (Fig. 2B), and then looking backwards in time until the hand started moving forward. The end of a reach was defined as the frame at which the tip of the second digit began to retract; multiple reaches could be counted in a single trial (though only first reaches are analyzed here). The start of the grasp was defined as the frame at which flexion of the second digit started to increase after reaching minimum flexion (i.e., maximum Figure 2. Refinement of "gross" forelimb movements. A, All 3D reaching trajectories on days 1 and 10 from an exemplar rat. Colored lines represent individual trials and black lines represent average trajectories of the hand and digit tips. Sugar pellet (black dot) is at (0,0,0). B, Hand trajectories are separated into "reach" and "grasp" components. The reach begins when the digits are first visible outside the reaching box and ends when the second digit begins to retract. The grasp begins when the second digit begins to flex and ends when the second digit is maximally flexed after grasp start. C, Average hand trajectory variability for the reach and grasp components represented as the mean distance from the average trajectory (mm). Trajectory variability did not differ significantly between learners and non-learners for either component (linear mixed model: effect of group: reach: t (23) = 0.53, p = 0.60; grasp: t (26) = À0.08, p = 0.94; group Â day interaction: reach: t (124) = 0.37, p = 0.71; grasp: t (124) = 1.08, p = 0.28). Trajectory variability of the grasp, but not reach, decreased significantly over days for both groups (linear mixed model: effect of day: reach: t (124) = À1.85, p = 0.07; grasp: t (124) = À2.75, p = 6.91 Â 10 À3 ). Individual rat data are shown in Extended Data Figure 2-1. D, Moving average of hand trajectory variability for the reach component (black) and first attempt success rate (green learners and pink non-learners) within individual days. Days 2, 4, 5, 6, and 8 had significant negative correlations between trajectory variability and success rate for the learners group (day 2: r = À0.55, p = 1.70 Â 10 À3 ; day 4: r = À0.68, p = 4.25 Â 10 À5 ; day 5: r = À0.53, p = 2.50 Â 10 À3 ; day 6: r = À0.82, p = 2.16 Â 10 À8 ; day 8: r = À0.70, p = 1.51 Â 10 À5 ). Days 1, 4, 6, 8, and 10 had significant negative correlations between trajectory variability and success rate for the non-learners group (day 1: r = À0.88, p = 1.78 Â 10 À10 ; day 4: r = À0.67, p = 4.33 Â 10 À5 ; day 6: r = À0.38, p = 0.04; day 8: r = À0.42, p = 0.02; day 10: r = À0.83, p = 1.53 Â 10 À8 ). E, Average reach duration for learners (green) and non-learners (pink). Linear mixed model: effect of group: t (21) = 0.29, p = 0.78; effect of day: t (124) = À1.03, p = 0.31; group Â day interaction: t (124) = À0.11, p = 0.91. Individual rat data are shown in Extended Data Figure 2-1. F, Average maximum reach velocity (mm/s) for learners (green) and non-learners (pink). Linear mixed model: effect of group: t (15) = 0.16, p = 0.87; effect of day: extension). Grasp end was defined as the first frame with maximum digit flexion after grasp start. The end of a reach and beginning of a grasp could overlap if rats started to flex their digits as the hand was still advancing. However, this was rare and the number of overlapping frames was small.
"Digit flexion" was calculated as the angle between a line connecting the second metacarpophalangeal (MCP) joint and the second digit tip; and a line connecting the hand dorsum and the second MCP joint (Fig. 4K). "Aperture" was calculated as the Euclidean distance between the tips of the first and fourth digits (in frames for which both were visible or could be estimated based on epipolar geometry; Fig. 4C). "Orientation" was calculated as the angle between a line connecting the first and fourth digits and a horizontal line in the direct camera view (for left-handed rats, orientation was calculated using the negated x-values to compare with right-pawed rats; Fig.  4G). Reach velocity was calculated as the Euclidean distance between the dorsum of the reaching hand in consecutive frames divided by the interframe interval (1/300 s). Reach duration was calculated as the number of frames between when the hand first passes through the reaching slot and reach end (defined above), divided by 300 fps (video frame rate).
The 3D trajectories of the hand and digit tips were interpolated across 100 evenly spaced points from reach start to reach end or from grasp start to grasp end using the function interparc (version 1.3.0.0 from the MATLAB file exchange) with the piecewise cubic Hermite polynomials (pchip) method. This created evenly-spaced trajectories that could be analyzed independently of reach velocity. In other words, slow and fast reaches tracing the same trajectory have matched corresponding points in the interpolated trajectories. Variability of hand and digit trajectories within each session was calculated as the mean distance of each interpolated trial trajectory from the average interpolated session trajectory (Figs. 2C,D, 4A,B): where D is the mean distance from the mean trajectory, the X i are the ith 3D points along each interpolated trial trajectory, the X i are the ith 3D points in the interpolated session trajectory, and n is the number of trials.
To quantify hand and digit 3D endpoint position variability, we used the generalized variance, which provides a single measure of the dispersion of points in multiple dimensions (Figs. 3D). Covariance matrices were calculated for the 3D paw/digit endpoints from each trial in each session. The determinant of these covariance matrices defines the generalized variance for reach endpoints.
To determine whether differences in endpoint position variability between the hand and digit tips were because of inconsistencies in DeepLabCut labeling, we compared the number of frames from reach start to grasp end with mislabeled hand and digit tip points on training days 1 and 10 (Fig. 3E). Points were first evaluated as correctly or incorrectly labeled based on confidence values generated by DeepLabCut. Points assigned confidence values above 0.97 were deemed correctly labeled, while points assigned confidence values below 0.85 were deemed mislabeled. Points with confidence values between 0.85 and 0.97 were evaluated based on the distance they moved between consecutive frames. If the distance moved between frames surpassed 50 pixels, the point was deemed mislabeled. Importantly, these criteria were used throughout the kinematic analysis so that invalid points were not included in the reconstructed 3D trajectories.

Within-session kinematics
To assess how reach kinematics (i.e., aperture, paw orientation, and digit flexion) changed within individual sessions, moving averages were calculated across blocks of 10 trials. To average these data across rats, the last value calculated in each session was carried forward until sessions had the same number of trials. For example, if rat A performed 35 trials and rat B performed 40 trials, the last value calculated for rat A (the average of trials 26-35) would be carried forward as the average of "trials" 27-36, 28-37, ... 31-40 so that rats A and B would have the same number of trials. This avoided sudden changes in the average caused by rats performing different numbers of trials ( Fig. 4E,I,M). Within-session changes in trajectory variability were calculated as described above, (see Processing reach kinematics) but average trajectories in moving blocks of 10 trials were used as the reference, rather than the average of all trials in a session (Figs. 2D, 4B).

Analysis of reach-to-grasp coordination
To monitor aperture, hand orientation, and digit flexion as a function of the z-coordinate of the tip of the second digit (z digit2 ; Fig. 5), the first reach of each trial was isolated. The 3D trajectory of each digit tip for the initial reach was interpolated using piecewise cubic Hermite polynomials (pchip in MATLAB) so that the 3D location of each digit was estimated for z digit2 = À20.0, À19.9, À19.8... 114.9, 115.0 mm from the pellet (positive numbers are past the pellet, negative numbers as the hand approaches the pellet). This allowed us to average aperture, orientation, and digit flexion as a function of hand advancement (assessed by z digit2 ). Points at z digit2 values missing from shorter reaches were excluded from the average. To compare the evolution of aperture, hand continued Figure 3. Reach endpoint accuracy improves with training. A, X, Y, and Z reach endpoints are measured in reference to the pellet position. Top, Mirror view of the dorsal surface of the hand. Bottom, Direct view from the camera. The inset shows the mirror view of the palmar surface of the hand to demonstrate how reach endpoint of the second digit tip in the Z direction (z digit2 ) is measured. The end of the reach is defined as the moment z digit2 begins to decrease (the digit tip moves back toward the box). B, Average reach endpoints of the hand (left) and digit 2 (right) in the X, Y, and Z directions. Pellet is at (0,0,0). Reach endpoint in the X direction increased over days for learners but not non-learners. Linear mixed model (hand): effect of group: t (18) = 1.30, p = 0.21; effect of day: t (124) = 2.13, p = 0.04; group Â day interaction: t (124) = À2.40, p = 0.02; digit 2: effect of group: t (17) = 1.80, p = 0.09; effect of day: t (124) = 3.12, p= 2.25 Â 10 À3 ; group Â day interaction: t (124) = À3.16, p = 1.97 Â 10 À3 . Reach endpoint of the hand in the Y plane decreased with training for both groups. Linear mixed model (hand): effect of group: t (18) = À0.94, p = 0.36; effect of day: t (124) = À2.45, p = 0.02; group Â day interaction: t (124) = 1.45, p = 0.15; digit 2: effect of group: t (19) = À0.90, p = 0.38; effect of day: t (124) = À1.80, p = 0.07; group Â day interaction: t (124) = 1.16, p = 0.25. Reach endpoint in the Z plane did not change significantly with training for either group. Linear mixed model (hand): effect of group: t (23) = À0.38, p = 0.71; effect of day: t (124) = 1.73, p = 0.09; group Â day interaction: t (124) = À1.23, p = 0.22; digit 2: effect of group: t (19) = À0.63, p = 0.53; effect of day: t (124) = 0.99, p = 0.33; group Â day interaction: t (124) = À0.58; p = 0.57. Extended Data Figure 3-1 shows individual rat data. C, 3D covariance of hand and digit reach endpoints across all days from an exemplar rat. Sugar pellet (*) is at (0,0,0). D, Average determinant of the covariance matrix (generalized variance) of reach endpoints for the hand and digits. Top row shows endpoint variability of "raw" digit and hand position data orientation, and digit flexion across the 10 training sessions, we compared digit aperture, paw orientation, and digit flexion at z digit2 = 13 mm (Fig. 5B,D,F).

Statistics
See Table 1 for a detailed description of statistical analyses. For Figures 1-5, linear mixed-effects models were used to evaluate whether performance outcomes and kinematics changed over training days and/or differed between learners and non-learners. We implemented linear mixed-effects models (using R lmer) with random intercepts/effects for each rat and fixed effects of training day and group (interaction). Linear mixed-effects models included averages for all 10 training days for all rats. To assess whether reach kinematics differed between successful and failed reaches, we implemented linear mixed-effects models (using R lmer) with random intercepts/effects for each rat and fixed interaction effects of training day and outcome. Separate linear mixed-effects models were run for each group (Fig. 6).

Code accessibility
The code described in this paper is freely available online at https://github.com/LeventhalLab/Bova_etal_eNeuro_ 2021.git. The code is available as Extended Data 1.

Results
Rats were trained for 10 d of 30 min sessions in an automated skilled reaching task in which high-definition, highframe rate video is recorded from multiple angles. Trials started with rats breaking a photobeam at the back of the chamber, which caused a pellet to be delivered in front of the reaching slot (Fig. 1A). Rats could make multiple reaches until the pellet delivery arm descended 2 s after the video trigger event. Rats showed significant improvement in skilled reaching performance, as first reach success rate gradually increased (Fig. 1C). However, there was a high level of variability in learning rates. Some rats reached asymptotic performance quickly, while others did not improve at all (Extended Data Fig. 1-1B). Therefore, we divided rats into learners and non-learners based on whether their performance improved significantly from days 1 to 2 to 9 to 10 (see Materials and Methods, Performance outcomes; Table  2). Although success rate was similar on training day 1, performance improved significantly over days for learners, but not non-learners. Learners showed a significant reduction of trials in which the pellet was knocked off of the pellet delivery rod over training days, whereas this outcome did not change significantly over days for non-learners (Extended Data Fig. 1-1D). Of the other trial outcomes (e.g., dropped pellet in box, pellet remained on pedestal; see Materials and Methods, Performance outcomes), only multiple success outcomes, where rats did not successfully retrieve the pellet on the first reach attempt but did on a subsequent attempt, decreased significantly for learners. Learners seemed to convert "multiple reach successes" and "pellet knocked off" trials into "first reach successes." Importantly, rats in the two groups performed similar numbers of trials over the 10 training days, indicating that the difference between learners and non-learners was not in practice or motivation ( Fig. 1B; Extended Data Fig. 1-1A).
Research Article: New Research between rats in how quickly trajectory variability changed (Extended Data Fig. 2-1A,B). Changes in hand trajectory variability and first reach success rate within individual sessions often occurred in tandem, such that when success rate increased, trajectory variability decreased and vice versa (Fig. 2D). Reaches also became faster with more training in both groups. The average reach duration decreased (Fig. 2E), while maximum reach velocity increased (Fig. 2F).
As hand trajectories stabilized, rats also improved their reach endpoint accuracy. In the first two training days, reaches terminated at positions above and short of the pellet for learners and non-learners (Fig. 3A,B). However, by the third session, reaches became lower and longer, with reach endpoints in the Y and Z directions stabilizing for the remaining sessions.
Reach endpoint variability of the hand and digits decreased considerably for both groups (Fig. 3C,D; Extended Data Fig. 3-2). One potential explanation for the decrease in digit endpoint variability is that they are attached to the hand. To determine whether digit endpoint variability decreased more than expected from the decrease in hand endpoint variability, we calculated digit location with respect to the hand (i.e., 3D hand  . Kinematic measures separated by reach success or failure. A, Average hand trajectory variability (mean distance from the average trajectory, mm) for the reach component in learners (green) and non-learners (pink). Trajectory variability was significantly higher for failed reaches ("miss," black) than successful reaches ("hit," green/pink) for non-learners, but not for learners. Linear mixed model (learners): effect of outcome: t (72) = 1.26, p = 0.21; outcome Â day interaction: t (72) = À0.46, p = 0.64. Linear mixed model (non-learners): effect of outcome: t (183) = 4.44, p = 1.56 Â 10 À5 ; outcome Â day interaction: t (183) = À1.25, p = 0.21. Individual rat data are shown in Extended Data Figure 6-1. B, Average hand trajectory variability (mean distance from the average trajectory, mm) for the grasp component in learners (green) and non-learners (pink). Trajectory variability was significantly higher for failed reaches ("miss," black) than successful reaches ("hit," green/pink) for both groups. Linear mixed model (learners): effect of outcome: t (72) = 4.77, p = 9.34 Â 10 À6 ; outcome Â day interaction: t (72) = À2.23, p = 0.03. Linear mixed model (non-learners): effect of outcome: t (183) = 5.58, p = 8.47 Â 10 À8 ; outcome Â day interaction: t (183) = À1.02, p = 0.31. Individual rat data are shown in Extended Data Figure 6-1. C, Average reach endpoint of digit 2 in the X, Y, and Z directions for learners (top) and non-learners (bottom). Reach endpoint in the X direction was significantly different for successful (green/pink) versus failed (black) reaches for learners but not non-learners. Linear mixed model (learners): effect of outcome: t (72) = À2.31, p = 0.02; outcome Â day interaction: t (72) = 1.94, p = 0.06. Linear mixed model (non-learners): effect of outcome: t (185) = À1.53, p = 0.13; outcome Â day interaction: t (185) = À0.03, p = 0.97. Reach endpoint in the Y direction did not differ between successful and failed reaches for either group. Linear mixed model (learners): effect of outcome: t (72) = À0.95, p = 0.35; outcome Â day interaction: t (72) = 0.23, p = 0.82. Linear mixed model (non-learners): effect of outcome: t (185) = À1.49, p = 0.14; outcome Â day interaction: t (185) = 0.77, p = 0.44. Failed reaches were significantly shorter (Z direction) than successful reaches for both groups, although the Z endpoint did not differ between successful location was subtracted from the digit position; Fig. 3D, bottom row). As expected, digit endpoint variability was much smaller when corrected for hand position. While only changes in digit 2 endpoint variability achieved significance, there was a consistent decrease in digit endpoint variability out to day 10 for the learner group. Therefore, variability in digit endpoint location is largely accounted for by variability in hand trajectory, but additional refinement of digit movement may account for some of the differences in reach success across rats.
To evaluate whether decreases in endpoint variability could be artifacts of inconsistencies in DeepLabCut labeling, we calculated the percentage of frames from reach start to reach grasp that were mislabeled for each part on days 1 and 10 ( Fig. 3E; see Materials and Methods, Processing reach kinematics). More mislabeled frames early in training would suggest that the apparent decrease in variability across sessions is really because of improved DeepLabCut reliability. For the most part, the number of mislabeled frames was consistent between days 1 and 10. There were significantly more mislabeled frames on day 10 compared with day 1 for digit 4, but if anything, this should artificially increase endpoint variability on day 10.
The orientation of the paw at reach end (u in Fig. 4G) also changed over the 10 training sessions, such that the paw became less pronated (Fig. 4H,I). Accordingly, the variability of the paw angle at reach end [measured by mean resultant length (MRL)] decreased over learning (Fig. 4J). Finally, the degree of digit flexion decreased over learning (d in Fig. 4K), so that the digits were more extended at the end of the reach in later sessions (Fig. 4L,  M). Again, the variability (MRL) in digit flexion across trials also decreased with training. Thus, learning to shape the paw and digits in preparation for grasping appears to require more practice than learning to efficiently guide the paw toward the pellet.
The reach-to-grasp movement not only requires rats to spread and extend their digits, orient their paw, and grasp, but that these digit movements be properly timed with respect to paw advancement. Therefore, we evaluated how the coordinated execution of paw and digit shaping changed by plotting digit aperture, paw orientation, and digit flexion as a function of paw advancement toward the pellet. To quantify changes in forelimb-digit coordination, we tested differences in these quantities when the tip of digit 2 was 3 mm past the pellet. Overall, there was a suggestion that rats in both groups spread their digits (Fig. 5A,B), pronated their paws (Fig. 5C,D), and extended their digits (Fig. 5E,F) earlier along the reach trajectory with training. However, only the changes in hand orientation achieved statistical significance in the linear mixed model.

Discussion
Differences in skilled reaching success across rats are partially, but not completely, explained by changes in forelimb and digit kinematics. Rats that improved their reaching performance as assessed by increased success rate (learners) converged on consistent gross forelimb kinematics, after which they continued to refine fine digit movements. However, individual rats may refine different aspects of reaching in different orders. Most non-learners also adapted their gross reach kinematics at a rate similar to learners, though some maintained a high degree of gross kinematic variability throughout training. This further emphasizes the importance of measuring forelimb kinematics, and not just success rates, when assessing the effects of interventions on skilled reaching. We found two subtle, but important, differences between learners and non-learners. First, learners showed continued decreases in digit endpoint variability (even accounting for variability in hand position). Second, there was more persistent variability in failed reach trajectories for nonlearners.
In general, learning curves were shallower and success rates lower for our rats compared with similar tasks (Alaverdashvili and Whishaw, 2008;Wong et al., 2015;Lemke et al., 2019). This could be because of relatively low trial counts in early sessions, possibly related to the transition from manual pretraining to the automated task. Even after well over 100 trials, however, success rates did not increase for 10/14 rats despite significant changes in limb kinematics. This task is inherently difficult, as slight perturbations to the pellet cause it to fall off the narrow pedestal. This contrasts with tasks in which the pellet is on a tray or shelf, allowing the rat to slide the pellet before securely grasping it. We may have found a higher proportion of learners if the task were more forgiving.
There are several potential explanations for why many non-learners did not improve their success rates despite changes in reach kinematics similar to learners. One possibility is that learners perform more trials per session, which accelerates learning rates (Wong et al., 2015). However, learners and non-learners performed similar numbers of trials ( Fig. 1B; Extended Data Fig. 1-1A), and kinematics changed on roughly the same timescale for both groups. It therefore seems unlikely that differences in motivation or practice account for differences in success rates. Biomechanical differences between rats could also limit success. For example, rats with small hands may not be capable of achieving reach apertures sufficient to improve success rates. Differences in sensory acuity, both in directing the initial reach (rats rely heavily on olfaction to direct reaches; Whishaw and Tomie, 1989), or somatosensation to refine their grasp, could affect success rates and kinematic adaptation. Another possibility is that these rats would have increased their success rates with additional training, as was observed in an "extended training" group trained for several weeks in a previous study (Lemke et al., 2019). Finally, and perhaps most interesting, is the possibility that there are innate differences in neural circuits/plasticity between learners, non-learners that refine their reach kinematics, and nonlearners in which trajectory variability persists. The possible neural substrates for such variability are broad. One candidate is differences in dopamine signaling/receptor profiles, as dopamine plays an important role in motor adaptation (Panigrahi et al., 2015;Bova et al., 2020). Other possibilities include differences in cortical plasticity (Li et al., 2017;Hyland et al., 2019) and corticostriatal coherence (Lemke et al., 2019).
Regardless of mechanism, our data suggest that nonlearners did not make final, subtle adjustments in digit kinematics necessary to firmly grasp and retrieve sugar pellets. This is consistent with previous work suggesting that gross forelimb movements for guiding the hand to the       pellet and fine digit movements for grasping the pellet are learned as separate "reach" and "grasp" modules (Li et al., 2017;Lemke et al., 2019). However, in these studies, refinement of fine motor control was inferred from increases in success rate, rather than direct measures of digit kinematics. We show that fine digit movements to prepare for grasping (i.e., precisely adjusting final digit location, spreading and extending digits, pronating the hand) are refined after gross movement kinematics (i.e., forelimb trajectory variability, reach velocity) have stabilized, at least in learners. Therefore, learners first learn to accurately and efficiently guide their hands toward the pellet, then to appropriately shape their digits and orient their hands at the correct moments to grasp the pellet. This timing may be reflected in corticostriatal coherence at low frequencies (3-6 Hz), which are phase-locked to grasp onset in an almost identical task (Lemke et al., 2019). That gross and fine motor components of reaching are learned sequentially, rather than concurrently, suggests that distinct neural mechanisms regulate and implement these aspects of motor control. This concept of modular circuits for reaching and grasping is supported by several physiologic experiments. Reach and grasp movements are elicited by microstimulation in distinct areas of cortex (caudal and rostral forelimb areas, respectively; Brown and Teskey, 2014). Furthermore, perturbations at different downstream loci of cerebral motor circuits selectively disrupt either reaching or grasping movements. Chemogenetic silencing of spinal interneurons that receive inputs from either motor or sensory cortex interfered with reaching or grasping, respectively (Ueno et al., 2018). This suggests that distinct circuits for reaching and grasping are maintained at all levels of motor circuitry.
However, reach-to-grasp movements require that the digits shape in preparation for grasping as the paw advances toward the pellet. Therefore, improving skilled reaching performance requires not only refinement of independent gross and fine motor components, but also their temporal and spatial integration. We found that coordinated execution of digit movements (e.g., spreading and extending the digits) with paw advancement continued to change across all 10 d of training. This suggests that fine digit movements are not regulated by circuits entirely distinct from those used to refine gross movements, or at least that communication between "reach" and "grasp" modules also evolves with practice.
Temporospatial integration of fine and gross motor control could be implemented in cortico-basal ganglia pathways. As rats learned skilled reaching, coordinated activity between dorsolateral striatum and motor cortex emerged, and inactivation of dorsolateral striatum disrupted movement-related cortical activity (Lemke et al., 2019). This suggested that coordinated corticostriatal activity is essential for refining reach-to-grasp movements. However, this coordinated activity only appeared to correlate with changes in gross movement kinematics and not with success rate (i.e., coordinated activity did not differ between successful and failed reaches), and it was concluded that striatum regulates gross, not fine, motor control. However, the present findings show that fine digit movements continue to change even after success rate (and presumably corticostriatal coherence) had stabilized in individual rats (Figs. 3,4;(6)(7)(8)(6)(7)(8)(9)(10)(11)(6)(7)(8)(9)(10)(11)(12)(13). Thus, coordinated corticostriatal activity could be correlated with changes in fine digit kinematics not evident from success rate. Indeed, previous work found that optogenetic manipulations of substantia nigra pars compacta dopamine neurons during skilled reaching disrupted the coordinated execution of digit movements with gross forelimb movements (Bova et al., 2020). Therefore, basal ganglia circuitry does regulate some aspects of fine digit control, or at least its timing with respect to gross forelimb movements. This may be accomplished through basal ganglia regulation of motor cortex activity.
Fine digit movements continued to be refined after success rate had stabilized in learners, suggesting that aspects of motor control that do not affect task outcome are optimized over a longer period of time. One possibility is that movements were refined further to maximize efficiency in terms of either time or energy consumption. In this view, the execution of reach-to-grasp movements requires balancing the desire to obtain as many rewards as possible with minimizing effort and maximizing accuracy. Humans and primates increase movement vigor when those movements are paired with reward, suggesting that reward mitigates the cost of effort (Takikawa et al., 2002;Shadmehr et al., 2010;Manohar et al., 2015;Summerside et al., 2018). However, movements often have a speedaccuracy trade-off, as movements that are faster or larger become less accurate (Shmuelof et al., 2012). Reach duration decreased and reach velocity increased over the first few training sessions, but then plateaued, suggesting that movement speed reached a threshold at which maximizing reward was balanced with maintaining accuracy. However, it is possible that once the speed of reaching movements was optimized, rats then attempted to reduce  Non-learners: t = À0.25, df = 185, p = 0.80 energy cost by refining digit movements. One way of minimizing effort is to efficiently distribute work across multiple muscles or joints (Diedrichsen et al., 2010). Improving digit shaping before grasping may reduce energetic costs by minimizing the number of corrective movements. Despite overall improvements in performance, there was still significant variability in reach-to-grasp movements after success rate stabilized, even in learners. Because the extrinsic features of the task do not change (e.g., the pellet is always in the same position, shape and size of pellet are consistent, etc.), it is likely that performance variability in late training sessions arises from changes in internal states. There are several potential sources of this internal variability. Manohar et al. (2015) found that increased reward enhanced saccade accuracy, and that saccade accuracy in Parkinson Disease patients was insensitive to reward. This suggests that increased motivation reduces motor noise to improve movement accuracy, possibly through dopaminergic mechanisms (Manohar et al., 2015). Similarly, male song in zebra finches is more stereotyped when performing for a female during courtship than when practicing alone (Kao et al., 2005). Furthermore, female-directed song is associated with enhanced extracellular dopamine levels in Area X (Ihle et al., 2015). Thus, variability in rodent skilled reaching may arise from varying motivation during a session. Another potential source of variability is the degree to which stimuli unrelated to the current task are attended (Roberts et al., 2021). While ignoring irrelevant information may improve task performance, ignoring potentially salient information carries a cost. For example, sensory information could warn of impending danger or signal existence of a better opportunity for reward. Therefore, even in a controlled laboratory environment, the brain may not be conditioned to persistently filter competing distractors for extended periods of time. Finally, movement variability may persist so that motor circuits can adapt if task rules change. Variability during early motor skill acquisition contributes to learning by allowing exploration of optimal motor patterns (Dhawale et al., 2017). Although extrinsic features of our skilled reaching task do not change, some level of variability could be maintained to allow fast adaptation if they did.
Whether reaching kinematics remain variable or become more stereotyped could depend on tonic dopamine levels in striatum, which is thought to control the trade-off between exploration (i.e., try different strategies) and exploitation (i.e., continue to use the same strategy; Humphries et al., 2012). Performance improvement during skilled reaching learning was prevented by ventral tegmental area dopamine lesions (Hosp et al., 2011). However, it is not clear whether learning was impaired because low dopamine levels prevented exploration for optimal reaching strategies, prevented rats from learning to exploit the optimal reaching strategy, or for other reasons. Experiments that directly assess changes in reach kinematics with dopamine manipulations in different brain regions during learning could address these questions. Similarly, it will be important to determine how dopamine signaling changes during skill acquisition across relevant brain regions (e.g., motor cortex and striatum), and how such changes correlate with task performance. Finally, it is not known whether differences in motor learning are correlated with differences in other domains (e.g., instrumental or Pavlovian conditioning).
Future studies will need to address potential explanations for continued refinement of fine digit kinematics and variability after the stabilization of performance outcomes. However, our results provide a foundational understanding of how skilled reach-to-grasp kinematics in rats evolve during learning, and illustrate the importance of intersubject variability. These findings are essential for interpreting results of studies applying neural circuit manipulations during learning or performance of skilled reaching, and are a crucial step toward understanding how neurologic disorders disrupt dexterous motor skill learning and performance.