Abstract
Human multisensory grasping movements (i.e., seeing and feeling a handheld object while grasping it with the contralateral hand) are superior to movements guided by each separate modality. This multisensory advantage might be driven by the integration of vision with either the haptic position only or with both position and size cues. To contrast these two hypotheses, we manipulated visual uncertainty (central vs peripheral vision) and the availability of haptic cues during multisensory grasping. We showed a multisensory benefit regardless of the degree of visual uncertainty suggesting that the integration process involved in multisensory grasping can be flexibly modulated by the contribution of each modality. Increasing visual uncertainty revealed the role of the distinct haptic cues. The haptic position cue was sufficient to promote multisensory benefits evidenced by faster actions with smaller grip apertures, whereas the haptic size was fundamental in fine-tuning the grip aperture scaling. These results support the hypothesis that, in multisensory grasping, vision is integrated with all haptic cues, with the haptic position cue playing the key part. Our findings highlight the important role of nonvisual sensory inputs in sensorimotor control and hint at the potential contributions of the haptic modality in developing and maintaining visuomotor functions.
Significance Statement
The longstanding view that vision is considered the primary sense we rely on to guide grasping movements relegates the equally important haptic inputs, such as touch and proprioception, to a secondary role. Here, we show that by increasing visual uncertainty during visuo-haptic grasping, the central nervous system exploits distinct haptic inputs about the object position and size to optimize grasping performance. Specifically, we demonstrate that haptic inputs about the object position are fundamental to support vision in enhancing grasping performance, whereas haptic size inputs can further refine hand shaping. Our results provide strong evidence that nonvisual inputs serve an important, previously underappreciated, functional role in grasping.
Introduction
A large proportion of grasping actions are directed toward objects we can sense with multiple modalities. For instance, when grasping with one hand an object we already hold in the other hand, the properties of the object, such as its size and position in space, are provided by both vision and haptics (touch and proprioception). The integration of these redundant sensory cues fosters a consistently superior grasping performance compared with when movements are guided by each modality alone (Camponogara and Volcic, 2019a,b). Even more intriguingly, the same superior grasping performance is achieved when the haptic size cue is not provided and vision is complemented by only the haptic position cue (Camponogara and Volcic, 2021b).
The elusive effect of the haptic size cue in the multisensory integration process might result from two different causes. The superior performance in multisensory grasping might arise from the visual and haptic integration at the level of the position cues only which would reduce the uncertainty about the position of the object in space (Carey and Allan, 1996; Battaglia et al., 2010; Sperandio et al., 2013; Chen et al., 2018). As a consequence, the object size estimation would be solely determined by vision (Camponogara and Volcic, 2021b). Alternatively, the visuo-haptic integration might occur both at the level of the position cues and at the level of size cues, but the dominance of the more reliable visual size cue would completely overshadow the haptic size cue, making it hard to determine whether the multisensory size information is truly integrated.
The main aim of this study was to contrast these two alternative explanations by disrupting visual information during multisensory grasping. The quality of visual information was manipulated by modulating the participants’ gaze direction and, by this, the grasping actions were executed in either central (foveal) or peripheral vision. Because visual acuity sharply declines with retinal eccentricity (Strasburger et al., 2011; Rosenholtz, 2016), object’s size and position estimates are noticeably impaired in peripheral compared with central vision (Collier, 1931; Newsome, 1972; Schneider et al., 1978; Thompson and Fowler, 1980; Bock, 1993; Goodale and Murphy, 1997; Brown et al., 2005; Baldwin et al., 2016). Moreover, multisensory integration studies in perception have shown that as the quality of visual information gradually declines, the object size estimation shifts toward more haptically-based perceptual judgments (Derrick and Dewar, 1970; Heller, 1983; Ernst and Banks, 2002; Gepshtein and Banks, 2003; Helbig and Ernst, 2007; Van Doorn et al., 2010). It might be thus expected that increasing visual uncertainty through peripheral vision should let the haptic size cue effect emerge also in conditions of multisensory grasping.
With respect to movements in central vision, grasping movements in peripheral vision are generally slower, with larger grip apertures and with a poorer grip aperture scaling (Sivak and MacKenzie, 1990, 1992; Goodale and Murphy, 1997; Watt et al., 2000; Brown et al., 2005; Schlicht and Schrater, 2007; Hesse et al., 2012). Introducing additional haptic cues might thus refine grasping movements in several ways depending on the contribution of haptic position and size cues. The integration of the haptic position cue would reduce the overall positional uncertainty, which would translate into faster movements and narrower grip apertures. Analogously, the contribution of the haptic size cue would diminish the uncertainty relative to the object size and would be revealed by an improved grip aperture scaling. However, if the haptic size cue is not part of the integration process, the sensitivity to changes in object size should remain unaffected.
We tested these predictions in two experiments. In the first experiment, we contrasted grasping performance under peripheral vision conditions, with (pVH) or without (pV) additional haptic cues, along with the central vision conditions (V, VH) and a haptic only (H) condition. In the second experiment, we further teased apart the contribution of haptic cues when grasping handheld objects in peripheral vision by selectively withdrawing the haptic size cue and providing the haptic position cue only (pVHP).
Experiment 1
Materials and methods
Participants
Eighteen participants took part in this experiment (four male, age 25.3 ± 8.2). All had normal or corrected-to-normal vision and no known history of neurologic disorders. All of the participants were naive to the purpose of the experiment and were provided with a subsistence allowance. The experiment was undertaken with the understanding and informed written consent of each participant and the experimental procedures were approved by the Institutional Review Board of New York University Abu Dhabi.
Apparatus
The set of stimuli consisted of three 3D-printed rectangular cuboids with depths of 40, 50, 60 mm, all the same height (120 mm) and width (25 mm). A chin rest was positioned at the edge of the experimental table and its height was adjusted such that the participants’ eyes were 440 mm above the table surface. During the experiment the three target objects were positioned 350 mm in the sagittal direction with respect to the table’s edge. Thus, in the peripheral vision condition, the top of the objects was at ∼45° of eccentricity with respect to the participants’ gaze (Fig. 1A). This eccentricity allowed to increase the visual uncertainty without completely eliminating the availability of visual cues (Goodale and Murphy, 1997; Schlicht and Schrater, 2007). A custom-made eye-tracker was attached to the left rod of the chin rest with a locking arm (JB01291-BWW). The eye-tracker consisted of a modified webcam (Vivitar V49252), with a sampling frequency of 30 Hz. An array of 25 infrared LEDs was positioned on the table 40 cm far and 30 cm to the left of the participant. The activation and deactivation of LEDs was controlled by an Arduino Yún board via MATLAB (MathWorks Inc) by a custom program, which also computed the pupil coordinates from the sampled eye images. The start position of the right hand was defined by a 5 mm high rubber bump with a diameter of 9 mm attached at the edge of the table, 450 mm to the right of the participants’ mid-line. The experiment was conducted in a dark room with the experimental table illuminated by a LED desk light (5W) positioned on the left side of the participant.
A black panel (600 mm wide, 500 mm high) was positioned 450 mm far from the participants’ position (i.e., behind the object). A small white square (5 × 5 mm) was positioned at the center of the panel, at a height of 440 mm, and acted as the fixation point in the peripheral vision block. A cardboard panel (400 mm wide, 300 mm high) was used to prevent vision of the workspace (but not of the board with the fixation point) between trials in the central and peripheral blocks, whereas a pair of occlusion goggles was used to prevent vision in the Haptic condition (Red Scientific). A pure tone of 1000 Hz, 100-ms length was used to signal the start of the trial, while a tone of 600 Hz with the same length was used to signal its end.
Index, thumb, and wrist movements were acquired on-line at 200 Hz with submillimeter resolution by using an Optotrak Certus system (Northern Digital Inc.). The position of the tip of each digit was calculated during the system calibration phase with respect to two rigid bodies defined by three infrared-emitting diodes attached on each distal phalanx (Nicolini et al., 2014). An additional marker was attached on the styloid process of the radius to monitor the movement of the wrist. The Optotrak system was controlled by the MOTOM toolbox (Derzsi and Volcic, 2018).
Procedure
Participants sat comfortably at the table with their torso touching its edge. All the trials started with the thumb and index digit of the right hand positioned on the start position, the left hand positioned on the left side of the chin rest and the head on the chin rest (Fig. 1A). The height of the chair was adjusted to keep the eyes at a fixed height to maintain the object at a fixed visual angle. Participants were required to perform a precision grip with their right thumb and index digit along the depth axis of the stimulus.
Before each trial, the cardboard panel was placed in front of the participant to cover the workspace, and the object was placed in its position 350 mm in front of the participant. The experimenter then removed the cardboard panel and after a variable period the start tone was delivered. The participant had to perform a right-handed reach-to-grasp action toward the object at a natural speed. No reaction time constrains were imposed. Three seconds after the start tone, the end sound was delivered, and the participant had to move the right hand back to the start position. The cardboard was then placed in front of the participant, the object was set to the new required size and the next trial started.
Five different conditions (Fig. 1B) were performed: Haptic (H), Visual (V), Visuo-Haptic (VH), Peripheral Vision (pV), and Peripheral Vision plus Haptic (pVH). In the H condition, vision was prevented for the whole duration of the condition. Before each trial, the experimenter signaled to the participant to hold the object with their left hand along its depth axis at its base (i.e., sense its size and position by means of touch and proprioception). In the V condition, as soon as the cardboard was removed the experimenter instructed the participant to look at the object which was in the central visual field (the left hand was kept on the table close to the chin rest). In the VH condition, the participant had to hold the object at its base with their left hand and look at the object. The pV and pVH conditions were identical to the V and VH conditions except that participants were instructed to look at the fixation point instead of foveating the object, so that the target object was always in visual periphery (Fig. 1A). Whereas in the pV condition only peripheral vision was available, in the pVH condition, participants were asked to also hold the object at its base with their left hand. Eye fixations in these two conditions were monitored with the eye-tracker, which started sampling as soon as the experimenter placed the cardboard panel between the participant and the object (the cardboard height was lower than the fixation point, but high enough to cover the target object), and stopped when the end of the trial sound was delivered. If the algorithm detected an eye movement of ∼10 mm (∼1.3° of visual angle) in the horizontal or vertical direction from the fixation point, the trial was discarded and repeated later in the condition. The five conditions were divided in two main experimental blocks. The H, V, and VH conditions were part of the Central vision block, whereas the pV and pVH conditions were part of the Peripheral vision block.
The Central and Peripheral vision blocks were performed in sequence, while the order of the conditions (H, V, VH, pV, and pVH) was randomized within blocks and across participants. The differently sized objects were presented in a random order and ten repetitions were performed for each object size and condition, which led to a total of 150 trials per participant. In order to get accustomed with the task, participants underwent a training session of ten trials before each condition, for a total of 50 trials.
Data analysis
Kinematic data were analyzed in R (R Core Team, 2020). The raw data were smoothed and differentiated with a third-order Savitzky–Golay filter with a window size of 21 points. These filtered data were then used to compute velocities and accelerations in three-dimensional space for each digit and the wrist. Movement onset was defined as the moment of the lowest, nonrepeating wrist acceleration value before the continuously increasing wrist acceleration values (Volcic and Domini, 2016; Camponogara and Volcic, 2019b), while the end of the grasping movement was defined on the basis of the Multiple Sources of Information method (Schot et al., 2010). We used the criteria that the grip aperture is close to the size of the object, that the grip aperture is decreasing, that the second derivative of the grip aperture is positive, and that the velocities of the wrist, thumb and index finger are low. Moreover, the probability of a moment being the end of the movement decreased over time to capture the first instance in which the above criteria were met. Trials in which the end of the movement was not captured correctly or in which the missing marker samples could not be reconstructed using interpolation were discarded from further analysis, the exclusion of these trials (158 trials, 5.8% in total) left us with 2542 trials.
We focused our analyses on two dependent variables: the peak grip aperture, defined as the maximum Euclidean distance between the thumb and the index finger, and, the peak velocity of the hand movement, defined as the highest wrist velocity along the movement. We analyzed the data using Bayesian linear mixed-effects models, estimated using the brms package (Bürkner, 2017) which implements Bayesian multilevel models in R using the probabilistic programming language Stan (Carpenter et al., 2017). The models included as fixed-effects (predictors) the categorical variable Condition (H, V, VH, pV, and pVH) in combination with the continuous variable Size. This latter was centered before being entered in the models, thus, the estimates of the Condition parameters (βCondition) correspond to the average performance of each Condition. The estimates of the parameter Size (βSize) correspond instead to the change in the dependent variables as a function of the object size. All models included independent random (group-level) effects for subjects. Models were fitted considering weakly informative prior distributions for each parameter to provide information about their plausible scale. We used Gaussian priors for the Condition fixed-effect predictor (peak grip aperture βCondition: mean = 90 and SD = 40; peak velocity βCondition: mean = 1100 and SD = 200). For the Size fixed-effect predictors we used a Cauchy prior distribution centered at 0 with a scale parameter of 2.5. For the group-level standard deviation parameters and sigmas we used Student t-distribution priors (peak grip aperture all SD parameters and sigma: df = 3, scale = 10; peak velocity all SD parameters and sigma: df = 3, scale = 170). Finally, we set a prior over the correlation matrix that assumes that smaller correlations are slightly more likely than larger ones (LKJ prior set to 2).
For each model we ran four Markov chains simultaneously, each for 16,000 iterations (1000 warm-up samples to tune the MCMC sampler) with the delta parameter set to 0.9 for a total of 60,000 postwarm-up samples. Chain convergence was assessed using the
The posterior distributions we have obtained represent the probabilities of the parameters conditional on the priors, model, and data, and they represent our belief that the “true” parameter lies within some interval with a given probability. We summarize these posterior distributions by computing the medians and the 95% highest density intervals (HDIs). The 95% HDI specifies the interval that includes with a 95% probability the true value of a specific parameter. To evaluate the differences between parameters of two conditions, we have simply subtracted the posterior distributions of βCondition and βSize weights between specific conditions. The resulting distributions are denoted as the credible difference distributions and are again summarized by computing the medians and the 95% HDIs.
For statistical inferences about the βSize we assessed the overlap of the 95% HDI with zero. A 95% HDI that does not span zero indicates that the predictor has an effect on the dependent variable. For statistical inferences about the differences of the model parameters, βCondition and βSize, between conditions, we applied an analogous approach. A 95% HDI of the credible difference distribution that does not span zero is taken as evidence that the model parameters in the two conditions differ from each other. Data and codes are available at the following link https://osf.io/dfycg/.
Results and discussion
Based on previous results (Camponogara and Volcic, 2019a,b, 2021b), we predict that the multisensory condition in central vision (VH) should exhibit faster grasping movements with smaller peak grip apertures than the V and H unisensory conditions. Likewise, we expect the peripheral vision conditions (pV, pVH) to show a decline in performance with respect to their corresponding central vision conditions (V, VH), because peripheral vision is characterized by a higher visual uncertainty. However, two main scenarios are considered for the peripheral vision conditions. If haptic size is largely involved in the control of grasping, we expect faster movements, with narrower peak grip apertures and a better grip aperture scaling in pVH compared with pV. If haptic size does not play a relevant role, we expect actions in pVH to be faster and with narrower peak grip apertures than in pV, but with no improvement in grip aperture scaling, that is, the sensitivity to changes in object size would be equivalent to the pV condition.
We confirmed that movements performed in central vision were faster and with a narrower peak grip aperture in multisensory compared with each unisensory conditions (Fig. 2A,C; Camponogara and Volcic, 2019a,b, 2021b). Interestingly, the same pattern of results was found also in peripheral vision (Fig. 2B,D), confirming that haptics and vision are integrated also when vision is degraded. As expected, actions were slower and were performed with a wider grip aperture in peripheral compared with central vision, in both unisensory and multisensory conditions (V vs pV and VH vs pVH; Fig. 2). Interestingly, while the peak grip aperture scaled similarly in V and VH (Fig. 2C), the scaling was stronger in pVH compared with pV (Fig. 2D), suggesting a different support of haptics when acting in central and peripheral vision.
Central vision
In central vision, the peak velocity was modulated according to the available sensory information (Fig. 3A), with an advantage of multisensory over unisensory grasping, and of vision over haptics. The peak velocity was credibly higher in VH compared with V and H, and tended to be credibly higher in V compared with H (Fig. 3B). The peak velocity was not affected by changes in object size in any of the conditions, with slope values ranging between –0.1 and –0.65 corresponding to minimal variations in peak velocity between the smallest and the largest object (∼10 mm/s difference equivalent to ∼1% of the average peak velocity).
The peak grip aperture was also clearly affected by the available sensory inputs (Fig. 3C). Peak grip aperture was credibly smaller in the VH condition compared with the H condition, and in V compared with the H condition (Fig. 3D). Also, the peak grip aperture in VH tended to be smaller than in the V condition. These results replicate previous findings and further corroborate that the simultaneous availability of visual and haptic inputs leads to a multisensory advantage (Camponogara and Volcic, 2019b, 2021b).
The peak grip aperture scaled with object size in all conditions (Fig. 3E). The scaling was equivalent in the VH and V conditions, and stronger compared with the H condition (Fig. 3F). This can be considered as a sign that, in central vision, the peak grip aperture modulation in multisensory grasping is mainly based on the visual size cue, as suggested by previous studies (Camponogara and Volcic, 2021b).
Comparisons between central and peripheral vision
Additional haptic inputs affected both peak velocity and peak grip aperture also in peripheral vision (Fig. 3A,C). As observed for central vision, holding the object with the contralateral hand facilitated faster movements and reduced grip apertures highlighting again the beneficial role of haptics.
The concurrent availability of peripheral vision and haptics enabled faster movements compared with when either of the two modalities was presented in isolation (Fig. 3B, pV–pVH and H–pVH comparisons). As expected, reach-to-grasp actions toward peripherally seen objects were slower than those toward centrally seen objects. The peak velocity was credibly lower in pVH compared with VH, and there was a tendency for a credibly lower peak velocity in pV compared with V (Fig. 3B, pV–V and pVH–VH comparisons).
The peak grip aperture credibly increased when the object was in peripheral compared with central vision, both with or without the support of concurrent haptic information (Fig. 3D, pV–V and pVH–VH comparisons). However, the switch from central to peripheral vision increased peak grip apertures more strongly when the grasping behavior was not supported by additional haptic information (pV–V vs pVH–VH). The effect of adding haptic information to peripheral vision resulted in credibly narrower peak grip apertures (Fig. 3D, pV–pVH comparison), whereas adding peripheral vision to haptics led to only a minor improvement (Fig. 3D, H–pVH comparison).
The availability of concurrent haptic size and position cues also partially prevented the typical worsening of the scaling of the grip aperture when grasping is guided only by peripheral vision (Fig. 3E). Object size scaling was credibly weaker in peripheral compared with central vision (Fig. 3F, pV–V and pVH–VH comparisons), but the scaling of the peak grip aperture was credibly stronger in the pVH condition compared with the pV condition (Fig. 3F, pV–pVH comparison), which was, in turn, identical to the H condition (Fig. 3F, H–pVH comparison). It is interesting to note that while in central vision the grip aperture scaled similarly in the unisensory visual and in the multisensory conditions (Fig. 3F, V–VH), the grip aperture in visual periphery scaled more strongly in the multisensory compared with the unisensory visual condition (Fig. 3F, pV–pVH). This suggests that haptic object position and size information are flexibly used according to the quality of visual information.
Figure 4A summarizes all the conditions in terms of peak velocity, peak grip aperture and the scaling of peak grip aperture as a function of object size. Conditions from worst (larger grip apertures and lower velocity) to best (smaller grip apertures and higher velocity) grasping performance lie along the diagonal line connecting the top-left to the bottom-right corners and are denoted with smaller to larger dot sizes indicating their respective slope of peak grip apertures. Two aspects are again evident here. First, both conditions with peripheral vision (pV and pVH) are inferior to their respective central vision conditions (V and VH). Second, complementing peripheral vision with haptic inputs leads to a superior grasping performance than when actions are guided only by peripheral vision (pVH vs pV). Interestingly, in peripheral vision haptics improved the grip aperture, peak velocity and the overall scaling of the peak grip aperture to a higher extent than in central vision.
Figure 4B represents the covariation of the wrist velocity and the grip aperture from the start to the end of the movement. The highest value reached by each curve along the horizontal axis represents the point of the movement trajectory at which the peak velocity occurred, and, similarly, the highest value of each curve along the vertical axis represents the peak grip aperture. Just after movement start, the curves clustered into two groups, one including the conditions with haptic information (H, VH, and pVH) and one including those without haptics (V and pV); a sign that the initial movement velocity and grip aperture in multisensory conditions were mainly under haptic control. These groups dissolved before the curves reached the peak velocity and the evolution of each curve was affected by the available sensory information. In contrast, the curves representing the changes in the scaling of the peak grip aperture formed three groups of conditions which stayed separated until movement end (Fig. 4C). The slopes were similar between H and pVH, and between V and VH, with flatter slopes for the first (H, pVH) than for the second group (V, VH). Instead, the pV condition showed a distinct slope profile with very weak scaling which persisted almost until movement end.
Experiment 2
The results of experiment 1 show that, as for central vision, actions toward handheld objects in peripheral vision are performed faster and with narrower grip apertures than those toward only (peripherally) seen objects. This suggests that visual and haptic inputs are successfully integrated even when vision is disrupted. However, the partially restored grip aperture scaling observed in peripheral multisensory grasping could have two different origins that either incorporate haptic size cues or not. If the haptic size cue is critical for hand shaping in peripheral multisensory grasping, we expect that its removal would resemble the peak grip aperture and its scaling observed in the pV condition. Instead, if the hand shaping is mainly determined by visual size cues which are improved by the availability of haptic positional information, as seen in central vision (Camponogara and Volcic, 2021b), the haptic position cue should be sufficient to attain the same level of peak grip aperture and its scaling as when all haptic cues are provided. As long as the haptic position cue is available, the presence or absence of the haptic size cue should not affect peak velocities, which should be higher than when only peripheral vision is available. To tease apart the relative contribution of these haptic inputs, we systematically manipulated the haptic size availability. In the Peripheral Vision plus Haptic Position condition (pVHP), we introduced a new set of objects which were identical to those used in experiment 1, but had the lower half replaced by a post which did not co-vary with the size of the objects (Fig. 5A). Thus, in the pVHP condition, participants were holding the post with their left hand (Fig. 5B), which provided only haptic positional but no relevant size information, while simultaneously seeing the object in the periphery. This pVHP condition was performed on a new group of participants together with the pV and pVH conditions, which were the same as in experiment 1.
Materials and methods
Participants
Eighteen new participants took part in experiment 2 (six male, age 20.7 ± 3.5). All had normal or corrected-to-normal vision and no known history of neurologic disorders. All of the participants were naive to the purpose of the experiment and were provided with a subsistence allowance. The experiment was undertaken with the understanding and informed written consent of each participant and the experimental procedures were approved by the Institutional Review Board of New York University Abu Dhabi.
Apparatus
The experimental setup was the same as in experiment 1 (Fig. 1A), except that two set of stimuli were used: the first set was the same as in the first experiment (Fig. 1A), whereas the second set of stimuli consisted of five rectangular cuboids of 60-mm height supported by a 60-mm-high post which was 10 mm deep and 25 mm wide (Fig. 5A). The upper part of these stimuli was identical to the first set of stimuli and thus varied in depth across trials. The post supporting the upper part had instead a fixed depth.
Procedure
The procedure was the same as for the Peripheral block of experiment 1. In the pV condition and the pVH condition the first set of objects was presented (Fig. 1A). In the pVHP condition, the second sets of objects was used (Fig. 5A). In this case, participants held with their left hand the the base of the post that supported the target object (Fig. 5B). Thus, while in the pVH condition haptic inputs were informative of both the object size and position, in the pVHP condition haptic inputs provided only positional object information. Therefore, peripheral vision was the only source of object size information.
The order of the conditions (pV, pVH, pVHP) was randomized across participants. Object sizes were randomized within each condition and 15 trials were performed for each object size and condition, which led to a total of 135 trials per participant. Before each condition, participants underwent a training session of ten trials to get accustomed with the task, for a total of 30 trials.
Data analysis
The raw data processing and the statistical analyses were identical to those of experiment 1. Based on the same exclusion criteria, a total of 276 trials (11.3% in total) were excluded which left us with 2154 trials for the final analysis. As in experiment 1, we focused our analyses on the peak grip aperture and the peak velocity of the hand movement. The
Results and discussion
Results showed that movements were performed faster and with a narrower grip aperture in the multisensory conditions (pVH, pVHP) compared with the unisensory (pV) condition. Interestingly, movements were equally fast and with a similar grip aperture either with (pVH) or without (pVHP) the haptic size cue (Fig. 5C,D). However, removing the haptic size cue considerably reduced the scaling of the grip aperture, which scaled less than when both the size and position haptic cues were available (Fig. 5D).
Movements supported by haptic inputs were faster than the unisensory visual condition (Fig. 6A) with a credibly higher peak velocity in pVH and in pVHP compared with pV (Fig. 6B). Interestingly, as we have observed for central vision (Camponogara and Volcic, 2021b), no differences in peak velocity were found between the pVH and pVHP conditions confirming that the integration of vision and haptics is mainly concerned with the position of the object. As in experiment 1, peak velocity was insensitive to changes in object size. The variation of the size effect spans the [0, –0.13] range, which corresponds to a variation of the peak velocity of 2.6 mm/s from the smallest to the largest object (∼0.2% of the average peak velocity).
The analysis of the peak grip aperture reaffirmed the advantage of multisensory over unisensory conditions (Fig. 6C). Peak grip aperture was credibly larger in the pV condition than in the pVH condition (Fig. 6D). The peak grip aperture was also credibly larger in pV compared with pVHP, and similar between the pVH and pVHP conditions. Most importantly, providing only haptic positional information was not sufficient to accurately scale the grip aperture according to the object size (Fig. 6E). We found that the peak grip aperture increased credibly less as a function of object size in pV and pVHP compared with pVH, and, it was similar between pV and pVHP conditions (Fig. 6F). Thus, in degraded visual conditions the haptic positional information speeds up movements and decreases grip aperture, but haptic size is essential to modulate the grip aperture according to the object size (Fig. 7A). Noticeably, the grip apertures and the movement velocities in pVH and pVHP conditions were almost indistinguishable from the beginning to the end of the movement and clearly separated from the pV condition emphasizing the specific role of the haptic position cue in improving action performance (Fig. 7B). However, as can be seen in Figure 7C, the haptic size cue was crucial to refine the hand shaping around the object by improving the grip aperture scaling along the whole movement trajectory.
Discussion
There are two key findings of the present research. First, we found that the integration of visual and haptic object features for multisensory guided grasping occurs not only when vision is superior to haptics, but also when vision is disrupted to the extent that it becomes the less reliable modality. Second, we found that the integration of vision and haptics for multisensory guided grasping comprises both position and size cues, with the greater benefits gained by the contribution of the haptic position cue.
Visually guided grasping in central vision clearly outperformed haptically guided grasping, but it was severely degraded when vision was only peripheral. Irrespective of the quality of visual information, we have observed pronounced improvements when both vision and haptics were simultaneously available. Multisensory guided movements were faster than movements in the fastest of the unisensory conditions and grip apertures tended to be smaller than the smallest of the unisensory conditions. These findings show that the process of multisensory integration for grasping actions obeys the same rules observed in studies on visuo-haptic reaching (Camponogara and Volcic, 2021a) and visuo-haptic perception (Derrick and Dewar, 1970; Heller, 1983; Ernst and Banks, 2002; Gepshtein and Banks, 2003; Helbig and Ernst, 2007; Wijntjes et al., 2009; Van Doorn et al., 2010). Thus, there is evidence in both perception and action that multisensory integration is not a rigid process in which vision simply dominates over haptics (Rock and Victor, 1964; Hay et al., 1965; Rock and Harris, 1967; Power and Graham, 1976), but it is instead a flexible process balancing the contributions of vision and haptics depending on the quality of each source of information.
With regard to the role of the separate haptic cues, we found that enriching peripheral visual information with only the haptic position cue was sufficient to increase movement velocity and reduce grip aperture as much as when also the haptic size cue was available. It is known that the localization of objects can be strongly impaired when they are placed in visually eccentric (peripheral) positions (Bock, 1993; Henriques et al., 1998; Henriques and Crawford, 2000; Bartolo et al., 2018). This increased positional uncertainty could be the primary cause of the worsened grasping performance usually observed when only peripheral vision is available (Sivak and MacKenzie, 1990, 1992; Goodale and Murphy, 1997; Watt et al., 2000; Brown et al., 2005; Schlicht and Schrater, 2007; Hesse et al., 2012). Our results clearly support the view that visual and haptic position cues are integrated to reduce the overall positional uncertainty, which positively influences the quality of grasping movements even when visual information is severely degraded (Chen et al., 2018; Camponogara and Volcic, 2021b). This does not exclude though that the uncertainty about object size also affects grasping movements.
The role of the haptic size cue was indeed revealed by how the grip aperture scaled according to object size. When both the haptic position and haptic size cues were provided together with peripheral vision, the scaling of the grip aperture improved with respect to the peripheral vision only condition and it was comparable to the scaling observed in the haptics only condition. This could have been an indication that the refined scaling resulted either from a reduced uncertainty about the object size driven by the availability of the haptic size cue, or, from a reduced uncertainty about the object position driven by the availability of the haptic position cue. Our results exclude the latter explanation. Providing only the haptic position cue with peripheral vision was not sufficient to induce the level of scaling observed when also the haptic size cue was available. Thus, the haptic size cue played a necessary role, because its removal, indeed, weakened the scaling of the grip aperture to the level of the peripheral vision only condition. The contributing role of the haptic size cue in reducing the overall size uncertainty is further reinforced by observing the evolution of grip aperture scaling along the whole movement trajectory. Scaling along the trajectory in multisensory peripheral vision conditions was identical to the haptic only condition when the haptic size cue was present and it was identical to the peripheral vision only condition when the haptic size cue was absent.
An additional aspect worth commenting concerns the relationship between the peak grip aperture and its scaling. The fact that peak grip aperture scales reliably with changes in object size (with a slope of ∼0.7) is an established property of normal grasping movements (Marteniuk et al., 1990; Jakobson and Goodale, 1991). It has also been shown that in degraded visual conditions (e.g., by removing visual feedback or by switching from binocular to monocular vision) the peak grip aperture increases and the grip aperture scaling weakens (Churchill et al., 2000; Watt and Bradshaw, 2000; Melmoth and Grant, 2006; Keefe and Watt, 2009; Hesse et al., 2016; Keefe et al., 2019). All our results conform with this behavior except for the multisensory condition in which only the haptic position was provided together with peripheral vision. Here, the grip aperture scaling heavily decreased without the parallel increase of the peak grip aperture. This means that, if needed, the grip aperture and its scaling can be controlled independently according to the demands of a specific situation and can lead to grasping movements of generally higher quality in which collisions with objects are strategically avoided.
The interpretation of the present results is based on the idea that an estimate of object size is necessary for the formation of reach-to-grasp movements. An alternative view, the digit-in-space framework, poses that grasping kinematics follow from the movements of the individual digits toward specific positions in space, which correspond to the grasping points of the digits on the object (Smeets and Brenner, 1999; Verheij et al., 2012; Smeets et al., 2019). Variations in grasping movement execution should thus be expected if haptics, vision and peripheral vision provide estimates of grasping points that differ in accuracy and/or precision. And, when more than one sense is simultaneously available, movement execution should be expected to improve compared with movements guided by each modality. Additionally, when the haptically sensed grasping points are closer to each other than those sensed by vision, the jointly estimated grasping points should be drawn toward the center of the object, making the difference in object sizes appear less distinct than they actually are and these should directly affect the emerging peak grip aperture and its scaling. Thus, the results presented here are also compatible with the digit-in-space framework. However, Camponogara and Volcic (2021b) previously reported an instance in which the results do not seem to be fully captured by this line of reasoning: the observed benefits on grasping movements in central vision were equal regardless of the congruence between the positions of the haptic and visual grasping points. A further element to be considered is that the improvements observed in multisensory grasping could also be a consequence of more effective sensorimotor transformations (Tagliabue and McIntyre, 2014; Kuling et al., 2016, 2017). Future studies will need to single out edge conditions in multisensory grasping for which these views predict different outcomes.
The associations between the visual and the haptic modality are not innate, but rather characterized by a high degree of plasticity. Vision and haptics achieve calibration during development through constant cross-sensory comparisons (Gori et al., 2008). Moreover, studies on cataract-treated participants showed the restoration of visual object recognition (Held et al., 2011; Chen et al., 2016) and the acquisition of multisensory integration (Senna et al., 2021) is possible within a brief period after surgery by exploiting the cross-modal interactions between vision and touch. The haptic and visual recalibration is also visible in adults following visuomotor adaptation tasks (Volcic et al., 2013; Wiesing et al., 2021), which might be related to the strong couplings that exist between the senses and movement control (Steinbach and Held, 1968; Bock, 1987; Maiello et al., 2018). Our results complement these findings and raise the intriguing possibility that the haptic modality available during sensorimotor interactions with the environment could be effective in learning or restoring visuomotor functions during development and throughout the lifespan.
In sum, our results are in clear support of the view that visuo-haptic integration for grasping occurs both at the level of the position cues and at the level of the size cues confirming the hypothesis that, in optimal visual conditions, the effect of the haptic size cue is usually masked by the dominance of the more reliable visual size cue. When vision was disrupted, both haptic position and haptic size cues played a relevant role in shaping the grasping movements. It is, however, important to note that most of the advantages in multisensory grasping stem from the contribution of the haptic position cue. As previously suggested (Camponogara and Volcic, 2021b), a sensorimotor system can achieve greater robustness if it relies on the integration of visuo-haptic object features that systematically co-occur (e.g., position) than on features that can frequently differ between the two sensory modalities because of variations in object shape (e.g., size).
Footnotes
The authors declare no competing financial interests.
This work was supported by the New York University Abu Dhabi (NYUAD) Research Enhancement Fund Grant RE183 and the NYUAD Center for Artificial Intelligence and Robotics, funded by Tamkeen under the NYUAD Research Institute Award CG010.
This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license, which permits unrestricted use, distribution and reproduction in any medium provided that the original work is properly attributed.