Elsevier

Cognition

Volume 126, Issue 2, February 2013, Pages 135-148
Cognition

Transfer of object category knowledge across visual and haptic modalities: Experimental and computational studies

https://doi.org/10.1016/j.cognition.2012.08.005Get rights and content

Abstract

We study people’s abilities to transfer object category knowledge across visual and haptic domains. If a person learns to categorize objects based on inputs from one sensory modality, can the person categorize these same objects when the objects are perceived through another modality? Can the person categorize novel objects from the same categories when these objects are, again, perceived through another modality? Our work makes three contributions. First, by fabricating Fribbles (3-D, multi-part objects with a categorical structure), we developed visual-haptic stimuli that are highly complex and realistic, and thus more ecologically valid than objects that are typically used in haptic or visual-haptic experiments. Based on these stimuli, we developed the See and Grasp data set, a data set containing both visual and haptic features of the Fribbles, and are making this data set freely available on the world wide web. Second, complementary to previous research such as studies asking if people transfer knowledge of object identity across visual and haptic domains, we conducted an experiment evaluating whether people transfer object category knowledge across these domains. Our data clearly indicate that we do. Third, we developed a computational model that learns multisensory representations of prototypical 3-D shape. Similar to previous work, the model uses shape primitives to represent parts, and spatial relations among primitives to represent multi-part objects. However, it is distinct in its use of a Bayesian inference algorithm allowing it to acquire multisensory representations, and sensory-specific forward models allowing it to predict visual or haptic features from multisensory representations. The model provides an excellent qualitative account of our experimental data, thereby illustrating the potential importance of multisensory representations and sensory-specific forward models to multisensory perception.

Highlights

► We address people’s abilities to transfer category knowledge across sensory domains. ► We introduce the See and Grasp data set, the first visual-haptic data set. ► Experiment shows that object category knowledge transfers across sensory domains. ► Bayesian inference algorithm is proposed for learning componential 3-D shapes. ► Forward models predict sensory-specific features from multisensory representations .

Introduction

When recording neural activity in the human medial temporal lobe, Quiroga, Kraskov, Koch, and Fried (2009) found individual neurons that explicitly encode multisensory percepts. For example, one neuron responded selectively when a person viewed images of the television host Oprah Winfrey, viewed her written name, or heard her spoken name. (To a lesser degree, the neuron also responded to the actress Whoopi Goldberg.) Another neuron responded selectively when a person saw images of the former Iraqi leader Saddam Hussein, saw his name, or heard his name. Clearly, our brains encode abstract representations of objects that are multisensory in the sense that these representations are activated by perceptual inputs, but these inputs span multiple sensory formats or modalities.

Why would our brains acquire abstract representations that are activated by inputs from a variety of sensory modalities? One possible answer to this question is that these representations facilitate the transfer of knowledge across modalities. Consider, for instance, a person that learns to categorize a set of objects based solely on tactile or haptic inputs. Would the person be able to categorize these same objects when the objects are viewed but not grasped? Would the person be able to view novel objects from the same categories and be able to categorize these?

Here, we report experimental and computational studies of the acquisition of multisensory representations of object category, and the role these representations play in the transfer of knowledge across visual and haptic modalities. Our work includes three contributions. First, our experiment used an unusual set of visual-haptic stimuli known as “Fribbles”. Fribbles are complex, 3-D objects with multiple parts and spatial relations among the parts (see Fig. 1). Moreover, they have a categorical structure—that is, each Fribble is an exemplar from a category formed by perturbing a category prototype. Fribbles have previously been used in the study of visual object recognition (Hayward and Williams, 2000, Tarr, 2003, Williams, 1997). An innovation of our work is that we have fabricated a large set of Fribbles using a 3-D printing process and, thus, our Fribbles are physical objects which can be both seen and grasped. Based on this set of stimuli, we have created a data set, referred to as the See and Grasp data set, containing both visual and haptic features of the Fribbles. We are making this data set freely available on the world wide web with the hope that it will encourage quantitative research on computational models of visual-haptic perception.

Second, we conducted an experiment evaluating whether people can transfer knowledge of object category across visual and haptic modalities. Previous researchers have considered the transfer of knowledge of object identity across visual and haptic modalities (e.g., Lacey et al., 2007, Lawson, 2009, Norman et al., 2004). They have also compared similarity and categorization judgements based solely on visual input with those based solely on haptic input (Gaißert and Wallraven, 2012, Gaißert et al., 2011, Gaißert et al., 2008, Gaißert et al., 2010). To our knowledge, our experiment is the first focused on the transfer of object category knowledge across visual and haptic modalities.

Lastly, we developed a computational model, referred to as the MVH (Multisensory-Visual-Haptic) model, accounting for how multisensory representations of prototypical 3-D shape might be acquired, and of the role these representations might play in the transfer of category knowledge across visual and haptic modalities. Like some previous models in the literature (Biederman, 1987; Marr & Nishihara, 1978), the model makes use of part-based representations of prototypes. However, it goes beyond previous work by introducing a learning mechanism for the acquisition of these representations. Using its acquired multisensory representations along with sensory-specific forward models for predicting visual or haptic features from multisensory representations, the model transfers object category knowledge between visual and haptic modalities, thereby providing a qualitative account of our experimental data.

Section snippets

Previous research on visual-haptic object perception

Previous research has shown that knowledge of object identity transfers (at least in part) across visual and haptic domains (e.g., Lacey et al., 2007, Lawson, 2009, Norman et al., 2004). For example, Lacey, Peters, et al. (2007) trained subjects to identify objects either visually or haptically. Following training, subjects were tested on the same task using the untrained sensory modality. Subjects showed excellent transfer to the novel modality when objects were presented at the same

Fribbles and the See and Grasp data set

A key component of our research is the unusual visual-haptic stimuli that we used in both our experimental and computational studies. These stimuli are a subset of a larger set of stimuli known as “Fribbles”.1 Fribbles have previously been used in the vision sciences to study visual

Experiment

Questions about categorization and generalization are fundamental to cognitive science, yet many open questions about them remain, particularly in the context of multisensory perception. Important questions include: To what extent does knowledge of object categories gained through one modality transfer to another modality? Is the amount of transfer the same for familiar and novel objects? For example, if a person learns to visually categorize a set of objects, can the person categorize these

Preliminary remarks regarding the MVH model

Our data show that participants transferred object category knowledge between visual and haptic modalities. How did they do this? To address this question, we propose a novel computational model, referred to as the MVH (Multisensory-Visual-Haptic) model, with several important properties. This model uses multisensory representations of prototypical 3-D shape. Like some previous models in the literature (Biederman, 1987; Marr & Nishihara, 1978), the model makes use of part-based representations

MVH (Multisensory-Visual-Haptic) model

This section provides the mathematical details of the MVH model. We describe the model from the perspective of a participant from Group V–H in our experiment. During training, the model is provided with images of Fribbles along with the Fribbles’ corresponding category labels. The model learns a multisensory representation of each category’s prototypical 3-D shape on the basis of this information. The model is provided with Fribbles’ haptic features during testing, and it estimates the category

Simulation results

In the simulations reported here, we used a slightly modified version of the See and Grasp data set for the four categories used in the experiment. We used three images of each Fribble rendered from three orthogonal viewpoints—a top view, a front view, and a right view. In addition, we simplified the images by using low-resolution images (80 pixels × 80 pixels) and by converting pixel values to binary numbers using a thresholding scheme. Therefore, the visual representation of a Fribble was a

Discussion

In summary, this article has addressed people’s abilities to transfer object category knowledge across visual and haptic domains. Our work has made three contributions. First, by fabricating Fribbles (3-D, multi-part objects with a categorical structure), we developed (and are making freely available on the web) visual-haptic stimuli that are highly complex and realistic. Second, we conducted an experiment evaluating whether people transfer object category knowledge across visual and haptic

Acknowledgements

We thank M. Tarr for making the 3-D object files for Fribbles available on his web pages. This work was supported by research grants from the National Science Foundation (DRL-0817250) and the Air Force Office of Scientific Research (FA9550-12-1-0303).

References (45)

  • L.W. Barsalou

    Grounded cognition

    Annual Review of Psychology

    (2008)
  • I. Biederman

    Recognition-by-components: A theory of human image understanding

    Psychological Review

    (1987)
  • C.M. Bishop

    Pattern recognition and machine learning

    (2006)
  • D.H. Brainard

    The psychophysics toolbox

    Spatial Vision

    (1997)
  • R.D. Easton et al.

    Do vision and haptics share common representations? Implicit and explicit memory within and between modalities

    Journal of Experimental Psychology: Learning, Memory and Cognition

    (1997)
  • J. Feldman et al.

    Bayesian estimation of the shape skeleton

    Proceedings of the National Academy of Sciences

    (2006)
  • I. Fine et al.

    Long-term deprivation affects visual perception and cortex

    Nature

    (2003)
  • N. Gaißert et al.

    Categorizing natural objects: A comparison of the visual and the haptic modalities

    Experimental Brain Research

    (2012)
  • N. Gaißert et al.

    Analyzing perceptual representations of complex, parametrically-defined shapes using MDS

  • N. Gaißert et al.

    Visual and haptic perceptual spaces show high similarity in humans

    Journal of Vision

    (2010)
  • F.E. Grubbs

    Sample criteria for testing outlying observations

    Annals of Mathematical Statistics

    (1950)
  • S. Haag

    Effects of vision and haptics on categorizing common objects

    Cognitive Processes

    (2011)
  • Cited by (39)

    • Impact of multisensory learning on perceptual and lexical processing of unisensory Morse code

      2021, Brain Research
      Citation Excerpt :

      While language is learned during childhood primarily via (spoken) speech, other sensory inputs such as facial expressions, gestures and lip movements additionally shape the processing of language (Rowe and Goldin-Meadow, 2009; Vigliocco et al., 2014; Özyürek, 2014). This additional information is supposed to strongly support the formation of MS feature representations like semantics (Doehrmann and Naumer, 2008) that, once built, can be reactivated by a variety of sensory inputs (Yildirim and Jacobs, 2013). Hence, language is a MS phenomenon involving high-level feature representations independent of the sensory modality (Fatma et al., 2019).

    • Widgets: A new set of parametrically defined 3D objects for use in haptic and visual categorization tasks

      2020, Revue Europeenne de Psychologie Appliquee
      Citation Excerpt :

      Unfortunately (at least to our knowledge), Fribbles and Greebles have not been used as stimuli in haptic categorization tasks so far, and we therefore have no evidence for their applicability to haptic categorization study. Indeed, Yildirim and Jacobs (2013) used Fribbles as materials for cross-modal transfer in a category learning paradigm, and James et al. (2005) tested Greebles as materials for haptic recognition in two adult participants only. In our view, complementary to Fribbles and Greebles, there is a need to design a new stimuli set to examine humans’ ability to categorize unfamiliar and complex 3D shapes in the haptic modality.

    • Infants use phonetic detail in speech perception and word learning when detail is easy to perceive

      2020, Journal of Experimental Child Psychology
      Citation Excerpt :

      As shown in Fig. 3, the variability in the formant trajectories occurs along F1 and F2 for /diːt/ but only in F1 for /duːt/ in both the habituation and test strings. The visual stimuli consisted of two objects that were selected from the Fribbles dataset (Yildirim & Jacobs, 2013) from the Object Databank (1996) that have been used successfully in previous word-learning paradigms (Kalashnikova & Burnham, 2016; Kalashnikova, Mattock, & Monaghan, 2015). The habituation stimuli consisted of videos of the two objects presented on a black background (Fig. 4).

    • An integrative computational architecture for object-driven cortex

      2019, Current Opinion in Neurobiology
      Citation Excerpt :

      In addition, aspects of multisensory perception and crossmodal transfer can be modeled by composing causal generative models for multiple sensory modalities that share the same underlying latent variables — those represented in the physics engine. Most of these extensions of our framework have been implemented computationally in some form, and received some behavioral support [62–64], but it is an open question whether or how these computations might be instantiated in object-driven cortex. Another important goal is to explore further how the computational architecture presented here connects to existing theoretical accounts of the parietal–frontal regions and their interactions [3,8–10].

    • Visuo-haptic object perception

      2019, Multisensory Perception: From Laboratory to Clinic
    View all citing articles on Scopus
    View full text