Abstract
Information about the material from which objects are made provide rich and useful clues that enable us to categorize and identify those objects, know their state (e.g., ripeness of fruits), and properly act on them. However, despite its importance, little is known about the neural processes that underlie material perception in nonhuman primates. Here we conducted an fMRI experiment in awake macaque monkeys to explore how information about various real-world materials is represented in the visual areas of monkeys, how these neural representations correlate with perceptual material properties, and how they correspond to those in human visual areas that have been studied previously. Using a machine-learning technique, the representation in each visual area was read out from multivoxel patterns of regional activity elicited in response to images of nine real-world material categories (metal, wood, fur, etc.). The congruence of the neural representations with either a measure of low-level image properties, such as spatial frequency content, or with the visuotactile properties of materials, such as roughness, hardness, and warmness, were tested. We show that monkey V1 shares a common representation with human early visual areas reflecting low-level image properties. By contrast, monkey V4 and the posterior inferior temporal cortex represent the visuotactile properties of material, as in human ventral higher visual areas, although there were some interspecies differences in the representational structures. We suggest that, in monkeys, V4 and the posterior inferior temporal cortex are important stages for constructing information about the material properties of objects from their low-level image features.
Introduction
In our daily life, we visually recognize what objects are made of based on their surface attributes, which can include color, gloss and texture. Information about the material composition help us to categorize and identify objects, know their state (e.g., freshness of fruits; Arce-Lopera et al., 2012) and decide how to interact with them (Buckingham et al., 2009). In the past few years, the neural mechanism underlying material perception has attracted attention in the field of visual psychophysics (Motoyoshi et al., 2007; for review, see Anderson, 2011), and more recently in the field of human neuroimaging. There is now growing evidence that the medial portion of the human ventral higher visual cortex is responsible for surface texture, an important attribute indicative of material (Cant and Goodale, 2007; Cant et al., 2009; Cavina-Pratesi et al., 2010a,b; Cant and Xu, 2012); indeed, this region represents such material properties as roughness and hardness in a perceptually relevant way (Hiramatsu et al., 2011) and is involved in making judgments about the hardness of materials (Cant and Goodale, 2011).
Our aim in the present study was to clarify how visual information about real-world materials is processed in the visual cortex of nonhuman primates. It has been demonstrated that neurons in V4 and the inferior temporal (IT) cortex of monkeys can discriminate natural textures (Arcizet et al., 2008; Köteles et al., 2008), and that they are sensitive to surface gloss, an important attribute for material perception (Nishio et al., 2012; Okazawa et al., 2012). These findings, together with the well documented color-sensitivity in these areas, raise the possibility that material perception involves V4 and the IT in monkeys. That said, the discriminability of material textures might be ascribable not to a difference in the material properties, but to the low-level image features, because material textures differ with respect to their image features, such as spatial frequency (Arcizet et al., 2008; Köteles et al., 2008). To date, no study has examined whether material properties, per se, are represented in these areas.
To address that issue, we took an approach that involved assessing the content of the information represented in multivoxel patterns of fMRI activity (Kriegeskorte et al., 2008), and examined where in the visual cortex of the monkey the material representation emerges. Specifically, we extended our earlier human fMRI analysis (Hiramatsu et al., 2011) to monkeys. This entailed reading out the neural similarity between materials from the activity patterns elicited by images of real-world materials, and asking whether the neural similarity is related to the similarity of visuotactile material properties (e.g., roughness and hardness), or to low-level image properties. Our results provide the first evidence that monkey V4 and the posterior IT (PIT) represent real-world materials in a way reflecting their visuotactile properties. This is in contrast to the early visual areas, which well reflect the low-level image properties. We also present representational similarities and differences between monkeys and humans, which provide new insights for linking neural representations across species, as well as between neural and perceptual representations.
Materials and Methods
Subjects
Two male macaque monkeys were used in this study (M1 and M2; Macaca fuscata, 6–7 kg). During training and scanning, each monkey was seated in the “sphinx” position in a horizontally oriented, custom-made monkey chair, as originally described by Vanduffel et al. (2001). The monkey's head was fixed to the chair using an implanted, MR-compatible headpost. Each monkey was extensively trained to perform a fixation task in a mock scanner environment. Detailed descriptions of the surgery and training are provided previously (Harada et al., 2009; Okazawa et al., 2012). All experimental procedures were in accordance with NIH guidelines and were approved by the Animal Experiment Committee of Okazaki National Research Institutes.
Visual stimuli
We used virtual 3D images of nine material categories (metal, ceramic, glass, stone, bark, wood, leather, fabric, and fur) rendered using NewTek LightWave 3D. Each category consisted of eight exemplars (Fig. 1), which had typical, but varied, surface attributes (texture, color, glossiness, and transparency/translucency) of its material category. The images were identical to those used in our earlier human study (Hiramatsu et al., 2011), except that they were resized and converted to 8-bit color images. Human subjects could accurately classify the images into the nine categories (mean accuracy across 9 categories = 0.84; chance level = 0.11; Hiramatsu et al., 2011). The material image (7.5° × 7.5°), in which an elongated virtual object subtending ∼4.5° width and 7.5° height was placed at the middle, was presented in the center of a uniform gray background (26° × 20°). The stimulus was displayed using a calibrated projection display system (Harada et al., 2009; Okazawa et al., 2012).
Experimental design
The stimuli were presented to the monkeys using a block design while they performed a fixation task. One scanning run consisted of nine category blocks interleaved with fixation-only blocks. Each block consisted of four fixation trials (each for ∼2500 ms) interleaved with short intervals (>700 ms). Each fixation trial began with the onset of a small central spot (∼0.2° × 0.2°) on which the monkey had to fixate, and ended with the offset of the spot. A liquid reward was given at the end of the trial. Two exemplar images from the same material category were presented during the fixation period in each trial (each exemplar image for 500 ms, interleaved with a 1000 ms interval), so that all eight exemplar images were presented during the successive four trials in one category block. The orders of the exemplars in each category block as well as the orders of category blocks in each run were randomized. During the scanning, each trial continued even when a saccade occurred during the fixation period, and a reward was given at the end of all trials to maintain the motivation of the monkeys. We analyzed the fixation performance offline, and discarded the data from runs in which the monkey performed poorly (see below). Because the monkeys were overtrained for fixation, the performances during scanning were generally good. The monkey's eye position was continuously recorded using an eye-tracking system based around an infrared CCD camera (60 Hz; Sony), and the task was controlled using custom-made software (Harada et al., 2009; Okazawa et al., 2012).
Data acquisition
Images were acquired with a Siemens 3T Allegra scanner using a surface coil (Takashima Seisakusyo). Functional images were collected using a gradient-echo EPI pulse sequence sensitive to BOLD contrast (TE/TR = 30/2000 ms, flip angle 80 deg, 1.25 mm in-plane resolution, slice thickness 1.6 mm, slice gap 0.32 mm). The images covered almost the entire occipital, temporal and parietal lobes, and part of the frontal lobe. T2-weighted anatomical images (inversion recovery turbo spin-echo, 0.75 mm in-plane resolution) were also acquired at the same locations as those used for the functional images.
A high-resolution anatomical image (MPRAGE; 0.5 mm isovoxel) was collected from each monkey under anesthesia in a separate scanning session (Harada et al., 2009; Okazawa et al., 2012) and the cortical surface was reconstructed from this image using CARET (http://www.nitrc.org/projects/caret/). The anatomical images and cortical surfaces from the two monkeys were matched with a common template space, which was created from the anatomical images of the two monkeys using Dartel toolbox (Ashburner, 2007) with the 112-RM macaque atlas (McLaren et al., 2009).
Data analysis
Each monkey performed >100 runs over 7–8 scanning sessions. The functional images in a given run were used for analyses only if the monkey fixated well (eye position should be inside fixation window (1.5° × 1.5°) for at least 95% of the total fixation period) and did not move too much (number of the image volumes containing >0.6 mm of translation should be <5% of the total volumes in the run). The number of analyzed runs was 94 and 85 for M1 and M2, respectively. The functional images were then split into two independent datasets, one for the main analysis (72 runs for each monkey) and another for the estimation of visual responsivity to the material images (22 and 13 runs for M1 and M2, respectively). The runs for the second dataset were evenly selected from all available runs concatenated across scanning sessions (every 4 runs for M1 and every 8 runs for M2), and the remaining runs were used as the first dataset for the main analysis.
Data preprocessing.
The functional images from the two monkeys were preprocessed using SPM8 (http://www.fil.ion.ucl.ac.uk/spm). After eliminating the first and last several volumes (in fixation-only block) in each run to allow for stabilization of the magnetization, the images were motion-corrected and registered with anatomical images. They were then spatially normalized to the common space using Dartel toolbox, and resampled in 1.0 mm isotropic voxels. The images were then spatially smoothed using a 2 mm full-width at half-maximum (FWHM) Gaussian kernel, globally scaled, and temporally high-pass filtered (cutoff 1/128 Hz).
Estimation of voxel responses to the materials.
To estimate the magnitudes of voxelwise responses to each material category, we used SPM8 to conduct a GLM analysis of the main dataset for each monkey. The model consisted of nine stimulus regressors, one for each of the nine categories, plus six head-motion regressors of no interest (translation and rotation in 3 dimensions) per run. The stimulus regressor was modeled by convolving the time series of the stimulus presentation with the macaque BOLD HRF measured by Leite et al. (2002). The spatial pattern of the estimated response magnitude (β values) for each of the nine categories was used for the following multivoxel pattern analysis (total of 9 patterns per run).
For the second dataset, we estimated the voxelwise response to each category using the GLM as the main dataset, and estimated the average response to all categories by contrasting all categories versus the fixation-only baseline (voxelwise t test). We regarded the obtained t value as visual responsivity.
Functional localizer and ROI definition.
We defined nine regions on each hemisphere: V1, V2, V3, V3A, V4, and the PIT, central IT (CIT), middle temporal complex (MT+), and fundus of the superior temporal area (FST). These regions were defined based on the results of retinotopic (meridian and center-periphery) mapping and motion localizer, which were conducted separately from the main material experiment. Detailed descriptions of these localizer experiments are available previously (Harada et al., 2009; Okazawa et al., 2012). Briefly, the meridian mapping run consisted of blocks of horizontal and vertical wedges, and center-periphery mapping run consisted of blocks of circular checkerboard patch (eccentricity <3 deg) and peripheral annulus (eccentricity 3–5 deg). The motion localizer run consisted of blocks of moving (expansion and contraction) random dots and stationary dots. In the retinotopic mapping and localizer experiments, monkeys performed the fixation task as in the material experiment and completed at least 14 runs in each experiment. The data were preprocessed and analyzed using SPM8 with the GLM as described above.
The borders of V1, V2, V3, V3A, and V4 were determined based on the meridian representation derived by contrasting horizontal versus vertical wedges (Fize et al., 2003). The MT+ was determined as a motion-responsive cluster in the posterior superior temporal sulcus (STS), defined by contrasting moving versus stationary dots (Vanduffel et al., 2001; Nelissen et al., 2006). The PIT and CIT were defined with reference to the CARET F99 atlas of the areal partitioning scheme by Felleman and Van Essen (1991), which was registered with individual hemispheres. The FST was also defined based on the atlas of Felleman and Van Essen (1991), because the boundary was not evident in our motion localizer and retinotopy data (Kolster et al., 2009). We used five ROIs for the main analyses: V1, V2, V3, V4, and the PIT. We did not analyze V3A or the CIT because these regions contained only small numbers of visually responsive voxels in some hemispheres (Fig. 2A). We also used six additional ROIs for detailed analyses: central visual field representation of V1, V4, and the PIT, the MT+/FST (MT+ plus FST, combined because of relatively small size), PITd, and PITv (dorsal and ventral parts of the PIT, respectively). The central visual field representation of V1, V4, and the PIT were defined based on the borders of 3° eccentricity derived by contrasting center versus peripheral stimuli. The PITd and PITv were separated anatomically at the lip of the STS according to the atlas of Felleman and Van Essen (1991). Furthermore, we defined functional clusters selective to face, place, object category, and object shape within the PIT for detailed analyses based on the GLM analysis of the data obtained in the separate face/place/object localizer experiment (Tsao et al., 2003; Denys et al., 2004; Pinsk et al., 2005; Bell et al., 2009; Ku et al., 2011; Nasr et al., 2011; Rajimehr et al., 2011; Lafer-Sousa and Conway, 2013). The face/place/object localizer run consisted of blocks of achromatic images of monkey faces, places (scenes), objects (fruits and man-made tools), and grid-scrambled objects (Okazawa et al., 2012). The face-, place (scene)-, object-category-, and object-shape-selective clusters were derived for each hemisphere by contrasting face versus object and place, place versus face and object, object versus face and place, and object versus grid-scrambled object, respectively (voxelwise t test; p < 0.01, uncorrected for multiple comparisons). We also defined color-selective clusters by using data from our previous fMRI experiment that measured responses to chromatic and achromatic Mondrian images for monkeys M1 and M2 (Harada et al., 2009). The color-selective clusters were derived by contrasting chromatic and achromatic images for each hemisphere (voxelwise t test; p < 0.01, uncorrected for multiple comparisons). Because the analysis in the previous study was performed in the native subject space, the t values were spatially transformed to the common template space in the present study using Dartel toolbox. These functional clusters except the place (scene), which were not evident in the PIT in some hemispheres, were used for the detailed analyses.
In each of the ROIs, the same numbers of voxels were selected for each hemisphere based on the visual responsivity determined using the second dataset as described above. We selected the 500 most visually responsive voxels (e.g., 500 voxels with highest t values) in each ROI (nearly the maximal number of voxels with t > 0 in V3 in some hemispheres) for the main analysis and the 250 most visually responsive voxels for the detailed analyses.
Pattern classification analysis.
Multivoxel pattern analysis was performed using a Princeton MVPA toolbox (http://www.pni.princeton.edu/mvpa/) in combination with LIBLINEAR (http://www.csie.ntu.edu.tw/∼cjlin/liblinear/), which implemented a linear support vector machine (SVM). We examined how accurately the nine material categories were classified using linear SVM based on the activity patterns in each ROI. The activity patterns from 72 runs in the main dataset were z-scored for each voxel for each run, and split into 12 datasets (6 runs in each). The classifier was trained using the activity patterns from 11 datasets (total, 594 patterns/66 runs) and tested on the remaining one dataset (54 patterns/6 runs) to determine the accuracy of the nine-category classification (Crammer–Singer multiclass classification method, chance level = 1/9). This cross-validation procedure was repeated 12 times while changing the training and test datasets, and the mean accuracy over the 12-fold cross-validation was computed. This accuracy was obtained separately for each of four hemispheres, and then the mean accuracy across the four hemispheres and the t value (mean accuracies across hemispheres minus chance, divided by the SE) were computed. We used a permutation-based t test (Nichols and Holmes, 2002) to assess whether the mean accuracies across hemispheres was significantly above the chance level; the significance was determined by comparing the actual t value with the t values under a null hypothesis generated by computing the classification accuracy using data with randomly shuffled category labels (2000 times). Results were considered significant at p < 0.05.
The accuracy of the nine-category classification was also obtained using an activity pattern that was combined for each run and for each category across the four hemispheres (i.e., 2000 voxels per ROI; Brouwer and Heeger, 2009; Hiramatsu et al., 2011; Popivanov et al., 2012). In this case, the accuracy was computed using 12-fold cross-validation as above, and the significance was assessed by comparing the accuracy with those under a null hypothesis generated using data with randomly shuffled category labels (2000 times, a random permutation test).
Representational similarity analysis.
We computed the neural dissimilarities between all pairs of categories (neural dissimilarity matrix) based on the activity patterns in each ROI, and compared them with dissimilarities in the low-level image properties and visuotactile material properties between categories. We defined pairwise classification accuracy as the neural dissimilarity between pairs of categories (Weber et al., 2009; Said et al., 2010; Hiramatsu et al., 2011). The pairwise classification accuracy was computed using linear SVM with the 12-fold cross-validation procedure as with the nine-category classification. The accuracy was obtained for each hemisphere and then averaged across the four hemispheres to obtain a group-averaged neural dissimilarity matrix, which was used for the main analyses. For the complementary analysis, we also obtained the neural dissimilarity matrix for each monkey by averaging the matrices from left and right hemispheres for each monkey.
We used dissimilarity matrices of image and material properties that were defined in our earlier human study (Hiramatsu et al., 2011). The dissimilarity in the image properties was based on 20 low-level image statistics of central square regions (3.2° × 3.2°). The image statistics were 8 pixel statistics of CIELAB coordinates (mean and SD of L*, a*, and b*, and skewness and kurtosis of L*), and 12 sub-band statistics (log mean magnitudes of 3 spatial frequencies × 4 orientations bands), which were derived using a steerable pyramid transform (Portilla and Simoncelli, 2000). The dissimilarity in the material properties was based on the results of a human psychological experiment, in which five human subjects were asked to rate their visual, tactile or conceptual impressions of each image using 12 bipolar adjective scales: matte–glossy, opaque–transparent, simple–complex, regular–irregular, colorful–colorless, smooth–rough, dry–wet, cold–warm, soft–hard, light–heavy, elastic–inelastic, and natural–artificial. These dissimilarities in image and material properties between categories were calculated from Euclidean distances between centroids of each category (mean across 8 exemplars) in the multivariate spaces of 20 low-level image features and 12 visuotactile/conceptual impressions, respectively.
The neural dissimilarity matrix for each ROI was tested to determine whether it was related to the dissimilarity matrix of the image properties or material properties by computing partial correlation coefficients while excluding the correlation between the dissimilarity matrices of the image and material properties. We opted for Spearman rank correlation as the measure of the correlation. The choice of the correlation measure did not affect the interpretation of the results in the present study or in our earlier human study. The Spearman simple correlation coefficient between dissimilarity matrices of the image and material properties was 0.289.
Interspecies comparisons.
We assessed whether the neural dissimilarity matrix for each of the monkey ROIs was congruent with those computed for the human ROIs measured in our earlier study (Hiramatsu et al., 2011) by computing Spearman simple correlation coefficients. We computed dissimilarity matrices for five human ROIs: V1/V2 (V1 plus V2), V3/V4 (V3 plus hV4; Wandell et al., 2007), FG/CoS (ventral higher visual area around fusiform gyrus, FG; and collateral sulcus, CoS), LOS/pITS (lateral higher visual area around lateral occipital sulcus, LOS; and posterior inferotemporal sulci, pITS), and V3AB/IPS (dorsal higher visual area that included V3A, V3B and the regions around the intraparietal sulcus, IPS). FG/CoS and LOS/pITS overlap the object-selective lateral occipital complex (LOC). Each ROI contained the 500 most visually responsive voxels for each human subject. The neural dissimilarity matrices for these human ROIs were derived based on the SVM classification accuracy between a pair of material categories averaged across five human subjects. In addition, the relationship among the dissimilarity matrices for the monkey and human ROIs, as well as those for the image and material properties, were visualized in a common low-dimensional space by using nonmetric multidimensional scaling (MDS; Kruskal's normalized stress criterion). In the MDS analysis, the distances between pairs of dissimilarity matrices were defined as one minus the Spearman simple correlation coefficients between them. Human V3AB/IPS was excluded from the MDS analysis, since inclusion of this ROI required >3 dimensions to approximate the distances.
Statistical tests of representational similarity.
We used a one-tailed random permutation test (Mantel test) to assess whether the partial/simple correlation between the dissimilarity matrices was significantly positive. The significance was determined by comparing the actual value (partial/simple correlation coefficient) with the distribution of those under a null hypothesis, which was generated by computing the values using neural dissimilarities with randomly shuffled category labels (10,000 times). Correction for multiple comparisons was made using maximum statistics method, which compares the actual correlation with the distribution of the maximum correlation over multiple comparisons under the null hypothesis (Nichols and Holmes, 2002). We reported uncorrected p values unless otherwise stated and results were considered significant at p < 0.05.
Similarity searchlight analysis.
A spherical searchlight analysis (Kriegeskorte et al., 2006) was performed to examine correlation between neural activities and image or material properties throughout the visual cortex without predefined ROIs. For each voxel in the visual cortex in each hemisphere, which covered V1, V2, V3, V3A, V4, MT+/FST, PIT, and CIT, the neural dissimilarity matrix was computed using local pattern of activity within a sphere (4 mm radius) centered at that voxel. The neural dissimilarity was based on the SVM pairwise classification accuracy, which was obtained using the same procedure as in the ROI analysis. The partial correlation (Spearman's rank correlation) between the neural dissimilarity and the dissimilarities of the image or material properties were then computed for each sphere, resulting in a map of partial correlations for each hemisphere. The maps were Fisher-transformed to z-values, spatially smoothed (4 mm FWHM) and averaged across the four hemispheres (left was flipped to right) to generate a group-averaged map of the partial correlations and a map of the t values (mean across hemispheres divided by the SE).
We used the one-tailed permutation-based t test to assess the statistical significance. We obtained 10,000 maps of t values under a null hypothesis by shuffling category labels of the neural dissimilarity matrix for all spheres in the same way. We then computed a p value at each voxel in the group-averaged map by comparing the actual t value (observed when using the correct labels) with the t values under the null hypothesis. The voxels were initially thresholded at p < 0.005 and corrected for multiple comparisons at the cluster level (p < 0.05). The minimum cluster sizes were estimated from null distribution of suprathreshold cluster sizes generated using the shuffled data (Nichols and Holmes, 2002). This group analysis was constrained on the voxels within the intersection of the visual cortices for the four hemispheres.
Results
Material information in monkey visual areas
We presented 72 images from nine different real-world material categories to two fixating monkeys using a block design. The material categories were metal, ceramic, glass, stone, bark, wood, leather, fabric, and fur (Fig. 1). The material image set activated wide regions of the visual cortex encompassing early visual areas and the PIT, as well as regions in the more anterior part of the IT (Fig. 2A). We divided these visually responsive regions into 5 ROIs (V1, V2, V3, V4, and PIT) based on separate retinotopic mapping and localizer data (Fig. 2A,B), and selected the 500 most visually responsive voxels from each ROI in each hemisphere (nearly the maximal number of visually responsive voxels available in V3).
We first tested for the discriminability between the nine material categories in these regions by asking how well they could be classified based on their activity patterns. We used linear SVM to compute the accuracy of the nine-way classification for each ROI in each hemisphere and then averaged them across the four hemispheres. The mean classification accuracies obtained were significantly greater than chance for all ROIs (Fig. 3, dark bars; accuracy = 0.171 and 0.172, p < 0.0005 for V1 and V2; accuracy = 0.145, 0.137, and 0.137, p = 0.004, 0.039 and 0.045, for V3, V4 and PIT, respectively; one-tailed permutation-based t test). We also computed the accuracy of the nine-way classification using the activity patterns concatenated across all hemispheres (Popivanov et al., 2012), because this method has been shown to improve the classification accuracy (Brouwer and Heeger, 2009; Hiramatsu et al., 2011). Consistent with earlier reports, the classification accuracies were improved in all ROIs (Fig. 3, light bars) and were highly significant (accuracy >0.176, p < 0.0005 for all ROIs; one-tailed permutation test). These results indicate that information about materials distribute across a wide region in the visual cortex, from the lower to higher visual areas. With both methods of classification, the accuracy tended to be high in earlier areas. It should be noted that, however, the classification accuracy levels depend on many, not yet fully understood, factors such as clustering of neurons with similar preferences. The relatively low accuracy in higher areas could be because information at the neuronal level is distributed relatively uniformly over the region, either on fine scale or on coarse scale.
Representational structures in monkey visual areas
We next explored the content of the information represented in each visual area by assessing the similarity/dissimilarity of the activity patterns evoked by the material images. Some categories (e.g., metal and glass) would give us similar visual and tactile impressions of the material properties (e.g., smooth and hard), but the low-level image properties of the images in those categories (e.g., spatial frequency magnitudes) would differ. This raises the question, does the similarity of the activity patterns in an area reflect similarity in the perceptual material properties, such as roughness, hardness, and warmness, or instead similarity in the low-level image properties? To address that question, we computed the neural dissimilarities between all pairs of material categories for each ROI and assessed how the neural dissimilarity was related to the measures of dissimilarity in the perceptual material properties or in the low-level image properties. We used linear SVM to evaluate the neural dissimilarity between pairs of material categories based on the pairwise classification accuracy; pairs of categories with higher accuracy were regarded as more dissimilar to each other (Weber et al., 2009; Said et al., 2010; Hiramatsu et al., 2011). For dissimilarities in image and material properties, we used the same measures used in our earlier human study assuming a commonality between material perception in humans and monkeys. The issue of interspecies differences will be considered later. Briefly, the dissimilarity in the perceptual material properties was determined based on 11 visual/tactile impressions and one conceptual impression of the material images measured using 12 bipolar adjective scales, whereas dissimilarity in the low-level image properties was determined based on 20 low-level image statistics (magnitudes of 3 spatial frequencies × 4 orientation sub-bands + 8 luminance/color pixel statistics) for the material images. The obtained dissimilarities between all pairs of categories were displayed as a matrix using a pseudocolor scale (Fig. 4, top row). This dissimilarity matrix summarizes the similarity/dissimilarity between different categories; for example, the two matrices show that metal and glass have relatively dissimilar low-level image properties (Fig. 4, top left, light color), but similar perceptual material properties (Fig. 4, top right, dark color).
We then examined whether the dissimilarity matrix obtained from neural activities (neural dissimilarity matrix, Fig. 4, left column) was related to the low-level image properties or perceptual material properties (Fig. 4, top row). Figure 4 shows that neural dissimilarity matrix for V1 was remarkably similar to the matrix of the image properties. By contrast, neural dissimilarity matrix for V4 and the PIT shared some common tendencies with the matrix of the material properties. We quantified whether the neural dissimilarity matrix was congruent with the dissimilarity matrix of the image properties or material properties by computing the correlation (Spearman's rank correlation) between them (Fig. 4a–f). Because there was weak correlation between the dissimilarity matrices of the image and material properties, we evaluated coefficients of partial correlation as the measure of congruence while excluding the correlation between the image and material properties. The partial correlation analysis revealed a marked difference in the representational structure between the early and higher visual areas (Fig. 5A). The neural dissimilarity matrix for V1 highly correlated with the dissimilarity matrix of low-level image properties (p = 0.0004; one-tailed permutation test) but not with that of the perceptual material properties (p = 0.271). By contrast, the activity in V4 and the PIT showed the opposite pattern; the dissimilarity matrices for these ROIs correlated significantly with the dissimilarity matrix of the perceptual material properties (p = 0.017 and 0.012, for V4 and the PIT, respectively) but not with that of the low-level image properties (p = 0.159 and 0.311, for V4 and the PIT, respectively). These differences remain significant after correction for multiple comparisons (image property: p = 0.0001 for V1; material property: p = 0.046 and 0.042, for V4 and the PIT, respectively; maximum statistics method). Further, the patterns of partial correlation were generally consistent when the data from individual monkeys were analyzed separately: the neural dissimilarity matrix for V1 highly correlated with the dissimilarity matrix of the image properties (M1: r = 0.690, p = 0.0003; M2: r = 0.748, p = 0.0006), whereas those for V4 and the PIT tended to correlate with the dissimilarity matrix of the material properties (M1: r = 0.287 and 0.319, p = 0.068 and 0.041, for V4 and the PIT, respectively; M2: r = 0.395 and 0.285, p = 0.021 and 0.063, for V4 and the PIT, respectively). These results indicate that activity in V1 represents simple, low-level image properties of the material images, whereas those in V4 and the PIT represent material properties manifested in visual/tactile and conceptual impressions. The extrastriate areas V2 and V3 showed patterns of partial correlation that were similar to V1: strongly significant correlations with image properties (p = 0.003 and 0.013 for V2 and V3, respectively) and weaker correlations with the material properties (Fig. 5A). The correlation with the material properties in these areas was, however, significantly positive in V2 (p = 0.042). This pattern of results suggests that these extrastriate areas, V2 in particular, were at a midway point between image-based representation in V1 and perceptual material representation in V4 and the PIT.
In the above analysis, the voxels in each area were selected according to visual responsivity, and there may be difference in the parts of the visual field represented in each area. We investigated whether this possible retinotopic bias could explain the observed correlation with the image or material properties by conducting partial correlation analysis for the voxels representing central visual field (eccentricity <3°) alone. The analysis revealed that the pattern of results for the central visual field representation were consistent with that for the entire area (Fig. 5B). The neural dissimilarity matrix for V1 highly correlated with the dissimilarity matrix of the image properties (p = 0.0029), but not with that of the material properties, and the neural dissimilarity matrices for V4 and the PIT correlated significantly with the matrix of the material properties (p = 0.019, and 0.024, for V4 and the PIT, respectively), although correlations with the image properties (p = 0.039 and 0.139, for V4 and the PIT, respectively) tended to be higher than those for the entire area. These results indicate that the significant correlation with the material properties in V4 and the PIT is not because the voxels in these areas represent different parts of the visual field from earlier areas.
Representations of anatomical/functional subdivisions in and around the PIT
It is known that there are anatomical or functional subdivisions in and around the PIT. We next investigated whether the material representation observed in the PIT could be localized to these anatomical/functional subdivisions. We first tested whether the representation differed between the dorsal and ventral parts of the PIT (PITd and PITv, respectively; Fig. 2A), which have often been assumed to be separate areas (Felleman and Van Essen, 1991; Kolster et al., 2009). We selected the 250 most visually responsive voxels (maximally attainable number for some hemispheres) from each subregion and computed the neural dissimilarity matrices separately using the activity patterns in the subregions. We then ran the partial correlation analysis as described above. We also analyzed the activity patterns in the MT+/FST, which is situated dorsal to the PIT (Fig. 2A). The results showed that the neural dissimilarity matrices for the PITd and PITv both correlated significantly with the dissimilarity matrix of the material properties (p = 0.020 and 0.013, for the PITd and PITv, respectively; Fig. 6A) but not with that of the image properties (p > 0.284, for both ROIs). These results indicate that the dorsal and ventral parts of the PIT similarly represent perceptual material properties. In contrast to these PIT subdivisions, the MT+/FST showed significantly positive correlation with the image properties (p = 0.006) but not with the material properties (Fig. 6A). The representational structure in this region is therefore quite different from that in the PIT.
We then asked whether the representational structures differed among functionally defined clusters in the PIT. It has been reported that images of objects evoke object-related fMRI activity in a large portion of the IT and that images of face and scene evoke clustered category-selective activations within the IT (Tsao et al., 2003; Denys et al., 2004; Pinsk et al., 2005; Bell et al., 2009; Ku et al., 2011; Nasr et al., 2011; Rajimehr et al., 2011; Lafer-Sousa and Conway, 2013). Consistent with those earlier reports, our functional localizer experiment using images of faces, scenes, objects and grid-scrambled objects revealed the PIT to contain large number of voxels responsive to objects (Fig. 2C, cyan). Because we defined this object-related activity by contrasting the activities evoked by objects versus grid-scrambled objects, it should mainly reflect selectivity for object shape. Based on this localizer data, we classified voxels in the PIT as voxels selective or nonselective to object shape (object-shape-selective or nonselective; Fig. 6B, white and gray regions), and then selected the 250 most visually responsive voxels from each of those groups. The selected object-shape-selective and nonselective voxels distributed in both the PITd and PITv although there was a bias toward larger number of the object-shape-selective voxels in the PITd (Fig. 2D). The partial correlation analysis indicated that the dissimilarity matrix derived from the object-shape-nonselective voxels within the PIT showed a strongly significant correlation with the material properties (p = 0.003; Fig. 6C) but not with the image properties (p = 0.089), whereas the matrix derived from the object-shape-selective voxels within the PIT showed nonsignificant correlation with the material properties (p = 0.178). These results suggest that object-shape-nonselective voxels play the main role in representing material properties within the PIT.
The localizer data also showed clusters of face-selective voxels present around the lip of the STS (Fig. 2C, green), and clusters of voxels more responsive to objects than to other categories (Fig. 2C, gray). Based on these data and the color-selectivity data obtained in our previous study (Fig. 2C, orange; Harada et al., 2009), we then investigated whether these feature/category selective voxels within the PIT are involved in the material representation. The numbers of these voxels were, however, much smaller than that of the object-shape-selective voxels (Fig. 6B), and the numbers in some hemispheres might not be sufficiently large for decoding information in monkey IT (Ku et al., 2008). For that reason, we assessed the contribution of a given set of voxels by eliminating those voxels and examining its effect (feature perturbation technique; Etzel et al., 2013): if a particular set of voxels were important for material representation, neural dissimilarity computed without such voxels would show degraded correlation with the material properties.
We defined the object-category-nonselective, face-nonselective, and color-nonselective voxels within the PIT as for the object-shape-nonselective voxels. Each of the object-category-nonselective, face-nonselective, and color-nonselective voxels contained the 250 most visually responsive voxels, which were included in the 500 most visually responsive voxels in the PIT. The partial correlation analysis showed that the object-category-nonselective, face-nonselective, and color-nonselective voxels in the PIT showed marginal or nonsignificant correlation with the material properties (p = 0.063, 0.042, and 0.10, for the object-category-nonselective, face-nonselective, and color-nonselective voxels, respectively; Fig. 6C). In other words, excluding the voxels selective to either object category, face, or color from the PIT degrades the correlation with the material properties. This pattern suggests the voxels selective to either object category, face, or color make some contribution to perceptual material representation within the PIT.
Dependencies on dissimilarity measures
The low-level image properties that we used consisted of sub-band magnitudes and color/luminance statistics. To assess what image features well reflect the neural dissimilarities, we conducted additional analyses for V1, V2, V3, V4, and PIT using the dissimilarity matrix of the image properties computed separately from 12 sub-band magnitudes (3 spatial frequencies × 4 orientation), luminance statistics (mean, SD, skewness, and kurtosis), or color statistics (mean and SD of a* and b*). For each ROI, we evaluated coefficients of partial correlation between the neural activities and each of these three types of image properties after excluding the correlation between the image and material properties as in the main analysis (Fig. 5A). The results revealed that the image properties computed from sub-band magnitudes well explained neural activities for early areas in a degree similar to the original image properties computed using all low-level image features, whereas the image properties computed from luminance statistics and from color statistics did not (Fig. 7A, left column). Thus, the differences in sub-band magnitudes made dominant contribution to the neural dissimilarities. Although the color-selective voxels would be involved in material representation to some degree (Fig. 6C), those voxels would represent more complex color features than those used in this analysis. We also evaluated coefficients of partial correlation between the neural activities and the material properties after excluding the correlation between the image and material properties for each type of image properties, because these estimates might change depending on the image properties. Figure 7A, right column, indicates partial correlation between the neural activities and the material properties computed after excluding the effect of the sub-band magnitudes (“sub-band” row), luminance statistics (“luminance” row), or color statistics (“color” row), respectively. The results showed that high partial correlation with the material properties in V4 and the PIT was reliably observed in all cases (V4: r ≥ 0.424, p ≤ 0.016; PIT: r ≥ 0.424, p ≤ 0.013; Fig. 7A, right column), confirming material representation in these regions. In addition, lack of the change in the partial correlation indicates that the high correlation between the neural activities and the material properties cannot be explained by the contribution of simple image features, such as sub-band magnitudes, luminance statistics, or color statistics.
In the analyses described so far we used the classification accuracy as a metric of the neural dissimilarity between material categories. We next examined how the results of the partial correlation analysis depend on the neural dissimilarity metric. We tested for additional two metrics of neural dissimilarity: Euclidean distance and correlation-based distance (1-Spearman simple correlation coefficient) between the multivoxel response patterns (Kriegeskorte et al., 2008; Hiramatsu et al., 2011). The neural dissimilarity matrices were computed using these metrics from the average response patterns to each of the material categories (β values averaged across all runs). The matrix was obtained for each ROI in each hemisphere, and then averaged across hemispheres. With both metrics, the neural dissimilarity matrices for V1 and V2 showed high correlation with the image properties (V1: r ≥ 0.698, p ≤ 0.006; V2: r ≥ 0.460, p ≤ 0.014; Fig. 7B, left, dark blue colors) and the matrix for the PIT showed significant correlation with the material properties (r ≥ 0.403, p ≤ 0.020; Fig. 7B, right, dark red colors, C, right), as observed with the classification-based neural dissimilarity (Fig. 7A, top row). Therefore, the pattern of partial correlation in these areas did not depend on the metrics of the neural dissimilarity. V3 and V4 tended to show variability depending on the metrics: in these areas, Euclidean distance between the responses patterns showed correlation with the material properties (V3: r = 0.43, p = 0.015; V4: r = 0.401, p = 0.026; Fig. 7B, right, C, left), but correlation-based distance did not. One important difference between these metrics is the contribution of the mean response amplitudes. Euclidean distance, as well as the classification accuracy, between the response patterns reflects differences in the mean response amplitudes between material categories, but correlation-based distance ignores them. The results thus suggest that the contribution of the regional mean responses to the representation is different between V3/V4 and higher area, as observed in humans (Hiramatsu et al., 2011).
In relation to the observation above, we performed a univariate analysis to investigate the regional mean responses (Fig. 7D). The mean response amplitudes (β weights) varied depending on the material categories in V1, V4, and PIT (F(8,24) = 19.3 and 16.5, p < 10−8, for V1 and V4, respectively; F(8,24) = 4.57, p = 0.002, for the PIT; repeated-measures ANOVA for each ROI). V1 responded strongly to metal, probably based on the low-level image features. The modulation of the response amplitudes for different categories in V4 was larger than that in the PIT. This may be related to the dependency on the neural dissimilarity metrics as described above. On the other hand, the average of the mean response amplitudes across all categories did not differ significantly among these ROIs (F(2,6) = 4.87, p = 0.055; repeated-measures ANOVA). Thus, the representational difference between ROIs is not ascribed to the difference in the average level of the activation.
Similarity searchlight
To complement the analyses with our predefined ROIs, we conducted a spherical searchlight analysis to map the representational similarity with the low-level image properties and perceptual material properties throughout the visual cortex (see Materials and Methods, Similarity searchlight analysis). Figure 8A shows the group-averaged map showing the centers of spheres where partial correlation with the image or material properties was significantly positive (p < 0.05 corrected for multiple comparisons; one-tailed permutation-based t test). Partial correlation with the image properties (blue regions) was significantly positive in posterior visual cortex around V1 and V2. The significant partial correlation with the material properties (red regions) was found in more anterior regions: parts of lunate and inferior occipital sulci overlapping with V4, and in the IT gyrus within the PIT, as well as in posterior regions around V2 and V3. We examined the location of the center of spheres that showed significant partial correlation with the image or material properties in the group-averaged map (Fig. 8A), by counting the number of them in each visual area. The numbers were calculated in each of the individual hemispheres and then averaged across hemispheres. The results indicated that the number of spheres showing correlation with the image properties in V1 and V2 was much larger than that of those showing correlation with the material properties (Fig. 8B). On the other hand, in V3, V4, and PIT, the number of spheres showing correlation with the material properties was much larger than that of those showing correlation with the image properties. It should be noted that spatial resolution in the searchlight analysis is limited (Etzel et al., 2013), because each sphere can contain voxels from multiple visual areas, when it is located around the area border or inside of a sulcus where different areas face each other. The high correlation with the material properties (but not with the image properties) in V3 would be possibly due to this limitation. Overall, the searchlight results are consistent with the results of the ROI analysis, providing further evidence for material representations in V4 and the PIT.
Representation of a specific material category
So far, we have opted to measure the dissimilarity in perceptual material properties based on ratings by human subjects and found that some areas in monkeys showed significant correlation with this measure. This would not be expected if material perception were substantially different across species. Thus, humans and monkeys likely share some degree of material perception in common. Nevertheless, it is also likely that monkeys and humans recognize some categories somewhat differently. If so, that difference may affect the estimates of the correlation between the activity pattern and the perceptual material properties, and the correlation between these measures may vary across material categories. Based on this idea, we investigated how the results of the partial correlation analysis shown in Figure 5A varied when one of the nine categories was excluded from the data.
The results revealed that the patterns of partial correlation were generally stable, even when one category was excluded from the dissimilarity matrices; partial correlations with image properties were high in early areas, V1 in particular (Fig. 9, left, dark blue colors), whereas partial correlation with material properties was generally high in V4 and the PIT (Fig. 9, middle, dark red colors). This is consistent with the general commonality of material perception in humans and monkeys. On the other hand, some categories do appear to influence the patterns of correlation (Fig. 9, right). For example, the neural dissimilarity matrix computed by excluding the ceramic category tended to show high correlation with the material properties in V4 and the PIT (p = 0.019 and 0.006, for V4 and the PIT, respectively; one-tailed permutation test). This implies that monkeys and humans recognize this material differently. Conversely, the neural dissimilarity matrix computed by excluding the metal category tended to show lowered correlation with the material properties (p = 0.151 and 0.060, for V4 and PIT, respectively). Thus, the neural and perceptual data for this material would make a relatively important contribution to the neural-perceptual correlation in the original analysis, probably because monkeys and humans recognize metal similarly.
Interspecies comparisons of the representational structures
In the present study and in our earlier human study, we used a common image set, essentially the same task, and the same measurement and analysis techniques. This enabled us to directly compare the neural representations across species, and to investigate how the neural representations in different visual areas in monkeys were related to those in humans. We first examined the representational similarity across species by computing interspecies correlation of dissimilarity matrices between five monkey ROIs and 5 human ROIs. The human ROIs were V1/V2, V3/V4, FG/CoS (ventral higher visual area around FG and CoS), LOS/pITS (lateral high visual area around LOS and pITS), and V3AB/IPS (dorsal higher visual area including V3A, V3B and the regions in IPS); neighboring early visual areas (e.g., V1 and V2) were combined to equate the numbers of voxels across ROIs. As in the present study, the voxels were selected for each ROI based on the visual responsivity to the material image set, and the neural dissimilarity matrix for each ROI was obtained based on the classification accuracy between material categories. Among these, FG/CoS was shown to reflect human perception well (Hiramatsu et al., 2011).
It is widely assumed that monkey early visual areas, V1 in particular, are functionally similar to the corresponding human areas, and that the monkey IT is a homolog of a part of human lateral and ventral higher visual areas (Kriegeskorte et al., 2008). The pattern of the interspecies correlations obtained was generally consistent with those ideas (Fig. 10A): monkey V1 and V2 showed strong correlation with human V1/V2, and the monkey PIT tended to correlate with the human ventral higher area FG/CoS, although monkey V3 and V4 did not show clear correlation with human V3/V4. We tested the significance of the correlation between monkey V1 and human V1/V2, both of which have been shown to reflect low-level image properties well, and confirmed that the representations in these areas were significantly correlated (r = 0.64, p = 0.002; one-tailed permutation test). We also tested whether the representations in monkey V4 and the PIT correlated significantly with that in the human FG/CoS, as all of these areas have been shown to be involved in perceptual material representation. The results showed that the correlations were significant only between the monkey PIT and human FG/CoS (V4, r = 0.30, p = 0.063; PIT, r = 0.45, p = 0.005). Thus, the representation in the human FG/CoS would be more similar to the monkey PIT than V4.
We next applied nonmetric MDS to visualize the relationship between the representational structures in the visual areas of monkeys and humans in a common low-dimensional space (Fig. 10B). Within this space, strongly correlated pairs (i.e., similar in representation) lie in close proximity and weakly correlated pairs are widely separated. This analysis took into account not only the interspecies correlations shown in Figure 10A, but also intraspecies correlations (e.g., between monkey V1 and V2). The neural dissimilarity matrices for five monkey ROIs and four human ROIs, as well as the dissimilarity matrices of the image and material properties, were used in this analysis, which enabled us to visualize the distances between the dissimilarity matrices in a 2-dimensional space (stress < 0.1; Fig. 10B, inset). Consistent with the results summarized above, within the MDS-derived space, the monkey V1, human V1/V2 and image properties are situated close to one another, whereas the monkey PIT and human FG/CoS are both close to the perceptual material properties. The overall configurations in this space well reflect the hierarchy from early to higher visual areas in both species, although the monkey areas and human areas followed separate paths. This suggests that, although there are some representational differences between the species, the image-based representation in the early area was transformed to perceptual material representation along the ventral path in both species.
Discussion
Our findings demonstrate that the activity patterns in the early and higher visual areas of monkeys carry information about materials. Importantly, the early and higher visual areas differ in the way they represent material information. Whereas activity patterns in the early visual areas, particularly V1, well reflect low-level image properties, those in V4 and the PIT reflect perceptual material properties. This suggests that, in monkeys, V4 and the PIT are important stages for constructing information about the material properties of objects from low-level image features. In a separate analysis, we also found concordant representations between species; neural representations in early areas (monkey V1 and human V1/V2) and higher areas (monkey PIT and human FG/CoS) share similar representational structures across species. Further analysis suggested that, within the PIT, voxels selective to object-category, face, and color contributed to some degree to material representation. Interestingly, information about material properties is carried by the activities of functional clusters with little selectivity for object shape, rather than by those selective for object shape. This is in line with the observation in human imaging studies that information about material/texture and shape are represented in separate regions within the human ventral higher areas; whereas material/texture involves medial/ventral parts, shape involves lateral/dorsal parts (Peuskens et al., 2004; Cant and Goodale, 2007; Cant et al., 2009; Cavina-Pratesi et al., 2010a,b; Cant and Goodale, 2011; Cant and Xu, 2012). Our results suggest the monkey PIT is functionally organized for separate processing of object surface and shape, as in humans, although the anatomical segregation (e.g., medial/ventral vs lateral/dorsal) between surface and shape may be less distinct than in humans.
Our approach in this study was to investigate the content of information represented in cortical areas by analyzing the similarity of multivoxel patterns of activity (for review, see Kriegeskorte and Kievit, 2013). The analysis assumes that information in a region can be read out from the activity pattern, because information at the neuronal level is not uniformly distributed over the region. Previous studies have suggested that multivoxel patterns of fMRI activity in the monkey IT carry information about object category (Tsao et al., 2003; Ku et al., 2008; Popivanov et al., 2012; Liu et al., 2013), shape (Op de Beeck et al., 2008), and facial expressions (Furl et al., 2012). Our present findings provide new evidence that information about materials also resides in the activity pattern in part of the monkey IT and in earlier areas. Importantly, ours is the first evidence that fMRI activity patterns in the monkey higher areas reflect perceptual material categories. Further, we observed interspecies commonality and differences in the neural and perceptual representations, adding new insight to previous attempts to link object representations at the levels of single neuron activity in monkeys, fMRI activity in monkeys and humans, and human perception (Kiani et al., 2007; Kriegeskorte et al., 2008; Liu et al., 2013; Mur et al., 2013).
Processing of surface attributes in monkey V4 and the IT
Neurons in monkey V4 and IT exhibit selectivity for artificial textures (Komatsu and Ideura, 1993; Kobatake and Tanaka, 1994; Hanazawa and Komatsu, 2001) and for real-world natural textures, such as leaf (Arcizet et al., 2008; Köteles et al., 2008). The neurons in these areas can also distinguish natural textures independently of their shape and direction of illumination, though the population responses of neurons in these areas could be explained to some extent by low-level image features (Arcizet et al., 2008; Köteles et al., 2008). We tested whether these areas do indeed represent information about materials, and provide clear evidence that the activity patterns in V4 and the PIT cannot be ascribed merely to low-level image features; instead, they reflect the material properties. We suggest the low-level features are transformed, probably through V2 (Freeman et al., 2013), to information about material properties at the level of V4 and the PIT.
Material perception requires both texture information and surface reflectance information, such as color and gloss. V4 and the PIT defined in the present study encompass the gloss-selective regions identified in our recent fMRI experiment (Okazawa et al., 2012). These regions also exhibit color-selective fMRI activity (Conway and Tsao, 2006; Conway et al., 2007; Wade et al., 2008; Harada et al., 2009; Lafer-Sousa and Conway, 2013), suggesting information about gloss and color resides in both V4 and the PIT. The information about surface properties as well as texture in these regions would be related to activities reflecting the material properties. Consistent with this idea, we suggested that color-selective voxels in the PIT contribute to material representation to some degree (Fig. 6C). In some hemispheres, gloss- and color-selective fMRI activities have also been observed in the CIT, a region anterior to the PIT (Harada et al., 2009; Okazawa et al., 2012; Lafer-Sousa and Conway, 2013). Moreover, neurons in the CIT have been found to selectively respond to particular types of gloss (Nishio et al., 2012). Thus, the CIT could potentially carry information about gloss and color. In the present study, we did not analyze material representation in the CIT because of the weak response to material images in this region (Fig. 2A). However, because the weakness of the response in the CIT is due in part to susceptibility artifacts (Harada et al., 2009), further research will be necessary to conclusively determine whether the CIT represents material properties. In that study, techniques with high sensitivity (e.g., use of contrast agent and high magnetic field) would be helpful.
Representations of materials, objects, and scenes in the IT
It has been suggested that, in humans, various object categories are represented semantically and hierarchically in the higher visual areas, where animate/living versus inanimate/nonliving object classes is one important semantic dimension (Kriegeskorte et al., 2008; Haxby et al., 2011; Connolly et al., 2012). So one may argue that the representation we observed in the IT might reflect not material but object classes with which the materials are associated (e.g., leather and fur might be associated with the animate/living object class, metal and stone with the inanimate/nonliving class). We suggest this is not the case, however. First, it remains controversial whether the animate-inanimate dimension is important for object representation in the monkey IT (Popivanov et al., 2012; Liu et al., 2013). Second, such representational structure has so far been suggested only for objects with a typical shape. We used virtual objects with nonsense shapes and found that information about the materials was represented in a region that was not selective for object shape, as argued earlier. Thus, the representation observed in this study was based on information about the surface, not about the shape.
It is worth considering the relationship between representation of materials and scenes, since some human studies have reported that a material/texture-selective region overlaps a scene-selective region (parahippocampal place area) in the medial portion of the ventral visual cortex (Cant and Goodale, 2011; Cant and Xu, 2012). Recent monkey fMRI studies reported scene-selective activity on the lateral/ventral surface of the IT gyrus, and around dorsal V4 and V3A (Nasr et al., 2011; Rajimehr et al., 2011). These regions also responded to high spatial frequency components, such as surface bumps (Rajimehr et al., 2011). The monkey PIT examined in the present study could possibly overlap part of the scene-selective region, although this remains unclear because scene-selective clusters were not evident in our localizer data. It would be of interest to know whether material information is represented in these scene-selective regions.
Commonality and differences in neural representation across species
Our analysis revealed representational similarity across species in the early visual area and in higher areas, but also showed a general tendency toward representational differences across species (Fig. 10). In particular, there was little correlation between representation in monkey V4 and that in human V3/V4 (Fig. 10A). It was recently suggested that the activities in monkey V4 do not functionally correlate with those in human V4 (hV4), but do correlate with those in higher areas, such as the LOC (Mantini et al., 2012a,2012b). Our results are in line with that finding, which suggests the correspondence between the visual areas of humans and monkeys become complex at this midlevel in the hierarchy. This idea is also supported by our MDS analysis of the relationship between representational structures in monkeys and humans (Fig. 10B).
We also suggest that there are some interspecies differences in material perception (Fig. 9). For example, the representation for metal might be similar in the two species, but representation for ceramic might differ. This is interesting, given that the prior experiences of the monkey subjects with these materials differ substantially: they have been visually and haptically exposed to metallic things in the animal facilities for several years, but they probably had little or no exposure to ceramic. It will be important in the future to clarify how the monkeys categorize these material images, and whether observed interspecies differences are attributable to differences in the subjects' visuohaptic experience, or to other factors, such as behavioral and/or evolutionary significance.
Footnotes
This study was supported by Grants-in-Aid for Scientific Research (22500248, 25330179) from Japan Society for the Promotion of Science (JSPS), Japan to N.G, and Grant-in-Aid for Scientific Research on Innovative Areas “Shitsukan” (22135007) from Ministry of Education, Culture, Sports, Science and Technology (MEXT), Japan to N.G. and H.K. We thank T. Ohta for help conducting monkey training and experiments, M. Takagi for technical assistance, and K. Matsuda for providing the eye-tracking software.
The authors declare no competing financial interests.
- Correspondence should be addressed to Dr Naokazu Goda, Division of Sensory and Cognitive Information, National Institute for Physiological Sciences, Myodaiji, Okazaki 444-8585, Japan. ngoda{at}nips.ac.jp