PT  - JOURNAL ARTICLE
AU  - Nobuhiko Wagatsuma
AU  - Akinori Hidaka
AU  - Hiroshi Tamura
TI  - Correspondence between monkey visual cortices and layers of a saliency map model based on a deep convolutional neural network for representations of natural images
AID  - 10.1523/ENEURO.0200-20.2020
DP  - 2020 Nov 24
TA  - eneuro
PG  - ENEURO.0200-20.2020
4099  - http://www.eneuro.org/content/early/2020/11/23/ENEURO.0200-20.2020.short
4100  - http://www.eneuro.org/content/early/2020/11/23/ENEURO.0200-20.2020.full
AB  - Attentional selection is a function that allocates the brain’s computational resources to the most important part of a visual scene at a specific moment. Saliency map models have been proposed as computational models to predict attentional selection within a spatial location. Recent saliency map models based on deep convolutional neural networks (DCNNs) exhibit the highest performance for predicting the location of attentional selection and human gaze, which reflect overt attention. Trained DCNNs potentially provide insight into the perceptual mechanisms of biological visual systems. However, the relationship between artificial and neural representations used for determining attentional selection and gaze location remains unknown. To understand the mechanism underlying saliency map models based on DCNNs and the neural system of attentional selection, we investigated the correspondence between layers of a DCNN saliency map model and monkey visual areas for natural image representations. We compared the characteristics of the responses in each layer of the model with those of the neural representation in the primary visual (V1), intermediate visual (V4), and inferior temporal cortices. Regardless of the DCNN layer level, the characteristics of the responses were consistent with that of the neural representation in V1. We found marked peaks of correspondence between V1 and the early level and higher-intermediate-level layers of the model. These results provide insight into the mechanism of the trained DCNN saliency map model and suggest that the neural representations in V1 play an important role in computing the saliency that mediates attentional selection, which supports the V1 saliency hypothesis.Significance Statement Trained deep convolutional neural networks (DCNNs) potentially provide insight into the perceptual mechanisms of biological visual systems. However, the relationship between artificial and neural representations for determining attentional selection and gaze location has not been identified. We compared the characteristics of the responses in each layer of a DCNN model for predicting attentional selection with those of the neural representation in visual cortices. We found that the characteristics of the responses in the trained DCNN model for attentional selection were consistent with that of the representation in the primary visual cortex (V1), suggesting that the activities in V1 underlie the neural representations of saliency in the visual field to exogenously guide attentional selection. This study supports the V1 saliency hypothesis.