2012 Special IssueMulti-column deep neural network for traffic sign classification
Introduction
The human visual system efficiently recognizes and localizes objects within cluttered scenes. For artificial systems, however, this is still difficult, due to viewpoint-dependent object variability, and the high in-class variability of many object types. Deep hierarchical neural models roughly mimic the nature of mammalian visual cortex, and are among the most promising architectures for such tasks. The most successful hierarchical object recognition systems all extract localized features from input images, convolving image patches with filters. Filter responses are then repeatedly pooled and re-filtered, resulting in a deep feed-forward network architecture whose output feature vectors are eventually classified. One of the first hierarchical neural systems was the Neocognitron by Fukushima (1980), which inspired many of the more recent variants.
Unsupervised learning methods applied to patches of natural images tend to produce localized filters that resemble off-center-on-surround filters, orientation-sensitive bar detectors, Gabor filters (Hoyer and Hyvärinen, 2000, Olshausen and Field, 1997, Schmidhuber et al., 1996). These findings in conjunction with experimental studies of the visual cortex justify the use of such filters in the so-called standard model for object recognition (Mutch and Lowe, 2008, Riesenhuber and Poggio, 1999, Serre et al., 2005), whose filters are fixed, in contrast to those of Convolutional Neural Networks (CNNs) (Behnke, 2003, LeCun et al., 1998, Simard et al., 2003), whose weights (filters) are randomly initialized and learned in a supervised way using back-propagation (BP). A DNN, the basic building block of our proposed MCDNN, is a hierarchical deep neural network, alternating convolutional with max-pooling layers (Riesenhuber and Poggio, 1999, Scherer et al., 2010, Serre et al., 2005). A single DNN of our team won the offline Chinese character recognition competition (Liu, Yin, Wang, & Wang, 2011), a classification problem with 3755 classes. Ciresan, Meier, Gambardella, and Schmidhuber (2011) report state-of-the-art results on isolated handwritten character recognition using a MCDNN with 7 columns. Meier, Ciresan, Gambardella, and Schmidhuber (2011) show that there is no need for optimizing the combination of different DNNs: simply averaging their outputs generalizes just as well or even better on the unseen test set.
Despite the hardware progress of the past decades, computational speed is still a limiting factor for deep architectures characterized by many building blocks. For our experiments we therefore rely on a fast implementation on Graphics Processing Units (GPUs) (Ciresan, Meier, Masci, Gambardella, & Schmidhuber, 2011a). Our implementation is flexible and fully online (i.e., weight updates after each image). It allows for training large DNN within days instead of months, thus making MCDNN feasible.
Recognizing traffic signs is essential for the automotive industry’s efforts in the field of driver assistance, and for many other traffic-related applications. The German traffic sign recognition benchmark (GTSRB) (Stallkamp, Schlipsing, Salmen, & Igel, 2011), a 43 class classification challenge, consisted of two phases: an online preliminary evaluation followed by an on-site final competition at the International Joint Conference on Neural Networks in 2011. We won the preliminary phase (Ciresan, Meier, Masci, & Schmidhuber, 2011b) using a committee of (Multi-Layer Perceptrons) MLP trained on provided features, and a DNN trained on raw pixel intensities. Here we present the method that won the on-site competition using a MCDNN, instead of a committee of MLP and DNN. Our new approach does not use handcrafted features anymore, relying only on the raw pixel images.
We first give a brief description of our MCDNN architecture, then describe the creation of the training set and the data preprocessing. We conclude by summarizing the results obtained during the on-site competition.
Section snippets
Multi-column deep neural networks
As a basic building block we use a deep hierarchical neural network that alternates convolutional with max-pooling layers, reminiscent of the classic work of Hubel and Wiesel (1962) and Wiesel and Hubel (1959) on the cat’s primary visual cortex, which identified orientation-selective simple cells with overlapping local receptive fields and complex cells performing down-sampling-like operations. Such architectures vary in how simple and complex cells are realized and how they are
Experiments
We use a system with a Core i7–950 (3.33 GHz), 24 GB DDR3, and four graphics cards of type GTX 580. Images from the training set might be translated, scaled and rotated, whereas only the undeformed, original or preprocessed images are used for validation. Training ends once the validation error is zero (usually after 15–30 epochs). Initial weights are drawn from a uniform random distribution in the range [−0.05, 0.05]. Each neuron’s activation function is a scaled hyperbolic tangent (e.g. LeCun
Conclusion
Our MCDNN won the German traffic sign recognition benchmark with a recognition rate of 99.46%, better than the one of humans on this task (98.84%), with three times fewer mistakes than the second best competing algorithm (98.31%). Forming a MCDNN from 25 nets, 5 per preprocessing method, increases the recognition rate from an average of 98.52%–99.46%. None of the preprocessing methods are superior in terms of single DNN recognition rates, but combining them into a MCDNN increases robustness to
Acknowledgment
This work was partially supported by a FP7-ICT-2009-6 EU Grant under Project Code 270247: A Neuro-dynamic Framework for Cognitive Robotics: Scene Representations, Behavioral Sequences, and Learning.
References (27)
- et al.
Sparse coding with an overcomplete basis set: a strategy employed by V1?
Vision Research
(1997) Pattern recognition and machine learning
(2006)- et al.
Deep, big, simple neural nets for handwritten digit recognition
Neural Computation
(2010) - et al.
Convolutional neural network committees for handwritten character classification
- et al.
Flexible, high performance convolutional neural networks for image classification
- et al.
A committee of neural networks for traffic sign classification
The combining classifier: to train or not to train?
Neocognitron: a self-organizing neural network for a mechanism of pattern recognition unaffected by shift in position
Biological Cybernetics
(1980)- et al.
Improving model accuracy using optimal linear combinations of trained neural networks
Transactions on Neural Networks
(1995)
Gradient flow in recurrent nets: the difficulty of learning long-term dependencies
Independent component analysis applied to feature extraction from colour and stereo images
Network: Computation in Neural Systems
Receptive fields, binocular interaction, and functional architecture in the cat’s visual cortex
Journal of Physiology (London)
Cited by (880)
A novel approach for industrial concrete defect identification based on image processing and deep convolutional neural networks
2023, Case Studies in Construction MaterialsComputer vision model for sorghum aphid detection using deep learning
2023, Journal of Agriculture and Food Research