Multi-column deep neural network for traffic sign classification

doi:10.1016/j.neunet.2012.02.023

Neural Networks

Volume 32, August 2012, Pages 333-338

https://doi.org/10.1016/j.neunet.2012.02.023 Get rights and content

Abstract

We describe the approach that won the final phase of the German traffic sign recognition benchmark. Our method is the only one that achieved a better-than-human recognition rate of 99.46%. We use a fast, fully parameterizable GPU implementation of a Deep Neural Network (DNN) that does not require careful design of pre-wired feature extractors, which are rather learned in a supervised way. Combining various DNNs trained on differently preprocessed data into a Multi-Column DNN (MCDNN) further boosts recognition performance, making the system insensitive also to variations in contrast and illumination.

Introduction

The human visual system efficiently recognizes and localizes objects within cluttered scenes. For artificial systems, however, this is still difficult, due to viewpoint-dependent object variability, and the high in-class variability of many object types. Deep hierarchical neural models roughly mimic the nature of mammalian visual cortex, and are among the most promising architectures for such tasks. The most successful hierarchical object recognition systems all extract localized features from input images, convolving image patches with filters. Filter responses are then repeatedly pooled and re-filtered, resulting in a deep feed-forward network architecture whose output feature vectors are eventually classified. One of the first hierarchical neural systems was the Neocognitron by Fukushima (1980), which inspired many of the more recent variants.

Unsupervised learning methods applied to patches of natural images tend to produce localized filters that resemble off-center-on-surround filters, orientation-sensitive bar detectors, Gabor filters (Hoyer and Hyvärinen, 2000, Olshausen and Field, 1997, Schmidhuber et al., 1996). These findings in conjunction with experimental studies of the visual cortex justify the use of such filters in the so-called standard model for object recognition (Mutch and Lowe, 2008, Riesenhuber and Poggio, 1999, Serre et al., 2005), whose filters are fixed, in contrast to those of Convolutional Neural Networks (CNNs) (Behnke, 2003, LeCun et al., 1998, Simard et al., 2003), whose weights (filters) are randomly initialized and learned in a supervised way using back-propagation (BP). A DNN, the basic building block of our proposed MCDNN, is a hierarchical deep neural network, alternating convolutional with max-pooling layers (Riesenhuber and Poggio, 1999, Scherer et al., 2010, Serre et al., 2005). A single DNN of our team won the offline Chinese character recognition competition (Liu, Yin, Wang, & Wang, 2011), a classification problem with 3755 classes. Ciresan, Meier, Gambardella, and Schmidhuber (2011) report state-of-the-art results on isolated handwritten character recognition using a MCDNN with 7 columns. Meier, Ciresan, Gambardella, and Schmidhuber (2011) show that there is no need for optimizing the combination of different DNNs: simply averaging their outputs generalizes just as well or even better on the unseen test set.

Despite the hardware progress of the past decades, computational speed is still a limiting factor for deep architectures characterized by many building blocks. For our experiments we therefore rely on a fast implementation on Graphics Processing Units (GPUs) (Ciresan, Meier, Masci, Gambardella, & Schmidhuber, 2011a). Our implementation is flexible and fully online (i.e., weight updates after each image). It allows for training large DNN within days instead of months, thus making MCDNN feasible.

Recognizing traffic signs is essential for the automotive industry’s efforts in the field of driver assistance, and for many other traffic-related applications. The German traffic sign recognition benchmark (GTSRB) (Stallkamp, Schlipsing, Salmen, & Igel, 2011), a 43 class classification challenge, consisted of two phases: an online preliminary evaluation followed by an on-site final competition at the International Joint Conference on Neural Networks in 2011. We won the preliminary phase (Ciresan, Meier, Masci, & Schmidhuber, 2011b) using a committee of (Multi-Layer Perceptrons) MLP trained on provided features, and a DNN trained on raw pixel intensities. Here we present the method that won the on-site competition using a MCDNN, instead of a committee of MLP and DNN. Our new approach does not use handcrafted features anymore, relying only on the raw pixel images.

We first give a brief description of our MCDNN architecture, then describe the creation of the training set and the data preprocessing. We conclude by summarizing the results obtained during the on-site competition.

Section snippets

Multi-column deep neural networks

As a basic building block we use a deep hierarchical neural network that alternates convolutional with max-pooling layers, reminiscent of the classic work of Hubel and Wiesel (1962) and Wiesel and Hubel (1959) on the cat’s primary visual cortex, which identified orientation-selective simple cells with overlapping local receptive fields and complex cells performing down-sampling-like operations. Such architectures vary in how simple and complex cells are realized and how they are

Experiments

We use a system with a Core i7–950 (3.33 GHz), 24 GB DDR3, and four graphics cards of type GTX 580. Images from the training set might be translated, scaled and rotated, whereas only the undeformed, original or preprocessed images are used for validation. Training ends once the validation error is zero (usually after 15–30 epochs). Initial weights are drawn from a uniform random distribution in the range [−0.05, 0.05]. Each neuron’s activation function is a scaled hyperbolic tangent (e.g. LeCun

Conclusion

Our MCDNN won the German traffic sign recognition benchmark with a recognition rate of 99.46%, better than the one of humans on this task (98.84%), with three times fewer mistakes than the second best competing algorithm (98.31%). Forming a MCDNN from 25 nets, 5 per preprocessing method, increases the recognition rate from an average of 98.52%–99.46%. None of the preprocessing methods are superior in terms of single DNN recognition rates, but combining them into a MCDNN increases robustness to

Acknowledgment

This work was partially supported by a FP7-ICT-2009-6 EU Grant under Project Code 270247: A Neuro-dynamic Framework for Cognitive Robotics: Scene Representations, Behavioral Sequences, and Learning.

References (27)

B.A. Olshausen et al.
Sparse coding with an overcomplete basis set: a strategy employed by V1?
Vision Research
(1997)
S. Behnke
C.M. Bishop
Pattern recognition and machine learning
(2006)
D.C. Ciresan et al.
Deep, big, simple neural nets for handwritten digit recognition
Neural Computation
(2010)
D.C. Ciresan et al.
Convolutional neural network committees for handwritten character classification
D.C. Ciresan et al.
Flexible, high performance convolutional neural networks for image classification
D.C. Ciresan et al.
A committee of neural networks for traffic sign classification
R.P.W. Duin
The combining classifier: to train or not to train?
K. Fukushima
Neocognitron: a self-organizing neural network for a mechanism of pattern recognition unaffected by shift in position
Biological Cybernetics
(1980)
S. Hashem et al.
Improving model accuracy using optimal linear combinations of trained neural networks
Transactions on Neural Networks
(1995)

S. Hochreiter et al.

Gradient flow in recurrent nets: the difficulty of learning long-term dependencies

P.O. Hoyer et al.

Independent component analysis applied to feature extraction from colour and stereo images

Network: Computation in Neural Systems

(2000)

D.H. Hubel et al.

Receptive fields, binocular interaction, and functional architecture in the cat’s visual cortex

Journal of Physiology (London)

(1962)

Cited by (880)

Reduce the delivery time and relevant costs in a chaotic requests system via lean-Heijunka model to enhance the logistic Hamiltonian route
2024, Results in Engineering
Online supply chain management (OSCM) is the smart way to deal with the vast amounts of data that come in from customers in a disorganized system to meet the quantities, volumes, and types of customer packages during both delivery and pick-up phases using a new design of vehicle boxes managed by IoT and to track their requests based on scheduling requests and sorting them to make a Hamiltonian route that guarantees the shortest travel distance. The OSCM framework consists of two sequential phases. 1st phase has four recruitment stages. The 1st stage discusses exploration resources (the relationship between the client and the vehicle) using IoT to receive customers' requests (Heijunka growth radius), then moves to exploration maturity to build a one-way Hamiltonian growth route direction. The 1st stage is based on tackling a Heijunka matrix fed through deep learning to classify the matrix into many conditional clusters according to customers' request forecasting and make the prediction value, which is the stop condition of cluster radius through next three stages. This study finds that XGboost outperforms Ada-boost by 14.352 % in the prediction stage. A heuristic rule based on NWBS enhances the FP-Growth algorithm over ECLAT by 7.648 % in the classification stage. Phase II is interested in reducing load and unloading activity time. This problem describes needing more than a different service at the same point (i.e., chaotic and unstable interaction leads to unstable delivery). Therefore, the online scheduling and tracking of the logistic routing using the IoT that Smart Lean Heijunka supports will enhance the SCM, increasing the visited points by 31.2 % and improving the profit by 41 %.
A lightweight network for traffic sign recognition based on multi-scale feature and attention mechanism
2024, Heliyon
Traffic sign recognition is an important part of intelligent transportation system. It uses computer vision and traffic sign recognition technology to detect and recognize traffic signs on the road automatically. In this paper, we propose a lightweight model for traffic sign recognition based on convolutional neural networks called ConvNeSe. Firstly, the feature extraction module of the model is constructed using the Depthwise Separable Convolution and Inverted Residuals structures. The model extracts multi-scale features with strong representation ability by optimizing the structure of convolutional neural networks and fusing of features. Then, the model introduces Squeeze and Excitation Block (SE Block) to improve the attention to important features, which can capture key information of traffic sign images. Finally, the accuracy of the model in the German Traffic Sign Recognition Benchmark Database (GTSRB) is 99.85%. At the same time, the model has good robustness according to the results of ablation experiments.
M-DDC: MRI based demyelinative diseases classification with U-Net segmentation and convolutional network
2024, Neural Networks
Childhood demyelinative diseases classification (DDC) with brain magnetic resonance imaging (MRI) is crucial to clinical diagnosis. But few attentions have been paid to DDC in the past. How to accurately differentiate pediatric-onset neuromyelitis optica spectrum disorder (NMOSD) from acute disseminated encephalomyelitis (ADEM) based on MRI is challenging in DDC. In this paper, a novel architecture M-DDC based on joint U-Net segmentation network and deep convolutional network is developed. The U-Net segmentation can provide pixel-level structure information, that helps the lesion areas location and size estimation. The classification branch in DDC can detect the regions of interest inside MRIs, including the white matter regions where lesions appear. The performance of the proposed method is evaluated on MRIs of 201 subjects recorded from the Children’s Hospital of Zhejiang University School of Medicine. The comparisons show that the proposed DDC achieves the highest accuracy of 99.19% and dice of 71.1% for ADEM and NMOSD classification and segmentation, respectively.
Graph-based methods coupled with specific distributional distances for adversarial attack detection
2024, Neural Networks
Artificial neural networks are prone to being fooled by carefully perturbed inputs which cause an egregious misclassification. These adversarial attacks have been the focus of extensive research. Likewise, there has been an abundance of research in ways to detect and defend against them. We introduce a novel approach of detection and interpretation of adversarial attacks from a graph perspective. For an input image, we compute an associated sparse graph using the layer-wise relevance propagation algorithm (Bach et al., 2015). Specifically, we only keep edges of the neural network with the highest relevance values. Three quantities are then computed from the graph which are then compared against those computed from the training set. The result of the comparison is a classification of the image as benign or adversarial. To make the comparison, two classification methods are introduced: (1) an explicit formula based on Wasserstein distance applied to the degree of node and (2) a logistic regression. Both classification methods produce strong results which lead us to believe that a graph-based interpretation of adversarial attacks is valuable.
A novel approach for industrial concrete defect identification based on image processing and deep convolutional neural networks
2023, Case Studies in Construction Materials
The preservation of structural integrity and durability is essential for the long-term viability of civil infrastructure projects. Addressing concrete defects promptly is crucial to achieving this objective. In this research, the research proposes a novel method for concrete defect analysis, harnessing the potential of image processing and deep learning techniques. It employs a fusion-based deep convolutional neural network (CNN), amalgamating the features of Inception V3, VGG16, and AlexNet architectures to identify and classify six distinct concrete defect characteristics, namely Cracks, Crazing, Efflorescence, Pop-out, Scaling, and Surface Cracks. Through rigorous training and validation, we thoroughly assess the performance of the proposed fusion-based CNN model. The testing phase reveals precision rates, with Crazing showing the lowest precision at 95%, and Cracks/Pop-outs achieving 98%, while other defect classifications exhibit exceptional 100% precision rates. The overall efficacy of our proposed model is evaluated using accuracy and F1-score metrics. Our findings demonstrate an impressive overall accuracy of 98.31% and an F1-score of 0.98, affirming the robustness and reliability of our approach. The outcomes of this study signify a significant advancement toward accurate and automated detection and classification of concrete defects.
Computer vision model for sorghum aphid detection using deep learning
2023, Journal of Agriculture and Food Research
Aphids are a challenging crop pest to manage. The sorghum aphid, for example, causes considerable yield loss in unmanaged sorghum. One of the key strategies to mitigate yield losses caused by this pest includes monitoring productions fields and using economic thresholds to spray insecticides. However, monitoring aphids is a time-consuming task and requires regular, visual assessments across large hectarage once aphids are detected on sorghum plants. To address this challenge, we propose to use object detection models based on deep learning to automatically detect aphid infestations on sorghum leaves using digital images. We used 1190 images collected during field monitoring events and evaluated the performance of 3 deep learning detection models within the YOLOv5 family that vary in complexity: YOLOv5n, YOLOv5s, and YOLOv5m. We then tested three different image sizes, including input resolutions of 416 × 416, 640 × 640, and 1280 × 1280 pixels. We trained models to detect individual aphids, which ranged between 1 and 125 sorghum aphids/leaf and is comparable to threshold levels used to manage aphids in field conditions (i.e., 50–125 aphids per leaf). Detection models had a precision of 92% precision with a 84.5% recall and 90.6% [email protected] for YOLOv5m Pytorch, making it a potential candidate for quantifying aphid densities using deep learning. The models tested and methodology developed here can be implemented in management decisions of sorghum aphids or as sampling tools for use in screening insect-resistant varieties. Development of mobile applications and integration into unmanned vehicles with sophisticated sensor systems will aid in use and adoption of computer vision models for pest management.

View all citing articles on Scopus

View full text

2012 Special IssueMulti-column deep neural network for traffic sign classification

Abstract

Introduction

Section snippets

Multi-column deep neural networks

Experiments

Conclusion

Acknowledgment

Vision Research

Pattern recognition and machine learning

Deep, big, simple neural nets for handwritten digit recognition

Neural Computation

Convolutional neural network committees for handwritten character classification

Flexible, high performance convolutional neural networks for image classification

A committee of neural networks for traffic sign classification

The combining classifier: to train or not to train?

Neocognitron: a self-organizing neural network for a mechanism of pattern recognition unaffected by shift in position

Biological Cybernetics

Improving model accuracy using optimal linear combinations of trained neural networks

Transactions on Neural Networks

Gradient flow in recurrent nets: the difficulty of learning long-term dependencies

Independent component analysis applied to feature extraction from colour and stereo images

Network: Computation in Neural Systems

Receptive fields, binocular interaction, and functional architecture in the cat’s visual cortex

Journal of Physiology (London)

2012 Special Issue
Multi-column deep neural network for traffic sign classification