Deep learning in neural networks: An overview

doi:10.1016/j.neunet.2014.09.003

Neural Networks

Volume 61, January 2015, Pages 85-117

https://doi.org/10.1016/j.neunet.2014.09.003 Get rights and content

Abstract

In recent years, deep artificial neural networks (including recurrent ones) have won numerous contests in pattern recognition and machine learning. This historical survey compactly summarizes relevant work, much of it from the previous millennium. Shallow and Deep Learners are distinguished by the depth of their credit assignment paths, which are chains of possibly learnable, causal links between actions and effects. I review deep supervised learning (also recapitulating the history of backpropagation), unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.

Section snippets

Preface

This is the preprint of an invited Deep Learning (DL) overview. One of its goals is to assign credit to those who contributed to the present state of the art. I acknowledge the limitations of attempting to achieve this goal. The DL research community itself may be viewed as a continually evolving, deep network of scientists who have influenced each other in complex ways. Starting from recent DL results, I tried to trace back the origins of relevant ideas through the past half century and

Introduction to Deep Learning (DL) in Neural Networks (NNs)

Which modifiable components of a learning system are responsible for its success or failure? What changes to them improve performance? This has been called the fundamental credit assignment problem (Minsky, 1963). There are general credit assignment methods for universal problem solvers that are time-optimal in various theoretical senses (Section 6.8). The present survey, however, will focus on the narrower, but now commercially important, subfield of Deep Learning (DL) in Artificial Neural

Event-oriented notation for activation spreading in NNs

Throughout this paper, let $i, j, k, t, p, q, r$ denote positive integer variables assuming ranges implicit in the given contexts. Let $n, m, T$ denote positive integer constants.

An NN’s topology may change over time (e.g., Sections 5.3, 5.6.3). At any given moment, it can be described as a finite subset of units (or nodes or neurons) $N = {u_{1}, u_{2}, \dots,}$ and a finite set $H \subseteq N \times N$ of directed edges or connections between nodes. FNNs are acyclic graphs, RNNs cyclic. The first (input) layer is the set of input units, a

Depth of Credit Assignment Paths (CAPs) and of problems

To measure whether credit assignment in a given NN application is of the deep or shallow type, I introduce the concept of Credit Assignment Paths or CAPs, which are chains of possibly causal links between the events of Section 2, e.g., from input through hidden to output layers in FNNs, or through transformations over time in RNNs.

Let us first focus on SL. Consider two events $x_{p}$ and $x_{q} (1 \leq p < q \leq T)$ . Depending on the application, they may have a Potential Direct Causal Connection (PDCC) expressed

Dynamic programming for Supervised/Reinforcement Learning (SL/RL)

One recurring theme of DL is Dynamic Programming (DP) (Bellman, 1957), which can help to facilitate credit assignment under certain assumptions. For example, in SL NNs, backpropagation itself can be viewed as a DP-derived method (Section 5.5). In traditional RL based on strong Markovian assumptions, DP-derived methods can help to greatly reduce problem depth (Section 6.2). DP algorithms are also essential for systems that combine concepts of NNs and graphical models, such as Hidden Markov

Supervised NNs, some helped by unsupervised NNs

The main focus of current practical applications is on Supervised Learning (SL), which has dominated recent pattern recognition contests (Sections 5.17 2009: first official competitions won by RNNs, and with MPCNNs, 5.18 2010: plain backprop (+ distortions) on GPU breaks MNIST record, 5.19 2011: MPCNNs on GPU achieve superhuman vision performance, 5.20 2011: Hessian-free optimization for RNNs, 5.21 2012: first contests won on ImageNet, object detection, segmentation, 5.22 2013-: more contests

DL in FNNs and RNNs for Reinforcement Learning (RL)

So far we have focused on Deep Learning (DL) in supervised or unsupervised NNs. Such NNs learn to perceive/encode/predict/classify patterns or pattern sequences, but they do not learn to act in the more general sense of Reinforcement Learning (RL) in unknown environments (see surveys, e.g., Kaelbling et al., 1996, Sutton and Barto, 1998, Wiering and van Otterlo, 2012). Here we add a discussion of DL FNNs and RNNs for RL. It will be shorter than the discussion of FNNs and RNNs for SL and UL

Conclusion and outlook

Deep Learning (DL) in Neural Networks (NNs) is relevant for Supervised Learning (SL) (Section 5), Unsupervised Learning (UL) (Section 5), and Reinforcement Learning (RL) (Section 6). By alleviating problems with deep Credit Assignment Paths (CAPs, Sections 3, 5.9), UL (Section 5.6.4) cannot only facilitate SL of sequences (Section 5.10) and stationary patterns (Sections 5.7, 5.15), but also RL (Sections 6.4, 4.2). Dynamic Programming (DP, Section 4.1) is important for both deep SL

Acknowledgments

Since 16 April 2014, drafts of this paper have undergone massive open online peer review through public mailing lists including [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], Google+ machine learning forum. Thanks to numerous NN/DL experts for valuable comments. Thanks to SNF, DFG, and the European Commission for partially funding my DL research group in the past quarter-century.

References (888)

R. Andrews et al.
Survey and critique of techniques for extracting rules from trained artificial neural networks
Knowledge-Based Systems
(1995)
D. Anguita et al.
Mixing floating- and fixed-point formats for neural network learning on neuroprocessors
Microprocessing and Microprogramming
(1996)
D. Anguita et al.
An efficient implementation of BP on RISC-based workstations
Neurocomputing
(1994)
P. Baldi et al.
Neural networks and principal component analysis: learning from examples without local minima
Neural Networks
(1989)
P. Baldi et al.
The dropout learning algorithm
Artificial Intelligence
(2014)
F. Biegler-König et al.
A learning algorithm for multilayered neural networks based on linear least squares problems
Neural Networks
(1993)
V.D. Blondel et al.
A survey of computational complexity results in systems and control
Automatica
(2000)
A.L. Blum et al.
Training a 3-node neural network is NP-complete
Neural Networks
(1992)
A. Blumer et al.
Occam’s razor
Information Processing Letters
(1987)
S.M. Bohte et al.
Error-backpropagation in temporally encoded networks of spiking neurons
Neurocomputing
(2002)

D. Aberdeen

Policy-gradient algorithms for partially observable Markov decision processes

(2003)

J. Abounadi et al.

Learning algorithms for Markov decision processes with average cost

SIAM Journal on Control and Optimization

(2002)

H. Akaike

Statistical predictor identification

Annals of the Institute of Statistical Mathematics

(1970)

H. Akaike

Information theory and an extension of the maximum likelihood principle

H. Akaike

A new look at the statistical model identification

IEEE Transactions on Automatic Control

(1974)

A. Allender

Application of time-bounded Kolmogorov complexity in complexity theory

Almeida, L. B. (1987). A learning rule for asynchronous perceptrons with feedback in a combinatorial environment. In...

L.B. Almeida et al.

On-line step size adaptation. Technical report, INESC, 9 Rua Alves Redol, 1000

(1997)

S. Amari

A theory of adaptive pattern classifiers

IEEE Transactions on Electronic Computers

(1967)

S.-I. Amari

Natural gradient works efficiently in learning

Neural Computation

(1998)

S. Amari et al.

A new learning algorithm for blind signal separation

S. Amari et al.

Statistical theory of learning curves under entropic loss criterion

Neural Computation

(1993)

D.J. Amit et al.

Dynamics of a recurrent network of spiking neurons before and following learning

Network: Computation in Neural Systems

(1997)

G. An

The effects of adding noise during backpropagation training on a generalization performance

Neural Computation

(1996)

M.A. Andrade et al.

Evaluation of secondary structure of proteins from UV circular dichroism spectra using an unsupervised learning neural network

Protein Engineering

(1993)

I. Arel et al.

Deep machine learning—a new frontier in artificial intelligence research

IEEE Computational Intelligence Magazine

(2010)

T. Ash

Dynamic node creation in backpropagation neural networks

Connection Science

(1989)

J.J. Atick et al.

Understanding retinal color coding from first principles

Neural Computation

(1992)

A.F. Atiya et al.

New results on recurrent network training: unifying the algorithms and accelerating convergence

IEEE Transactions on Neural Networks

(2000)

J. Ba et al.

Adaptive dropout for training deep neural networks

Baird, H. (1990). Document image defect models. In Proceddings, IAPR workshop on syntactic and structural pattern...

Baird, L. C. (1995). Residual algorithms: Reinforcement learning with function approximation. In International...

L. Baird et al.

Gradient descent for general reinforcement learning

B. Bakker

Reinforcement learning with long short-term memory

B. Bakker et al.

Hierarchical reinforcement learning based on subgoal discovery and subpolicy specialization

Bakker, B., Zhumatiy, V., Gruener, G., & Schmidhuber, J. (2003). A robot that reinforcement-learns to identify and...

P. Baldi

Gradient descent learning algorithms overview: A general dynamical systems perspective

IEEE Transactions on Neural Networks

(1995)

P. Baldi

Autoencoders, unsupervised learning, and deep architectures

Journal of Machine Learning Research

(2012)

P. Baldi et al.

Exploiting the past and the future in protein secondary structure prediction

Bioinformatics

(1999)

P. Baldi et al.

Neural networks for fingerprint recognition

Neural Computation

(1993)

P. Baldi et al.

Hybrid modeling, HMM/NN architectures, and protein applications

Neural Computation

(1996)

P. Baldi et al.

Learning in linear networks: a survey

IEEE Transactions on Neural Networks

(1995)

P. Baldi et al.

The principled design of large-scale recursive neural network architectures—DAG-RNNs and the protein structure prediction problem

Journal of Machine Learning Research

(2003)

Ballard, D. H. (1987). Modular learning in neural networks. In Proc. AAAI (pp....

S. Baluja

Population-based incremental learning: A method for integrating genetic search based function optimization and competitive learning. Technical report CMU-CS-94-163

(1994)

R. Balzer

A 15 year perspective on automatic programming

IEEE Transactions on Software Engineering

(1985)

H.B. Barlow

Unsupervised learning

Neural Computation

(1989)

H.B. Barlow et al.

Finding minimum entropy codes

Neural Computation

(1989)

H.G. Barrow

Learning receptive fields

A.G. Barto et al.

Recent advances in hierarchical reinforcement learning

Discrete Event Dynamic Systems

(2003)

Cited by (14691)

Deep learning approaches to identify order status in a complex supply chain
2024, Expert Systems with Applications
The emergence of artificial intelligence (AI) and its related capabilities has led industries to rethink the existing practices of conventional supply chain management and data analysis. Machine learning (ML), Deep Learning (DL) and their unique ability to predict future data and classify data have led to important research in the supply chain (SC) domain, particularly in identifying and prioritizing supply chain risks. This paper proposes several DL methodologies to exploit the benefit of DL, particularly to identify whether any product will be delivered late due to any unforeseen reason in a complex SC system. Four different DL architectures (Simple-LSTM, Deep-LSTM, 1D-CNN, and TCN-1DSPCNN models) are proposed to extract features, while six variant classifiers: Softmax, random trees (RT), random forest (RF), K-nearest neighbor (KNN), artificial neural network (ANN), and support vector machine (SVM), were used to classify delay or non-delay information. By seamlessly capturing intricate temporal dependencies, these DL models enhance accuracy in robustly identifying supply chain late orders. Leveraging their hierarchical feature learning, these proposed DL models excel in recognizing subtle patterns and correlations, making them ideal for classifying late orders within the supply chain. Their parallel processing prowess facilitates real-time decision support, allowing organizations to address potential delays and allocate resources effectively and proactively. Five-fold cross-validation is presented to avoid over-fitting and to prove the efficiency of the proposed DL models. The total accuracies of the six ML classifiers are 74.03, 75.81, 93.35, 87.72, 93.59, and 95.10, respectively, while the maximum accuracies obtained from four proposed DL methodologies obtained an accuracy of 97.6, 98.63, 100, 100% respectively using the SVM classifier for predicting late orders based on five-fold cross-validation.
Imbalanced rock burst assessment using variational autoencoder-enhanced gradient boosting algorithms and explainability
2024, Underground Space (new)
We conducted a study to evaluate the potential and robustness of gradient boosting algorithms in rock burst assessment, established a variational autoencoder (VAE) to address the imbalance rock burst dataset, and proposed a multilevel explainable artificial intelligence (XAI) tailored for tree-based ensemble learning. We collected 537 data from real-world rock burst records and selected four critical features contributing to rock burst occurrences. Initially, we employed data visualization to gain insight into the data's structure and performed correlation analysis to explore the data distribution and feature relationships. Then, we set up a VAE model to generate samples for the minority class due to the imbalanced class distribution. In conjunction with the VAE, we compared and evaluated six state-of-the-art ensemble models, including gradient boosting algorithms and the classical logistic regression model, for rock burst prediction. The results indicated that gradient boosting algorithms outperformed the classical single models, and the VAE-classifier outperformed the original classifier, with the VAE-NGBoost model yielding the most favorable results. Compared to other resampling methods combined with NGBoost for imbalanced datasets, such as synthetic minority oversampling technique (SMOTE), SMOTE-edited nearest neighbours (SMOTE-ENN), and SMOTE-tomek links (SMOTE-Tomek), the VAE-NGBoost model yielded the best performance. Finally, we developed a multilevel XAI model using feature sensitivity analysis, Tree Shapley Additive exPlanations (Tree SHAP), and Anchor to provide an in-depth exploration of the decision-making mechanics of VAE-NGBoost, further enhancing the accountability of tree-based ensemble models in predicting rock burst occurrences.
GAN with opposition-based blocks and channel self-attention mechanism for image synthesis
2024, Expert Systems with Applications
Recently, image synthesis has always been a research hotspot in the field of deep learning. Generally, the methods based on generative adversarial networks (GANs) directly feed the semantic layout as input to obtain the photorealistic images for image synthesis. However, these methods based on GANs have not achieved satisfactory reconstructed results in quality. One of the main reasons is that the normalization layers in these methods will cause the loss of the semantic information. Another of the main reason is that the information contained in the semantic layout is sparse. In order to solve the above problems, GAN with opposition-based blocks and channel self-attention mechanism (OCGAN) is proposed. In OCGAN, the opposition-based learning method and the proposed adaptive normalization method are used to design the opposition-based blocks (OB Blks). The proposed channel self-attention mechanism (CSAM) is employed to give different focus to each channel of the semantic layout. The generator of OCGAN uses the opposition-based blocks and the channel self-attention mechanism to maintain and capture the important details from the semantic layouts. Experiments on several challenging datasets demonstrate the advantages of our method over existing approaches, regarding both visual quality and the representative evaluating criteria.
Exploring the interrelationships between composition, rheology, and compressive strength of self-compacting concrete: An exploration of explainable boosting algorithms
2024, Case Studies in Construction Materials
This study introduces a novel methodology for enhancing the compressive strength of self-compacting concrete (SCC) via the use of the Explainable Boosting Machine (EBM), a sophisticated and interpretable machine learning algorithm. It presents a data-driven model that aims to accurately predict the strength of SCC by considering the intricate interactions among its various elements. Additionally, the model provides insights into the variables that influence SCC's compressive strength. By using EBM in conjunction with XGBoost and CatBoost algorithms, this study conducts a comparative examination of predictive abilities using datasets related to composition and rheology. The findings reveal that CatBoost has greater predictive performance using rheology dataset, as shown by an R² value of 0.977. Conversely, XGBoost exhibits a higher predictive capability using the composition dataset, as indicated by an R² value of 0.947. The EBM can provide comprehensive explanations at both global and local levels. It effectively identifies the key factors that have a significant influence on compressive strength. These factors include the coarse aggregate content, cement content, water content, viscosity, and V-funnel flow time. The study findings provide more evidence to support the notion that including rheological data into the model leads to a notable improvement in its accuracy. This outcome further confirms the existence of a direct correlation between rheological properties and compressive strength. The explanatory insights provided by EBM give practical instructions for customising SCC mixes to attain desired strengths. This facilitates quality control and enables personalised concrete design in the field of construction. This study highlights the potential of interpretable machine learning algorithms in improving the predictive modelling of SCC features. This advancement may lead to the development of more durable, efficient, and customised building materials.
A novel automatic detection and classification algorithm for elderly cognitive impairment using CSVM
2024, Biomedical Signal Processing and Control
Alzheimer’s disease (AD) is a degenerative disease with insidious onset and chronic nervous system, which can be roughly classified into three stages: preclinical AD, mild cognitive impairment (MCI), and clinical AD. Given that timely treatment of MCI can delay the course of AD, identifying what stage of the disease is at the time of early diagnosis is critical in clinical practice. Based on the survey data of elderly residents in Changsha community, this paper constructs the cognitive impairment data set and explores the application of the emerging classification algorithm Convolutional Support Vector Machine(CSVM) in identifying elderly cognitive impairment, a small data classification problem. CSVM is an improved Support Vector Machine classification(SVC) algorithm, which uses the convolution filtering idea of the Convolutional Neural Networks(CNN) algorithm to preprocess the nonlinearly separable sample set to improve the classification ability of SVM. In addition, the optimal convolution filter (solution) is found by the Simplified Swarm Optimization(SSO) algorithm and orthogonal array experimental design. A variable update amplitude method is also proposed to optimize the SSO update strategy further. Aiming at the problem of an unbalanced sample set, a stratified sampling method is used to divide and cross-validate the sample data, and a variety of evaluation indicators are used to evaluate the performance of the model. Experimental results show that the data set processed by a specific convolution filter can significantly improve the classification performance of linear SVM. The accurate differentiation of normal cognition(NC), MCI, and AD helps intervene in adjuvant treatment for potential patients timely.
Prediction of California bearing ratio and modified proctor parameters using deep neural networks and multiple linear regression: A case study of granular soils
2024, Case Studies in Construction Materials
The California Bearing Ratio (CBR) and modified proctor parameters belong to the soil geotechnical properties used to assess soil behavior. Direct measurement of these properties can be quite time-consuming in large-scale applications or when immediate results are required. Therefore, significant research efforts have been made in the literature to develop indirect methods for their estimation. However, some gaps in the state-of-the-art can be highlighted in these topics, such as the deficiency in computational models to calculate the maximum dry unit weight ( $γ_{d (\max)}$ ), optimum moisture content ( $w_{opt}$ ) and CBR, and the lack of methods that consider their intrinsic influence on each other. Hence, in this investigation, mathematical and computational models were created to obtain the above-mentioned variables from the soil grain size distribution. The mathematical model was based on Multiple Linear Regression (MLR) correlations. Meanwhile, the computational model was constructed from a custom-made Deep Neural Networks (DNNs) architecture. Subsequently, the accuracy of these models was validated with an experimental case study. The results demonstrated that the proposed methods in this study are more precise than previous approaches in the literature. Accordingly, the main contribution of this manuscript to the industry is the formation of models with high exactness to predict the $γ_{d (\max)}$ , $w_{opt}$ and CBR of granular soils.

View all citing articles on Scopus

View full text

ReviewDeep learning in neural networks: An overview

Abstract

Section snippets

Preface

Introduction to Deep Learning (DL) in Neural Networks (NNs)

Event-oriented notation for activation spreading in NNs

Depth of Credit Assignment Paths (CAPs) and of problems

Dynamic programming for Supervised/Reinforcement Learning (SL/RL)

Supervised NNs, some helped by unsupervised NNs

DL in FNNs and RNNs for Reinforcement Learning (RL)

Conclusion and outlook

Acknowledgments

Knowledge-Based Systems

Microprocessing and Microprogramming

Neurocomputing

Neural Networks

Artificial Intelligence

Neural Networks

Automatica

Neural Networks

Information Processing Letters

Neurocomputing

Policy-gradient algorithms for partially observable Markov decision processes

Learning algorithms for Markov decision processes with average cost

SIAM Journal on Control and Optimization

Statistical predictor identification

Annals of the Institute of Statistical Mathematics

Information theory and an extension of the maximum likelihood principle

A new look at the statistical model identification

IEEE Transactions on Automatic Control

Application of time-bounded Kolmogorov complexity in complexity theory

On-line step size adaptation. Technical report, INESC, 9 Rua Alves Redol, 1000

A theory of adaptive pattern classifiers

IEEE Transactions on Electronic Computers

Natural gradient works efficiently in learning

Neural Computation

A new learning algorithm for blind signal separation

Statistical theory of learning curves under entropic loss criterion

Neural Computation

Dynamics of a recurrent network of spiking neurons before and following learning

Network: Computation in Neural Systems

The effects of adding noise during backpropagation training on a generalization performance

Neural Computation

Evaluation of secondary structure of proteins from UV circular dichroism spectra using an unsupervised learning neural network

Protein Engineering

Deep machine learning—a new frontier in artificial intelligence research

IEEE Computational Intelligence Magazine

Dynamic node creation in backpropagation neural networks

Connection Science

Understanding retinal color coding from first principles

Neural Computation

New results on recurrent network training: unifying the algorithms and accelerating convergence

IEEE Transactions on Neural Networks

Adaptive dropout for training deep neural networks

Gradient descent for general reinforcement learning

Reinforcement learning with long short-term memory

Hierarchical reinforcement learning based on subgoal discovery and subpolicy specialization

Gradient descent learning algorithms overview: A general dynamical systems perspective

IEEE Transactions on Neural Networks

Autoencoders, unsupervised learning, and deep architectures

Journal of Machine Learning Research

Exploiting the past and the future in protein secondary structure prediction

Bioinformatics

Neural networks for fingerprint recognition

Neural Computation

Hybrid modeling, HMM/NN architectures, and protein applications

Neural Computation

Learning in linear networks: a survey

IEEE Transactions on Neural Networks

The principled design of large-scale recursive neural network architectures—DAG-RNNs and the protein structure prediction problem

Journal of Machine Learning Research

Population-based incremental learning: A method for integrating genetic search based function optimization and competitive learning. Technical report CMU-CS-94-163

A 15 year perspective on automatic programming

IEEE Transactions on Software Engineering

Unsupervised learning

Neural Computation

Finding minimum entropy codes

Neural Computation

Learning receptive fields

Recent advances in hierarchical reinforcement learning

Review
Deep learning in neural networks: An overview