ReviewDeep learning in neural networks: An overview
Section snippets
Preface
This is the preprint of an invited Deep Learning (DL) overview. One of its goals is to assign credit to those who contributed to the present state of the art. I acknowledge the limitations of attempting to achieve this goal. The DL research community itself may be viewed as a continually evolving, deep network of scientists who have influenced each other in complex ways. Starting from recent DL results, I tried to trace back the origins of relevant ideas through the past half century and
Introduction to Deep Learning (DL) in Neural Networks (NNs)
Which modifiable components of a learning system are responsible for its success or failure? What changes to them improve performance? This has been called the fundamental credit assignment problem (Minsky, 1963). There are general credit assignment methods for universal problem solvers that are time-optimal in various theoretical senses (Section 6.8). The present survey, however, will focus on the narrower, but now commercially important, subfield of Deep Learning (DL) in Artificial Neural
Event-oriented notation for activation spreading in NNs
Throughout this paper, let denote positive integer variables assuming ranges implicit in the given contexts. Let denote positive integer constants.
An NN’s topology may change over time (e.g., Sections 5.3, 5.6.3). At any given moment, it can be described as a finite subset of units (or nodes or neurons) and a finite set of directed edges or connections between nodes. FNNs are acyclic graphs, RNNs cyclic. The first (input) layer is the set of input units, a
Depth of Credit Assignment Paths (CAPs) and of problems
To measure whether credit assignment in a given NN application is of the deep or shallow type, I introduce the concept of Credit Assignment Paths or CAPs, which are chains of possibly causal links between the events of Section 2, e.g., from input through hidden to output layers in FNNs, or through transformations over time in RNNs.
Let us first focus on SL. Consider two events and . Depending on the application, they may have a Potential Direct Causal Connection (PDCC) expressed
Dynamic programming for Supervised/Reinforcement Learning (SL/RL)
One recurring theme of DL is Dynamic Programming (DP) (Bellman, 1957), which can help to facilitate credit assignment under certain assumptions. For example, in SL NNs, backpropagation itself can be viewed as a DP-derived method (Section 5.5). In traditional RL based on strong Markovian assumptions, DP-derived methods can help to greatly reduce problem depth (Section 6.2). DP algorithms are also essential for systems that combine concepts of NNs and graphical models, such as Hidden Markov
Supervised NNs, some helped by unsupervised NNs
The main focus of current practical applications is on Supervised Learning (SL), which has dominated recent pattern recognition contests (Sections 5.17 2009: first official competitions won by RNNs, and with MPCNNs, 5.18 2010: plain backprop (+ distortions) on GPU breaks MNIST record, 5.19 2011: MPCNNs on GPU achieve superhuman vision performance, 5.20 2011: Hessian-free optimization for RNNs, 5.21 2012: first contests won on ImageNet, object detection, segmentation, 5.22 2013-: more contests
DL in FNNs and RNNs for Reinforcement Learning (RL)
So far we have focused on Deep Learning (DL) in supervised or unsupervised NNs. Such NNs learn to perceive/encode/predict/classify patterns or pattern sequences, but they do not learn to act in the more general sense of Reinforcement Learning (RL) in unknown environments (see surveys, e.g., Kaelbling et al., 1996, Sutton and Barto, 1998, Wiering and van Otterlo, 2012). Here we add a discussion of DL FNNs and RNNs for RL. It will be shorter than the discussion of FNNs and RNNs for SL and UL
Conclusion and outlook
Deep Learning (DL) in Neural Networks (NNs) is relevant for Supervised Learning (SL) (Section 5), Unsupervised Learning (UL) (Section 5), and Reinforcement Learning (RL) (Section 6). By alleviating problems with deep Credit Assignment Paths (CAPs, Sections 3, 5.9), UL (Section 5.6.4) cannot only facilitate SL of sequences (Section 5.10) and stationary patterns (Sections 5.7, 5.15), but also RL (Sections 6.4, 4.2). Dynamic Programming (DP, Section 4.1) is important for both deep SL
Acknowledgments
Since 16 April 2014, drafts of this paper have undergone massive open online peer review through public mailing lists including [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], Google+ machine learning forum. Thanks to numerous NN/DL experts for valuable comments. Thanks to SNF, DFG, and the European Commission for partially funding my DL research group in the past quarter-century.
References (888)
- et al.
Survey and critique of techniques for extracting rules from trained artificial neural networks
Knowledge-Based Systems
(1995) - et al.
Mixing floating- and fixed-point formats for neural network learning on neuroprocessors
Microprocessing and Microprogramming
(1996) - et al.
An efficient implementation of BP on RISC-based workstations
Neurocomputing
(1994) - et al.
Neural networks and principal component analysis: learning from examples without local minima
Neural Networks
(1989) - et al.
The dropout learning algorithm
Artificial Intelligence
(2014) - et al.
A learning algorithm for multilayered neural networks based on linear least squares problems
Neural Networks
(1993) - et al.
A survey of computational complexity results in systems and control
Automatica
(2000) - et al.
Training a 3-node neural network is NP-complete
Neural Networks
(1992) - et al.
Occam’s razor
Information Processing Letters
(1987) - et al.
Error-backpropagation in temporally encoded networks of spiking neurons
Neurocomputing
(2002)
Policy-gradient algorithms for partially observable Markov decision processes
Learning algorithms for Markov decision processes with average cost
SIAM Journal on Control and Optimization
Statistical predictor identification
Annals of the Institute of Statistical Mathematics
Information theory and an extension of the maximum likelihood principle
A new look at the statistical model identification
IEEE Transactions on Automatic Control
Application of time-bounded Kolmogorov complexity in complexity theory
On-line step size adaptation. Technical report, INESC, 9 Rua Alves Redol, 1000
A theory of adaptive pattern classifiers
IEEE Transactions on Electronic Computers
Natural gradient works efficiently in learning
Neural Computation
A new learning algorithm for blind signal separation
Statistical theory of learning curves under entropic loss criterion
Neural Computation
Dynamics of a recurrent network of spiking neurons before and following learning
Network: Computation in Neural Systems
The effects of adding noise during backpropagation training on a generalization performance
Neural Computation
Evaluation of secondary structure of proteins from UV circular dichroism spectra using an unsupervised learning neural network
Protein Engineering
Deep machine learning—a new frontier in artificial intelligence research
IEEE Computational Intelligence Magazine
Dynamic node creation in backpropagation neural networks
Connection Science
Understanding retinal color coding from first principles
Neural Computation
New results on recurrent network training: unifying the algorithms and accelerating convergence
IEEE Transactions on Neural Networks
Adaptive dropout for training deep neural networks
Gradient descent for general reinforcement learning
Reinforcement learning with long short-term memory
Hierarchical reinforcement learning based on subgoal discovery and subpolicy specialization
Gradient descent learning algorithms overview: A general dynamical systems perspective
IEEE Transactions on Neural Networks
Autoencoders, unsupervised learning, and deep architectures
Journal of Machine Learning Research
Exploiting the past and the future in protein secondary structure prediction
Bioinformatics
Neural networks for fingerprint recognition
Neural Computation
Hybrid modeling, HMM/NN architectures, and protein applications
Neural Computation
Learning in linear networks: a survey
IEEE Transactions on Neural Networks
The principled design of large-scale recursive neural network architectures—DAG-RNNs and the protein structure prediction problem
Journal of Machine Learning Research
Population-based incremental learning: A method for integrating genetic search based function optimization and competitive learning. Technical report CMU-CS-94-163
A 15 year perspective on automatic programming
IEEE Transactions on Software Engineering
Unsupervised learning
Neural Computation
Finding minimum entropy codes
Neural Computation
Learning receptive fields
Recent advances in hierarchical reinforcement learning
Discrete Event Dynamic Systems
Cited by (14691)
Deep learning approaches to identify order status in a complex supply chain
2024, Expert Systems with ApplicationsGAN with opposition-based blocks and channel self-attention mechanism for image synthesis
2024, Expert Systems with ApplicationsExploring the interrelationships between composition, rheology, and compressive strength of self-compacting concrete: An exploration of explainable boosting algorithms
2024, Case Studies in Construction MaterialsA novel automatic detection and classification algorithm for elderly cognitive impairment using CSVM
2024, Biomedical Signal Processing and ControlPrediction of California bearing ratio and modified proctor parameters using deep neural networks and multiple linear regression: A case study of granular soils
2024, Case Studies in Construction Materials