2002 Special issueMetalearning and neuromodulation
Introduction
Some of the neurotransmitters that have spatially distributed, temporally extended effects on the recipient neurons and circuits are called Neuromodulators (Katz, 1999, Saper, 2000, Marder and Thirumalai, 2002). The best known examples of neuromodulators are dopamine (DA), serotonin (5-HT), noradrenaline (NA; also called norepinephrine, NE), and acetylcholine (ACh). Neuromodulators are traditionally assumed to be involved in the control of general arousal (Robbins, 1997, Saper, 2000). Recent advances in molecular biological techniques have provided rich data on the spatial localization and physiological effects of different neuromodulators and their receptors. This prompted us to build a more specific yet still comprehensive theory for the functions of neuromodulators. This paper proposes a computational theory on the roles of the earlier four major neuromodulators from the viewpoint that neuromodulators are media for signaling specific global variables and parameters that regulate distributed learning modules in the brain (Doya, 2000b).
The computational theory for acquisition of goal-directed behaviors has been formulated under the name of reinforcement learning (RL) (Barto, 1995b, Sutton and Barto, 1998, Doya, 2000c, Doya et al., 2001). The theory has been successfully applied to a variety of dynamic optimization problems, such as game programs (Tesauro, 1994), robotic control (Morimoto & Doya, 2001), and resource allocation (Singh & Bertsekas, 1997). In practical applications of reinforcement learning theory, a critical issue is how to set the parameters of the learning algorithms, such as the speed of learning, the size of noise for exploration, and the time scale in prediction of future reward. Such parameters globally affect the way many system parameters change by learning, so they are called metaparameters or hyperparameters.
In statistical learning theory, the need for setting the right metaparameters, such as the degree of freedom of statistical models and the prior distribution of parameters, is widely recognized. Theories of metaparameter setting have been developed from the viewpoints of risk-minimization (Vapnik, 2000) and Bayesian estimation (Neal, 1996). However, many applications of reinforcement learning have depended on heuristic search for setting the right metaparameters by human experts. The need for the tuning of metaparameters is one of the major reasons why sophisticated learning algorithms, which perform successfully in the laboratory, cannot be practically applied in highly variable environments at home or on the street.
Compared to current artificial learning systems, the learning mechanisms implemented in the brain appear to be much more robust and flexible. Humans and animals can learn novel behaviors under a wide variety of environments. This suggests that the brain has a certain mechanism for metalearning, a capability of dynamically adjusting its own metaparameters of learning. This paper presents a hypothesis stating that the ascending neuromodulatory systems (Fig. 1) are the media of metalearning for controlling and coordinating the distributed learning modules in the brain (Doya, 1999). More specifically, we propose the following set of hypotheses to explain the roles of the four major ascending neuromodulators (Doya, 2000b):
- 1.
Dopamine represents the global learning signal for prediction of rewards and reinforcement of actions.
- 2.
Serotonin controls the balance between short-term and long-term prediction of reward.
- 3.
Noradrenaline controls the balance between wide exploration and focused execution.
- 4.
Acetylcholine controls the balance between memory storage and renewal.
In order to state the above hypotheses in a more computationally well-defined manner, we first review the basic algorithms of reinforcement learning and the roles of major metaparameters. We then propose a set of hypotheses on how such metaparameters are regulated by the above neuromodulators. Finally, we discuss the possible neural mechanisms of metaparameter control and the possible interactions between neuromodulatory systems predicted from the hypotheses.
In this paper, our main focus is on the roles of neuromodulators within the circuit of basal ganglia, which have been suggested as the major locus of reinforcement learning (Houk et al., 1995, Montague et al., 1996, Doya, 2000a). However, we also discuss how their roles can be generalized to other brain areas, including the cerebral cortex and the cerebellum.
Section snippets
Reinforcement learning algorithm
Reinforcement learning is a computational framework for an agent to learn to take an action in response to the state of the environment so that the acquired reward is maximized in a long run (Fig. 2) (Barto, 1995b, Sutton and Barto, 1998, Doya, 2000c, Doya et al., 2001). What makes reinforcement learning difficult yet interesting is that selection of an action does not only affect the immediate reward but also affect the future rewards through the dynamic evolution of the future states.
In order
Hypothetical roles of neuromodulators
Now we restate our hypotheses on the roles of neuromodulators in terms of the global learning signal and metaparameters introduced in the above reinforcement learning algorithm (Doya, 2000b):
- 1.
Dopamine signals the TD error δ.
- 2.
Serotonin controls the discount factor γ.
- 3.
Noradrenaline controls the inverse temperature β.
- 4.
Acetylcholine controls the learning rate α.
Below, we review the experimental findings and theoretical models that support these hypotheses.
Dynamic interactions of neuromodulators
Based on the above hypotheses on the specific roles of neuromodulators in reinforcement learning, it is possible to theoretically predict how the activities of those modulators should depend on each other. Fig. 9 shows the possible interactions between the neuromodulators, the experience of the agent represented in the form of value functions, and the environment.
Conclusion
This paper proposed a unified theory on the roles of neuromodulators in mediating the global learning signal and metaparameters of distributed learning mechanisms of the brain. We considered how such regulatory mechanisms can be implemented in the neural circuit centered around the basal ganglia. However, there are many other brain areas and functions that require further consideration, for example, the roles of the amygdala and hippocampus in reinforcement learning and the roles of
Acknowledgements
The author is grateful to Peter Dayan, Barry Everitt, Takeshi Inoue, Sham Kakade, Go Okada, Yasumasa Okamoto, and Shigeto Yamawaki for valuable discussions on the roles of serotonin and also thanks Nicolas Schweighofer for his comments on the manuscript.
References (83)
Serotonin receptors in cognitive behaviors
Current Opinion in Neurobiology
(1997)- et al.
Opponent interactions between serotonin and dopamine
Neural Networks
(2002) What are the computations of the cerebellum, the basal ganglia, and the cerebral cortex
Neural Networks
(1999)Complementary roles of basal ganglia and cerebellum in learning and motor control
Current Opinion in Neurobiology
(2000)- et al.
The computational role of dopamine D1 receptors in working memory
Neural Networks
(2002) Effects of combined or separate 5,7-dihydroxytryptamine lesions of thee dorsal and median raphe nuclei on responding maintained by a DRL 20s schedule of food reinforcement
Brain Research
(1995)- et al.
Simplified dynamics in a model of noradrenergic modulation of cognitive performance
Neural Networks
(2002) A stochastic reinforcement learning algorithm for learning real-valued functions
Neural Networks
(1990)- et al.
Acetylcholine and memory
Trends in Neurosciences
(1993) Mesolimbocortical and nigrostriatal dopamine responses to salient non-reward events
Neuroscience
(2000)