2002 Special issueMetalearning and neuromodulation
Introduction
Some of the neurotransmitters that have spatially distributed, temporally extended effects on the recipient neurons and circuits are called Neuromodulators (Katz, 1999, Saper, 2000, Marder and Thirumalai, 2002). The best known examples of neuromodulators are dopamine (DA), serotonin (5-HT), noradrenaline (NA; also called norepinephrine, NE), and acetylcholine (ACh). Neuromodulators are traditionally assumed to be involved in the control of general arousal (Robbins, 1997, Saper, 2000). Recent advances in molecular biological techniques have provided rich data on the spatial localization and physiological effects of different neuromodulators and their receptors. This prompted us to build a more specific yet still comprehensive theory for the functions of neuromodulators. This paper proposes a computational theory on the roles of the earlier four major neuromodulators from the viewpoint that neuromodulators are media for signaling specific global variables and parameters that regulate distributed learning modules in the brain (Doya, 2000b).
The computational theory for acquisition of goal-directed behaviors has been formulated under the name of reinforcement learning (RL) (Barto, 1995b, Sutton and Barto, 1998, Doya, 2000c, Doya et al., 2001). The theory has been successfully applied to a variety of dynamic optimization problems, such as game programs (Tesauro, 1994), robotic control (Morimoto & Doya, 2001), and resource allocation (Singh & Bertsekas, 1997). In practical applications of reinforcement learning theory, a critical issue is how to set the parameters of the learning algorithms, such as the speed of learning, the size of noise for exploration, and the time scale in prediction of future reward. Such parameters globally affect the way many system parameters change by learning, so they are called metaparameters or hyperparameters.
In statistical learning theory, the need for setting the right metaparameters, such as the degree of freedom of statistical models and the prior distribution of parameters, is widely recognized. Theories of metaparameter setting have been developed from the viewpoints of risk-minimization (Vapnik, 2000) and Bayesian estimation (Neal, 1996). However, many applications of reinforcement learning have depended on heuristic search for setting the right metaparameters by human experts. The need for the tuning of metaparameters is one of the major reasons why sophisticated learning algorithms, which perform successfully in the laboratory, cannot be practically applied in highly variable environments at home or on the street.
Compared to current artificial learning systems, the learning mechanisms implemented in the brain appear to be much more robust and flexible. Humans and animals can learn novel behaviors under a wide variety of environments. This suggests that the brain has a certain mechanism for metalearning, a capability of dynamically adjusting its own metaparameters of learning. This paper presents a hypothesis stating that the ascending neuromodulatory systems (Fig. 1) are the media of metalearning for controlling and coordinating the distributed learning modules in the brain (Doya, 1999). More specifically, we propose the following set of hypotheses to explain the roles of the four major ascending neuromodulators (Doya, 2000b):
- 1.
Dopamine represents the global learning signal for prediction of rewards and reinforcement of actions.
- 2.
Serotonin controls the balance between short-term and long-term prediction of reward.
- 3.
Noradrenaline controls the balance between wide exploration and focused execution.
- 4.
Acetylcholine controls the balance between memory storage and renewal.
In order to state the above hypotheses in a more computationally well-defined manner, we first review the basic algorithms of reinforcement learning and the roles of major metaparameters. We then propose a set of hypotheses on how such metaparameters are regulated by the above neuromodulators. Finally, we discuss the possible neural mechanisms of metaparameter control and the possible interactions between neuromodulatory systems predicted from the hypotheses.
In this paper, our main focus is on the roles of neuromodulators within the circuit of basal ganglia, which have been suggested as the major locus of reinforcement learning (Houk et al., 1995, Montague et al., 1996, Doya, 2000a). However, we also discuss how their roles can be generalized to other brain areas, including the cerebral cortex and the cerebellum.
Section snippets
Reinforcement learning algorithm
Reinforcement learning is a computational framework for an agent to learn to take an action in response to the state of the environment so that the acquired reward is maximized in a long run (Fig. 2) (Barto, 1995b, Sutton and Barto, 1998, Doya, 2000c, Doya et al., 2001). What makes reinforcement learning difficult yet interesting is that selection of an action does not only affect the immediate reward but also affect the future rewards through the dynamic evolution of the future states.
In order
Hypothetical roles of neuromodulators
Now we restate our hypotheses on the roles of neuromodulators in terms of the global learning signal and metaparameters introduced in the above reinforcement learning algorithm (Doya, 2000b):
- 1.
Dopamine signals the TD error δ.
- 2.
Serotonin controls the discount factor γ.
- 3.
Noradrenaline controls the inverse temperature β.
- 4.
Acetylcholine controls the learning rate α.
Below, we review the experimental findings and theoretical models that support these hypotheses.
Dynamic interactions of neuromodulators
Based on the above hypotheses on the specific roles of neuromodulators in reinforcement learning, it is possible to theoretically predict how the activities of those modulators should depend on each other. Fig. 9 shows the possible interactions between the neuromodulators, the experience of the agent represented in the form of value functions, and the environment.
Conclusion
This paper proposed a unified theory on the roles of neuromodulators in mediating the global learning signal and metaparameters of distributed learning mechanisms of the brain. We considered how such regulatory mechanisms can be implemented in the neural circuit centered around the basal ganglia. However, there are many other brain areas and functions that require further consideration, for example, the roles of the amygdala and hippocampus in reinforcement learning and the roles of
Acknowledgements
The author is grateful to Peter Dayan, Barry Everitt, Takeshi Inoue, Sham Kakade, Go Okada, Yasumasa Okamoto, and Shigeto Yamawaki for valuable discussions on the roles of serotonin and also thanks Nicolas Schweighofer for his comments on the manuscript.
References (83)
Serotonin receptors in cognitive behaviors
Current Opinion in Neurobiology
(1997)- et al.
Opponent interactions between serotonin and dopamine
Neural Networks
(2002) What are the computations of the cerebellum, the basal ganglia, and the cerebral cortex
Neural Networks
(1999)Complementary roles of basal ganglia and cerebellum in learning and motor control
Current Opinion in Neurobiology
(2000)- et al.
The computational role of dopamine D1 receptors in working memory
Neural Networks
(2002) Effects of combined or separate 5,7-dihydroxytryptamine lesions of thee dorsal and median raphe nuclei on responding maintained by a DRL 20s schedule of food reinforcement
Brain Research
(1995)- et al.
Simplified dynamics in a model of noradrenergic modulation of cognitive performance
Neural Networks
(2002) A stochastic reinforcement learning algorithm for learning real-valued functions
Neural Networks
(1990)- et al.
Acetylcholine and memory
Trends in Neurosciences
(1993) Mesolimbocortical and nigrostriatal dopamine responses to salient non-reward events
Neuroscience
(2000)
Control of exploitation–exploration meta-parameter in reinforcement learning
Neural Networks
Actor–critic models of the basal ganglia: New anatomical and computational perspectives. Dopaminergic modulation in the basal ganglia
Neural Networks
Dopamine bonuses
Neural Networks
Internal models for motor control and trajectory planning
Current Opinion in Neurobiology
Learning policies for partially observable environments: Scaling up
Cellular, synaptic, and network effects of neuromodulators
Neural Networks
Basal gagnlia and cerebellar loops: Motor and cognitive circuits
Brain Research Reviews
Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning
Robotics and Autonomous Systems
Striatal, pallidal, and parsreticulata evoked inhibition of nigrostriatal dopaminergic neurons is mediated by GABA A receptors in vivo
Neuroscience
Acetylcholine in mind: A neurotransmitter correlate of consciousness?
Trends in Neurosciences
Decision making and neuropsychiatry
Trends in Cognitive Sciences
The role of acetylcholine in cortical synaptic plasticity
Behavioural Brain Research
Is the short-latency dopamine response too short to signal reward error?
Trends in Cognitive Sciences
Dopamine-dependent plasticity of cortico-striatal synapses
Neural Networks
Arousal systems and attentional processes
Biological Psychology
Serotonin-mediated striatal dopamine release involves the dopamine uptake site and the serotonin receptor
Brain Research Bulletin
Td models of reward predictive responses in dopamine neurons
Neural Networks
Neuromodulation of decision and response selection
Neural Networks
Dopamine reverses the depression of rat corticostriatal synapses which normally follows high-frequency stimulation of cortex in vitro
Neuroscience
Neurobiology of addiction
Current Opinion in Neurobiology
Acetylcholine in cortical inference
Neural Networks
Responses of tonically active neurons in the primate's striatum undergo systematic changes during behavioral sensory-motor conditioning
Journal of Neuroscience
Locus coeruleus neurons in monkey are selectively activated by attended cues in a vigilance task
Journal of Neuroscience
Adaptive critics and the basal ganglia
Reinforcement learning
Neuronlike adaptive elements that can solve difficult learning control problems
IEEE Transactions on Systems, Man, and Cybernetics
How the basal ganglia use parallel excitatory and inhibitory learning pathways to selectively respond to unexpected rewarding cues
Journal of Neuroscience
Impulsive choice induced in rats by lesions of the nucleus accumbens core
Science
Advances in neural information processing systems
Opposite changes of in vivo dopamine release in the rat nucleus accumbens and striatum that follows electrical stimulation of dorsal raphe nucleus: Role of 5-HT3 receptors
Journal of Neuroscience
Cited by (513)
Migraine as an allostatic reset triggered by unresolved interoceptive prediction errors
2024, Neuroscience and Biobehavioral ReviewsSpiking neural predictive coding for continually learning from data streams
2023, NeurocomputingNoradrenergic and cholinergic systems take centre stage in neuropsychiatric diseases of ageing
2023, Neuroscience and Biobehavioral ReviewsTransdiagnostic computations of uncertainty: towards a new lens on intolerance of uncertainty
2023, Neuroscience and Biobehavioral Reviews