Keywords
reward, prediction, learning, aversive, movement
reward, prediction, learning, aversive, movement
The question “What is dopamine doing?” keeps stubbornly popping up after the discovery of the brain’s dopamine system and its relationships to Parkinson’s disease, psychosis, and drug addiction. Although the efficacy of dopamine receptor–stimulating drugs in alleviating Parkinsonian movement disorders pointed initially to a mere tonic, modulatory role, it became increasingly clear that dopamine is a neurotransmitter not unlike other transmitters and has its own synapses and phasic activity related to stimuli and actions. The ensuing research efforts revealed an amazing array of heterogeneous functions at various time courses and levels of specificity that range from general behavioral activation to precise reward signaling for biological learning, machine learning, and economic choice1. The complexity defies the notion of “one neuronal system equals one function” but likely reflects the workings of an evolutionarily ancient system that governs the individual’s requirements for survival.
This overview describes further conceptual, biological, and economic characterizations of the dopamine reward signal in animals from the past few years, its involvement in social processes, and its distinction from aversive, novelty, sensory, and motor processing. I will follow the notion that the function of an information-processing system can be defined by the relationship of its internal signals to behavior. This knowledge would provide a firm basis for investigating molecular, cellular, and circuit mechanisms. However, detailed descriptions of the recently elucidated fine network properties of dopamine neurons would exceed the topic and limits of this brief review, nor will I be able to discuss molecular signaling, human brain signals, and effects of lesions and systemic dopaminergic drugs that indicate tonic permissive rather than phasic driving influences.
Rather than coding rewards and reward-predicting stimuli as they appear in the environment, phasic, sub-second responses in the majority of midbrain dopamine neurons code a reward prediction error. Their activity is increased for one hundred or two hundred milliseconds when a reward or reward-predicting stimulus is better than predicted, their activity is unchanged when these events have the same reward value as their prediction, and their activity is briefly depressed when these events have lower reward value than predicted1.
Electrical or optical stimulation of dopamine neurons serves as a teaching signal for lever pressing, nose poking, place preference, unblocking, and prevention of extinction2–6; conversely, optogenetic dopamine inhibition induces place avoidance and behavioral inhibition7–9. These behavioral effects likely reflect the elicitation of positive and negative reward prediction error signals, respectively. Recent research shows that these behavioral learning functions extend to neuronal learning: monkey dopamine neurons acquire stronger responses to an intrinsically neutral visual stimulus that is followed by optogenetic dopamine stimulation added to juice reward, as compared with a stimulus associated with only that reward (Figure 1A, B)10. Concomitantly, the animal develops choice preference over 20 to 25 repetitions for the stimulation-associated fractal over an alternative, non-stimulated fractal, even without natural reward. In rats, optogenetic dopamine excitation at the time of reward induces dopamine responses to the stimulus along with driving approach and locomotion (Figure 1C, D)11. In a further step, dopamine stimulation serves as reward for operantly controlling cortical firing patterns12. These effects together support the hypothesis that bidirectional dopamine reward prediction error responses influence neuronal and behavioral learning.
Standard reward learning paradigms rely on the contingent association with a stimulus, whereas higher learning theories postulate a role for representations beyond explicit reward contingency. Dopamine neurons follow this latter notion13: during sensory preconditioning, two stimuli (A and B) are first presented sequentially. Then reward occurs only with the later stimulus presented alone (B). Then the earlier stimulus (A) is tested for reward prediction. Indeed, dopamine neurons are activated by the test stimulus (A) although it had never been explicitly paired with the reward. Thus, the neurons access a reward representation via the test stimulus (A) that had earlier been associated with the then-unrewarded stimulus (B), defying the simple requirement for direct stimulus–reward contingency.
The reward prediction error response depends on both the reward and the prediction: reward received minus reward predicted. If we know the reward and measure the dopamine response, we can infer the prediction the neuron is accessing.
The idea started with a stimulus sequence that always ends with a reward after a short but random number of steps. A monkey registering only repeated reward omissions would expect progressively less reward, but with experience it would know the reward would come more likely the longer the wait is (increasing hazard rate). Thus, with longer waiting, reward prediction increases and the error when the reward occurs decreases. Indeed, the dopamine response to the reward decreased during waiting, indicating that the neurons accessed the temporally increasing reward prediction derived from the overall task experience (rather than a decreasing prediction derived from the repeating reward omissions)14. A recent experiment confirmed this result in mice but tested also slightly uncertain rewards (probability of P = 0.9). Here, the animal never knew for sure whether the reward would ultimately come and might increasingly expect none as time advances (like humans giving up waiting for an unreliable bus). But when the reward does occur, the prediction error and the dopamine response are higher the longer the wait was15. Thus, the dopamine response reflects access to reward predictions that are inferred from the temporal structure of reward probabilities rather than deriving entirely from the occurrence or omission of last rewards. Interestingly, reward-predicting responses in amygdala reflect also temporal reward probability16, indicating that reward neurons in general may access more sophisticated reward representations than hitherto assumed.
Reward predictions accessed by dopamine neurons derive from probability distributions of reward amounts. A larger reward compared with the expected value (predicted mean) of a predicted distribution activates dopamine neurons in monkeys, and a smaller reward induces a depression17–20. Dopamine responses change their gain depending on the variance of the distribution21, suggesting access to at least the first two statistical moments of distributions. By contrast, with a predicted distribution of only two fixed reward amounts, something unexpected happens in mice: there is no response when either of the two predicted rewards occurs but a graded response in rare probe trials that tends to increase with the absolute difference to each of the two predicted rewards; the response is positive for amounts slightly above the lower reward, negative for amounts slightly below the upper reward, and zero for amounts right between the two rewards22. For an intuitive example, imagine a restaurant with two randomly alternating chefs with widely different ability: when the food is almost but not quite spectacular, we realize the good chef was cooking but may have overlooked something, thus generating a negative prediction error (relative to the predicted superb meal from that chef), even though the food was better than from the other chef and above the mean from both chefs. Thus, dopamine neurons access rich reward probability distributions via their statistical moments but can access individual elements when distributions are very restricted. As seen during waiting14,15 and reward reversal23, the reward predictions accessed by dopamine neurons derive not only from recent rewards but also from the overall reward structure of the environment.
Perceptual choices help to further reveal what’s on dopamine’s mind. Dopamine responses to a set of choice options reflect the animal’s future choice. When a monkey chooses the more frequently rewarded option, the stimulus response is stronger compared with choosing the less often rewarded option, despite identical option presentation. As reward probability constitutes value, the neurons code “chosen value” (that is, the value of the option the animal chooses) rather than the mean value of all options24,25. The chosen value response occurs to the stimulus and partly precedes and thus predicts the choice. In these straightforward tests, the animal chooses, with some stochasticity, between values that are firmly associated with the options. By contrast, in perceptual random-dot motion choice tasks, the value depends on the animal’s discrimination of motion direction, and the reward probabilities are not firmly associated with constant, unequivocally marked options. Higher motion coherence allows better discrimination and thus increases the probability of getting a reward. Thus, with higher coherence, reward value increases monotonically when choosing the correct motion direction but decreases monotonically when choosing the opposite, incorrect direction. Dopamine neurons in monkeys and mice show exactly this graded chosen value response during random-dot motion and contrast detection tasks26,27. The value responses before each choice derive from the combination of the animal’s stimulus assessment and the subjective probability of making a correct discrimination (“subjective” in the sense of perception rather than individual economic probability weighing). As the targets are not distinctly marked for value, the responses cannot simply reflect the experienced reward probability for a given target.
Taken together, dopamine neurons have access to representations of future rewards that not only are associated with explicit stimuli but also derive from environmental factors like context, task structure, and time. These internal representations may be more globally called belief states and, when they reflect prior probabilities, Bayesian belief states22,26. These representations or beliefs are parts of reward predictions that affect dopamine neurons, which report their deviation from the actual obtained primary and conditioned rewards as “reward prediction error”.
Rewards don’t exist; they are made up by our minds. The third steak during a dinner is not attractive although it is pretty similar to the first two appetizing steaks. Plenty of other examples confirm that reward value is subjective and depends on non-physical factors like satiety, delay, and risk. While we can forever test individual cases of subjective value, economic theory provides concepts for understanding subjective value and preferences and predicting behavioral choices under various conditions, including risk. An example is the utility signal of dopamine neurons that transcends the ad-hoc coding of subjective value19. This neuronal result aligns biological reward to economic choice and constitutes a prerequisite for understanding how individuals maximize utility for momentary and evolutionary benefit.
But what would a dopamine signal for such a theoretical decision variable do in a real-world scenario? One of the most intuitive and reliable phenomena in economics is the price–demand relationship. As the price goes up, consumption goes down; people buy less stuff when it gets more expensive. But if the good becomes more valuable, demand increases, which shifts price–demand curves to the right. Price can be modeled as number of lever presses in rats, and value can be enhanced by dopamine stimulation, although further known factors affecting consumption may be too extensive for an initial, well-controlled study, such as availability of alternatives, time, and effort. How then would a dopamine economic value (utility) signal affect consumer choice? Indeed, inducing a positive dopamine reward prediction error signal by optogenetic excitation at the reward shifts the curves upward and rightward, indicating that the stimulation enhances value, thereby increasing demand at same price and maintaining same consumption despite higher price (Figure 2)28. Stimulation at the reward-predicting cue has the opposite effect (by lowering reward value due to a negative prediction error elicited by the reward following the enhanced value prediction). This well-conceptualized situation, even with the restrictions imposed on an initial study, demonstrates that the dopamine utility signal has a very practical application; it affects daily consumer choice by influencing the value of a good. This beautiful result, outside the beaten path, suggests many follow-up experiments.
Rewards are fine for me but may not be so great when somebody else receives them instead of me. Monkeys see it the same way; they value rewards more when they occur more frequently for themselves but not so much when they occur for another monkey, as shown by licking and binary choice. Dopamine neurons follow this social reward valuation; higher probability of own water reward elicits stronger responses, confirming standard reward value coding, whereas higher reward probability for the other monkey reduces own dopamine responses29. It seems that this disadvantageous reward inequity has negative reward value for dopamine neurons. Thus, dopamine neurons register everybody’s rewards but value them only relative to their host. Their primary concern with own reward resembles that of most reward neurons in the striatum30, some of which sense disadvantageous reward inequity31.
Environmental rewards and reward-predicting stimuli contain a non-value component that impacts on sensory receptors, but their identification and evaluation take a few tens or hundreds of milliseconds. Dopamine neurons, in analogy to other neuronal systems, show an early unselective activation, which reflects sensory detection of the stimulus32 and constitutes a default signal for any potential reward in the environment; it is quickly replaced, before any behavioral action, by the subsequent prediction error component that codes reward value19,33–35; recent studies confirm this notion36. Thus, the initial, non-reward activation constitutes an integral part of the dopamine reward response. Its identification requires temporal resolution in the ten-millisecond range and is often difficult, in particular with unrewarded, value-less stimuli not allowing independent variation of sensory and reward parameters.
Several factors affect the initial, sensory dopamine activation. First, it increases with physical impact and salience, irrespective of reward or aversive value34. Second, it is elicited and enhanced by neutral or punishment-predicting stimuli that resemble rewards or occur in rewarding contexts37–39. Finally, it occurs with novel stimuli in humans, monkeys, and mice25,40–42. The novelty component decays during conditioning (due to repetition), whereas the reward-predicting component increases25,42. The unpredicted occurrence of an unrewarded picture and positive sensory prediction errors enhance the initial-component response but, in contrast to bidirectional reward prediction error coding, picture omission does not seem to elicit a dopamine depression in monkeys and rats33,38,43 (Figure 3A–D). Thus, the initial dopamine response component seems to code surprise salience rather than a full, bidirectional prediction error. In contrast to the initial sensory component, delivery of different juices with different sensory attributes elicits a bidirectional reward prediction error response that reflects the value of the juices (Figure 3E, F).
For 40 years, many studies, including our own, reported activations by aversive stimuli in some dopamine neurons (for references, see 35). However, aversive events contain several components, as do rewards, and their dissociation concluded that dopamine activations by aversive stimuli reflect physical impact (first component) rather than aversiveness34; aversiveness is coded not at all34 or as depression of activity reflecting negative reward value (second component)44,45. Dopamine reward neurons are also activated by negative punishment prediction error, which has positive value (double negative)39,45,46, by rebound from aversive depression34,45, and by prediction of relief from punishment45–47, which is rewarding48,49. Thus, some of the recently reported activations by aversive air puff, sound, and foot shock44,45 might reflect rewarding relief from the threat these stimuli might pose to the animals, even if these neurons do not code standard reward.
In contrast to these reward responses, recent studies report activations in dopamine subgroups in lateral substantia nigra, striatum tail, and ventro-medial nucleus accumbens shell in response to air puff, intense sound, and foot shock but not with physically less intense aversive quinine nor much with reward42,44,45. These responses may reflect physical impact or aversion or both. The foot shock activation transfers to predictive stimuli during learning in ventro-medial nucleus accumbens shell45. This result would refute a possible relation to physical impact, which is unchanged, but it might also reflect temporal surprise salience; it might even indicate transfer of an early-component sensory impact response in analogy to the known transfer of the subsequent value component. Nonetheless, these neurons differ in molecular and physiological properties and have striatal projection territories different from those of the typical, straightforward reward-processing dopamine neurons44,45. Foot shock omission fails to elicit depressions in these dopamine neurons45; this lack of bidirectional prediction error coding would make an involvement in reinforcement learning less direct. Furthermore, optogenetic excitation of dopamine axons in striatum tail elicits behavioral aversion44, indicating a truly aversive function (though without completely mimicking the brain’s mechanics of natural excitation). The physically less intense quinine is ineffective despite its behavioral aversiveness44, which argues for a contribution of physical impact and against general negative value coding.
Thus, if physical impact remains an option for explaining activations by aversive stimuli, we might be dealing with the opposite tails of two continuous probability distributions: one for physical impact and one for value. Then dopamine neurons with activations by aversive stimuli might lie at the high end of the physical impact distribution, and their weak reward coding would be at the low end of the value distribution. On the other hand, despite all the caveats, optogenetics may have uncovered groups of dopamine neurons that are truly activated by specific punishers and thus differ qualitatively from reward-processing dopamine neurons45, after 40 years of trying to nail them. If so, they might be parts of an ancient system detecting fear (of air puff, intense sound, foot shock, and novelty) rather than disgust (quinine)44 and contrast with the abundant reward-coding dopamine neurons that are depressed by aversive stimuli and code outcome value monotonically from negative to positive39,44. Dopamine neurons in fruit flies show similar response diversity—about 130 neurons code reward and 12 neurons code punishment50 suggesting preservation across a huge evolutionary range. So, ten years from now, will we know whether the dopamine activations by aversive stimuli reflect physical impact or aversiveness or maybe both?
Even though the common assumption of one brain system equals one function may not hold for dopamine1, such multifunctionality seems perplexing and gives rise to the question “What is dopamine doing?”
The earliest behavioral studies of midbrain dopamine neurons and striatal dopamine concentrations in monkeys and rats report heterogeneous activations and depressions for a second or more with movements51–55. Dopamine changes are associated with task events such as large contralateral or ipsilateral arm reaching movements (16–44% and 15–17% of neurons, respectively), self-initiated arm movements (12%), reward delivery and mouth movements (9%), and full trial duration (5%). However, such changes are absent with more concise movements, such as well-controlled arm flexion-extension56, stereotyped reaching41, sluggish reaching elicited by offset of a stimulus57, and spontaneous and stimulus-driven eye movements57. The monitoring of large numbers of individual muscles in monkeys (Figure 4) shows that these heterogeneous dopamine changes are unrelated to specific movements or motor control but reflect the behavioral activation underlying large movements, derived from the activity of many muscles55,57,58 and of sensory receptors in muscle, joint, and skin associated with such movements, a global process that might also be called vigor or even motivation.
The advent of dopamine voltammetry, molecular identification, optogenetics, and optical recording allows us to further characterize these behavior-related changes, associate them with different neuronal populations and their projection territories, and distinguish them from reward prediction error responses. Recent studies describe dopamine changes when rodents move in open fields, small chambers, levers, nose poke ports, T-mazes, running wheels, and trackballs6,59–68, whereas specific motor processes engaging only few muscles are ineffective69. The dopamine changes are heterogeneous in terms of timing during test trials, behavioral variable being encoded, and midbrain location. Thus, early in each trial, activity in distinct dopamine neurons varies with different movement parameters like speed and acceleration, whereas at trial end more neurons code mouth movement or reward68. While some studies provide fine-grained statistical dissociation68, some of the effective behavioral variables, like reward expectation leading to faster movement and movement speed reflecting vigor and motivation, might be intercorrelated; indeed, a common variable underlying these behaviors might be arousal and general behavioral activation. The molecular, cellular, and input heterogeneity of dopamine neuron groups and the differential projection topography between midbrain and striatum71–73 would allow specific dopamine influences on particular postsynaptic targets. Correspondingly, optogenetic dopamine excitation elicits locomotion and biases choice depending on the midbrain region being stimulated, whereas inhibition elicits opposite effects61,64,65, suggesting an active behavioral role of the observed dopamine changes (even without knowing the animal’s “feeling” when receiving a dopamine shock without accompanying sensory or motor cortex activity). By contrast, some motivation-related changes in striatal dopamine concentration are not associated with dopamine impulse changes in the soma67 and may derive from local presynaptic influences that have long been recognized74,75. (As with other neurotransmitter systems, dopamine function depends on transmitter release and postsynaptic receptors in addition to the temporally precise impulse responses.)
The amazing spectrum and heterogeneity of dopamine relationships to behavioral activation contrast with the rather stereotyped reward prediction error response that varies across neurons in only a single scalar parameter36. The prediction error response stands out more; it is more phasic and has a higher instantaneous impulse rate and a shorter duration than the changes related to behavioral activation. These differences are particularly evident with the high temporal resolution of neurophysiological impulse responses. Nevertheless, the detection of prediction error responses requires explicit events that allow to identify predictions and to subtract their value from that of the reward. Analyses using reinforcement models help to further identify dopamine prediction error responses in elaborate tasks64,76.
How might these seemingly separate modes of dopamine action relate to each other? Despite attempts to derive a common activational role77, it is currently unclear how the heterogeneous relationships to behavioral activation might emerge from prediction error coding. One may dissociate the behavioral activation from prediction error coding by their respective spatial and non-spatial specificity78 or explain the dopamine voltammetry signal during movement and reward expectation by prediction error coding79–81, or behavioral activation and reward prediction error might be coded in different dopamine groups. In rodents, movement relationships are more frequent in substantia nigra dopamine neurons and their striatum-projecting regions, whereas reward prediction error coding is abundant in ventral tegmental area neurons and their nucleus accumbens projection6,62,67,68. These differences are gradual and do not constitute the strong medio-lateral midbrain or the ventro-dorsal striatum dichotomy seen in regional lesion experiments. Similar graded, rather than strict, differences are seen in monkeys, whose dopamine neurons in substantia nigra signal reward less frequently (<60%) than in ventral tegmental area (>70–80%)41,82; in corresponding striatal projection territories, reward expectation affects 40 to 50% of caudate and anterior putamen neurons and more than 75% of nucleus accumbens neurons83.
Thus, the notion of one neuronal system having exactly one function may not be valid for dopamine neurons, however hard we try. Maybe such an evolutionarily ancient system, which exists already in fruit flies, has multiple functions that are difficult to capture in a single term. A common denominator for the role of phasic dopamine activity might be to get the animal what it needs to survive, like detecting reward and coding the action for obtaining it (the two key components of motivation), although that sounds awfully superficial given the intricate complexity of the system.
The investigation of dopamine function and the underlying networks are currently in full swing. The past several years have revealed many details that help us get a better understanding of dopamine function, and lots of mysticism has disappeared. We are not dealing with a system with clear-cut and well-parcellated functions, but we know that some of the dopamine functions are crucial for the animal’s survival. What we don’t know are at least two things.
How does the dopamine reward signal, as the strongest component of dopamine function, get us the best reward and thus help evolutionary fitness? An obvious approach is to study economic decision-making, which has well-developed concepts for maximizing utility. This approach assumes that decision makers identify, process, and deliberate about all available options and have clear preferences, which underlies the first Von Neumann–Morgenstern utility axiom (“completeness”). But there are many exceptions to rational decision-making, and many decisions are not based on identifiable options. We often just do what we do without actively considering the alternatives. What is the role of dopamine neurons in these processes?
As the investigation of dopamine function has revealed a number of important processes, then what are the other “neuromodulatory” systems hiding? Can we get a handle on norepinephrine after its attentional functions have been so well described84? And what about serotonin— would it have several, diverse functions85,86 but ultimately a coherent denominator? And what about acetylcholine? We have tons of work to do.
Of course, all of these processes may go wrong in brain disorders, which affect more than 20% of the population and present a major human challenge. For that reason, we should invest substantial portions of our wealth into all fields of neuroscience.
The author is indebted to Armin Lak (University College London), Joseph F. Cheers (University of Maryland School of Medicine), Stephan Lammel (University of California, Berkeley), Donita L. Robinson (University of North Carolina), and Alexander Gomez (University of North Carolina) for helpful comments.
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Competing Interests: No competing interests were disclosed.
Competing Interests: No competing interests were disclosed.
Competing Interests: No competing interests were disclosed.
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | |||
---|---|---|---|
1 | 2 | 3 | |
Version 1 24 Sep 19 |
read | read | read |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)