Model-based predictions for dopamine

https://doi.org/10.1016/j.conb.2017.10.006Get rights and content

Highlights

  • Dopamine reward prediction error signals reflect model-based information.

  • Model-based predictions rely on multiple features of expected outcomes.

  • Multiple dimensions of prediction include reward identity, delay, variability.

  • Recent work establishes a role for dopamine in learning model-based associations.

  • We explore the computational implications of model-based learning with dopamine prediction errors.

Phasic dopamine responses are thought to encode a prediction-error signal consistent with model-free reinforcement learning theories. However, a number of recent findings highlight the influence of model-based computations on dopamine responses, and suggest that dopamine prediction errors reflect more dimensions of an expected outcome than scalar reward value. Here, we review a selection of these recent results and discuss the implications and complications of model-based predictions for computational theories of dopamine and learning.

Introduction

The striking correspondence between the phasic responses of midbrain dopamine neurons and the temporal-difference reward prediction error posited by reinforcement-learning theory is by now well established [1, 2, 3, 4, 5]. According to this theory, dopamine neurons broadcast a prediction error  the difference between the learned predictive value of the current state, signaled by cues or features of the environment, and the sum of the current reward and the value of the next state. Central to the normative grounding of temporal-difference reinforcement learning (TDRL) is the definition of ‘value’ as the expected sum of future (possibly discounted) rewards [6], from whence the learning rule can be derived directly. The algorithm also provides a simple way to learn such values using prediction errors, which is thought to be implemented in the brain through dopamine-modulated plasticity in corticostriatal synapses [7, 8] (Figure 1, left). This theory provides a parsimonious account of a number of features of dopamine responses in a range of learning tasks [9, 10, 11, 12].

Section snippets

Are model-free dopamine prediction errors a red herring?

A core tenet of TDRL is that it is ‘model-free’: learned state values are aggregate, scalar representations of total future expected reward, in some common currency [1, 13]. That is, the value of a state is a quantitative summary of future reward amount, irrespective of either the specific form of the expected reward (e.g., water, food, a combination of the two), or the sequence of future states through which it will be obtained (e.g., will water be presented before or after food). Critically,

Temporal representation and dopamine

One notable property of dopamine prediction errors is that they are temporally precise: if an expected reward is omitted, the phasic decrease in dopamine neuron activity appears just after the time the reward would have occurred [2]. It is this phenomenon that inspired the TDRL algorithm, which models such temporally precise predictions by postulating sequences of time-point states that are triggered by a stimulus (known as the ‘complete serial compound,’ CSC stimulus representation, or ‘tapped

Not all dopaminergic predictions are learned through direct experience

Indeed, a central aspect of TDRL that makes it model free is that, in the algorithm, values for state are learned (and cached) through direct experience with the state. Recent work suggests, however, that phasic dopamine may reflect values that have been learned indirectly. Of particular relevance is a sensory preconditioning experiment showing that reward predictions that are ascribed to a cue solely through its relationship to another neutral cue are reflected in dopamine neuron firing. Here,

Multiple dimensions of prediction in dopamine responses

Another fundamental property of TDRL is that it learns aggregate, scalar predictions of the sum of future rewards predicated on occupying the current state  a ‘common currency’ value that sums over apples, oranges, sex and sleep. As alluded to above, and complicating the mapping between dopamine and TDRL even further, it appears that dopamine neurons respond to deviations from predictions in dimensions other than scalar value [49]. In particular, prediction errors have been recorded for an

Model-based learning with dopamine prediction errors

All told, current findings suggest that dopamine neurons have access to model-based representations of expected rewards that reflect learned properties beyond a scalar representation of value (Figure 1, right). However, the convergence of TDRL to a useful value representation stems from the alignment between the computational goal of the agent (to maximize total reward through value-guided action) and the single dimension along which reward predictions are represented (i.e., scalar value).

So what is the role of dopamine in learning?

One thing that these recent studies make clear is that a better understanding of the computational role of dopamine entails a broader consideration of what it means for a reinforcement learning algorithm to be ‘model-based’ [34]. Model-based prediction in RL has been most strongly identified with the use of models for forward planning, enabling values to be computed on the fly (as opposed to cached) in order to flexibly support goal-directed behavior [65]. But models may also be exploited to

Conflict of interest statement

Nothing declared.

References and recommended reading

Papers of particular interest, published within the period of review, have been highlighted as:

  • • of special interest

  • •• of outstanding interest

Acknowledgements

This work was funded by grant R01DA042065 from the National Institute on Drug Abuse (AJL, YN), grant W911NF-14-1-0101 from the Army Research Office (YN, MJS), an NHMRC CJ Martin fellowship (MJS), and the Intramural Research Program at the National Institute on Drug Abuse (ZIA-DA000587) (MJS, GS). The opinions expressed in this article are the authors’ own and do not reflect the view of the NIH/DHHS.

References (68)

  • S.J. Gershman

    Dopamine, inference, and uncertainty

    Neural Comput

    (2017)
  • L. Deserno et al.

    Ventral striatal dopamine reflects behavioral and neural signatures of model-based control during sequential decision making

    Proc Natl Acad Sci

    (2015)
  • E.S. Bromberg-Martin et al.

    Dopamine in motivational control: rewarding, aversive, and alerting

    Neuron

    (2010)
  • P.H. Rudebeck et al.

    The orbitofrontal oracle: cortical mechanisms for the prediction and evaluation of specific behavioral outcomes

    Neuron

    (2014)
  • K. Doya et al.

    Multiple model-based reinforcement learning

    Neural Comput

    (2002)
  • W. Menegas et al.

    Opposite initialization to novel cues in dopamine signaling in ventral and posterior striatum in mice

    eLife

    (2017)
  • S. Threlfell et al.

    Striatal dopamine release is triggered by synchronized activity in cholinergic interneurons

    Neuron

    (2012)
  • S.J. Gershman et al.

    Time representation in reinforcement learning models of the basal ganglia

    Front Comput Neurosci

    (2014)
  • E.A. Ludvig et al.

    Stimulus representation and the timing of reward-prediction errors in models of the dopamine system

    Neural Comput

    (2008)
  • P.R. Montague et al.

    A framework for mesencephalic dopamine systems based on predictive Hebbian learning

    J Neurosci

    (1996)
  • W. Schultz et al.

    A neural substrate of prediction and reward

    Science

    (1997)
  • M.R. Roesch et al.

    Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards

    Nat Neurosci

    (2007)
  • N. Eshel et al.

    Arithmetic and local circuitry underlying dopamine prediction errors

    Nature

    (2015)
  • R.S. Sutton et al.
    (1998)
  • J.N. Reynolds et al.

    A cellular mechanism of reward-related learning

    Nature

    (2001)
  • S. Yagishita et al.

    A critical time window for dopamine actions on the structural plasticity of dendritic spines

    Science

    (2014)
  • J.R. Hollerman et al.

    Dopamine neurons report an error in the temporal prediction of reward during learning

    Nat Neurosci

    (1998)
  • G. Morris et al.

    Midbrain dopamine neurons encode decisions for future action

    Nat Neurosci

    (2006)
  • P.N. Tobler et al.

    Coding of predicted reward omission by dopamine neurons in a conditioned inhibition paradigm

    J Neurosci

    (2003)
  • W.-X. Pan et al.

    Tripartite mechanism of extinction suggested by dopamine neuron activity and temporal difference model

    J Neurosci

    (2008)
  • S. Kobayashi et al.

    Influence of reward delays on responses of dopamine neurons

    J Neurosci

    (2008)
  • A. Lak et al.

    Dopamine prediction error responses integrate subjective value from different reward dimensions

    Proc Natl Acad Sci

    (2014)
  • P. Dayan et al.

    Model-based and model-free Pavlovian reward learning: revaluation, revision, and revelation

    Cogn Affect Behav Neurosci

    (2014)
  • N.D. Daw et al.

    Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control

    Nat Neurosci

    (2005)
  • Cited by (0)

    View full text