Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Brief Communication
  • Published:

Action initiation shapes mesolimbic dopamine encoding of future rewards

Subjects

Abstract

It is widely held that dopamine signaling encodes predictions of future rewards and such predictions are regularly used to drive behavior, but the relationship between these two is poorly defined. We found in rats that nucleus accumbens dopamine following a reward-predicting cue was attenuated unless movement was correctly initiated. Our results indicate that dopamine release in this region is contingent on correct action initiation and not just reward prediction.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Go/No-Go task in experiment 1.
Figure 2: NAcc dopamine signals on Go and No-Go trials in experiment 1.
Figure 3: Experiment 2: behavior and dopamine signals.

Similar content being viewed by others

References

  1. Montague, P.R., Dayan, P. & Sejnowski, T.J. J. Neurosci. 16, 1936–1947 (1996).

    Article  CAS  Google Scholar 

  2. Schultz, W., Dayan, P. & Montague, P.R. Science 275, 1593–1599 (1997).

    Article  CAS  Google Scholar 

  3. Bayer, H.M. & Glimcher, P.W. Neuron 47, 129–141 (2005).

    Article  CAS  Google Scholar 

  4. Cohen, J.Y., Haesler, S., Vong, L., Lowell, B.B. & Uchida, N. Nature 482, 85–88 (2012).

    Article  CAS  Google Scholar 

  5. Gan, J.O., Walton, M.E. & Phillips, P.E. Nat. Neurosci. 13, 25–27 (2010).

    Article  CAS  Google Scholar 

  6. Tobler, P.N., Fiorillo, C.D. & Schultz, W. Science 307, 1642–1645 (2005).

    Article  CAS  Google Scholar 

  7. Roesch, M.R., Calu, D.J. & Schoenbaum, G. Nat. Neurosci. 10, 1615–1624 (2007).

    Article  CAS  Google Scholar 

  8. Day, J.J., Roitman, M.F., Wightman, R.M. & Carelli, R.M. Nat. Neurosci. 10, 1020–1028 (2007).

    Article  CAS  Google Scholar 

  9. Howe, M.W., Tierney, P.L., Sandberg, S.G., Phillips, P.E. & Graybiel, A.M. Nature 500, 575–579 (2013).

    Article  CAS  Google Scholar 

  10. Nicola, S.M. J. Neurosci. 30, 16585–16600 (2010).

    Article  CAS  Google Scholar 

  11. Robbins, T.W. & Everitt, B.J. Psychopharmacology (Berl.) 191, 433–437 (2007).

    Article  CAS  Google Scholar 

  12. Flagel, S.B. et al. Nature 469, 53–57 (2011).

    Article  CAS  Google Scholar 

  13. Phillips, P.E., Stuber, G.D., Heien, M.L., Wightman, R.M. & Carelli, R.M. Nature 422, 614–618 (2003).

    Article  CAS  Google Scholar 

  14. Roitman, M.F., Stuber, G.D., Phillips, P.E., Wightman, R.M. & Carelli, R.M. J. Neurosci. 24, 1265–1271 (2004).

    Article  CAS  Google Scholar 

  15. Guitart-Masip, M. et al. Proc. Natl. Acad. Sci. USA 109, 7511–7516 (2012).

    Article  CAS  Google Scholar 

  16. Jin, X. & Costa, R.M. Nature 466, 457–462 (2010).

    Article  CAS  Google Scholar 

  17. Bari, A. & Robbins, T.W. Prog. Neurobiol. 108, 44–79 (2013).

    Article  Google Scholar 

  18. Phillips, P.E., Robinson, D.L., Stuber, G.D., Carelli, R.M. & Wightman, R.M. Methods Mol. Med. 79, 443–464 (2003).

    CAS  PubMed  Google Scholar 

  19. Clark, J.J. et al. Nat. Methods 7, 126–129 (2010).

    Article  CAS  Google Scholar 

  20. Heien, M.L., Phillips, P.E., Stuber, G.D., Seipel, A.T. & Wightman, R.M. Analyst 128, 1413–1419 (2003).

    Article  CAS  Google Scholar 

  21. Wanat, M.J., Kuhnen, C.M. & Phillips, P.E. J. Neurosci. 30, 12020–12027 (2010).

    Article  CAS  Google Scholar 

  22. Heien, M.L., Johnson, M.A. & Wightman, R.M. Anal. Chem. 76, 5697–5704 (2004).

    Article  CAS  Google Scholar 

  23. Heien, M.L. et al. Proc. Natl. Acad. Sci. USA 102, 10023–10028 (2005).

    Article  CAS  Google Scholar 

  24. Green, D.M. & Swets, J.A. Signal Detection Theory and Psychophysics (Wiley, New York, 1966).

  25. Sutton, R.S. & Barto, A.C. Reinforcement Learning: An Introduction (MIT Press, London, 1998).

Download references

Acknowledgements

We would like to thank M. Baudonnat and L. Tankelevitch for assistance with data collection, N. Hollon and N. Kolling for analysis advice, and S. Ng-Evans for technical support. This work was funded by a Wellcome Trust Research Career Development Fellowship (WT090051MA to M.E.W.), the Medical Research Council UK (awards MC_UU_12020/5 and MC_UU_12024/2 to P.J.M., MC_UU_12024/5 to R.B., and MC_UU_12024/1 to P.B.), and a studentship from the Economic and Social Research Council and St John's College Oxford (to L.L.G.).

Author information

Authors and Affiliations

Authors

Contributions

P.B. conceived the core study, and P.B., P.J.M. and M.E.W. developed this and, with E.C.J.S., planned the experiments. For experiments 1 and 2, E.C.J.S. performed surgeries, and collected and analyzed the data. For experiment 2, L.L.G. also performed surgery and collected the data. R.B. performed the simulations. M.E.W. supervised the study. M.E.W. and E.C.J.S. prepared the manuscript with input from the other authors.

Corresponding author

Correspondence to Mark E Walton.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Representation of recording sites.

(a) Schematic, along with an example photomicrograph, of the recording locations in the nucleus accumbens core in experiment 1 (n = 7 electrodes in 7 rats). (b) Schematic of the recording locations in the nucleus accumbens core in experiment 2 (n = 9 electrodes in 6 rats). The numbers next to each section indicate distance in mm anterior to bregma. Adapted from the atlas of Paxinos and Watson (2005).

Supplementary Figure 2 Behavioral performance and simulations of the actor-critic model in experiment 1.

(a) Average success rates, (b) holding times from cue onset to head exit (mean ± S.D.), and (c) reward delivery times after cue onset during experiment 1. For all box plots, the central mark is the median, the edges of the box are the 25th and 75th percentiles, and the whiskers extend to the most extreme data points. (d) Box plots of success rates derived from simulations. Note that the model also found the No-Go trials more difficult to perform correctly as there was only a single sequence of actions that resulted in the delivery of the pellet, while all other sequences of actions resulted in the light being turned on signaling an error. By contrast, on Go trials there existed multiple potential sequences that were not immediately incorrect. (e) Average RPEs (mean ± S.E.M) recorded in 0.5 s time steps for the different trial types for 7 simulated “rats”. Though the model was trained over a long period (that parallels the extended training received by the rats in experiment 1) to achieve qualitatively similar discrepancies in success rates as the real rats, there is nonetheless a positive RPE at cue presentation in all conditions. This occurred because the simulated animal was (i) unable to estimate time precisely from past events, and (ii) its accuracy was not at 100%. In particular, there was a large increase in the RPE at cue onset, because after reward delivery, the next trial could only begin after the ITI had ended, 5 s after reward delivery, so the simulated animal could not fully predict if entering and staying in the nose-poke would initiate a trial. The real rats in experiment 1 in fact made nose-poke responses during the ITI (< 4.5 s since previous reward delivery) on ~20% of trials, which could indicate that they also did not estimate time precisely and hence could not fully predict if staying in the nose poke would initiate a trial. The RPE at cue presentation was numerically smaller on No-Go than on Go trial, because, just like the real rats, the simulated animals had lower accuracy on No-Go trials; thus, they estimated a lower value for the state associated with No-Go cue. There was also an increase in the RPE at the time of reward delivery. This occurred because the same action had to be executed multiple times to result in the pellet delivery, and the state of the simulated animals did not include any information on its past actions, so the simulated animal could not fully predict if its action would result in pellet delivery in the next time step.

Supplementary Figure 3 Dopamine signals in experiment 1 to reward delivery and collection.

Dopamine data was aligned to reward delivery (a) and reward collection from the food magazine (b). Reward delivery occurred after the 2nd lever press on Go trials, and at cue offset on No-Go trials. The food magazine was situated on the opposite wall of the chamber to the nosepoke and levers. It took rats on average 1.85 s or 2.04 s for Go or No-Go trials, respectively, to reach the food magazine after reward delivery.

Supplementary Figure 4 Dopamine signals in delayed Go trials.

Upper panels display the unsmoothed average dopamine signals from experiment 1 (mean ± S.E.M.) recorded during ‘valid’ Go trials (post-cue RT < 1.7 s; blue filled line) or ‘delayed’ Go trials (post-cue RT > 1.7 s; dotted cyan line) (NB: delayed Go trials were not included in any other analyses) aligned to either (a) cue onset or (b) time of head exit from nose-poke. Lower panels show average discriminability between the Valid and Delayed Go trials for each timepoint (shaded area = population of 1000 permuted sessions; line: *, p < 0.05 permutation tests, corrected for multiple comparisons). (c) Boxplot of the nose-poke holding times (central mark is the median, the edges of the box are the 25th and 75th percentiles, the whiskers extend to the most extreme data points) for the valid or delayed Go trials (*, p = 0.02, W7= 0, Wilcoxon Signed Ranks Test)

Supplementary Figure 5 Dopamine predicts Wrong Go selections prior to the error being signaled.

Average unsmoothed dopamine release from experiment 1 (mean ± S.E.M.) recorded during successful (filled line) and wrong (dotted line) Go trials (trials where the animal correctly initiated an action but selected the wrong lever) aligned to the time of (a, d) cue onset, (b) head exit from nose-poke or (c) the first lever press (when the error is signaled). Lower panels show average discriminability between the Correct and Wrong Go trials for each timepoint (shaded area = population of 1000 permuted sessions; line: *, p < 0.05 permutation tests, corrected for multiple comparisons). (d-e) Comparison of dopamine release on successful and wrong Go trials as a function of speed of action initiation. (d) Successful (blue lines) or Wrong (red lines) Go trials sorted by “fast” (< 1 s, filled line) or “slow” (> 1 s, dotted line) nose-poke exit latency. (e) Regression weights for trial accuracy and action initiation latency for Go trials (a.u., arbitrary units). On average, rats were both significantly slower to exit the nose poke (p = 0.03, W7 = 1, Wilcoxon Signed Rank Test) and to move from the nose poke to make a lever press on Wrong trials (p = 0.02, W7 = 0, Wilcoxon Signed Rank Test). Nonetheless, these differences could not fully explain the different patterns of dopamine release on the Correct and Wrong trials.

Supplementary Figure 6 Dopamine on failed No-Go trials decreases when the error is signaled.

Average unsmoothed dopamine signals from experiment 1 (mean ± S.E.M.) recorded during successful (filled line) and failed (dotted line) No-Go trials (trials where the animal exited the nose poke before the end of the No-Go hold period) aligned to time of (a) cue onset or (b) head exit from nose-poke (when the error is signaled). Lower panels show average discriminability between the Correct and Failed No-Go trials for each timepoint (shaded area = population of 1000 permuted sessions; line: *, p < 0.05 permutation tests, corrected for multiple comparisons). (c) Boxplot of the average coefficients for the slopes of the linear fit of the dopamine signal in a 2.5 s window following cue onset for correct and failed No-Go trials (central mark is the median, the edges of the box are the 25th and 75th percentiles, the whiskers extend to the most extreme data points). The decrease in DA following cue onset is significantly greater in failed no-go trials than in successful no-go trials (*, p = 0.02, W7 = 0, Wilcoxon Signed Rank Test).

Supplementary Figure 7 Dopamine release scales with success rates on Go trials in experiment 1.

(a) Average dopamine release (mean ± S.E.M.) recorded during successful Go Left and Right trials in experiment 1, arranged in each animal by performance (best / highest success rate = red line, worst / lowest success rate = blue line). Lower panels show average discriminability (black line) between the Best and Worst Go trials for each time point (grey lines = population of 1000 permuted sessions; black bars with *, p < 0.05 permutation tests, corrected for multiple comparisons). Note the mean head exit latency was the same between the two conditions (Best Go = 0.59 s, Worst Go = 0.59 s). (b) Patterns of dopamine release on Go trials with equivalent nose-poke exit times, divided into trials with fast (< 1 s) or slow (> 1 s) response times to make the 1st lever press (filled and dashed blue lines, respectively), were qualitatively very similar.

Supplementary Figure 8 Behavioral performance in experiment 2.

(a) Average success rates in experiment 2. Central mark is the median, box edges are 25th and 75th percentiles, whiskers extend to the most extreme data points not considered outliers (points 1.67 x interquartile range away from the 25th or 75th percentile), and outliers are plotted individually. The two outlier points in the Go Small condition come from the same animal. (b) Holding times from cue onset to head exit (mean ± S.D.) on Correct, Wrong Response (wrong lever pressed) or No Response (no lever pressed within 5 s of cue onset) Go trials, or Correct or Failed (premature exit from nose poke) No-Go trials. (*, both p = 0.03, W6 = 0, Wilcoxon Signed Rank Test).

Supplementary Figure 9 Comparison between trial types in experiment 2.

Average discriminability between the different trial types in experiment 2 for each time point (grey lines = max and min from population of 1000 permuted sessions; line: *, p < 0.05 permutation tests, corrected for multiple comparisons). (a) Go Large v Go Small, (b) No-Go Large v Go Small, (c) No-Go Small v Go Small, (d) Go Large v No-Go Small, (e) No-Go Large v No-Go Small, (f) Go Large v No-Go Small.

Supplementary Figure 10 Dopamine release on No-Go trials in experiment 2 reflects correct action initiation.

(a, b) Average dopamine release (mean ± S.E.M.) on correct No-Go trials (green lines) compared to incorrect No-Go trials when the animals left the nose-poke prematurely (blue lines). Data is aligned to cue onset (a) or nose-poke exit (b). Large Reward trials are plotted with filled lines, Small Reward with dashed lines. Owing to the smaller proportion of Go trials in each session and increased success rate on Go Large trials, there were too few Incorrect Go trials to analyse in experiment 2 (< 2%). (c, d) Average No-Go dopamine release (mean ± SEM) on a subset of correctly performed No-Go trials where the rats’ exit from the nose-poke was either “fast” (< 600 ms after cue offset, filled green line) or “slow” (> 1,600 ms after cue offset, dashed green line). Data is aligned to cue offset (which occurred 1 s before reward delivery in experiment 2) (c) or nose-poke exit (d). Therefore, the fast head exit times occur > 400 ms before reward delivery and the slow ones > 600 ms after reward delivery.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–10 and Supplementary Table 1 (PDF 1346 kb)

Supplementary Methods Checklist (PDF 465 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Syed, E., Grima, L., Magill, P. et al. Action initiation shapes mesolimbic dopamine encoding of future rewards. Nat Neurosci 19, 34–36 (2016). https://doi.org/10.1038/nn.4187

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nn.4187

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing