Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Letter
  • Published:

Temporal difference models describe higher-order learning in humans

Abstract

The ability to use environmental stimuli to predict impending harm is critical for survival. Such predictions should be available as early as they are reliable. In pavlovian conditioning, chains of successively earlier predictors are studied in terms of higher-order relationships, and have inspired computational theories such as temporal difference learning1. However, there is at present no adequate neurobiological account of how this learning occurs. Here, in a functional magnetic resonance imaging (fMRI) study of higher-order aversive conditioning, we describe a key computational strategy that humans use to learn predictions about pain. We show that neural activity in the ventral striatum and the anterior insula displays a marked correspondence to the signals for sequential learning predicted by temporal difference models. This result reveals a flexible aversive learning process ideally suited to the changing and uncertain nature of real-world environments. Taken with existing data on reward learning2, our results suggest a critical role for the ventral striatum in integrating complex appetitive and aversive predictions to coordinate behaviour.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Experimental design and temporal difference model.
Figure 2: Temporal difference prediction error (statistical parametric maps).
Figure 3: Temporal difference prediction error (impulse responses).
Figure 4: Temporal difference value (statistical parametric maps and impulse response in the right anterior insula).

Similar content being viewed by others

References

  1. Sutton, R. S. & Barto, A. G. in Learning and Computational Neuroscience: Foundations of Adaptive Networks (eds Gabriel, M. & Moore, J.) 497–537 (MIT, Cambridge, Massachusetts, 1990)

    Google Scholar 

  2. Everitt, B. J. et al. Associative processes in addiction and reward. The role of amygdala–ventral striatal subsystems. Ann. NY Acad. Sci. 877, 412–438 (1999)

    Article  ADS  CAS  Google Scholar 

  3. LeDoux, J. Fear and the brain: where have we been, and where are we going? Biol. Psychiatry 44, 1229–1238 (1998)

    Article  CAS  Google Scholar 

  4. Buchel, C. & Dolan, R. J. Classical fear conditioning in functional neuroimaging. Curr. Opin. Neurobiol. 10, 219–223 (2000)

    Article  CAS  Google Scholar 

  5. Ploghaus, A. et al. Dissociating pain from its anticipation in the human brain. Science 284, 1979–1981 (1999)

    Article  CAS  Google Scholar 

  6. Ploghaus, A. et al. Learning about pain: the neural substrate of the prediction error for aversive events. Proc. Natl Acad. Sci. USA 97, 9281–9286 (2000)

    Article  ADS  CAS  Google Scholar 

  7. Dickinson, A. Contemporary Animal Learning Theory (Cambridge Univ. Press, Cambridge, UK, 1980)

    Google Scholar 

  8. Sutton, R. S. & Barto, A. G. Toward a modern theory of adaptive networks: expectation and prediction. Psychol. Rev. 88, 135–170 (1981)

    Article  CAS  Google Scholar 

  9. Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction (MIT, Cambridge, Massachusetts, 1998)

    Google Scholar 

  10. Montague, P. R., Dayan, P. & Sejnowski, T. J. A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J. Neurosci. 16, 1936–1947 (1996)

    Article  CAS  Google Scholar 

  11. Schultz, W., Dayan, P. & Montague, P. R. A neural substrate of prediction and reward. Science 275, 1593–1599 (1997)

    Article  CAS  Google Scholar 

  12. Suri, R. E. & Schultz, W. Temporal difference model reproduces anticipatory neural activity. Neural Comput. 13, 841–862 (2001)

    Article  CAS  Google Scholar 

  13. O'Doherty, J. P., Dayan, P., Friston, K., Critchley, H. & Dolan, R. J. Temporal difference models and reward-related learning in the human brain. Neuron 38, 329–337 (2003)

    Article  CAS  Google Scholar 

  14. Friston, K. J., Tononi, G., Reeke, G. N. Jr, Sporns, O. & Edelman, G. M. Value-dependent selection in the brain: simulation in a synthetic neural model. Neuroscience 59, 229–243 (1994)

    Article  CAS  Google Scholar 

  15. McClure, S. M., Berns, G. S. & Montague, P. R. Temporal prediction errors in a passive learning task activate human striatum. Neuron 38, 339–346 (2003)

    Article  CAS  Google Scholar 

  16. Daw, N. D., Kakade, S. & Dayan, P. Opponent interactions between serotonin and dopamine. Neural Netw. 15, 603–616 (2002)

    Article  Google Scholar 

  17. Brandon, S. E., Vogel, E. H. & Wagner, A. R. Stimulus representation in SOP: I. Theoretical rationalization and some implications. Behav. Processes 62, 5–25 (2003)

    Article  Google Scholar 

  18. Barto, A. G., Sutton, R. S. & Anderson, C. W. Neuronlike elements that can solve difficult learning problems. IEEE Trans. Syst. Man Cybern. 13, 834–846 (1983)

    Article  Google Scholar 

  19. Barto, A. G., Sutton, R. S. & Watkins, C. J. C. H. in Learning and Computational Neuroscience: Foundations of Adaptive Networks (eds Gabriel, M. & Moor, J.) 539–602 (MIT, Cambridge, Massachusetts, 1990)

    Google Scholar 

  20. Barto, A. G. in Models of Information Processing in the Basal Ganglia (eds Houk, J. C., Davis, J. L. & Beiser, D. G.) 215–232 (MIT, Cambridge, Massachusetts, 1995)

    Google Scholar 

  21. Chudler, E. H. & Dong, W. K. The role of the basal ganglia in nociception and pain. Pain 60, 3–38 (1995)

    Article  CAS  Google Scholar 

  22. Solomon, R. L. & Corbit, J. D. An opponent-process theory of motivation. I. Temporal dynamics of affect. Psychol. Rev. 81, 119–145 (1974)

    Article  CAS  Google Scholar 

  23. Dickinson, A. & Dearing, M. F. in Mechanisms of Learning and Motivation (eds Dickinson, A. & Boakes, R. A.) 203–231 (Erlbaum, Hillsdale, New Jersey, 1979)

    Google Scholar 

  24. Horvitz, J. C. Mesolimbocortical and nigrostriatal dopamine responses to salient non-reward events. Neuroscience 96, 651–656 (2000)

    Article  CAS  Google Scholar 

  25. Azmitia, E. C. & Segal, M. An autoradiographic analysis of the differential ascending projections of the dorsal and median raphe nuclei in the rat. J. Comp. Neurol. 179, 641–667 (1978)

    Article  CAS  Google Scholar 

  26. Mirenowicz, J. & Schultz, W. Preferential activation of midbrain dopamine neurons by appetitive rather than aversive stimuli. Nature 379, 449–451 (1996)

    Article  ADS  CAS  Google Scholar 

  27. Horvitz, J. C. Dopamine gating of glutamatergic sensorimotor and incentive motivational input signals to the striatum. Behav. Brain Res. 137, 65–74 (2002)

    Article  CAS  Google Scholar 

  28. Ploghaus, A., Becerra, L., Borras, C. & Borsook, D. Neural circuitry underlying pain modulation: expectation, hypnosis, placebo. Trends Cogn. Sci. 7, 197–200 (2003)

    Article  Google Scholar 

  29. Deichmann, R., Gottfried, J. A., Hutton, C. & Turner, R. Optimized EPI for fMRI studies of the orbitofrontal cortex. Neuroimage 19, 430–441 (2003)

    Article  CAS  Google Scholar 

  30. Buchel, C., Dolan, R. J., Armony, J. L. & Friston, K. J. Amygdala–hippocampal involvement in human aversive trace conditioning revealed through event-related functional magnetic resonance imaging. J. Neurosci. 19, 10869–10876 (1999)

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We thank P. Allen and E. Featherstone for technical help. This work was funded by Wellcome Trust program grants to R.S.F., K.J.F., M.K. and R.J.D. P.D. was funded by the Gatsby Charitable foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ben Seymour.

Ethics declarations

Competing interests

The authors declare that they have no competing financial interests.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Seymour, B., O'Doherty, J., Dayan, P. et al. Temporal difference models describe higher-order learning in humans. Nature 429, 664–667 (2004). https://doi.org/10.1038/nature02581

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nature02581

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing