Temporal difference models describe higher-order learning in humans

Seymour, Ben; O'Doherty, John P.; Dayan, Peter; Koltzenburg, Martin; Jones, Anthony K.; Dolan, Raymond J.; Friston, Karl J.; Frackowiak, Richard S.

doi:10.1038/nature02581

Letter
Published: 10 June 2004

Temporal difference models describe higher-order learning in humans

Ben Seymour¹,
John P. O'Doherty¹,
Peter Dayan²,
Martin Koltzenburg³,
Anthony K. Jones⁴,
Raymond J. Dolan¹,
Karl J. Friston¹ &
…
Richard S. Frackowiak^1,5

Nature volume 429, pages 664–667 (2004)Cite this article

5929 Accesses
453 Citations
3 Altmetric
Metrics details

Abstract

The ability to use environmental stimuli to predict impending harm is critical for survival. Such predictions should be available as early as they are reliable. In pavlovian conditioning, chains of successively earlier predictors are studied in terms of higher-order relationships, and have inspired computational theories such as temporal difference learning¹. However, there is at present no adequate neurobiological account of how this learning occurs. Here, in a functional magnetic resonance imaging (fMRI) study of higher-order aversive conditioning, we describe a key computational strategy that humans use to learn predictions about pain. We show that neural activity in the ventral striatum and the anterior insula displays a marked correspondence to the signals for sequential learning predicted by temporal difference models. This result reveals a flexible aversive learning process ideally suited to the changing and uncertain nature of real-world environments. Taken with existing data on reward learning², our results suggest a critical role for the ventral striatum in integrating complex appetitive and aversive predictions to coordinate behaviour.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Experimental design and temporal difference model.**

**Figure 2: Temporal difference prediction error (statistical parametric maps).**

**Figure 3: Temporal difference prediction error (impulse responses).**

**Figure 4: Temporal difference value (statistical parametric maps and impulse response in the right anterior insula).**

Impulsivity and risk-seeking as Bayesian inference under dopaminergic control

Article 10 August 2021

John G. Mikhael & Samuel J. Gershman

Computational models of adaptive behavior and prefrontal cortex

Article Open access 13 August 2021

Alireza Soltani & Etienne Koechlin

Neural and computational underpinnings of biased confidence in human reinforcement learning

Article Open access 28 October 2023

Chih-Chung Ting, Nahuel Salem-Garcia, … Maël Lebreton

References

Sutton, R. S. & Barto, A. G. in Learning and Computational Neuroscience: Foundations of Adaptive Networks (eds Gabriel, M. & Moore, J.) 497–537 (MIT, Cambridge, Massachusetts, 1990)
Google Scholar
Everitt, B. J. et al. Associative processes in addiction and reward. The role of amygdala–ventral striatal subsystems. Ann. NY Acad. Sci. 877, 412–438 (1999)
Article ADS CAS Google Scholar
LeDoux, J. Fear and the brain: where have we been, and where are we going? Biol. Psychiatry 44, 1229–1238 (1998)
Article CAS Google Scholar
Buchel, C. & Dolan, R. J. Classical fear conditioning in functional neuroimaging. Curr. Opin. Neurobiol. 10, 219–223 (2000)
Article CAS Google Scholar
Ploghaus, A. et al. Dissociating pain from its anticipation in the human brain. Science 284, 1979–1981 (1999)
Article CAS Google Scholar
Ploghaus, A. et al. Learning about pain: the neural substrate of the prediction error for aversive events. Proc. Natl Acad. Sci. USA 97, 9281–9286 (2000)
Article ADS CAS Google Scholar
Dickinson, A. Contemporary Animal Learning Theory (Cambridge Univ. Press, Cambridge, UK, 1980)
Google Scholar
Sutton, R. S. & Barto, A. G. Toward a modern theory of adaptive networks: expectation and prediction. Psychol. Rev. 88, 135–170 (1981)
Article CAS Google Scholar
Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction (MIT, Cambridge, Massachusetts, 1998)
Google Scholar
Montague, P. R., Dayan, P. & Sejnowski, T. J. A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J. Neurosci. 16, 1936–1947 (1996)
Article CAS Google Scholar
Schultz, W., Dayan, P. & Montague, P. R. A neural substrate of prediction and reward. Science 275, 1593–1599 (1997)
Article CAS Google Scholar
Suri, R. E. & Schultz, W. Temporal difference model reproduces anticipatory neural activity. Neural Comput. 13, 841–862 (2001)
Article CAS Google Scholar
O'Doherty, J. P., Dayan, P., Friston, K., Critchley, H. & Dolan, R. J. Temporal difference models and reward-related learning in the human brain. Neuron 38, 329–337 (2003)
Article CAS Google Scholar
Friston, K. J., Tononi, G., Reeke, G. N. Jr, Sporns, O. & Edelman, G. M. Value-dependent selection in the brain: simulation in a synthetic neural model. Neuroscience 59, 229–243 (1994)
Article CAS Google Scholar
McClure, S. M., Berns, G. S. & Montague, P. R. Temporal prediction errors in a passive learning task activate human striatum. Neuron 38, 339–346 (2003)
Article CAS Google Scholar
Daw, N. D., Kakade, S. & Dayan, P. Opponent interactions between serotonin and dopamine. Neural Netw. 15, 603–616 (2002)
Article Google Scholar
Brandon, S. E., Vogel, E. H. & Wagner, A. R. Stimulus representation in SOP: I. Theoretical rationalization and some implications. Behav. Processes 62, 5–25 (2003)
Article Google Scholar
Barto, A. G., Sutton, R. S. & Anderson, C. W. Neuronlike elements that can solve difficult learning problems. IEEE Trans. Syst. Man Cybern. 13, 834–846 (1983)
Article Google Scholar
Barto, A. G., Sutton, R. S. & Watkins, C. J. C. H. in Learning and Computational Neuroscience: Foundations of Adaptive Networks (eds Gabriel, M. & Moor, J.) 539–602 (MIT, Cambridge, Massachusetts, 1990)
Google Scholar
Barto, A. G. in Models of Information Processing in the Basal Ganglia (eds Houk, J. C., Davis, J. L. & Beiser, D. G.) 215–232 (MIT, Cambridge, Massachusetts, 1995)
Google Scholar
Chudler, E. H. & Dong, W. K. The role of the basal ganglia in nociception and pain. Pain 60, 3–38 (1995)
Article CAS Google Scholar
Solomon, R. L. & Corbit, J. D. An opponent-process theory of motivation. I. Temporal dynamics of affect. Psychol. Rev. 81, 119–145 (1974)
Article CAS Google Scholar
Dickinson, A. & Dearing, M. F. in Mechanisms of Learning and Motivation (eds Dickinson, A. & Boakes, R. A.) 203–231 (Erlbaum, Hillsdale, New Jersey, 1979)
Google Scholar
Horvitz, J. C. Mesolimbocortical and nigrostriatal dopamine responses to salient non-reward events. Neuroscience 96, 651–656 (2000)
Article CAS Google Scholar
Azmitia, E. C. & Segal, M. An autoradiographic analysis of the differential ascending projections of the dorsal and median raphe nuclei in the rat. J. Comp. Neurol. 179, 641–667 (1978)
Article CAS Google Scholar
Mirenowicz, J. & Schultz, W. Preferential activation of midbrain dopamine neurons by appetitive rather than aversive stimuli. Nature 379, 449–451 (1996)
Article ADS CAS Google Scholar
Horvitz, J. C. Dopamine gating of glutamatergic sensorimotor and incentive motivational input signals to the striatum. Behav. Brain Res. 137, 65–74 (2002)
Article CAS Google Scholar
Ploghaus, A., Becerra, L., Borras, C. & Borsook, D. Neural circuitry underlying pain modulation: expectation, hypnosis, placebo. Trends Cogn. Sci. 7, 197–200 (2003)
Article Google Scholar
Deichmann, R., Gottfried, J. A., Hutton, C. & Turner, R. Optimized EPI for fMRI studies of the orbitofrontal cortex. Neuroimage 19, 430–441 (2003)
Article CAS Google Scholar
Buchel, C., Dolan, R. J., Armony, J. L. & Friston, K. J. Amygdala–hippocampal involvement in human aversive trace conditioning revealed through event-related functional magnetic resonance imaging. J. Neurosci. 19, 10869–10876 (1999)
Article CAS Google Scholar

Download references

Acknowledgements

We thank P. Allen and E. Featherstone for technical help. This work was funded by Wellcome Trust program grants to R.S.F., K.J.F., M.K. and R.J.D. P.D. was funded by the Gatsby Charitable foundation.

Author information

Authors and Affiliations

Wellcome Department of Imaging Neuroscience, 12 Queen Square, WC1N 3BG, London, UK
Ben Seymour, John P. O'Doherty, Raymond J. Dolan, Karl J. Friston & Richard S. Frackowiak
Gatsby Computational Neuroscience Unit, Alexandra House, 17 Queen Square, WC1N 3AR, London, UK
Peter Dayan
Institute of Child Health, University College London, 30 Guilford St, WC1N 1EH, London, UK
Martin Koltzenburg
University of Manchester Rheumatic Diseases Centre, Hope Hospital, M6 8HD, Manchester, UK
Anthony K. Jones
Fondazione Santa Lucia, 00179, Rome, Italy
Richard S. Frackowiak

Authors

Ben Seymour
View author publications
You can also search for this author in PubMed Google Scholar
John P. O'Doherty
View author publications
You can also search for this author in PubMed Google Scholar
Peter Dayan
View author publications
You can also search for this author in PubMed Google Scholar
Martin Koltzenburg
View author publications
You can also search for this author in PubMed Google Scholar
Anthony K. Jones
View author publications
You can also search for this author in PubMed Google Scholar
Raymond J. Dolan
View author publications
You can also search for this author in PubMed Google Scholar
Karl J. Friston
View author publications
You can also search for this author in PubMed Google Scholar
Richard S. Frackowiak
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ben Seymour.

Ethics declarations

Competing interests

The authors declare that they have no competing financial interests.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Seymour, B., O'Doherty, J., Dayan, P. et al. Temporal difference models describe higher-order learning in humans. Nature 429, 664–667 (2004). https://doi.org/10.1038/nature02581

Download citation

Received: 02 December 2003
Accepted: 19 April 2004
Issue Date: 10 June 2004
DOI: https://doi.org/10.1038/nature02581

This article is cited by

Aberrations in temporal dynamics of cognitive processing induced by Parkinson’s disease and Levodopa
- Mohammad Mahdi Kiani
- Mohammad Hossein Heidari Beni
- Hamid Aghajan
Scientific Reports (2023)
Strengths of social ties modulate brain computations for third-party punishment
- Zixuan Tang
- Chen Qu
- Jean-Claude Dreher
Scientific Reports (2023)
Striatal hub of dynamic and stabilized prediction coding in forebrain networks for olfactory reinforcement learning
- Laurens Winkelmeier
- Carla Filosa
- Wolfgang Kelsch
Nature Communications (2022)
Personalized information and willingness to pay for non-financial risk prevention: An experiment
- Yves Arrighi
- David Crainich
- Sophie Massin
Journal of Risk and Uncertainty (2022)
Anatomical dissociation of intracerebral signals for reward and punishment prediction errors in humans
- Maëlle C. M. Gueguen
- Alizée Lopez-Persem
- Julien Bastin
Nature Communications (2021)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Temporal difference models describe higher-order learning in humans

Abstract

Access options

Similar content being viewed by others

Impulsivity and risk-seeking as Bayesian inference under dopaminergic control

Computational models of adaptive behavior and prefrontal cortex

Neural and computational underpinnings of biased confidence in human reinforcement learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Rights and permissions

About this article

Cite this article

This article is cited by

Aberrations in temporal dynamics of cognitive processing induced by Parkinson’s disease and Levodopa

Strengths of social ties modulate brain computations for third-party punishment

Striatal hub of dynamic and stabilized prediction coding in forebrain networks for olfactory reinforcement learning

Personalized information and willingness to pay for non-financial risk prevention: An experiment

Anatomical dissociation of intracerebral signals for reward and punishment prediction errors in humans

Comments

Search

Quick links

Abstract

Access options

Similar content being viewed by others

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links