Reinforcement learning with Marr

doi:10.1016/j.cobeha.2016.04.005

Current Opinion in Behavioral Sciences

Volume 11, October 2016, Pages 67-73

https://doi.org/10.1016/j.cobeha.2016.04.005 Get rights and content

Highlights

•
Reinforcement learning as a field spans all three of Marr's levels of analysis.
•
Despite much progress, open questions remain at every level.
•
These call for multidisciplinary research that crosses boundaries between levels.

To many, the poster child for David Marr's famous three levels of scientific inquiry is reinforcement learning — a computational theory of reward optimization, which readily prescribes algorithmic solutions that evidence striking resemblance to signals found in the brain, suggesting a straightforward neural implementation. Here we review questions that remain open at each level of analysis, concluding that the path forward to their resolution calls for inspiration across levels, rather than a focus on mutual constraints.

Section snippets

The computational level: the goals of a decision-making system

At the computational level, the basic goal of an agent or a decision-making system is to maximize reward and minimize punishment. Although one might argue whether this is the true goal of agents from an evolutionary perspective, different definitions of reward and punishment allow considerable flexibility. Indeed, work in recent years has elaborated on what constitutes a reward — in addition to the obvious food and shelter (and their associated conditioned reinforcers) there seem to be other

The algorithmic level: multiple solutions to the decision-making problem

Given the computational goal of maximizing reward, how does a decision-making agent learn which states of the world predict reward, and what actions enable their attainment? RL provides multiple algorithmic solutions to the problem of credit assignment (i.e., correctly assigning credit or laying blame for an outcome on preceding actions or states). Many of these algorithms proceed through the incremental update of state- and action-specific ‘values’ defined as the (discounted) sum of future

The implementational level: dopamine-dependent learning in the basal ganglia

At the final level of the hierarchy, neuroscientists have had considerable success in mapping functions implied by RL algorithms to neurobiological substrates. Whereas some of the computational and algorithmic questions highlighted above revolved around scaling RL to environments with real-world action and state complexity, the problems at the implementational level arise from the sheer complexity of the neural system, as well as the limitations of different experimental methods.

Much of this

Conclusion — inspiration across levels

Reinforcement learning is perhaps the poster child of Marr's levels of analysis — a computational problem that, expressed formally, leads to a host of algorithmic solutions that seem to be implemented in human and animal brains. However, as with many classification schemes, too much emphasis on delineation of levels can distract from the holistic nature of scientific inquiry. As we have shown, the boundaries between the levels are not clear cut, and cross-disciplinary interaction among

Conflict of interests

Nothing declared.

References and recommended reading

Papers of particular interest, published within the period of review, have been highlighted as:

• of special interest
•• of outstanding interest

Acknowledgements

We are grateful to Gecia Bravo-Hermsdorff, Mingbo Cai, Andra Geana, Nina Rouhani, Nico Schuck and Yeon Soon Shin for valuable comments on this manuscript. This work was funded by the Human Frontier Science Program Organization and by NIMH grant R01MH098861.

References (103)

Y. Niv et al.
Dialogues on prediction errors
Trends Cogn Sci
(2008)
Y. Niv
Reinforcement learning in the brain
J Math Psychol
(2009)
P. Redgrave et al.
The basal ganglia: a vertebrate solution to the selection problem?
Neuroscience
(1999)
E.S. Bromberg-Martin et al.
Midbrain dopamine neurons signal preference for advance information about upcoming rewards
Neuron
(2009)
T.C. Blanchard et al.
Orbitofrontal cortex uses distinct codes for different choice attributes in decisions motivated by curiosity
Neuron
(2015)
N. Daw
Trial by trial data analysis using computational models
(2011)
S.J. Gershman et al.
Context, learning, and extinction
Psychol Rev
(2010)
C. Padoa-Schioppa et al.
Neurons in the orbitofrontal cortex encode economic value
Nature
(2006)
T.S. Gouvêa et al.
Striatal dynamics explain duration judgments
eLife
(2016)
A. Solway et al.
Optimal behavioral hierarchy
PLoS Computat Biol
(2014)

L.P. Kaelbling et al.

Planning and acting in partially observable stochastic domains

Artif Intell

(1998)

N.D. Daw et al.

Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control

Nature Neurosci

(2005)

V. Voon et al.

Disorders of compulsivity: a common bias towards learning habits

Mol Psychiatry

(2015)

W. Potjans et al.

A spiking neural network model of an actor-critic learning agent

Neural Computat

(2009)

R.P. Rao et al.

Spike-timing-dependent Hebbian plasticity as temporal difference learning

Neural Computat

(2001)

K.C. Berridge

The debate over dopamine's role in reward: the case for incentive salience

Psychopharmacology

(2007)

J.Y. Cohen et al.

Neuron-type-specific signals for reward and punishment in the ventral tegmental area

Nature

(2012)

S.J. Gershman et al.

The successor representation and temporal context

Neural Computat

(2012)

D. Marr et al.

From understanding computation to understanding neural circuitry

Neurosci Res Program Bull

(1977)

R.S. Sutton et al.

Reinforcement learning: an introduction

(1998)

R.S. Sutton

Learning to predict by the methods of temporal differences

Mach Learn

(1988)

J.C. Houk et al.

A model of how the basal ganglia generate and use neural signals that predict reinforcement

R.C. O’Reilly et al.

Making working memory work: a computational model of learning in the prefrontal cortex and basal ganglia

Neural Computat

(2006)

P.R. Montague et al.

A framework for mesencephalic dopamine systems based on predictive Hebbian learning

J Neurosci

(1996)

W. Schultz et al.

A neural substrate of prediction and reward

Science

(1997)

A.G. Barto

Adaptive critics and the basal ganglia

M.J. Kang et al.

The wick in the candle of learning epistemic curiosity activates reward circuitry and enhances memory

Psychol Sci

(2009)

G. Loewenstein

The psychology of curiosity: a review and reinterpretation

Psychol Bull

(1994)

J. Schmidhuber

A possibility for implementing curiosity and boredom in model-building neural controllers

From animals to animats: proceedings of the first international conference on simulation of adaptive behavior

(1991)

P.-Y. Oudeyer et al.

What is intrinsic motivation? A typology of computational approaches

Front Neurorobot

(2007)

A.G. Barto

Intrinsic motivation and reinforcement learning

Ö. Şimşek et al.

An intrinsic reward mechanism for efficient exploration

S. Singh et al.

Where do rewards come from

M. McDevitt et al.

When good news leads to bad choices

J Exp Anal Behav

(2016)

J.M. Pisklak et al.

When good pigeons make bad decisions: choice with probabilistic delays and outcomes

J Exp Anal Behav

(2015)

E.S. Bromberg-Martin et al.

Lateral habenula neurons signal errors in the prediction of reward information

Nature Neurosci

(2011)

S. Singh et al.

Intrinsically motivated reinforcement learning: An evolutionary perspective

IEEE Trans Autonom Mental Dev

(2010)

X. Guo et al.

Reward mapping for transfer in long-lived agents

Adv Neural Inform Process Syst

(2013)

J. Sorg et al.

Internal rewards mitigate agent boundedness

C. Hull

Principles of behavior: an introduction to behavior theory

(1943)

M. Keramati et al.

Homeostatic reinforcement learning for integrating reward collection and physiological stability

eLife

(2014)

M. Keramati et al.

A reinforcement learning theory for homeostatic regulation

Advances in neural information processing systems

(2011)

I. Nahum-Shani et al.

Just in time adaptive interventions (JITAIs): an organizing framework for ongoing health behavior support

Methodology Center Technical Report No. 14-126

(2014)

I. Nahum-Shani et al.

Building health behavior models to guide the development of just-in-time adaptive interventions: a pragmatic framework

Health Psychol

(2015)

G. Konidaris et al.

Transfer in reinforcement learning via shared features

J Mach Learn Res

(2012)

S.J. Gershman et al.

Learning latent structure: carving nature at its joints

Curr Opin Neurobiol

(2010)

S.J. Gershman et al.

Discovering latent causes in reinforcement learning

Curr Opin Behav Sci

(2015)

S.J. Gershman et al.

Gradual extinction prevents the return of fear: implications for the discovery of state

Front Behav Neurosci

(2013)

S.J. Gershman et al.

Perceptual estimation obeys Occam's razor

Front Psychol

(2013)

A.G. Collins et al.

Working memory contributions to reinforcement learning impairments in schizophrenia

J Neurosci

(2014)

Cited by (0)

View full text

Reinforcement learning with Marr

Highlights

Section snippets

The computational level: the goals of a decision-making system

The algorithmic level: multiple solutions to the decision-making problem

The implementational level: dopamine-dependent learning in the basal ganglia

Conclusion — inspiration across levels

Conflict of interests

References and recommended reading

Acknowledgements

Trends Cogn Sci

J Math Psychol

Neuroscience

Neuron

Neuron

Psychol Rev

Nature

eLife

PLoS Computat Biol

Artif Intell

Nature Neurosci

Mol Psychiatry

Neural Computat

Neural Computat

Psychopharmacology

Nature

Neural Computat

From understanding computation to understanding neural circuitry

Neurosci Res Program Bull

Reinforcement learning: an introduction

Learning to predict by the methods of temporal differences

Mach Learn

A model of how the basal ganglia generate and use neural signals that predict reinforcement

Making working memory work: a computational model of learning in the prefrontal cortex and basal ganglia

Neural Computat

A framework for mesencephalic dopamine systems based on predictive Hebbian learning

J Neurosci

A neural substrate of prediction and reward

Science

Adaptive critics and the basal ganglia

The wick in the candle of learning epistemic curiosity activates reward circuitry and enhances memory

Psychol Sci

The psychology of curiosity: a review and reinterpretation

Psychol Bull

A possibility for implementing curiosity and boredom in model-building neural controllers

From animals to animats: proceedings of the first international conference on simulation of adaptive behavior

What is intrinsic motivation? A typology of computational approaches

Front Neurorobot

Intrinsic motivation and reinforcement learning

An intrinsic reward mechanism for efficient exploration

Where do rewards come from

When good news leads to bad choices

J Exp Anal Behav

When good pigeons make bad decisions: choice with probabilistic delays and outcomes

J Exp Anal Behav

Lateral habenula neurons signal errors in the prediction of reward information

Nature Neurosci

Intrinsically motivated reinforcement learning: An evolutionary perspective

IEEE Trans Autonom Mental Dev

Reward mapping for transfer in long-lived agents

Adv Neural Inform Process Syst

Internal rewards mitigate agent boundedness

Principles of behavior: an introduction to behavior theory

Homeostatic reinforcement learning for integrating reward collection and physiological stability

eLife

A reinforcement learning theory for homeostatic regulation

Advances in neural information processing systems

Just in time adaptive interventions (JITAIs): an organizing framework for ongoing health behavior support

Methodology Center Technical Report No. 14-126

Building health behavior models to guide the development of just-in-time adaptive interventions: a pragmatic framework

Health Psychol

Transfer in reinforcement learning via shared features

J Mach Learn Res

Learning latent structure: carving nature at its joints

Curr Opin Neurobiol

Discovering latent causes in reinforcement learning

Curr Opin Behav Sci

Gradual extinction prevents the return of fear: implications for the discovery of state

Front Behav Neurosci

Perceptual estimation obeys Occam's razor