Reinforcement learning with Marr
Section snippets
The computational level: the goals of a decision-making system
At the computational level, the basic goal of an agent or a decision-making system is to maximize reward and minimize punishment. Although one might argue whether this is the true goal of agents from an evolutionary perspective, different definitions of reward and punishment allow considerable flexibility. Indeed, work in recent years has elaborated on what constitutes a reward — in addition to the obvious food and shelter (and their associated conditioned reinforcers) there seem to be other
The algorithmic level: multiple solutions to the decision-making problem
Given the computational goal of maximizing reward, how does a decision-making agent learn which states of the world predict reward, and what actions enable their attainment? RL provides multiple algorithmic solutions to the problem of credit assignment (i.e., correctly assigning credit or laying blame for an outcome on preceding actions or states). Many of these algorithms proceed through the incremental update of state- and action-specific ‘values’ defined as the (discounted) sum of future
The implementational level: dopamine-dependent learning in the basal ganglia
At the final level of the hierarchy, neuroscientists have had considerable success in mapping functions implied by RL algorithms to neurobiological substrates. Whereas some of the computational and algorithmic questions highlighted above revolved around scaling RL to environments with real-world action and state complexity, the problems at the implementational level arise from the sheer complexity of the neural system, as well as the limitations of different experimental methods.
Much of this
Conclusion — inspiration across levels
Reinforcement learning is perhaps the poster child of Marr's levels of analysis — a computational problem that, expressed formally, leads to a host of algorithmic solutions that seem to be implemented in human and animal brains. However, as with many classification schemes, too much emphasis on delineation of levels can distract from the holistic nature of scientific inquiry. As we have shown, the boundaries between the levels are not clear cut, and cross-disciplinary interaction among
Conflict of interests
Nothing declared.
References and recommended reading
Papers of particular interest, published within the period of review, have been highlighted as:
• of special interest
•• of outstanding interest
Acknowledgements
We are grateful to Gecia Bravo-Hermsdorff, Mingbo Cai, Andra Geana, Nina Rouhani, Nico Schuck and Yeon Soon Shin for valuable comments on this manuscript. This work was funded by the Human Frontier Science Program Organization and by NIMH grant R01MH098861.
References (103)
- et al.
Dialogues on prediction errors
Trends Cogn Sci
(2008) Reinforcement learning in the brain
J Math Psychol
(2009)- et al.
The basal ganglia: a vertebrate solution to the selection problem?
Neuroscience
(1999) - et al.
Midbrain dopamine neurons signal preference for advance information about upcoming rewards
Neuron
(2009) - et al.
Orbitofrontal cortex uses distinct codes for different choice attributes in decisions motivated by curiosity
Neuron
(2015) Trial by trial data analysis using computational models
(2011)- et al.
Context, learning, and extinction
Psychol Rev
(2010) - et al.
Neurons in the orbitofrontal cortex encode economic value
Nature
(2006) - et al.
Striatal dynamics explain duration judgments
eLife
(2016) - et al.
Optimal behavioral hierarchy
PLoS Computat Biol
(2014)