Neural computations associated with goal-directed choice
Introduction
Consider a canonical decision-making problem. Every day a hungry animal is placed at the bottom of a Y-maze and is allowed to run towards the upper left or right to collect a reward. The left arm leads to a highly liked food, but is also associated with a high cost since the animal is required to swim to reach it. The right end leads to a less desirable outcome, but does not require swimming. The foods randomly change each day. How does the animal decide which course to take?
A growing body of work has shown that this problem can be solved using two very different approaches [1, 2, 3, 4]. In one approach animals learn the value of each action through trial-and-error using reinforcement learning, and then take the action with the highest learned value [2, 4, 5, 6, 7, 8, 9]. This strategy requires little knowledge on the part of the subject and can account for multiple aspects of behavior in many domains, but is only able to pick the optimal action on average. In another approach, animals estimate the value associated with each action in every trial using knowledge about their costs and benefits. With sufficient knowledge this approach, often called ‘goal-directed’ or ‘model-based’ decision-making [7, 8, 10], can do much better since it is able to pick the optimal action in every trial [10].
Over the past decade significant advances have been made in understanding how the brain makes goal-directed choices. We review important findings from the past few years, as well as some of the most pressing open questions. Owing to space limitations we do not attempt to be comprehensive.
Section snippets
Computational framework
Models from psychology and economics suggest that goal-directed choice requires the following computations. First, the brain computes stimulus values that measure the value of the outcomes generated by each action. Second, it computes action costs that measure the costs associated with each course of action. Third, it integrates them into action values given by
Finally, the action values are compared in order to make a choice.
We now describe what is known
How are stimulus values encoded?
Several human fMRI studies have placed individuals in simple choice situations and have found that BOLD activity in the medial orbitofrontal cortex (mOFC) correlates with behavioral measures of stimulus values [11, 12, 13•]. These findings are consistent with monkey neurophysiology studies that have found stimulus value coding in OFC neurons during choice tasks [14, 15••, 16•, 17] (Figure 1). Note, however, that we must be cautious when comparing OFC findings across species owing to potential
How are stimulus values computed?
A popular theory states that stimulus values are learned through reinforcement learning and retrieved in OFC at the time of choice [2, 9, 29]. Although some evidence suggests that this process is at work in settings where animals repeatedly face a small number of stimuli [30], it cannot account for all observed behavior because humans are able to evaluate novel stimuli. We propose an alternative theory of stimulus value computation that takes advantage of the fact that most stimuli are complex
Stimulus valuation in complex decision situations
Two recent human fMRI studies provide clues about how the brain has adapted to solve more sophisticated choice problems, such as dietary decisions with long-term consequences, or complex social decisions. In order to make good choices in these domains, the brain needs to compute the value of attributes such as the impact of the choice on future health, or on others’ well-being. Hare et al. [33••] studied dietary choices that involve self-control. Subjects made choices between stimuli that
How are action costs encoded and computed?
Almost every choice we make has costs associated with it. These costs come in two types. First are the costs of the actions required to obtain the stimuli, such as effort. Second are aversive stimuli that are bundled with the desired outcome. For example, purchasing a book requires giving up money. The key distinction between them is whether the cost is tied to the action or to the outcome. The distinction is meaningful because, for example, one can decrease the effort costs associated with
How are action values encoded and computed?
The computational model described above predicts that there should be neurons encoding each of the action values, regardless of whether the action is taken or not. Samejima et al. [42] recorded from striatal neurons during a probabilistic binary choice task and found neurons encoding the action values. In a closely related study Lau and Glimcher [43] found that about 60% of phasically active neurons in the caudate encoded either the value of particular actions (early in the trial) or the value
How are action values compared to make a choice?
The final stage in making a decision involves the comparison of the action values in order to make a choice. A significant amount of behavioral evidence suggests that the mapping from action values to choices is stochastic and follows a soft-max (or logistic) functional form, and that there is a speed-accuracy tradeoff. Two of the most important open questions in the field have to do with how the stochastic choice process is implemented: What exactly is the algorithm used by the brain to
Conclusions
Throughout the review we have emphasized a multitude of important and pressing open questions. However, it is important not to lose sight of the progress that has been made. We now know that OFC neurons encode stimulus values in a wide variety of contexts and that values are sensitive to internal physiological and cognitive states. We know that stimulus value signals respond to variables such as delay and risk in ways that are consistent with theories from behavioral economics. We know that
References and recommended reading
Papers of particular interest, published within the annual period of review, have been highlighted as:
• of special interest
•• of outstanding interest
Acknowledgements
Support of the NSF (AR3.SELFCNTRL-1-NSF.ARR1) and the Betty and Gordon Moore Foundation is gratefully acknowledged.
References (73)
Neural bases of food-seeking: affect, arousal and reward in corticostriatolimbic circuits
Physiol Behav
(2005)- et al.
Goal-directed instrumental action: contingency and incentive learning and their cortical substrates
Neuropharmacology
(1998) - et al.
The misbehavior of value and the discipline of the will
Neural Netw
(2006) - et al.
Architectonic subdivision of the human orbital and medial prefrontal cortex
J Comp Neurol
(2003) - et al.
Psychology and neurobiology of simple decisions
Trends Neurosci
(2004) - et al.
A comparison of sequential sampling modles for two-choice reaction time
Psychol Rev
(2004) - et al.
The physics of optimal decision making: a formal analysis of models of performance in two-alternative forced choice tasks
Psychol Rev
(2006) - et al.
The basal ganglia and cortex implement optimal decision making between alternative actions
Neural Comput
(2007) - et al.
Perceptual decisions between multiple directions of visual motion
J Neurosci
(2008) - et al.
The basal ganglia: a vertebrate solution to the selection problem?
Neuroscience
(1999)