Research reportReinforcement learning and decision making in monkeys during a competitive game
Introduction
Decision making refers to an evaluative process of selecting a particular action from a number of alternative choices in a given situation. As such, it occupies a central step in transforming incoming sensory inputs to specific motor intentions. Traditionally, the process of decision making has been studied from at least two different perspectives. On the one hand, economists have developed mathematical framework to characterize optimal decision-making rules [8], [71]. On the other hand, psychologists and behavioral ecologists have investigated whether people and animals conform to the predictions based on optimality principles introduced in such theoretical analyses [9], [21], [24], [25], [27]. More recently, an increasing number of imaging and single-neuron recording studies have uncovered hitherto unknown aspects of neural processes that are related to decision making [2], [19], [20], [56], [58], [61], [66]. However, brain mechanisms responsible for seeking optimal decision-making strategies in a dynamic environment still remain poorly understood.
An important step towards understanding the neural mechanisms of decision making is to examine how such processes are modified through experience. Relatively simple learning algorithms would be sufficient if there are only a small number of alternative actions and if the environment is stationary. In real life, however, environment is almost always dynamic. In addition, for animals interacting with other animals in their environment, the problem is further complicated by the fact that the outcome of one's decision can be influenced by decisions of others. The problem of finding an optimal decision-making strategy in a multi-agent environment can be analyzed mathematically using game theory [71]. A game is defined by a list of choices available to each player and a payoff function that assigns a reward (i.e., utility) to each player as a function of choices of all players. A solution to a game is often provided by one or more Nash equilibria. Nash equilibrium refers to a particular set of strategies for all players in which no players can increase their payoffs by changing their strategies individually [40]. In the present study, we examined the choice behavior of monkeys in a simple zero-sum game, known as matching pennies, to gain insights into the decision-making process in primates. For this game, the Nash equilibrium is for each player to make both choices with equal probabilities. In addition, if this game is played repeatedly between two intelligent players, it is necessary for each player to make her successive choices independently from the choices of both players in previous trials. In game theory, this is referred to as a mixed strategy, which is defined as a probability density function over a set of alternative choices. A goal of the present study was to determine how closely the decision-making strategy of monkeys follows the prediction of Nash equilibrium in a simple game analogous to matching pennies.
Section snippets
Animal preparation and apparatus
Three male rhesus monkeys (Macaca mulatta, body weight=7–12 kg) were used in this study. The animals were seated in a primate chair and faced a computer monitor located approximately 57 cm from their eyes. All visual stimuli were presented on the computer monitor. The animal's eye position was sampled at 250 Hz with either a scleral eye coil (DNI, DE) or a high-speed eye tracker (ET49, Thomas Recording, Germany).
Behavioral task
At the beginning of each trial, the animal was required to fixate a yellow square
Database
A total of 11,409, 155,758, and 112,669 trials were analyzed for algorithms 0, 1, and 2, respectively. The number of days each animal was tested for different algorithms is shown in Table 1.
Choice and reward probability
When playing against algorithm 0, all animals had a significant bias to choose one of the two targets more frequently than the other (Fig. 2). The percentage of trials in which the animal selected the right-hand target was 70.0%, 90.2%, and 33.2% for the monkeys C, E, and F, respectively. In all cases, the
Decision-making strategies of monkeys in a competitive game
The present study examined the statistical patterns of choice behavior in rhesus monkeys playing a matching pennies game against a computer opponent. The computer used three different algorithms, which exploited an increasing amount of information regarding the animal's past choice sequence and reward history. In algorithm 0, the computer adopted the mixed-strategy equilibrium strategy, and selected the two targets randomly with equal probabilities regardless of the animal's behavior. In this
Acknowledgements
We thank Lindsay Carr, Rita Farrell, and Ted Twietmeyer for their technical assistance, John Swan-Stone for computer programming, and Bruno Averbeck for help with data analysis. This study was supported by James S. McDonnell Foundation and the National Institute of Health (R01-NS044270 and P30-EY001319).
References (74)
- et al.
Functional imaging of neural responses to expectancy and experience of monetary gains and losses
Neuron
(2001) - et al.
Differential neural response to positive and negative feedback in planning and guessing tasks
Neuropsychologia
(1997) - et al.
Neural computations that underlie decisions about sensory stimuli
Trends Neurosci.
(2001) - et al.
Reward-dependent gain and bias of visual responses in primate superior colliculus
Neuron
(2003) - et al.
Effect of expected reward magnitude on the response of neurons in the dorsolateral prefrontal cortex of the macaque
Neuron
(1999) Markov games as a framework for multi-agent reinforcement learning
- et al.
Saccade reward signals in posterior cingulate cortex
Neuron
(2003) - et al.
Learning behavior in an experimental matching pennies game
Games Econ. Behav.
(1994) Games with unique, mixed strategy equilibria: an experimental study
Games Econ. Behav.
(1995)- et al.
Mixed strategies in strictly competitive games: a further test of the minimax hypothesis
Games Econ. Behav.
(1992)
Predicting how people play games: a simple dynamic model of choice
Games Econ. Behav.
Neural correlates of decision processes: neural and mental chronometry
Curr. Opin. Neurobiol.
Getting formal with dopamine and reward
Neuron
Neural coding of basic reward terms of animal learning theory, game theory, microeconomics, and behavioral ecology
Curr. Opin. Neurobiol.
Mixed strategy play and the minimax hypothesis
J. Econ. Theory
Random generation and the executive control of working memory
Q. J. Exp. Psychol.
Prefrontal cortex and decision making in a mixed-strategy game
Nat. Neurosci.
Does minimax work? An experimental study
Econ. J.
Testing the minimax hypothesis: a re-examination of O'Neill's game experiment
Econometrica
Subjective randomization in one- and two-person games
J. Behav. Decis. Mak.
Model Selection and Multimodel Inference. A Practical Information-Theoretic Approach
Stochastic Models for Learning
Behavioral Game Theory: Experiments in Strategic Interaction
A neural mechanism that randomises behavior
J. Conscious. Stud.
Boundedly rational Nash equilibrium: a probabilistic choice approach
Games Econ. Behav.
Testing mixed-strategy equilibria when players are heterogeneous: the case of penalty kicks in soccer
Am. Econ. Rev.
Log-Linear Models and Logistic Regression
Elements of Information Theory
Differential response patterns in the striatum and orbitofrontal cortex to financial reward in humans: a parametric functional magnetic resonance imaging study
J. Neurosci.
Predicting how people play games: reinforcement learning in experimental games with unique, mixed strategy equilibria
Am. Econ. Rev.
The evolution of brain lateralization: a game-theoretic analysis of population structure
Proc. R. Soc. Lond., B
The neurobiology of visual-saccadic decision making
Annu. Rev. Neurosci.
The Matching Law: Papers in Psychology and Economics
Performance monitoring by the anterior cingulate cortex during saccade countermanding
Science
Handbook of Experimental Economics
Economic Choice Theory: An Experimental Analysis of Animal Behavior
Responsiveness in two-person zero-sum games
Behav. Sci.
Cited by (116)
Autonomous behaviour and the limits of human volition
2024, CognitionCompetitive and cooperative games for probing the neural basis of social decision-making in animals
2023, Neuroscience and Biobehavioral ReviewsLesions of lateral habenula attenuate win-stay but not lose-shift responses in a competitive choice task
2019, Neuroscience LettersLesions of ventrolateral striatum eliminate lose-shift but not win-stay behaviour in rats
2018, Neurobiology of Learning and Memory