Reinforcement learning and decision making in monkeys during a competitive game

doi:10.1016/j.cogbrainres.2004.07.007

Cognitive Brain Research

Volume 22, Issue 1, December 2004, Pages 45-58

https://doi.org/10.1016/j.cogbrainres.2004.07.007 Get rights and content

Abstract

Animals living in a dynamic environment must adjust their decision-making strategies through experience. To gain insights into the neural basis of such adaptive decision-making processes, we trained monkeys to play a competitive game against a computer in an oculomotor free-choice task. The animal selected one of two visual targets in each trial and was rewarded only when it selected the same target as the computer opponent. To determine how the animal's decision-making strategy can be affected by the opponent's strategy, the computer opponent was programmed with three different algorithms that exploited different aspects of the animal's choice and reward history. When the computer selected its targets randomly with equal probabilities, animals selected one of the targets more often, violating the prediction of probability matching, and their choices were systematically influenced by the choice history of the two players. When the computer exploited only the animal's choice history but not its reward history, animal's choice became more independent of its own choice history but was still related to the choice history of the opponent. This bias was substantially reduced, but not completely eliminated, when the computer used the choice history of both players in making its predictions. These biases were consistent with the predictions of reinforcement learning, suggesting that the animals sought optimal decision-making strategies using reinforcement learning algorithms.

Introduction

Decision making refers to an evaluative process of selecting a particular action from a number of alternative choices in a given situation. As such, it occupies a central step in transforming incoming sensory inputs to specific motor intentions. Traditionally, the process of decision making has been studied from at least two different perspectives. On the one hand, economists have developed mathematical framework to characterize optimal decision-making rules [8], [71]. On the other hand, psychologists and behavioral ecologists have investigated whether people and animals conform to the predictions based on optimality principles introduced in such theoretical analyses [9], [21], [24], [25], [27]. More recently, an increasing number of imaging and single-neuron recording studies have uncovered hitherto unknown aspects of neural processes that are related to decision making [2], [19], [20], [56], [58], [61], [66]. However, brain mechanisms responsible for seeking optimal decision-making strategies in a dynamic environment still remain poorly understood.

An important step towards understanding the neural mechanisms of decision making is to examine how such processes are modified through experience. Relatively simple learning algorithms would be sufficient if there are only a small number of alternative actions and if the environment is stationary. In real life, however, environment is almost always dynamic. In addition, for animals interacting with other animals in their environment, the problem is further complicated by the fact that the outcome of one's decision can be influenced by decisions of others. The problem of finding an optimal decision-making strategy in a multi-agent environment can be analyzed mathematically using game theory [71]. A game is defined by a list of choices available to each player and a payoff function that assigns a reward (i.e., utility) to each player as a function of choices of all players. A solution to a game is often provided by one or more Nash equilibria. Nash equilibrium refers to a particular set of strategies for all players in which no players can increase their payoffs by changing their strategies individually [40]. In the present study, we examined the choice behavior of monkeys in a simple zero-sum game, known as matching pennies, to gain insights into the decision-making process in primates. For this game, the Nash equilibrium is for each player to make both choices with equal probabilities. In addition, if this game is played repeatedly between two intelligent players, it is necessary for each player to make her successive choices independently from the choices of both players in previous trials. In game theory, this is referred to as a mixed strategy, which is defined as a probability density function over a set of alternative choices. A goal of the present study was to determine how closely the decision-making strategy of monkeys follows the prediction of Nash equilibrium in a simple game analogous to matching pennies.

Section snippets

Animal preparation and apparatus

Three male rhesus monkeys (Macaca mulatta, body weight=7–12 kg) were used in this study. The animals were seated in a primate chair and faced a computer monitor located approximately 57 cm from their eyes. All visual stimuli were presented on the computer monitor. The animal's eye position was sampled at 250 Hz with either a scleral eye coil (DNI, DE) or a high-speed eye tracker (ET49, Thomas Recording, Germany).

Behavioral task

At the beginning of each trial, the animal was required to fixate a yellow square

Database

A total of 11,409, 155,758, and 112,669 trials were analyzed for algorithms 0, 1, and 2, respectively. The number of days each animal was tested for different algorithms is shown in Table 1.

Choice and reward probability

When playing against algorithm 0, all animals had a significant bias to choose one of the two targets more frequently than the other (Fig. 2). The percentage of trials in which the animal selected the right-hand target was 70.0%, 90.2%, and 33.2% for the monkeys C, E, and F, respectively. In all cases, the

Decision-making strategies of monkeys in a competitive game

The present study examined the statistical patterns of choice behavior in rhesus monkeys playing a matching pennies game against a computer opponent. The computer used three different algorithms, which exploited an increasing amount of information regarding the animal's past choice sequence and reward history. In algorithm 0, the computer adopted the mixed-strategy equilibrium strategy, and selected the two targets randomly with equal probabilities regardless of the animal's behavior. In this

Acknowledgements

We thank Lindsay Carr, Rita Farrell, and Ted Twietmeyer for their technical assistance, John Swan-Stone for computer programming, and Bruno Averbeck for help with data analysis. This study was supported by James S. McDonnell Foundation and the National Institute of Health (R01-NS044270 and P30-EY001319).

References (74)

H.C. Breiter et al.
Functional imaging of neural responses to expectancy and experience of monetary gains and losses
Neuron
(2001)
R. Elliott et al.
Differential neural response to positive and negative feedback in planning and guessing tasks
Neuropsychologia
(1997)
J.I. Gold et al.
Neural computations that underlie decisions about sensory stimuli
Trends Neurosci.
(2001)
T. Ikeda et al.
Reward-dependent gain and bias of visual responses in primate superior colliculus
Neuron
(2003)
M.I. Leon et al.
Effect of expected reward magnitude on the response of neurons in the dorsolateral prefrontal cortex of the macaque
Neuron
(1999)
M.L. Littman
Markov games as a framework for multi-agent reinforcement learning
A.N. McCoy et al.
Saccade reward signals in posterior cingulate cortex
Neuron
(2003)
D. Mookherjee et al.
Learning behavior in an experimental matching pennies game
Games Econ. Behav.
(1994)
J. Ochs
Games with unique, mixed strategy equilibria: an experimental study
Games Econ. Behav.
(1995)
A. Rapoport et al.
Mixed strategies in strictly competitive games: a further test of the minimax hypothesis
Games Econ. Behav.
(1992)

R. Sarin et al.

Predicting how people play games: a simple dynamic model of choice

Games Econ. Behav.

(2001)

J.D. Schall

Neural correlates of decision processes: neural and mental chronometry

Curr. Opin. Neurobiol.

(2003)

W. Schultz

Getting formal with dopamine and reward

Neuron

(2002)

W. Schultz

Neural coding of basic reward terms of animal learning theory, game theory, microeconomics, and behavioral ecology

Curr. Opin. Neurobiol.

(2004)

J.M. Shachat

Mixed strategy play and the minimax hypothesis

J. Econ. Theory

(2002)

A. Baddeley et al.

Random generation and the executive control of working memory

Q. J. Exp. Psychol.

(1998)

D.J. Barraclough et al.

Prefrontal cortex and decision making in a mixed-strategy game

Nat. Neurosci.

(2004)

K. Binmore et al.

Does minimax work? An experimental study

Econ. J.

(2001)

J.N. Brown et al.

Testing the minimax hypothesis: a re-examination of O'Neill's game experiment

Econometrica

(1990)

D.V. Budescu et al.

Subjective randomization in one- and two-person games

J. Behav. Decis. Mak.

(1994)

K.P. Burnham et al.

Model Selection and Multimodel Inference. A Practical Information-Theoretic Approach

(2002)

R.R. Bush et al.

Stochastic Models for Learning

(1955)

C.F. Camerer

Behavioral Game Theory: Experiments in Strategic Interaction

(2003)

R.H.S. Carpenter

A neural mechanism that randomises behavior

J. Conscious. Stud.

(1999)

H.-C. Chen et al.

Boundedly rational Nash equilibrium: a probabilistic choice approach

Games Econ. Behav.

(1996)

P.-A. Chiappori et al.

Testing mixed-strategy equilibria when players are heterogeneous: the case of penalty kicks in soccer

Am. Econ. Rev.

(2002)

R. Christensen

Log-Linear Models and Logistic Regression

(1997)

T.M. Cover et al.

Elements of Information Theory

(1991)

R. Elliott et al.

Differential response patterns in the striatum and orbitofrontal cortex to financial reward in humans: a parametric functional magnetic resonance imaging study

J. Neurosci.

(2003)

I. Erev et al.

Predicting how people play games: reinforcement learning in experimental games with unique, mixed strategy equilibria

Am. Econ. Rev.

(1998)

S. Ghirlanda et al.

The evolution of brain lateralization: a game-theoretic analysis of population structure

Proc. R. Soc. Lond., B

(2004)

P.W. Glimcher

The neurobiology of visual-saccadic decision making

Annu. Rev. Neurosci.

(2003)

R.J. Herrnstein

The Matching Law: Papers in Psychology and Economics

(1997)

S. Ito et al.

Performance monitoring by the anterior cingulate cortex during saccade countermanding

Science

(2003)

J.H. Kagel et al.

Handbook of Experimental Economics

(1995)

J.H. Kagel et al.

Economic Choice Theory: An Experimental Analysis of Animal Behavior

(1995)

J. Kahan et al.

Responsiveness in two-person zero-sum games

Behav. Sci.

(1973)

Cited by (116)

Autonomous behaviour and the limits of human volition
2024, Cognition
Humans and some other animals can autonomously generate action choices that contribute to solving complex problems. However, experimental investigations of the cognitive bases of human autonomy are challenging, because experimental paradigms typically constrain behaviour using controlled contexts, and elicit behaviour by external triggers. In contrast, autonomy and freedom imply unconstrained behaviour initiated by endogenous triggers. Here we propose a new theoretical construct of adaptive autonomy, meaning the capacity to make behavioural choices that are free from constraints of both immediate external triggers and of routine response patterns, but nevertheless show appropriate coordination with the environment. Participants (N = 152) played a competitive game in which they had to choose the right time to act, in the face of an opponent who punished (in separate blocks) either choice biases (such as always responding early), sequential patterns of action timing across trials (such as early, late, early, late…), or predictable action-outcome dependence (such as win-stay, lose-shift). Adaptive autonomy was quantified as the ability to maintain performance when each of these influences on action selection was punished. We found that participants could become free from habitual choices regarding when to act and could also become free from sequential action patterns. However, they were not able to free themselves from influences of action-outcome dependence, even when these resulted in poor performance. These results point to a new concept of autonomous behaviour as flexible adaptation of voluntary action choices in a way that avoids stereotypy. In a sequential analysis, we also demonstrated that participants increased their reliance on belief learning in which they attempt to understand the competitor's beliefs and intentions, when transition bias and reinforcement bias were punished. Taken together, our study points to a cognitive mechanism of adaptive autonomy in which competitive interactions with other agents could promote both social cognition and volition in the form of non-stereotyped action choices.
Competitive and cooperative games for probing the neural basis of social decision-making in animals
2023, Neuroscience and Biobehavioral Reviews
In a social environment, it is essential for animals to consider the behavior of others when making decisions. To quantitatively assess such social decisions, games offer unique advantages. Games may have competitive and cooperative components, modeling situations with antagonistic and shared objectives between players. Games can be analyzed by mathematical frameworks, including game theory and reinforcement learning, such that an animal’s choice behavior can be compared against the optimal strategy. However, so far games have been underappreciated in neuroscience research, particularly for rodent studies. In this review, we survey the varieties of competitive and cooperative games that have been tested, contrasting strategies employed by non-human primates and birds with rodents. We provide examples of how games can be used to uncover neural mechanisms and explore species-specific behavioral differences. We assess critically the limitations of current paradigms and propose improvements. Together, the synthesis of current literature highlights the advantages of using games to probe the neural basis of social decisions for neuroscience studies.
The asymmetric learning rates of murine exploratory behavior in sparse reward environments
2021, Neural Networks
Goal-oriented behaviors of animals can be modeled by reinforcement learning algorithms. Such algorithms predict future outcomes of selected actions utilizing action values and updating those values in response to the positive and negative outcomes. In many models of animal behavior, the action values are updated symmetrically based on a common learning rate, that is, in the same way for both positive and negative outcomes. However, animals in environments with scarce rewards may have uneven learning rates. To investigate the asymmetry in learning rates in reward and non-reward, we analyzed the exploration behavior of mice in five-armed bandit tasks using a Q-learning model with differential learning rates for positive and negative outcomes. The positive learning rate was significantly higher in a scarce reward environment than in a rich reward environment, and conversely, the negative learning rate was significantly lower in the scarce environment. The positive to negative learning rate ratio was about 10 in the scarce environment and about 2 in the rich environment. This result suggests that when the reward probability was low, the mice tend to ignore failures and exploit the rare rewards. Computational modeling analysis revealed that the increased learning rates ratio could cause an overestimation of and perseveration on rare-rewarding events, increasing total reward acquisition in the scarce environment but disadvantaging impartial exploration.
Dissecting functional contributions of the social brain to strategic behavior
2021, Neuron
Social interactions routinely lead to neural activity in a “social brain network” comprising, among other regions, the temporoparietal junction (TPJ) and the dorsomedial prefrontal cortex (dmPFC). But what is the function of these areas? Are they specialized for behavior in social contexts or do they implement computations required for dealing with any reactive process, even non-living entities? Here, we use fMRI and a game paradigm separating the need for these two aspects of cognition. We find that most social-brain areas respond to both social and non-social reactivity rather than just to human opponents. However, the TPJ shows a dissociation from the dmPFC: its activity and connectivity primarily reflect context-dependent outcome processing and reactivity detection, while dmPFC engagement is linked to implementation of a behavioral strategy. Our results characterize an overarching computational property of the social brain but also suggest specialized roles for subregions of this network.
Lesions of lateral habenula attenuate win-stay but not lose-shift responses in a competitive choice task
2019, Neuroscience Letters
Multiple neural systems contribute to choice adaptation following reinforcement. Recent evidence suggests that the lateral habenula (LHb) plays a key role in such adaptations, particularly when reinforcements are worse than expected. Here, we investigated the effects of bilateral LHb lesions on responding in a binary choice task with no discriminatory cues. LHb lesions in rats decreased win-stay responses but surprisingly left lose-shift responses intact. This same dissociated effect was also observed after systemic administration of d-amphetamine in a separate cohort of animals. These results suggest that at least some behavioural responses triggered by reward omission do not depend on an intact LHb.
Lesions of ventrolateral striatum eliminate lose-shift but not win-stay behaviour in rats
2018, Neurobiology of Learning and Memory
Animals tend to repeat actions that are associated with reward delivery, whereas they tend to shift responses to alternate choices following reward omission. These so-called win-stay and lose-shift responses are employed by a wide range of animals in a variety of decision-making scenarios, and depend on dissociated regions of the striatum. Specifically, lose-shift responding is impaired by extensive excitotoxic lesions of the lateral striatum. Here we used focal lesions to assess whether dorsal and ventral regions of the lateral striatum contribute differently to this effect. We found that damage to ventrolateral striatum reduced lose-shift responding without impairing win-stay, motoric, or motivational aspects of behaviour in the task, whereas lesions confined to the dorsolateral striatum significantly impaired the ability of rats to complete trials of the task. Moreover, lesions to the dorsomedial striatum had no effect on either lose-shift or win-stay responding. Together, these data suggest a novel role of the ventral portion of the lateral striatum in driving lose-shift decisions.

View all citing articles on Scopus

View full text

Research reportReinforcement learning and decision making in monkeys during a competitive game

Abstract

Introduction

Section snippets

Animal preparation and apparatus

Behavioral task

Database

Choice and reward probability

Decision-making strategies of monkeys in a competitive game

Acknowledgements

Neuron

Neuropsychologia

Trends Neurosci.

Neuron

Neuron

Neuron

Games Econ. Behav.

Games Econ. Behav.

Games Econ. Behav.

Games Econ. Behav.

Curr. Opin. Neurobiol.

Neuron

Curr. Opin. Neurobiol.

J. Econ. Theory

Random generation and the executive control of working memory

Q. J. Exp. Psychol.

Prefrontal cortex and decision making in a mixed-strategy game

Nat. Neurosci.

Does minimax work? An experimental study

Econ. J.

Testing the minimax hypothesis: a re-examination of O'Neill's game experiment

Econometrica

Subjective randomization in one- and two-person games

J. Behav. Decis. Mak.

Model Selection and Multimodel Inference. A Practical Information-Theoretic Approach

Stochastic Models for Learning

Behavioral Game Theory: Experiments in Strategic Interaction

A neural mechanism that randomises behavior

J. Conscious. Stud.

Boundedly rational Nash equilibrium: a probabilistic choice approach

Games Econ. Behav.

Testing mixed-strategy equilibria when players are heterogeneous: the case of penalty kicks in soccer

Am. Econ. Rev.

Log-Linear Models and Logistic Regression

Elements of Information Theory

Differential response patterns in the striatum and orbitofrontal cortex to financial reward in humans: a parametric functional magnetic resonance imaging study

J. Neurosci.

Predicting how people play games: reinforcement learning in experimental games with unique, mixed strategy equilibria

Am. Econ. Rev.

The evolution of brain lateralization: a game-theoretic analysis of population structure

Proc. R. Soc. Lond., B

The neurobiology of visual-saccadic decision making

Annu. Rev. Neurosci.

The Matching Law: Papers in Psychology and Economics

Performance monitoring by the anterior cingulate cortex during saccade countermanding

Science

Handbook of Experimental Economics

Economic Choice Theory: An Experimental Analysis of Animal Behavior

Responsiveness in two-person zero-sum games

Behav. Sci.

Research report
Reinforcement learning and decision making in monkeys during a competitive game