Skip to main content

Main menu

  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Blog
    • Collections
    • Podcast
  • TOPICS
    • Cognition and Behavior
    • Development
    • Disorders of the Nervous System
    • History, Teaching and Public Awareness
    • Integrative Systems
    • Neuronal Excitability
    • Novel Tools and Methods
    • Sensory and Motor Systems
  • ALERTS
  • FOR AUTHORS
  • ABOUT
    • Overview
    • Editorial Board
    • For the Media
    • Privacy Policy
    • Contact Us
    • Feedback
  • SUBMIT

User menu

Search

  • Advanced search
eNeuro
eNeuro

Advanced Search

 

  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Blog
    • Collections
    • Podcast
  • TOPICS
    • Cognition and Behavior
    • Development
    • Disorders of the Nervous System
    • History, Teaching and Public Awareness
    • Integrative Systems
    • Neuronal Excitability
    • Novel Tools and Methods
    • Sensory and Motor Systems
  • ALERTS
  • FOR AUTHORS
  • ABOUT
    • Overview
    • Editorial Board
    • For the Media
    • Privacy Policy
    • Contact Us
    • Feedback
  • SUBMIT
PreviousNext
Research ArticleResearch Article: New Research, Cognition and Behavior

Neuronal Representation of a Working Memory-Based Decision Strategy in the Motor and Prefrontal Cortico-Basal Ganglia Loops

Tomohiko Yoshizawa, Makoto Ito and Kenji Doya
eNeuro 1 June 2023, 10 (6) ENEURO.0413-22.2023; https://doi.org/10.1523/ENEURO.0413-22.2023
Tomohiko Yoshizawa
1Oral Physiology, Department of Oral Functional Science, Faculty of Dental Medicine and Graduate School of Dental Medicine, Hokkaido University, Sapporo 060-8586, Japan
2Neural Computation Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa 904-0412, Japan
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Makoto Ito
2Neural Computation Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa 904-0412, Japan
3LiNKX, Inc, Tokyo 105-0003, Japan
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kenji Doya
2Neural Computation Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa 904-0412, Japan
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Kenji Doya
  • Article
  • Figures & Data
  • Info & Metrics
  • eLetters
  • PDF
Loading

Abstract

While animal and human decision strategies are typically explained by model-free and model-based reinforcement learning (RL), their choice sequences often follow simple procedures based on working memory (WM) of past actions and rewards. Here, we address how working memory-based choice strategies, such as win-stay-lose-switch (WSLS), are represented in the prefrontal and motor cortico-basal ganglia loops by simultaneous recording of neuronal activities in the dorsomedial striatum (DMS), the dorsolateral striatum (DLS), the medial prefrontal cortex (mPFC), and the primary motor cortex (M1). In order to compare neuronal representations when rats employ working memory-based strategies, we developed a new task paradigm, a continuous/intermittent choice task, consisting of choice and no-choice trials. While the continuous condition (CC) consisted of only choice trials, in the intermittent condition (IC), a no-choice trial was inserted after each choice trial to disrupt working memory of the previous choice and reward. Behaviors in CC showed high proportions of win-stay and lose-switch choices, which could be regarded as “a noisy WSLS strategy.” Poisson regression of neural spikes revealed encoding specifically in CC of the previous action and reward before action choice and prospective coding of WSLS action during action execution. A striking finding was that the DLS and M1 in the motor cortico-basal ganglia loop carry substantial WM information about previous choices, rewards, and their interactions, in addition to current action coding.

  • basal ganglia
  • motor cortex
  • prefrontal cortex
  • reinforcement learning
  • striatum
  • working memory

Significance Statement

Working memory (WM)-based decision strategies, such as win-stay-lose-switch (WSLS), are widely observed in humans and animals. To address neuronal bases of these strategies, we recorded neuronal activities of rat prefrontal and motor cortico-basal ganglia loops during continuous/intermittent choice tasks. The rat choice strategy was a noisy WSLS in the continuous choice condition, whereas non-WSLS was selected in the intermittent choice condition. In the continuous choice condition, the primary motor cortex (M1) and the dorsolateral striatum (DLS) in the motor loop more strongly conveyed information about previous choices, rewards, and their interactions than the medial prefrontal cortex (mPFC) and the dorsomedial striatum (DMS) in the prefrontal loop. These results demonstrate that the motor cortico-basal ganglia loop contributes to working memory-based decision strategies.

Introduction

Human and animal decision-making processes can be modeled by reinforcement learning (RL) theory, in which agents update the expected reward for each choice (Sutton and Barto, 1998). However, learning can be more dynamic and hypothesis driven. Under the assumption that only one of two choices is rewarding and the other is not rewarding, win-stay-lose-shift or win-stay-lose-switch (WSLS) is an optimal strategy. WSLS can be implemented with a very high learning rate in model-free reinforcement learning (Barraclough et al., 2004; Ito and Doya, 2009; Ohta et al., 2021), entropy-based metrics (Trepka et al., 2021), or using working memory (WM; Kesner and Churchwell, 2011; Nolen-Hoeksema et al., 2014). The possible contents of WM are previous actions, previous rewards or prospective actions. Patients with psychiatric disorders or developmental disabilities frequently show abnormal patterns of WSLS (Shurman et al., 2005; Waltz and Gold, 2007; Waltz et al., 2007, 2011; Prentice et al., 2008; Schlagenhauf et al., 2014), which may be because of disorders of WM (Barch and Ceaser, 2012).

Previous studies tested how availability of WM affected choice strategies by increasing the number of visual stimuli to remember (Collins and Frank, 2012; Collins et al., 2014), requiring an additional memory task in parallel (Otto et al., 2013a), or resulting in acute stress (Otto et al., 2013b). These studies involved humans and long intertrial intervals (ITIs) in rodents (Iigaya et al., 2018), suggested that choice strategies with intact WM were close to WSLS, whereas strategies under WM disruption became similar to behavior under standard RL.

While the basal ganglia play a major role in model-free reinforcement learning (Samejima et al., 2005; Ito and Doya, 2009, 2015a; Yoshizawa et al., 2018), the neural basis of WM-based decision-making is still unclear. Here, we developed a choice task for rats, in which WM availability was manipulated by inserting a no-choice trial between choice trials. This task addressed how working memory-based choice strategies, such as WSLS, are represented in the prefrontal and motor cortico-basal ganglia loops by simultaneous recording of neuronal activities in the dorsomedial striatum (DMS) and the medial prefrontal cortex (mPFC). These structures form a corticostriatal loop related to goal-directed behaviors (Voorn et al., 2004; Yin et al., 2004, 2005a, b, 2006; Balleine et al., 2007; Balleine and O’Doherty, 2010). The dorsolateral striatum (DLS) and the primary motor cortex (M1) form a corticostriatal loop related to motor actions. While previous studies suggested a major role of the PFC in working memory, the present results suggest the contribution of the motor loop to WM-based decision-making.

Materials and Methods

Subjects

Male Long–Evans rats (n = 6; 260–310 g body weight; 16–37 weeks old at the first recording session) were housed individually under a light/dark cycle (lights on at 7 A.M., off at 7 P.M.). Experiments were performed during the light phase. Food was provided after training and recording sessions so that body weights decreased no lower than 90% of initial levels. Water was supplied ad libitum. The Okinawa Institute of Science and Technology Graduate University Animal Research Committee approved the study.

Apparatus

All training and recording procedures were conducted in a 40 × 40 × 50 cm experimental chamber placed in a sound-attenuating box (O’Hara & Co). The chamber was equipped with three nose-poke holes on one wall and a pellet dish on the opposite wall (Fig. 1A). Each nose-poke hole was equipped with an infrared (IR) sensor to detect nose entry, and the pellet dish was equipped with an infrared sensor to detect the presence of a sucrose pellet (25 mg) delivered by a pellet dispenser. The chamber top was open to allow connections between electrodes mounted on the rat’s head and an amplifier. House lights, two video cameras, two IR LED lights and a speaker were placed above the chamber. A computer program written with LabVIEW (National Instruments) was used to control the speaker and the dispenser and to monitor states of the IR sensors.

Figure 1.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 1.

Apparatus and behavioral task. A, The experimental chamber was equipped with three holes for nose poking and a pellet dish. C: center, L: left, R: right. B, Time sequences of choice task. The behavioral task consisted of choice and no-choice trials. C, The behavioral task comprised two conditions. Choice trials were repeatedly represented in the continuous condition (CC). Choice and no-choice trials were presented alternatively in the intermittent condition (IC).

Behavioral task

Animals were trained to perform a choice trial and a no-choice trial using nose-poke responses. In either task, each trial began with a tone presentation (start tone: 3000 Hz, 1000 ms). When the rat performed a nose-poke in the center hole for 500–1000 ms, one of two cue tones (choice tone: white noise, 1000–1500 ms; no-choice tone: 900 Hz; 1000–1500 ms) was presented (Fig. 1B).

After onset of the choice tone (choice trials), the rat was required to perform a nose-poke in either the left or right hole within 2 s after exiting the center hole. If the rat exited the center hole before the offset of the choice tone, the choice tone was stopped. When the rat nose-poked either the left or right hole, either a reward tone (500 Hz, 1000 ms) or a no-reward tone (500 Hz, 250 ms) was presented probabilistically, depending on the selected action. The reward tone was followed by delivery of a sucrose pellet (25 mg) in the food dish. If the rat did not perform a nose-poke in either the left or right hole within 2 s, the trial was ended as an error trial after presentation of an error tone (9500 Hz, 1000 ms).

For the no-choice tone (no-choice trials), the rat was required not to perform left nor right nose pokes during 2 s after the exit from the center hole. Then, the trial was correctly finished by presentation of the no-reward tone. In this no-choice trial, the rat could not obtain any pellets, but if the rat could not perform this trial correctly, that is, if the rat incorrectly performed left or right nose-poke despite the no-choice tone, the trial was ended as an error trial after the error tone presentation, and the no-choice trial was repeated again in the next trial.

We designed the continuous condition (CC) that consisted only of choice trials, and the intermittent condition (IC) that had a no-choice trial inserted after every choice trial (Fig. 1C). A block is defined as a sequence of trials under the same reward probabilities of either (left, right) = (75%, 25%) or (25%, 75%). The first three blocks in each session were CC and the subsequent two blocks were IC. Reward probabilities of the first block were randomly selected from these two settings for each recording session, and were switched for every subsequent block.

The first (CC) and the third (CC) blocks were terminated when the choice frequency of the 75% reward side in the last 10 choice trials reached 80% (Fig. 2A). The second (CC), and, fourth and fifth (IC) blocks were ended when 10 choice trials had been conducted. In this setting, the first 20 choice trials in the second and the third CC blocks, and in the fourth and fifth IC blocks should be comparable; starting from 80% biased choice and switching reward probabilities after 10 choice trials. This set of five blocks was repeated about six times in a 1-d recording session.

Figure 2.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 2.

Behavioral results. A, Diagram of the block alternation schedule. The order of left-right reward probabilities was randomized across sessions. CC-L and CC-R indicate that the left and right choices were more rewarded in the CC block, respectively. IC-R and IC-L indicate that the left and right choices were more rewarded in the IC block, respectively. P(L) was the probability of the rat choosing the left side. B, A representative example of rat performance. Blue vertical lines indicate individual choices in choice trials. Red vertical lines indicate no-choice trials. Long lines and short lines represent rewarded and nonrewarded trials, respectively. The green trace in the middle indicates the probability of a left choice in choice trials (average of the last 10 choice trials). C, Averaged learning curves in one sequence of CC (upper) and IC (lower) on all sessions. A sequence consisted of 20 trials, and after 10 trials, reward probabilities were reversed. The vertical axis indicates the frequency of the action associated with the higher reward probability in the first 10 blocks. Filled circles and open circles show that the action frequency was significantly different from 0.5 (p < 0.05; Mann–Whitney U test). D, Distributions of the choice probability of 75% reward side in one session of CC (upper) and IC (lower). The optimal action probability is the frequency of selecting the action associated with the larger reward probability in one session. Medians of both distributions are significantly different from 0.5 (p < 0.01 for CC and IC; Mann–Whitney U test). E, Effects of interaction between past actions and outcomes on the subsequent action. The subsequent action was regressed by action in the previous trial and interaction of actions and outcomes in the past nine trials. **p < 0.01, *p < 0.05, Wilcoxon signed-rank test. F, Win-stay lose-switch (WSLS) indexes. The horizontal axis represents a win-stay index, the frequency that rats selected the same action after the rewarded trial. The vertical axis represents a lose-switch index, the frequency that rats switch the action after the no-reward trial. WSLS indices of CC sessions are plotted with green dots, while indices of IC sessions are shown with pink dots. Vertical lines in histograms indicate the medians of win-stay or lose-switch probabilities in CC and IC. **p < 0.01, Mann–Whitney U test.

Surgery

After rats trained to perform the CC and the IC tasks, they were anesthetized with pentobarbital sodium (50 mg/kg, i.p.) and placed in a stereotaxic frame. The skull was exposed and holes were drilled in the skull over the recording site. Four drivable electrode bundles were implanted and fixed in the DMS in the left hemisphere (1.0 mm posterior, 1.6 mm lateral from bregma, 3.7 mm ventral from the brain surface), the DLS in the right hemisphere (1.0 mm anterior, 3.5 mm lateral from bregma, 3.3 mm ventral from the brain surface), the mPFC in the left hemisphere (3.2 mm anterior, 0.7 mm lateral from bregma, 2.0 mm ventral from the brain surface), and the M1 in the right hemisphere (1.0 mm anterior, 2.6 mm lateral from bregma, 0.4 mm ventral from the brain surface) using pink dental cement with confirmed effects on the brain (Yoshizawa and Funahashi, 2020).

An electrode bundle was composed of eight Formvar-insulated, 25-μm bare diameter nichrome wires (A-M Systems) and was inserted into a stainless-steel guide cannula (0.3 mm in outer diameter; Unique Medical). Tips of the microwires were cut with sharp surgical scissors so that ∼1.5 mm of each tip protruded from the cannula. Each tip was electroplated with gold to obtain an impedance of 100–200 kΩ at 1 kHz. Electrode bundles were advanced by 125 μm per recording day to acquire activity from new neurons.

Electrophysiological recordings

Recordings were made while rats performed choice tasks. Neuronal signals were passed through a head amplifier at the head stage and then fed into the main amplifier through a shielded cable. Signals passed through a band pass filter (50–3000 Hz) to a data acquisition system (Power1401; CED), by which all waveforms that exceeded an amplitude threshold were time-stamped and saved at a sampling rate of 20 kHz. The threshold amplitude for each channel was adjusted so that action potential-like waveforms were not missed while minimizing noise. After a recording session, the following off-line spike sorting was performed using a template-matching algorithm and principal component analysis with Spike2 (Spike2; CED): recorded waveforms were classified into several groups based on their shapes, and a template waveform for each group was computed by averaging. Groups of waveforms that generated templates that appeared to be action potentials were accepted, and others were discarded. Then, to test whether accepted waveforms were recorded from multiple neurons, principal component analysis was applied to the waveforms. Clusters in principal component space were detected by fitting a mixture Gaussian model, and each cluster was identified as signals from a single neuron. This procedure was applied to each 50-min data segment. If stable results were not obtained, data were discarded. Then, gathered spike data were refined by omitting data from neurons that satisfied at least one of the three following conditions:

  1. The amplitude of waveforms was <7× the SD of background noise.

  2. The firing rate calculated by perievent time histograms (PETHs; from −4.0 to 4.0 s with 100-ms time bin based on the onset of cue tone, exit from the center hole, or entrance into the left or right hole) was <1.0 Hz for all time bins of all PSTHs.

  3. The estimated recording site was considered outside the target.

Furthermore, considering the possibility that the same neuron was recorded from different electrodes in the same bundle, we calculated cross-correlation histograms with 1-ms time bins for all pairs of neurons that were recorded from different electrodes in the same bundle. If the frequency at 0 ms was 10× larger than the mean frequency (from –200 to 200 ms, except the time bin at 0 ms) and their PETHs had similar shapes, either one of the pair was removed from the database.

Histology

After all experiments were completed, rats were anesthetized as described in the surgery section, and a 10-μA positive current was passed for 30 s through one or two recording electrodes of each bundle to mark their final recording positions. Rats were perfused with 10% formalin containing 3% potassium hexacyanoferrate (II), and brains were carefully removed so that the microwires did not cause tissue damage. Sections were cut at 60 μm on an electrofreeze microtome and stained with cresyl violet. Final positions of electrode bundles were confirmed using dots of Prussian blue. The position of each recorded neuron was estimated from the final position and the distance that the bundle of electrodes moved. If the position was outside the DMS, DLS, PL, or M1, recorded data were discarded.

Logistic regression analysis for behavioral data

We performed logistic regression analysis to examine the influence of past actions and outcomes on the next choice using the regression model: logit(p(t))=β0 + β1a(t−1) + β2a(t−1)×r(t−1)β3∑k=2mck×a(t−k)×r(t−k)p(t)=P(a(t)=1), where βi is the regression coefficient for each variable (regressor), a(t) ∈ {1: left, −1: right} is the selected action, and r(t) ∈ {1: rewarded, −1: nonrewarded} is the reward outcome. The parameter c (0 ≤ c ≤ 1) specifies the decay rate of past actions and rewards. For each setting of c, regression coefficients βi are derived by the “fitglm” function of MATLAB and the optimal c was selected between 0 and 1 with a line search. The optimal m was determined by comparing adjusted R2 with c set at the optimal value. The adjusted R2 became maximal with m = 9.

Poisson regression analysis for neuronal data

We performed Poisson regression analyses to examine what kinds of variables were encoded in neuronal spikes. The first Poisson regression analysis considered a Poisson model in which the number of spikes at a certain phase is sampled from a Poisson distribution with the expected number of spikes at trial t, μ(t): Poi(y|μ(t))=e−μ(t)μ(t)yy!.

μ(t) is represented by μ(t)=exp(β0 + βbb(t) + βaa(t) + βrr(t) + βcc(t) + log(d(t))),... (A)where βi is the regression coefficient for each explanatory variable (regressor). b(t) is the monotonically increasing factor, namely, b(t) = t, which is inserted to capture task-event-independent monotonic increases or decreases in firing pattern. a(t) ∈ {1: contralateral, −1: ipsilateral} is the selected action, r(t) ∈ {1: rewarded, 0: nonrewarded} is reward availability, and c(t) ∈ {1: CC, 0: IC} is the task condition, d(t) is the time duration of a phase. We also analyzed the differential reward responses of neurons following ipsilateral or contralateral action choice using a(t)×r(t) ∈ {1: contralateral rewarded, −1: ipsilateral rewarded, 0: nonrewarded} as a regressor, instead of a(t) and r(t) separately.

Optimal regression coefficients are determined so that the objective function, the log likelihood for all trials, is maximized. L(β)=∑t=1Tlog(Poi(y(t)|μ(t))).

Here, β represents a set of coefficients. For this calculation, a function in MATLAB Statistics and Machine Learning Toolbox “fitglm(X, y, 'Distribution', 'poisson')” was used.

Next, we found the minimal necessary regressors in (A) to predict μ(t). Then, we used the Bayesian information criterion (BIC): BIC=−2L(β*) + ln(n)k, where β* is the optimized β that maximizes the log likelihood L. k is the number of parameters, and n is the trial number in a session. BIC can be regarded as a fitting measure taking into account the penalty for the number of parameters (the number of β) in the model.

Better models have smaller BICs. Because the full regression model includes five regressors (including the constant variable for 0), we can consider 25 models for all combinations. We calculated the BIC for all possible models, and then we selected a set of regressors that showed the smallest BIC. Then, we tested the statistical significance of each regression coefficient in the selected model using the regular Poisson regression analysis. If p < 0.01, the corresponding variable was regarded as being coded in the firing rate. This variable selection was conducted independently for each neuron and for each time bin.

Then, we used the following full model to find neuron coding actions, rewards, interactions between actions and rewards, or prospective action using a WSLS strategy: μ(t)=exp(β0 + βbb(t) + βL1L1(t−1) + βL0L0(t−1) + βR1R1(t−1) + βR0R0(t−1) + log(d(t))), where L1(t) is the variable taking 1 if the rat selected left and rewarded in the trial t. Otherwise, it takes 0. L0(t), R1(t), and R0(t) are the variables taking 1 only for L0, R1, or R0, respectively.

Strictly speaking, this final variable R0(t) is unnecessary, because R0(t) can be represented by 1 – L1(t) – L0(t) – R(t). Therefore, we were unable to find a unique solution of the full model that minimized likelihood. So, we considered 26–1 combinations of coefficients without the full model, and found the model with the smallest BIC.

According to the variables included in the best model and the signs of these coefficients, we classified neurons into five types: neurons coding action in the previous choice trial, neurons coding reward in the previous choice trial, neurons coding action-reward-interaction (A×R) in the previous choice trial, prospective-action-coding, and the noncoding neurons (Table 1).

View this table:
  • View inline
  • View popup
Table 1

Classification of information-coding neurons

If the activity codes action and not reward, coefficients of the model for L1 and L0, or for R1 and R0 should have the same signs. If the activity codes reward, but not action, coefficients of L1 and R1, or L0 and R0, should be necessary and should have the same signs. If the activity codes an interaction between action and reward (A×R), one coefficient among L1, L0, R1, and R0, should be necessary.

If activity codes a prospective action by a WSLS strategy (Pros Action), the coefficient of L1 and R0 (both predict action R in the next trial), or L0 and R1 (both predict action L in the next trial) should be necessary. Because multiple tests changed the ratio of false positive errors, thresholds indicating a significant proportion of these information-coding neurons were calculated so that the ratio of false positives becomes 0.05 (binomial tests). To compare proportions of these information-coding neurons between CC and IC, data in CC and IC were analyzed separately with this Poisson regression analysis.

Mutual information

To elucidate when and how much information from previous A×R was coded in neuronal activity, the mutual information between neural firing and previous A×R was calculated using the method described previously (Ito and Doya, 2009, 2015a). For a T = 100 ms time window in each trial, the neuronal activity was defined as a random variable F, taking the number of spikes from 0 to fmax = T/2. X is a random variable taking x1, x2, x3, or x4, corresponding to previous A×R: left rewarded, left nonrewarded, right rewarded or right nonrewarded, respectively. Mutual information between F and X is defined by the following: I(F,X)=∑f=0fmax∑i=14p(f,xi)logp(f,xi)p(f)p(xi).

For each neuron, mutual information (bits) was estimated (for more detail, see Ito and Doya, 2009) for every 100-ms time bin of an EASH, using all choice trials in the CC.

Experimental design and statistical analyses

The presented analyses include 48,554 behavioral and neural trials (after the task learning was completed) recorded over a total of 78 sessions in six rats. The minimum and maximum number of trials per session were 436 and 1105, respectively.

We used appropriate statistical tests when applicable, i.e., paired or unpaired t test, Mann–Whitney U test, Wilcoxon signed-rank test, binominal test, and χ2 tests with or without Bonferroni’s multiple comparison tests. Differences were considered statistically significant when p < 0.05. See Results for details.

Results

To investigate whether and how the availability of working memory (WM) influences the choice strategy of rats, we designed a choice task composed of choice trials and no-choice trials. In the continuous condition (CC), choice trials were repeated and in the intermittent condition (IC), a no-choice trial was inserted between each pair of choice trials (Fig. 1). This insertion did not only prolong interchoice-trial intervals but also increase patterns of behaviors, therefore WM was expected to be strongly disturbed. An experimental session consisted of the first three blocks in CC and the fourth and fifth blocks in IC (Fig. 2A,B). For the analysis, we used behavioral and neuronal data in 20 choice trials in the second and third blocks as a “CC sequence,” and 20 choice trials in the fourth and fifth blocks as an “IC sequence” (see Materials and Methods for more detail). In one experimental day, this sequence of five blocks was repeated four to six times (usually, six times; mean = 5.9 times).

Behavioral performance

We trained six Long–Evans rats to perform the choice task (Fig. 1; see Materials and Methods) and conducted 78 sessions consisting of 461 CC sequences and 461 IC sequences. In both conditions, rats learned to choose the 75% reward side, based on their experiences with choice and reward (Fig. 2B). When the more rewarding side switched to the opposite side, after ten choice trials, their choices also shifted to the opposite side.

We compared choice sequences in CC and IC in response to the change of reward probabilities (Fig. 2C). In CC, the choice probability switched to the other option immediately after the change of reward probabilities. In IC, the choice probability gradually shifted to the opposite side and reached a significantly different level from the chance after an average of 5 trials from the block change.

The choice probability of 75% reward side in each session was distributed between 0.5 and 0.75 with a median value of 0.632 in CC, and between 0.4 and 0.65 with a median of 0.538 in IC (Fig. 2D). In both conditions, the median was significantly greater than the chance level (p = 1.6e−14 in CC and p = 1.1e−7 in IC, Wilcoxon signed-rank test), confirming that choice behavior in both CC and IC adapted to reward-probability changes.

To investigate choice strategies in more detail, using logistic regression, we examined the influence of past actions and outcomes on the subsequent choice (see Materials and Methods). The interaction of the action and the outcome of the last trial affected the subsequent action more strongly in CC than in IC (median of coefficient for a(t-1)*r(t-1) β2: CC; 1.87, IC; 0.42, p = 3.3e-12, Wilcoxon signed-rank test; Fig. 2E). The effect of the previous action-reward relationship decayed more rapidly in CC (median of the decay constant c: CC; 0.23, IC; 0.62, p = 0.048 with no significant difference in the coefficient β3: CC; 0.13, IC: 0.19, p = 0.49). These results indicate that rats recognized the side with a larger reward probability and changed their action selection in both CC and IC, while insertion of no-choice trials in IC made learning slower.

Next, to test the hypothesis that the choice strategy is closer to WSLS in CC than in IC, we calculated WSLS indices, composed of the win-stay ratio P(stay| reward), the frequency that the rats chose the same action after the rewarded choice trial, and the lose-switch ratio P(switch| no-reward), the frequency that the rats switch the action after no-reward trials. WSLS indices calculated for each session are plotted two-dimensionally, P(stay| reward) versus P(switch| no-reward) (Fig. 2F). WSLS indices of CC sessions were concentrated near the upper right corner, while those of IC sessions were widely distributed from the center to the lower right corner. These two distributions had little overlap. Ratios of win-stay P(stay| reward) and lose-switch P(switch| no-reward) were significantly larger in CC than in IC (win-stay: 0.85 vs 0.64, p = 1.2e-16, Mann–Whitney U test; lose-switch: 0.82 vs 0.51, p = 2.6e-23). Since behaviors in CC showed high P(stay| win) and P(switch| lose), these could be regarded as “a noisy WSLS strategy” (cf. the regular WSLS is deterministic).

Neuronal activities of the prefrontal and motor cortico-basal ganglia loops

We recorded neuronal activity in DMS, DLS, mPFC, and M1 of rats performing the choice task. Each rat was implanted with four bundles of eight microwires. After all experiments were completed, locations of bundles were confirmed by Nissl staining (Fig. 3A,B). Stable recordings were made from 320 neurons in DMS, 210 neurons in DLS, 158 neurons in mPFC, and 247 neurons in M1 from six rats. We analyzed neural activities in CC and IC conditions in four phases in a trial (see Fig. 1B, 1: in the center hole before the choice tone; 2: after the choice tone before exiting the center hole; 3: during the approach to the left or right hole; 4: in the left/right hole with a reward/no reward tone).

Figure 3.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 3.

Representative activity patterns of neurons during choice trials. A, Nissl-stained coronal sections showing recording locations for the DMS (blue) and DLS (red). B, Tracks of accepted electrode bundles for all rats are indicated by rectangles. Neurons recorded from blue, red, cyan, and magenta rectangles were classified as DMS, DLS, mPFC, and M1 neurons, respectively. Each diagram represents a coronal section referenced to the bregma (Paxinos and Watson, 1998). C–F, Perievent time histograms (PETHs) of a representative DMS neuron (C), DLS neuron (D), mPFC neuron (E), and M1 neuron (F). PETHs were calculated based on timings of five task events (onset of center-hole poking, onset of the tone, offset of center-hole poking, onset of left or right hole-poking, offset of left-hole or right-hole poking), and the following four task phases were defined: phase 1, the period from the start of the center-hole poking to the onset of the cue tone; phase 2, the choice tone presentation period; phase 3, the action execution period between exiting the center hole and entry into the left or right hole; phase 4, the feedback period when a reward or no-reward tone was presented after left-hole or right-hole poking. PETHs of all choice trials (black), and of trials in CC (green) and trials in IC (cyan; upper panel). PETHs of CC and IC choice trials in which left was selected and rewarded (L1), left was selected and not rewarded (L0), right was selected and rewarded (R1), or right was selected and not rewarded (R0; lower panel). All PETHs (50-ms bins) were smoothed with a Gaussian kernel with a 150-ms SD. G–J, Normalized activity patterns of all recorded neurons from the DMS (G), DLS (H), mPFC (I), and M1 (J). An activity pattern for each neuron was normalized so that the maximum PETH was 1 and represented by pseudo-color (values from 0 to 1 are represented from blue to red). Indexes of neurons were sorted based on the time that the normalized PETH first surpassed 0.8.

Representative examples of spike perievent time histograms (PETHs) with intertrial time alignment (Ito and Doya, 2015b) are shown in Figure 3C–F. Neurons in DMS (Fig. 3C) increased their activity when the rat exited from the center hole and entered the left or right hole (phase 3), and showed the largest peak after exiting the left or right hole, when the rat anticipated obtaining a pellet (black line in upper panel). PETHs for CC (green line) and IC (magenta line) differed in phase three and after exiting the left/right hole, showing that the activity pattern was modulated by the task condition. PETHs also showed different increased activity following a right side, especially when the choice was rewarded (R1), showing that the activity pattern was modulated by a conjunction of choice and reward.

The DLS neuron in Figure 3D increased its activity when the rat was approaching the center hole (phase 1). It showed higher activity for left than right choices in phases 3 and 4, which was higher in IC, showing context-dependent action coding.

The mPFC neuron in Figure 3E showed higher activity for no-reward than reward experiences after exiting the left/right hole, showing outcome coding.

PETHs of the M1 neuron (Fig. 3F) showed higher activity when entering the left hole (phase 4), which was stronger in CC, showing context-dependent action coding.

To see the distribution of peak timings of all DMS neurons, PETHs for all DMS neurons were normalized and represented by color, with neuron indices sorted on the basis of peak activity timing (Fig. 3G). Activity peaks of DMS neurons were widely distributed in different phases in a trial. DLS, mPFC, and M1 neurons (Fig. 3H–J) also had their own peak firing at different phases, with greater apparent concentrations when approaching the center hole and choice holes (phase 3) in DMS and DLS, and during choice tone presentation (phase 2) in mPFC. These analyses show that neurons in the four areas show various actions, outcomes, and task condition coding at different times in the trial.

Neuronal representation of experiences of the current choice trial

We next applied Poisson regression analyses of spike counts in each phase to quantify neuronal coding of actions, rewards, and task conditions. Note that action coding in phase 1 and 2 represents an action command or plan before an action is performed, and that the outcome regressor means reward prediction in phases 1–3 (Fig. 4A). Average durations of each phase did not differ significantly between CC and IC (Fig. 4B), which excludes the possibility that differences in neural activities were due simply to differences in motor behaviors.

Figure 4.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 4.

Proportions of neurons coding actions, rewards, and conditions. A, Meanings of regressors in each task phase. B, Means of time duration in each phase do not differ significantly between CC and IC. Results are means ± SEM; n.s.: p > 0.05 (paired t test). C, D, Proportions of neurons significantly correlated with a regressor for action (C) and reward (D) in each phase. Upward and downward bars indicate proportions of neurons with positive and negative regression coefficients, respectively. **p < 0.01, *p < 0.05, χ2 test between the DMS and DLS or mPFC and M1. ##p < 0.01, #p < 0.05, Binominal test. E, Mean regression coefficients of lateralized reward. Mean ± SEM. **p < 0.01, *p < 0.05, paired t test compared with zero.

The proportion of neurons correlated with action was significantly larger in the motor loop than in the prefrontal loop in phases 2 and 4 (Fig. 4C, phase 2: DMS: 5.9%, DLS: 12%, p = 0.037, mPFC: 1.3%, M1: 13%, p = 9.5e-05; phase 4: DMS: 27%, DLS: 37%, p = 0.039, mPFC: 22%, M1: 38%, p = 0.0032, χ2 test with Bonferroni correction). The proportion was significantly larger in M1 than in the mPFC in phase 3 (mPFC: 15%, M1: 38%, p = 1.2e-06). A contralateral bias was clearly observed during action (phase 3) in M1, but also in DMS neurons (M1: p = 7.8e-04, DMS: p = 0.044, Binominal test with Bonferroni correction). These results indicate that the motor loop is more strongly involved in action preparation, execution, and memory than the prefrontal loop.

The proportion of neurons correlated with the outcome variable was significantly larger in the DLS than in the DMS in phase 4 (Fig. 4D, DMS: 7.8%, DLS: 15%, p = 0.044). Neurons in the DLS and M1 were significantly more activated by reward than no-reward experience (DLS: p = 0.018, M1: p = 0.0045). These results indicate that DLS neurons can more strongly distinguish reward-associated sensory stimulus than those in the DMS.

For the reward coding neurons, we further assessed whether their responses were sensitive to the laterality of action choice. We regressed the number of spikes in phase four using action × reward (contralateral rewarded: 1, ipsilateral rewarded: −1, nonrewarded: 0; see Materials and Methods). DLS had significantly stronger sensitivity to contralateral reward than ipsilateral reward in both CC and IC conditions (Regression coefficient; CC: 1.44 ± 0.41, p = 0.0013, IC: 1.6 ± 0.44, p = 9.6e-4, paired t test compared with zero; Fig. 4E). Other regions did not show significantly different sensitivity to reward following contralateral and ipsilateral actions. Note that the order of left-right reward probabilities were randomized across sessions (Fig. 2A), so that the laterality of action-reward responses did not affect the observed features, although the recordings of the motor loop were from the right hemisphere (Fig. 3B).

In all phases, the proportion of neurons correlated with the task condition (CC/IC) was <10% (data not shown), which suggests that neuronal activities did not directly encode task condition, but does not exclude the possibility that information coding of action or outcome in the previous choice trial was modulated by task conditions.

Neuronal representation of experiences in the previous choice trial

Next, to examine how information from choice trials transferred to the subsequent choice trial with and without WM interference, we applied a Poisson regression analysis to spike data in CC and IC separately (see Materials and Methods; Table 1). In this analysis, activity of each neuron in each phase was classified exclusively into five groups: neurons coding the action of the previous choice trial, neurons coding the reward of the previous choice trial, neurons coding the interaction between action and reward (A×R) of the previous choice trial, neurons coding the action according to WSLS strategy, and noncoding neurons.

In phase 1 (Fig. 5A), on entry into the center hole, coding of the previous reward was seen in DLS, mPFC, and M1, specifically in CC (DLS: CC; 9.0%, IC; 1.9%, p = 0.0052, mPFC: CC; 15%, IC; 1.3%, p = 2.7e-05, M1: CC; 8.5%, IC; 1.2%, p = 0.00066, χ2 test with Bonferroni correction). Coding of the interaction of the previous action and reward was seen in all four areas, with more DMS neurons in CC than in IC (CC; 16%, IC; 9.4%, p = 0.037).

Figure 5.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 5.

Neuronal representation of WSLS strategy in the prefrontal and motor cortico-basal ganglia loops. A–D, Proportions of neurons coding actions, rewards, and A×R s of the previous choice trial and coding WSLS and current action in phases 1 (A), 2 (B), 3 (C), and 4 (D). Filled and hatched bars indicate proportions in CC and IC, respectively. When rats employed the WSLS strategy, DLS neurons more strongly conveyed information about the previous choice trial during action preparation than DMS neurons (yellow arrows). Neuronal activities of all areas excluding the mPFC represented WSLS action during action execution (green arrow). **p < 0.01, *p < 0.05, χ2 test between DMS and DLS or mPFC and M1 in each task condition. ##p < 0.01, #p < 0.05, χ2 test between CC and IC in each recording area. E, The time course mutual information between previous A×R and neuronal firing in 100 ms bins before and after the offset of center-hole poking in the CC. Mean ± SEM. F, The average mutual information between previous A×R and neuronal firing during 500 ms before and after the offset of center-hole poking in the CC. Mean ± SEM; **p < 0.01, *p < 0.05, unpaired t test.

In phase 2 (Fig. 5B), while the cue tone was presented, neurons in the DLS and M1 coded the previous action selectively in CC (DLS: CC; 5.2%, IC; 0.95%, p = 0.044, M1: CC; 5.3%, IC; 0.40%, p = 0.044). More neurons in DLS, mPFC, and M1 neurons coded the interaction of the previous action and reward in CC than in IC (DLS: CC; 22%, IC; 12%, p = 0.018, mPFC: CC; 10%, IC; 1.9%, p = 0.0084, M1: CC; 14%, IC; 5.3%, p = 0.0033).

In phase 3 (Fig. 5C), when rats moved to left/right hole, neurons coding the WSLS action predicted from the previous action and reward were seen in the DMS, DLS, and M1 selectively in CC (DMS: CC; 6.3%, IC; 0%, p = 2.2e-05, DLS: CC; 6.7%, IC; 0%, p = 0.00057, M1: CC; 6.9%, IC; 0%, p = 0.00011). Coding of the current action was seen in all four areas, with higher proportions in CC than in IC in the DMS and M1 (DMS: CC; 17.5%, IC; 8.4%, p = 0.0026, M1: CC; 21%, IC; 12%, p = 0.023).

In phase 4 (Fig. 5D), when rats entered the left/right hole and heard the reward/no-reward tone, while the proportion of M1 neurons coding the WSLS action (CC; 12%, IC; 0.40%, p = 3.0e-07) and the current actions (CC; 19%, IC; 9.7%, p = 0.013) remained higher in CC than in IC, the proportion of neurons coding the previous reward was larger in IC than in CC in the DLS and M1 (DLS: CC; 0.48%, IC; 4.3%, p = 0.042, M1: CC; 1.6%, IC; 7.7%, p = 0.0054).

Interestingly, throughout the four phases in CC, neurons coding working memory about the previous trial were more prevalent in the motor loop than in the prefrontal loop (phase 1, previous reward: DMS vs DLS; 3.1% vs 9.0%, p = 0.027; phase 2, previous action: DMS vs DLS; 0.63% vs 5.2%, p = 0.0063, mPFC vs M1; 0% vs 5.3%, p = 0.027; phase 2, previous A×R: DMS vs DLS; 11% vs 22%, p = 0.0019; phase 3, previous action: DMS vs DLS; 1.9% vs 14%, p = 2.2e-07; phase 3, previous A×R: mPFC vs M1; 8.9% vs 20%, p = 0.023; phase 4, previous action: DMS vs DLS; 0.63% vs 7.1%, p = 0.00025; phase 4, previous A×R: DMS vs DLS; 13% vs 27%, p = 0.00041, χ2 test with Bonferroni correction), although working memory is regarded as a major function of the prefrontal cortex; 35–48% neurons coding previous A×R in each area encoded it across phases (data not shown).

To elucidate when and how much information about previous A×R was represented in the motor and prefrontal loops, we calculated the mutual information between previous A×R and neuronal firing and in 100-ms time bins around the offset of center-hole poking in the CC (Fig. 5E; Ito and Doya, 2009, 2015a). M1 neurons showed a sharp peak after exit from the center-hole. Previous A×R information was more strongly encoded in the motor loop than in the prefrontal loop for 500 ms before and after the offset of center-hole poking (before: DMS vs DLS; 0.020 ± 0.0012 vs 0.031 ± 0.0017 bits, p = 8.1e-08, mPFC vs M1; 0.016 ± 0.0010 vs 0.026 ± 0.0016 bits, p = 4.6e-06. after: DMS vs DLS; 0.033 ± 0.0021 vs 0.045 ± 0.0027 bits, p = 2.2e-04, mPFC vs M1; 0.021 ± 0.0020 vs 0.046 ± 0.0036 bits, p = 2.6e-07; mean ± SEM unpaired t test; Fig. 5F). These time windows were equivalent to phases 2 and 3, respectively. The result of mutual information analysis is consistent with that of neuronal proportion analysis.

We further focused on the neural activities just before and after the reward probabilities were changed. We performed an analysis using the WSLS-related variables as in Figure 5 for phase 1 of the first or last four trials in each block (Fig. 6). In the first four trials, the DLS more strongly represented the previous reward than the DMS (DMS vs DLS; 7% vs 16%, p = 0.0053, χ2 test with Bonferroni correction), similar the result for all trials in Figure 5A. In the last four trials, the DLS more strongly encoded previous action and previous A×R than the DMS (previous action: DMS vs DLS; 0.3% vs 3%, p = 0.042, previous A×R: DMS vs DLS; 12% vs 26%, p = 0.00029), which were not observed with all trials combined. The result confirms the role of DLS in working memory both in early and late stages of learning, with the coding in the latter related more to the next action.

Figure 6.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 6.

Neuronal representation of WSLS strategy before a change in reward contingency and after that change. Same as Figure 5A but only using behavioral and neuronal data during phase 1 of the first or last four trials in each block for analysis. **p < 0.01, *p < 0.05, χ2 test between DMS and DLS or mPFC and M1 in each task condition. ##p < 0.01, #p < 0.05, χ2 test between CC and IC in each recording area.

Discussion

Our main achievements and findings in this research are as follows.

  1. We developed a choice task that manipulated WM availability for rats and showed that disturbance of WM disrupted the WSLS choice strategy (Fig. 2).

  2. Poisson regression of neural spikes showed that the proportions of neurons coding the current action before and after action choice were larger in the DLS than DMS, and in M1 than the mPFC, and that neurons in DLS showed stronger reward response following contralateral action than ipsilateral action (Fig. 4).

  3. Before action choice, the proportion of neurons coding the previous reward was larger in CC in DLS, mPFC, and M1, and the proportion coding the previous action was larger in CC in DLS and M1. During action execution in CC, neuronal activities of DMS, DLS, and M1 represented prospective action by the WSLS strategy (Fig. 5).

  4. Throughout the trial, working memories of previous actions, rewards, and their interactions were more prevalent in the motor loop than in the prefrontal loop (DLS than the DMS and M1 than mPFC; Fig. 5).

In the present study, we showed the effect of WM availability in the choice behavior of rats, which allowed us to analyze neuronal correlates of WM-based choice strategy in the cortico-striatal circuit (Fig. 2).

Recent studies in both humans and rodents showed that disruption of WM changed the strategy in sequential choice tasks (Collins and Frank, 2012; Worthy et al., 2012; Otto et al., 2013a, b; Collins et al., 2014; Economides et al., 2015; Iigaya et al., 2018). Worthy et al. (2012) used a two-choice task and disrupted WM by requiring an additional memory task in parallel. They showed that human choice behavior with intact WM was better fitted by a WSLS model than a RL model, while choice behavior with WM load was better fitted by a RL model than the WSLS model. Collins and Frank (2012) and Collins et al. (2014) used a choice task in which a subject selected one action among three options for a given visual image. The WM load was controlled by varying the number of visual stimuli. They analyzed the choice strategy using a hybrid model combining a RL model and a WM model and suggested that the choice behavior without WM load can be explained by the WM model (RL model with the learning rate = 1), while the choice behavior with WM load can be explained by the RL model with the lower learning rate. Iigaya et al. (2018) studied mouse performance in a nonstationary, reward-driven, decision-making task and assessed WM availability based on spontaneous variations in intertrial intervals (ITIs). Mice showed WSLS-like choices after short ITIs, but RL-like choices after long ITIs. Optogenetic stimulation of dorsal raphe serotonin neurons boosted the learning rate only in trials after long ITIs, suggesting that serotonin neurons modulate reinforcement learning rates, and that this influence is masked by WM-based decision mechanisms.

All previous studies examining WM effects on choice strategies except Iigaya et al. (2018) were performed with human subjects, and there have been no reports comparing neuronal representations of a WM-based strategy. In this study, we recorded neurons in the DMS, DLS, mPFC, and M1 during a choice task with and without WM disturbance, and found that neurons in each area had a variety of activity patterns throughout a trial, and that patterns were modulated by selected actions, reward outcomes, and task conditions (Fig. 3). Information on action command and selected action was more strongly represented in the motor loop than in the prefrontal loop (Fig. 4). These properties are similar to our previous observations in the DLS and DMS neurons (Ito and Doya, 2015a). The DLS had significantly stronger sensitivity to reward following contralateral than ipsilateral actions, whereas reward responses in other regions did not show significantly different sensitivity to the laterality of actions (Fig. 4E). The neuronal activities in two loops were recorded from different hemispheres. However, the order of left-right reward setting was randomized across sessions, so that the analysis result would not be affected by the laterality of recording sites.

What kind of information must be retained between trials for the WSLS strategy? There are at least two possibilities. One is to retain direct experiences with action and reward in the previous trial, such as, L1, L0, R1, and R0 (keeping past experience). Another is to compute the next action using a WSLS rule soon after reward feedback (prospective action), that is L for L1 or R0, and R for L0 or R1 and retain that until the next choice (keeping future plan).

Indeed, before action execution in CC, coding of the previous action was seen in the DLS and M1 and coding of the previous reward was seen in the DLS, mPFC, and M1 (Fig. 5A,B). Combinatorial information of the previous action and reward was more dominant in CC in the DMS (phase 1) and the DLS, mPFC, and M1 (phase 2). A previous study in monkeys reported that prefrontal neurons modulated their activity according to the previous outcome and the conjunction of the previous choice and outcome (Barraclough et al., 2004).

Our results showed for the first time that the motor loop (DLS and M1) retains that information more strongly than the prefrontal loop (DMS and mPFC) when a WM-based strategy was observed.

Previous studies suggested that different types of uncertainty, such as the difference of the reward probabilities and the volatility of reward contingency, affects choice strategies and learning rates (Soltani and Izquierdo, 2019; Woo et al., 2022). While some studies report higher learning rate with higher volatility (Behrens et al., 2007; Piray and Daw, 2021), others reported reduced learning rates (Donahue and Lee, 2015; Farashahi et al., 2019; Woo et al., 2022). In the present study, learning and forgetting were slower in the IC condition than in the CC condition (Fig. 2C). It is not likely, however, that this is a result of the difference in the volatilities in the two conditions because both CC and IC blocks included the same 10 choice trials.

Another possible factor that might affect the choice strategy and the involvement of the motor loop is the simple repeated choice sequence acquired in the CC compared with a more complex choice/no-choice sequence acquired in the IC condition. However, the analysis of neuronal representation soon after reward contingency change still shows higher previous action coding in of the motor loop (Fig. 6). This suggests that the motor loop did not only contribute to simple repetitive actions but also adaptive choice shift using WM.

In the present study, rats were overtrained with the task before neuronal recording. It was reported that overtraining induced a shift from goal-directed to habitual behavior (Smith and Graybiel, 2013), which is often associated with a shift in the control from the DMS to DLS (Yin et al., 2004; Ashby et al., 2010; Thorn et al., 2010; Graybiel and Grafton, 2015; Kupferschmidt et al., 2017). A DLS lesion study reported an impairment of lose-shift response in an operant task (Skelin et al., 2014). Our study showed that motor loop including the DLS more strongly conveyed information about previous choices, rewards, and their interactions than prefrontal loop including the DMS. It is possible that the WSLS strategy using the WM of action and reward was established as a habitual behavior in the motor loop.

From the viewpoint of information processing, keeping a future plan is more efficient in that it requires less memory capacity. Memory of future action has been termed “prospective action” in previous studies (Kesner, 1989; Goto and Grace, 2008; Kesner and Churchwell, 2011). However, these studies suggested that the prefrontal cortex is responsible for prospective action coding, in our study, prospective WSLS coding appeared during action execution in all recorded areas, excluding the mPFC. It is still possible that the prospective action was calculated and retained in other prefrontal areas, such as the anterior cingular cortex (ACC), located just above the mPFC. The ACC was considered a part of the prefrontal cortex in previous studies (Kesner, 1989; Goto and Grace, 2008; Kesner and Churchwell, 2011) and was thought to be involved in working memory for the motor response (Kesner and Churchwell, 2011).

In conclusion, this experiment showed that the availability of WM affects choice strategies in rats and revealed WM-related neuronal activities in DMS, DLS, mPFC, and M1. A striking finding was that DLS and M1 in the motor cortico-basal ganglia loop carry substantial WM information about previous choices, rewards, and their interactions, in addition to action coding during action execution.

Acknowledgments

Acknowledgements: We thank the members of the Neural Computation Unit for helpful comments and discussion and Steven D. Aird for thorough editing and proofreading.

Footnotes

  • The authors declare no competing financial interests.

  • This work was supported by JSPS KAKENHI Grant Numbers JP16H06563 and JP23120007 (to K.D.) and JP19K16299 and JP22K15217 (to T.Y.) and the generous research support of Okinawa Institute of Science and Technology Graduate University for the Neural Computation Unit.

This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license, which permits unrestricted use, distribution and reproduction in any medium provided that the original work is properly attributed.

References

  1. ↵
    Ashby FG, Turner BO, Horvitz JC (2010) Cortical and basal ganglia contributions to habit learning and automaticity. Trends Cogn Sci 14:208–215. https://doi.org/10.1016/j.tics.2010.02.001 pmid:20207189
    OpenUrlCrossRefPubMed
  2. ↵
    Balleine BW, O’Doherty JP (2010) Human and rodent homologies in action control: corticostriatal determinants of goal-directed and habitual action. Neuropsychopharmacology 35:48–69. https://doi.org/10.1038/npp.2009.131 pmid:19776734
    OpenUrlCrossRefPubMed
  3. ↵
    Balleine BW, Delgado MR, Hikosaka O (2007) The role of the dorsal striatum in reward and decision-making. J Neurosci 27:8161–8165. https://doi.org/10.1523/JNEUROSCI.1554-07.2007 pmid:17670959
    OpenUrlAbstract/FREE Full Text
  4. ↵
    Barch DM, Ceaser A (2012) Cognition in schizophrenia: core psychological and neural mechanisms. Trends Cogn Sci 16:27–34. https://doi.org/10.1016/j.tics.2011.11.015 pmid:22169777
    OpenUrlCrossRefPubMed
  5. ↵
    Barraclough DJ, Conroy ML, Lee D (2004) Prefrontal cortex and decision making in a mixed-strategy game. Nat Neurosci 7:404–410. https://doi.org/10.1038/nn1209 pmid:15004564
    OpenUrlCrossRefPubMed
  6. ↵
    Behrens TE, Woolrich MW, Walton ME, Rushworth MF (2007) Learning the value of information in an uncertain world. Nat Neurosci 10:1214–1221. https://doi.org/10.1038/nn1954 pmid:17676057
    OpenUrlCrossRefPubMed
  7. ↵
    Collins AG, Frank MJ (2012) How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis. Eur J Neurosci 35:1024–1035. https://doi.org/10.1111/j.1460-9568.2011.07980.x pmid:22487033
    OpenUrlCrossRefPubMed
  8. ↵
    Collins AG, Brown JK, Gold JM, Waltz JA, Frank MJ (2014) Working memory contributions to reinforcement learning impairments in schizophrenia. J Neurosci 34:13747–13756. https://doi.org/10.1523/JNEUROSCI.0989-14.2014 pmid:25297101
    OpenUrlAbstract/FREE Full Text
  9. ↵
    Donahue CH, Lee D (2015) Dynamic routing of task-relevant signals for decision making in dorsolateral prefrontal cortex. Nat Neurosci 18:295–301. https://doi.org/10.1038/nn.3918 pmid:25581364
    OpenUrlCrossRefPubMed
  10. ↵
    Economides M, Kurth-Nelson Z, Lübbert A, Guitart-Masip M, Dolan RJ (2015) Model-based reasoning in humans becomes automatic with training. PLoS Comput Biol 11:e1004463. https://doi.org/10.1371/journal.pcbi.1004463 pmid:26379239
    OpenUrlCrossRefPubMed
  11. ↵
    Farashahi S, Donahue CH, Hayden BY, Lee D, Soltani A (2019) Flexible combination of reward information across primates. Nat Hum Behav 3:1215–1224. https://doi.org/10.1038/s41562-019-0714-3 pmid:31501543
    OpenUrlCrossRefPubMed
  12. ↵
    Goto Y, Grace AA (2008) Dopamine modulation of hippocampal-prefrontal cortical interaction drives memory-guided behavior. Cereb Cortex 18:1407–1414. https://doi.org/10.1093/cercor/bhm172 pmid:17934187
    OpenUrlCrossRefPubMed
  13. ↵
    Graybiel AM, Grafton ST (2015) The striatum: where skills and habits meet. Cold Spring Harb Perspect Biol 7:a021691. https://doi.org/10.1101/cshperspect.a021691 pmid:26238359
    OpenUrlAbstract/FREE Full Text
  14. ↵
    Iigaya K, Fonseca MS, Murakami M, Mainen ZF, Dayan P (2018) An effect of serotonergic stimulation on learning rates for rewards apparent after long intertrial intervals. Nat Commun 9:2477. https://doi.org/10.1038/s41467-018-04840-2 pmid:29946069
    OpenUrlCrossRefPubMed
  15. ↵
    Ito M, Doya K (2009) Validation of decision-making models and analysis of decision variables in the rat basal ganglia. J Neurosci 29:9861–9874. https://doi.org/10.1523/JNEUROSCI.6157-08.2009 pmid:19657038
    OpenUrlAbstract/FREE Full Text
  16. ↵
    Ito M, Doya K (2015a) Distinct neural representation in the dorsolateral, dorsomedial, and ventral parts of the striatum during fixed- and free-choice tasks. J Neurosci 35:3499–3514. https://doi.org/10.1523/JNEUROSCI.1962-14.2015 pmid:25716849
    OpenUrlAbstract/FREE Full Text
  17. ↵
    Ito M, Doya K (2015b) Parallel representation of value-based and finite state-based strategies in the ventral and dorsal striatum. PLoS Comput Biol 11:e1004540. https://doi.org/10.1371/journal.pcbi.1004540 pmid:26529522
    OpenUrlCrossRefPubMed
  18. ↵
    Kesner RP (1989) Retrospective and prospective coding of information: role of the medial prefrontal cortex. Exp Brain Res 74:163–167. https://doi.org/10.1007/BF00248289 pmid:2924832
    OpenUrlPubMed
  19. ↵
    Kesner RP, Churchwell JC (2011) An analysis of rat prefrontal cortex in mediating executive function. Neurobiol Learn Mem 96:417–431. https://doi.org/10.1016/j.nlm.2011.07.002 pmid:21855643
    OpenUrlCrossRefPubMed
  20. ↵
    Kupferschmidt DA, Juczewski K, Cui G, Johnson KA, Lovinger DM (2017) Parallel, but dissociable, processing in discrete corticostriatal inputs encodes skill learning. Neuron 96:476–489.e5. https://doi.org/10.1016/j.neuron.2017.09.040 pmid:29024667
    OpenUrlCrossRefPubMed
  21. ↵
    Nolen-Hoeksema S, Fredrickson BL, Loftus GR, Luts C (2014) Atkinson and Hilgard’s introduction to psychology, Ed 16. Stamford: Cengage Learning.
  22. ↵
    Ohta H, Satori K, Takarada Y, Arake M, Ishizuka T, Morimoto Y, Takahashi T (2021) The asymmetric learning rates of murine exploratory behavior in sparse reward environments. Neural Netw 143:218–229. https://doi.org/10.1016/j.neunet.2021.05.030 pmid:34157646
    OpenUrlPubMed
  23. ↵
    Otto AR, Gershman SJ, Markman AB, Daw ND (2013a) The curse of planning: dissecting multiple reinforcement-learning systems by taxing the central executive. Psychol Sci 24:751–761. https://doi.org/10.1177/0956797612463080 pmid:23558545
    OpenUrlCrossRefPubMed
  24. ↵
    Otto AR, Raio CM, Chiang A, Phelps EA, Daw ND (2013b) Working-memory capacity protects model-based learning from stress. Proc Natl Acad Sci U S A 110:20941–20946. https://doi.org/10.1073/pnas.1312011110 pmid:24324166
    OpenUrlAbstract/FREE Full Text
  25. ↵
    Paxinos G, Watson C (1998) The rat brain stereotaxic coordinates, Ed 4. San Diego: Academic.
  26. ↵
    Piray P, Daw ND (2021) A model for learning based on the joint estimation of stochasticity and volatility. Nat Commun 12:6587. https://doi.org/10.1038/s41467-021-26731-9 pmid:34782597
    OpenUrlPubMed
  27. ↵
    Prentice KJ, Gold JM, Buchanan RW (2008) The Wisconsin Card Sorting impairment in schizophrenia is evident in the first four trials. Schizophr Res 106:81–87. https://doi.org/10.1016/j.schres.2007.07.015 pmid:17933496
    OpenUrlCrossRefPubMed
  28. ↵
    Samejima K, Ueda Y, Doya K, Kimura M (2005) Representation of action-specific reward values in the striatum. Science 310:1337–1340. https://doi.org/10.1126/science.1115270 pmid:16311337
    OpenUrlAbstract/FREE Full Text
  29. ↵
    Schlagenhauf F, Huys QJM, Deserno L, Rapp MA, Beck A, Heinze HJ, Dolan R, Heinz A (2014) Striatal dysfunction during reversal learning in unmedicated schizophrenia patients. Neuroimage 89:171–180. https://doi.org/10.1016/j.neuroimage.2013.11.034 pmid:24291614
    OpenUrlCrossRefPubMed
  30. ↵
    Shurman B, Horan WP, Nuechterlein KH (2005) Schizophrenia patients demonstrate a distinctive pattern of decision-making impairment on the Iowa Gambling Task. Schizophr Res 72:215–224. https://doi.org/10.1016/j.schres.2004.03.020 pmid:15560966
    OpenUrlCrossRefPubMed
  31. ↵
    Skelin I, Hakstol R, VanOyen J, Mudiayi D, Molina LA, Holec V, Hong NS, Euston DR, McDonald RJ, Gruber AJ (2014) Lesions of dorsal striatum eliminate lose-switch responding but not mixed-response strategies in rats. Eur J Neurosci 39:1655–1663. https://doi.org/10.1111/ejn.12518 pmid:24602013
    OpenUrlCrossRefPubMed
  32. ↵
    Smith KS, Graybiel AM (2013) A dual operator view of habitual behavior reflecting cortical and striatal dynamics. Neuron 79:361–374. https://doi.org/10.1016/j.neuron.2013.05.038 pmid:23810540
    OpenUrlCrossRefPubMed
  33. ↵
    Soltani A, Izquierdo A (2019) Adaptive learning under expected and unexpected uncertainty. Nat Rev Neurosci 20:635–644. https://doi.org/10.1038/s41583-019-0180-y
    OpenUrl
  34. ↵
    Sutton RS, Barto AG (1998) Reinforcement learning. Cambridge: MIT Press.
  35. ↵
    Thorn CA, Atallah H, Howe M, Graybiel AM (2010) Differential dynamics of activity changes in dorsolateral and dorsomedial striatal loops during learning. Neuron 66:781–795. https://doi.org/10.1016/j.neuron.2010.04.036 pmid:20547134
    OpenUrlCrossRefPubMed
  36. ↵
    Trepka E, Spitmaan M, Bari BA, Costa VD, Cohen JY, Soltani A (2021) Entropy-based metrics for predicting choice behavior based on local response to reward. Nat Commun 12:6567. https://doi.org/10.1038/s41467-021-26784-w pmid:34772943
    OpenUrlPubMed
  37. ↵
    Voorn P, Vanderschuren LJ, Groenewegen HJ, Robbins TW, Pennartz CM (2004) Putting a spin on the dorsal-ventral divide of the striatum. Trends Neurosci 27:468–474. https://doi.org/10.1016/j.tins.2004.06.006 pmid:15271494
    OpenUrlCrossRefPubMed
  38. ↵
    Waltz JA, Gold JM (2007) Probabilistic reversal learning impairments in schizophrenia: further evidence of orbitofrontal dysfunction. Schizophr Res 93:296–303. https://doi.org/10.1016/j.schres.2007.03.010 pmid:17482797
    OpenUrlCrossRefPubMed
  39. ↵
    Waltz JA, Frank MJ, Robinson BM, Gold JM (2007) Selective reinforcement learning deficits in schizophrenia support predictions from computational models of striatal-cortical dysfunction. Biol Psychiatry 62:756–764. https://doi.org/10.1016/j.biopsych.2006.09.042 pmid:17300757
    OpenUrlCrossRefPubMed
  40. ↵
    Waltz JA, Frank MJ, Wiecki TV, Gold JM (2011) Altered probabilistic learning and response biases in schizophrenia: behavioral evidence and neurocomputational modeling. Neuropsychology 25:86–97. https://doi.org/10.1037/a0020882 pmid:21090899
    OpenUrlPubMed
  41. ↵
    Woo JH, Aguirre CG, Bari BA, Tsutsui KI, Grabenhorst F, Cohen JY, Schultz W, Izquierdo A, Soltani A (2023) Mechanisms of adjustments to different types of uncertainty in the reward environment across mice and monkeys. Cogn Affect Behav Neurosci. 2023 Feb 23. https://doi.org/10.3758/s13415-022-01059-z. Erratum in: Cogn Affect Behav Neurosci. 2023 Mar 29.
  42. ↵
    Worthy DA, Otto AR, Maddox WT (2012) Working-memory load and temporal myopia in dynamic decision making. J Exp Psychol Learn Mem Cogn 38:1640–1658. https://doi.org/10.1037/a0028146 pmid:22545616
    OpenUrlCrossRefPubMed
  43. ↵
    Yin HH, Knowlton BJ, Balleine BW (2004) Lesions of dorsolateral striatum preserve outcome expectancy but disrupt habit formation in instrumental learning. Eur J Neurosci 19:181–189. https://doi.org/10.1111/j.1460-9568.2004.03095.x pmid:14750976
    OpenUrlCrossRefPubMed
  44. ↵
    Yin HH, Knowlton BJ, Balleine BW (2005a) Blockade of NMDA receptors in the dorsomedial striatum prevents action-outcome learning in instrumental conditioning. Eur J Neurosci 22:505–512. https://doi.org/10.1111/j.1460-9568.2005.04219.x pmid:16045503
    OpenUrlCrossRefPubMed
  45. ↵
    Yin HH, Ostlund SB, Knowlton BJ, Balleine BW (2005b) The role of the dorsomedial striatum in instrumental conditioning. Eur J Neurosci 22:513–523. https://doi.org/10.1111/j.1460-9568.2005.04218.x pmid:16045504
    OpenUrlCrossRefPubMed
  46. ↵
    Yin HH, Knowlton BJ, Balleine BW (2006) Inactivation of dorsolateral striatum enhances sensitivity to changes in the action-outcome contingency in instrumental conditioning. Behav Brain Res 166:189–196. https://doi.org/10.1016/j.bbr.2005.07.012 pmid:16153716
    OpenUrlCrossRefPubMed
  47. ↵
    Yoshizawa T, Funahashi M (2020) Effects of methyl methacrylate on the excitability of the area postrema neurons in rats. J Oral Biosci 62:306–309. https://doi.org/10.1016/j.job.2020.09.003 pmid:32931900
    OpenUrlPubMed
  48. ↵
    Yoshizawa T, Ito M, Doya K (2018) Reward-predictive neural activities in striatal striosome compartments. eNeuro 5:ENEURO.0367-17.2018. https://doi.org/10.1523/ENEURO.0367-17.2018
    OpenUrl

Synthesis

Reviewing Editor: Arvind Kumar, KTH Royal Institute of Technology

Decisions are customarily a result of the Reviewing Editor and the peer reviewers coming together and discussing their recommendations until a consensus is reached. When revisions are invited, a fact-based synthesis statement explaining their decision and outlining what is needed to prepare a revision will be listed below. The following reviewer(s) agreed to reveal their identity: Bernard Balleine, Alireza Soltani.

Here authors have sought to understand the role of working memory in a reinforcement learning based decision-making paradigm. To this end they have recorded neural activity from DMS, DLS, mPFC, and motor cortex. They show that working memory related activity is observed in the motor cortex and DLS (the lateral cortico-basal ganglia loop). The two reviewers are generally positive about the work but both have also noticed that some important data are not shown and some new analysis may be needed to substantiate the claims. In following I mention some of the key points made by the reviewers:

1. Authors should show the neural activity when the rewarded choice changes because that is where the biggest difference between the two loops may emerge. In particular, authors should show the neural activity in the different loops prior to a change in reward contingency and after that change.

2. The medial and lateral loops are recorded from different hemispheres. These loops may be different in their degree of sensitivity to lateralised motor activity. So authors should separately assess activity in the four structures when the rewarded choice is ipsilateral to the recording electrode and when it is contralateral to the recording electrode.

3. The text suggests that the 5 different groups (previous action, previous reward, previous interaction, WSLS, and non-coding) have been classified in a different category in each phase. If it is true, it is important to show consistently that neurons in each region encode each of these categories (i.e., the proportion of neurons in each region that are being classified in the same category across phases).

Besides these points, the reviewers have also identified several places where more discussion or clarification is needed. Detailed comments from the reviewers are appended below this message. Please revise the manuscript accordingly and also provide a point-by-point reply to the comments.

We look forward to the revised version of the manuscript.

---

Reviewer #1

The authors attempt to contrast working memory and reinforcement learning processes in choice performance using a signalled two-choice nose-poke task with changing reward probabilities between the two choices. To test role of working memory the authors interpose a signalled non-choice trial between choice trials in an intermittent condition (IC) vs. a continuous condition (CC) which contained only choice trials. During performance on the task the authors record from the medial prefrontal cortex and dorsomedial striatum (reflecting activity in a medial cortical basal ganglia loop) and also primary motor cortex and dorsolateral striatum (reflecting activity in a lateral cortical basal ganglia loop).

They find, behaviorally, that performance shifts more rapidly with shifts in reward in the CC than IC, that the CC performance is more strongly influenced by the previous trial while IC performance incorporates past trials. Neurally, they find that action command activity (i.e., activity prior to actual performance) is greatest in CC than IC and in lateral loop than medial loop structures. They conclude that working memory activity on this task is to be found in M1 and DLS rather than mPFC and DMS.

[1] This is quite clever and interesting. It should be noted, however, that the numbers of sessions and trials per session are very large (48,554 behavioral and neural trials) which means that the rats in this situation were extensively trained on the task. Where the biggest difference in the neural activity across the two loops should emerge, however, is at the point of transition when the rewarded choice changes. The difference between CC and IC conditions is clearest at that point (Figure 2B) and so it is unfortunate that the authors don’t present comparable data from that point in the neural recordings. I would like to see changes in neural activity in the different loops prior to a change in reward contingency and after that change. I think it may be that activity induced during the many many trials that precede the change might well induce activity in the M1 and DMS, whereas more activity in the medial loop structures might emerge immediately after the shift, perhaps reflecting the need to consider more carefully what prior choices have been made.

[2] The other important difference between the medial and lateral loop structures is that the former are recorded in the left hemisphere and the latter in the right hemisphere. There is clear evidence of lateralisation of signals in the right hemisphere but less in the left hemisphere which suggests the loops may be different in their degree of sensitivity to lateralised motor activity. This in itself seems to imply an explanation for the differences in activity in the different loops. It is important, therefore, that the influence of laterality is made clear by separately assessing activity in these four structures when the rewarded choice is ipsilateral to the recording electrode and when it is contralateral to the recording electrode.

[3] The lack of these two types of data - from choice-reward transitions and from the lateralised signals - makes it difficult to decide whether performance differences between CC and IC are due to the different strains on working memory or the difference in the complexity of the stimulus-response sequences the animals learn as the choice-reward relationships stabilize after a shift. This makes it very difficult to differentiate the influence of direct experience from the use of some kind of rule. I think that these issues need to be more fully address in the discussion. The finding that lateral loop structures are more prevalent in their activity in the CC condition may suggest that they are involved in working memory but they may alternatively be involved in the initiation of a relatively simple response sequence compared to a more complex sequence acquired in the IC condition.

[4] The authors state that the IC condition “did not only prolong inter-choice-trial-intervals but also increase patterns of behaviors, therefore WM was expected to be strongly disturbed”. By how much did it increase inter-choice interval and what other behaviors were introduced? How often did the rats choose congruently and incongruently on the no-choice trials to the subsequently rewarded trial? The addition of no choice trial also increases the inter-choice-interval so there is a confound here and could just reflect increased ITI but of course it also means that the last action prior to reward will be both a no choice trial and then the next choice trial. So, WM will have contents of at least two trials and it is possible that choices on no choice trials sometimes favour and sometimes counter performance on the next trial. Are these averaged out or is there a bias induced by spontaneous alternation (a spontaneous choice opposite in direction that to the previously rewarded choice) on the no choice trial?

[5] The results and each section of the results start by a page of text repeating details from the methods section immediately preceding it. Is this repetition necessary?

Reviewr #2

The study aims to investigate the role of working memory (WM) in reward-based decision making and learning. Past research has shown that a set of strategies known as win-stay/lose-switch (WSLS) require intact WM of immediate choices and rewards, such that choices carried out with disrupted WM are more similar to standard RL with smaller learning rates. The authors have developed a task that can impair WM retention through increasing the duration of the maintenance period. Specifically, in one condition of the task, the authors interleave choice trials with no-choice trials (intermitted choice (IC) task), which allows them to analyze separately blocks where the animal has immediate access to previous choice (continuous condition, CC) versus when it does not (IC condition). They main choice task and is a form of probabilistic reversal learning task.

Behavioral results show that rats’ choices in CC blocks’ follow WSLS, whereas choices in the IC blocks do not. Their regression analysis also confirms a stronger effect of recent action and reward in CC condition. In addition, they also record from goal-directed (in DMS and mPFC) and motor (in DLS and M1) action components of the corticostriatal loop. The role of the DMS-mPFC loop has been well-studied, but the results here present a somewhat unexpected role of the motor loop in WM. Specifically, the authors analyze different phases of the trials (initiation, cue, making choice, and getting reward) in a separate model for each of CC and IC conditions, and find that before choosing an action, mPFC neurons code for previous reward in CC more than in IC condition. However, during choice and action execution, all areas except for mPFC have higher proportions of neurons that are active in CC condition. Overall, they also found that action, reward, and their interactions for the previous trial were represented in higher proportion of neurons in the motor loop than in the prefrontal loop.

Overall, the manuscript is well written, and the results are interesting. The main conclusion of the study is supported by the results. My comments/suggestions below are mostly clarifying.

Major comments:

(1) With regard to the surprising finding that WM is represented more in M1 and DLS compared to mPFC and DMS, the discussion mentions two possibilities of implementing WSLS with either: 1) computing the next trial’s choice soon after the end of each trial and retain that choice information between the two trials, or 2) retain both choice and reward information of the previous trial and using them at the time of making a choice during the subsequent trial. The data supports the latter, and this retention of past choice and outcome is more pronounced in the motor loop (although mPFC does retain information about previous reward in phase 1 of CC condition; Figure 5). However, possible reasons for this higher involvement of the motor loop are not clearly discussed.

[As a side-note, the former possibility is mentioned only afterwards as a prospective action encoding and that “prospective WSLS coding appeared during action execution in all areas excluding mPFC”. If this is referring to the green arrow in Figure 5c, does that WSLS belong to the subsequent trial? Because if it is for the current trial, it is happening in phase 3, which is not really a prospective action. If it is WSLS for the next trial, mentioning it in the figure would be helpful.]

In any case, the difference between the two possible ways of employing WSLS still does not clearly address the interesting finding that WM is more pronounced in the motor loop, which warrants some more elaboration and/or discussion.

(2) The task is a type of probabilistic learning task that involve both expected and unexpected uncertainty (Soltani and Izquierdo, 2019). In addition to no-choice trials, the IC condition involves reversal every 20 trials (instead of 10 trials in CC condition). This means that IC condition is less volatile than CC condition and this could have significant effect on WSLS because of the perception of uncertainty in the environment (e.g., see Woo, Aquirre, et al, bioRxiv 2022, https://doi.org/10.1101/2022.10.01.510477). It would be useful if authors could interpret their behavioral and neural findings in terms of differences in uncertainty in the two conditions.

(3) It could also be informative (although not required) to compare only short ITI trials in CC (assuming trial initiation was self-paced) and IC conditions (as in Iigaya et al., 2018) to see whether there is still a difference in the involvement of the prefrontal loop.

(4) Task description (p.7 Line.129-136) is a bit confusing. The 1st and 3rd CC blocks are terminated after 80% performance on the high-reward side, and 2nd, 4th, and the 5th blocks are terminated after 10 choice trials, so there is a total of 10 choice trials in the 2nd block (CC) and 10 choice trials in each of the 4th and 5th blocks (IC). It seems that the trials in the 2nd and 3rd blocks are altogether compared to all the trials in 4th and 5th blocks, so the sentence “starting from 80% biased choice and switching reward probabilities after 10 choice trials” would apply to the cumulation of blocks 2-3 versus blocks 4-5. This is somewhat difficult to parse from the text, it might be helpful to add a panel to Figure 1 to explain it visually as well.

(5) Line 431: regarding the classification of neurons into 5 different groups (previous action, previous reward, previous interaction, WSLS, and non-coding) in each phase, the text conveys that each neuron could have been classified in a different category in each phase. If that is true, I am wondering how consistently neurons in each region encode each of these categories (i.e., the proportion of neurons in each region that are being classified in the same category across phases).

Minor comments:

-- Line 298: Reward probabilities were changed “in the beginning” of each block.

-- Figure 2B. Significantly “different”.

Back to top

In this issue

eneuro: 10 (6)
eNeuro
Vol. 10, Issue 6
June 2023
  • Table of Contents
  • Index by author
  • Masthead (PDF)
Email

Thank you for sharing this eNeuro article.

NOTE: We request your email address only to inform the recipient that it was you who recommended this article, and that it is not junk mail. We do not retain these email addresses.

Enter multiple addresses on separate lines or separate them with commas.
Neuronal Representation of a Working Memory-Based Decision Strategy in the Motor and Prefrontal Cortico-Basal Ganglia Loops
(Your Name) has forwarded a page to you from eNeuro
(Your Name) thought you would be interested in this article in eNeuro.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Print
View Full Page PDF
Citation Tools
Neuronal Representation of a Working Memory-Based Decision Strategy in the Motor and Prefrontal Cortico-Basal Ganglia Loops
Tomohiko Yoshizawa, Makoto Ito, Kenji Doya
eNeuro 1 June 2023, 10 (6) ENEURO.0413-22.2023; DOI: 10.1523/ENEURO.0413-22.2023

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Respond to this article
Share
Neuronal Representation of a Working Memory-Based Decision Strategy in the Motor and Prefrontal Cortico-Basal Ganglia Loops
Tomohiko Yoshizawa, Makoto Ito, Kenji Doya
eNeuro 1 June 2023, 10 (6) ENEURO.0413-22.2023; DOI: 10.1523/ENEURO.0413-22.2023
Twitter logo Facebook logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Jump to section

  • Article
    • Abstract
    • Significance Statement
    • Introduction
    • Materials and Methods
    • Results
    • Discussion
    • Acknowledgments
    • Footnotes
    • References
    • Synthesis
  • Figures & Data
  • Info & Metrics
  • eLetters
  • PDF

Keywords

  • basal ganglia
  • motor cortex
  • prefrontal cortex
  • reinforcement learning
  • striatum
  • working memory

Responses to this article

Respond to this article

Jump to comment:

No eLetters have been published for this article.

Related Articles

Cited By...

More in this TOC Section

Research Article: New Research

  • A progressive ratio task with costly resets reveals adaptive effort-delay tradeoffs
  • What is the difference between an impulsive and a timed anticipatory movement ?
  • Psychedelics Reverse the Polarity of Long-Term Synaptic Plasticity in Cortical-Projecting Claustrum Neurons
Show more Research Article: New Research

Cognition and Behavior

  • A progressive ratio task with costly resets reveals adaptive effort-delay tradeoffs
  • What is the difference between an impulsive and a timed anticipatory movement ?
  • Psychedelics Reverse the Polarity of Long-Term Synaptic Plasticity in Cortical-Projecting Claustrum Neurons
Show more Cognition and Behavior

Subjects

  • Cognition and Behavior
  • Home
  • Alerts
  • Follow SFN on BlueSky
  • Visit Society for Neuroscience on Facebook
  • Follow Society for Neuroscience on Twitter
  • Follow Society for Neuroscience on LinkedIn
  • Visit Society for Neuroscience on Youtube
  • Follow our RSS feeds

Content

  • Early Release
  • Current Issue
  • Latest Articles
  • Issue Archive
  • Blog
  • Browse by Topic

Information

  • For Authors
  • For the Media

About

  • About the Journal
  • Editorial Board
  • Privacy Notice
  • Contact
  • Feedback
(eNeuro logo)
(SfN logo)

Copyright © 2025 by the Society for Neuroscience.
eNeuro eISSN: 2373-2822

The ideas and opinions expressed in eNeuro do not necessarily reflect those of SfN or the eNeuro Editorial Board. Publication of an advertisement or other product mention in eNeuro should not be construed as an endorsement of the manufacturer’s claims. SfN does not assume any responsibility for any injury and/or damage to persons or property arising from or related to any use of any material contained in eNeuro.