Abstract
Two theories regarding the role for dopamine neurons in learning include the concepts that their activity serves as a (1) mechanism that confers incentive salience onto rewards and associated cues and/or (2) contingency teaching signal reflecting reward prediction error. While both theories are provocative, the causal role for dopamine cell activity in either mechanism remains controversial. In this study mice that either fully or partially lacked NMDARs in dopamine neurons exclusively, as well as appropriate controls, were evaluated for reward-related learning; this experimental design allowed for a test of the premise that NMDA/glutamate receptor (NMDAR)-mediated mechanisms in dopamine neurons, including NMDA-dependent regulation of phasic discharge activity of these cells, modulate either the instrumental learning processes or the likelihood of pavlovian cues to become highly motivating incentive stimuli that directly attract behavior. Loss of NMDARs in dopamine neurons did not significantly affect baseline dopamine utilization in the striatum, novelty evoked locomotor behavior, or consumption of a freely available, palatable food solution. On the other hand, animals lacking NMDARs in dopamine cells exhibited a selective reduction in reinforced lever responses that emerged over the course of instrumental learning. Loss of receptor expression did not, however, influence the likelihood of an animal acquiring a pavlovian conditional response associated with attribution of incentive salience to reward-paired cues (sign tracking). These data support the view that reductions in NMDAR signaling in dopamine neurons affect instrumental reward-related learning but do not lend support to hypotheses that suggest that the behavioral significance of this signaling includes incentive salience attribution.
Significance Statement
Behavior is shaped to a dramatic degree by the occurrence of rewards, through both pavlovian and instrumental conditioning processes; these mechanisms give rise to both normal and abnormal behavior. It is crucial to understand the neural mechanisms that give rise to normal actions and how they lead to pathological behaviors, such as overeating and drug addictions. Though dopamine neurotransmission has often been implicated in reward-related learning, the specifics of this role remain poorly understood. The set of studies described in this manuscript reveals that NMDA/glutamate-mediated dopamine transmission contributes to the acquisition of instrumental reward-seeking actions, possibly highlighting these mechanisms as targets of interventions designed to alter the occurrence of reward-related actions, like drug seeking and drug taking.
Introduction
The electrical activity of dopamine neurons, and associated activity-dependent synaptic release of dopamine, is thought to be critical to reward-related learning and behavior (Wise and Rompre, 1989; Robbins and Everitt, 1992; Robinson and Berridge, 1993; Salamone, 1994; Schultz et al., 1997; Redgrave et al., 1999; Kelley, 2004). Because alterations in reward-related behaviors are found in a range of psychiatric conditions (Robinson and Berridge, 1993; Taylor and Jentsch, 2001; Neuringer, 2002; Everitt and Robbins, 2005; Martin-Soelch et al., 2007; Flagel et al., 2009; Groman et al., 2009; Shiflett and Balleine, 2011), and because many of these disorders are thought to involve dopaminergic dysfunction (Swerdlow and Koob, 1987; Billstedt, 2000; Robinson and Berridge, 2000; Everitt and Robbins, 2005; Nestler and Carlezon Jr, 2006; Iversen et al., 2008; Groman et al., 2009), understanding the mechanistic role for dopamine release in reward-driven learning remains an important research question.
A considerable body of evidence, derived mostly from electrophysiological recordings of midbrain neurons in nonhuman primates, implicates brief event-related, high-frequency discharge activity of dopaminergic neurons, and the associated phasic, nonlinear increases in the quantity of transmitter released (Grace and Bunney, 1984; Gonon, 1988; Bean and Roth, 1991), as a neural instantiation of the “prediction error” signal that figures in both classical and modern mathematical learning models (Rescorla and Wagner, 1972; Schultz et al., 1993, 1997; Sutton and Barto, 1998; Day et al., 2007). Phasic aspects of dopamine signaling may represent the difference between predicted and actually received rewards (Schultz, 2002), information used in these models to update expectancies of the organism as it learns the contingent relationships between stimuli that predict biologically significant outcomes, and the responses that produce them.
An alternate perspective regards dopaminergic transmission as the mechanism by which rewarding events and reward-predictive stimuli are imbued with incentive motivational properties, transforming them from merely pleasurable, or “liked,” to “wanted” attractors of motivated behavior and attention (Crow, 1976; Robinson and Berridge, 1993; Berridge and Robinson, 1998). A variety of lines of evidence support this conclusion: elevating dopamine release, in multiple contexts, can invigorate motivation to engage in a behavior, without affecting learning of the behavior itself (Robbins, 1978; Wyvell and Berridge, 2000, 2001; Salamone et al., 2001, 2005; Peciña et al., 2003; Cagniard et al., 2006; Yin et al., 2006). Altering dopamine can alter the magnitude of established responding immediately (Berridge, 2007), indicating that dopamine can impact reward-driven behavior without an experience of a prediction error as a precondition; indeed, aspects of reward learning are possible when dopamine is nearly absent altogether (Cannon and Palmiter, 2003; Hnasko et al., 2005), suggesting that dopamine might function to instruct motivational value, rather than associative contingencies.
The prediction error and the incentive salience perspectives are often both supported by the results of experimental manipulations of dopamine transmission. For example, optogenetic simulation of dopamine neuron burst firing acts as an unconditioned stimulus that reinforces instrumental and pavlovian behaviors (Tsai et al., 2009; Witten et al., 2011). While this establishes a causal role for phasic dopaminergic activity in reward-related learning, whether it conveys a prediction error signal that teaches contingencies or whether it instructs the incentive motivation to engage in these behaviors cannot readily be distinguished. However, studies of individual differences in the nature of behaviors expressed during autoshaping may offer a unique paradigm better suited for distinguishing these theories. Specifically, contingency learning via prediction error signals, expressed as a pavlovian approach to a reward-delivery location (goal tracking), can be differentiated from contingency learning that additionally involves incentive salience attribution to reward-predictive cues (sign tracking; Robinson and Flagel, 2009). Recent evidence suggests that the magnitude of cue-evoked, phasic dopamine release positively relates to incentive salience attribution (Flagel et al., 2011): sign-tracking rats exhibited greater conditional stimulus (CS)-elicited dopamine transients than goal trackers.
Because NMDA/glutamate receptors (NMDARs localized within midbrain dopaminergic neurons regulate dopamine transmission, including through influences on the burst firing activity; Suaud-Chagny et al., 1992), phasic dopamine release is attenuated in a mouse model lacking NMDAR in dopamine neurons (Zweifel et al., 2009; Luo et al., 2010). One application of this system, therefore, is to evaluate the effects of quantitiative reductions of afferent input-generated phasic dopamine signaling on behavior. Here, we assessed instrumental learning (which involves both prediction error and incentive salience attribution) in mice lacking NMDAR in dopamine neurons, and then studied sign-tracking/goal-tracking behavior to test the idea that NMDA-dependent aspects of dopamine signaling are causally related to propensity for incentive salience attribution.
Materials and Methods
Mouse lines
B6.SJL-Slc6a3tm1.1(cre)Bkmn /J (stock #006660; http://jaxmice.jax.org/strain/006660.html; referred to here as DATcre+) mice, each heterozygous for a mutated dopamine transporter (DAT) gene expressing Cre recombinase, and B6.129S4-Grin1tm2Stl/J (stock #005246; http://jaxmice.jax.org/strain/005246.html; referred to here as NR1 flox/flox ) mice were purchased from The Jackson Laboratory. In DATcre+ mice, Cre recombinase cDNA was inserted into the 3' untranslated region of the DAT gene for bicistronic mRNA translation; Cre-mediated recombination is detectable in this line as early as E15 and is primarily restricted to the substantia nigra, ventral tegmental area, and retrorubral field within the midbrain (Bäckman et al., 2006). NR1 flox/flox mice have a loxP site between exons 11 and 12 and another loxP site, along with a neomycin resistance gene, at the 3' end of the Grin1 gene (Tonegawa et al., 1996). The NR1 gene is an obligatory component of the functional NMDAR (Forrest et al., 1994), which regulates NMDAR-mediated plasticity and also dopamine cell burst firing, the latter by facilitating temporal summation of excitatory inputs (Suaud-Chagny et al., 1992; Overton and Clark, 1997). Conditional deletion of NR1 expression blocks NMDAR activity (Tsien et al., 1996), reducing the magnitude of phasic dopamine release events to ∼30% of control levels (Zweifel et al., 2009; Parker et al., 2010).
Male DATcre+ mice were bred with female NR1 flox/flox mice; the DATcre+ males in the resulting F1 generation were further bred with a different set of female NR1 flox/flox mice to create DATcre–;NR1 flox/wt , DATcre–;NR1 flox/flox , DATcre+;NR1 flox/wt , and DATcre+;NR1 flox/flox mice (collectively referred to as DATcre;NR1 mice). Male DATcre+ mice were also separately crossed to female B6.129S4-GT(ROSA)26Sortm1Sor /J (stock #003474; http://jaxmice.jax.org/strain/003474.html; referred to as ROSA26-LacZ) reporter mice (Soriano, 1999), obtained from Dr. Alcino Silva’s laboratory at University of California, Los Angeles. DATcre, NR1, and ROSA26-LacZ zygosity was determined using conventional PCR methods.
Mice were between 60 and 120 d old when involved in this study. All subjects were socially housed in cages of two to four individuals with Sani-Chip cage bedding (PJ Murphy Forest Products) in a temperature- and humidity-controlled room on a 14/10 h light/dark cycle. Behavioral testing was conducted during the light cycle. Food was available ad libitum during locomotor behavior and free-reward consumption testing, but was restricted during other experiments, as detailed below. All animal procedures are performed according to the regulations of the university animal care committee for each author.
LacZ X-Gal staining
DATcre+ mice also expressing the ROSA26-LacZ gene were killed by isoflurane overdose, then transcardially perfused with freshly mixed, cold 4% paraformaldehyde. Brains were stored in paraformaldehyde for 1 d before being switched to a 30% sucrose/PBS solution. Slices of 40 µm width were cut on a cryostat and rinsed in PBS. The staining solution contained 85.33 mg potassium ferrocyanide, 64 mg potassium ferricyanide, 4 ml of 20 mm MgCl2, 36 ml PBS, 60 mg X-gal, and 800 µl dimethylformamide. The solution was allowed to react with brain slices at 37°C for 48 h; the slices were then rinsed, counterstained, and mounted on slides.
Quantification of monoamine utilization in the striatum
Thirty-five conscious DATcre;NR1 mice (males and females, DATcre–;NR1 flox/wt , n = 9; DATcre–;NR1 flox/flox , n = 8; DATcre+;NR1 flox/wt , n = 10; DATcre+;NR1 flox/flox , n = 8) were killed by rapid decapitation and tissue samples were collected from the ventral striatum. Samples were frozen for subsequent analyses of monoamines and their metabolites using HPLC. Tissue was homogenized in 0.1 m perchloric acid, centrifuged for 25 min, and the content of 200 μl of supernatant was quantified by reverse-phase column HPLC (BAS) at 0.7 V applied, using a 7% acetonitrile-based mobile phase. Protein content was quantified using the Lowry method (Lowry et al., 1951).
Locomotor activity in a novel context
The locomotor behavior of 165 DATcre;NR1 mice (males and females, DATcre–;NR1 flox/wt , n = 42; DATcre–;NR1 flox/flox , n = 42; DATcre+;NR1 flox/wt , n = 40; DATcre+;NR1 flox/flox , n = 41) was characterized by placing subjects in clean, standard acrylic animal cages that were novel to the mouse (24 × 40 cm), with a thin layer of bedding. Each cage was equipped with Opto M3 locomotor activity monitors (Columbus Instruments) fitted with 1” spaced x-axis infrared beam emitters. Locomotor behavior was monitored for 30 min (data collected in 5 min time bins). Locomotor data for 36 mice was lost because of equipment failure, leaving n = 36, n = 32, n = 31, and n = 31 for the four genotype groups, respectively.
Free consumption of a palatable food
Subsequently, the same sample of 165 mice used in the locomotor experiment underwent habituation to a two bottle, free-choice palatable food consumption procedure over the course of 2 d. In 2 h sessions of individual housing, mice had access to 2 Lixit tube-equipped water bottles, one filled with water and the other filled with a 10% v/v sweetened condensed milk solution (Kroger). Bottle positions (i.e., left side of the cage vs right side, order counterbalanced across genotypes) were switched on the second day of habituation. Testing began the following day, bottle positions were again switched, and data were collected for 2 d; a final switch, followed by 2 d of data collection, concluded the procedure. Data presented are averages of consumption levels on the second day of placement on each side.
Instrumental conditioning
An experimentally naive set of 112 mice (males only, DATcre–;NR1 flox/wt , n = 26; DATcre–;NR1 flox/flox , n = 27; DATcre+;NR1 flox/wt , n = 22; DATcre+;NR1 flox/flox , n = 26; reflects data exclusion from 11 mice due to technical failures with the operant chambers, e.g., pellet dispenser or lever failures) were introduced to limited access to chow in their home cages in order to achieve body weights ∼85% of free-feeding levels. Mice were exposed to 0.5 g of the reinforcer pellets (14 mg Dustless Precision Pellets, used in subsequent behavioral experiments; BioServ) in their home cages during the first day of food restriction. Body weight was maintained at this level throughout the experiment, and standard chow was provided in the home cage at least 1 h after daily testing. Mice were trained on sequential days in extra wide aluminum and polycarbonate Med Associates modular mouse-testing chambers, each stationed inside a sound-attenuating chamber and equipped with a white noise generator, house light (both always on during all experiments), and a tone generator. A horizontal array of five illuminable nose-poke apertures formed one side of the box, and on the other resided an illuminable pellet-delivery magazine with an entry-detection photocell. Chambers also contained two retractable ultrasensitive mouse levers (2 g force requirement for actuation; Med Associates); these were positioned one each on both sides of the food magazine.
Training began with 2 d of familiarization to delivery of food pellets to the magazine. Fifty pellets were delivered to the magazine on a fixed-time 30 s schedule, each followed by a 2 s illumination of the magazine. Ten daily 30 min sessions of instrumental training followed. Sessions began with the extension of both levers, and responses on the active lever (designated left vs right in a counterbalanced fashion across genotypes) resulted in a 50 ms tone pulse, which was accompanied by pellet delivery and a 2 s illumination of the magazine light upon completion of the ratio schedule. The first 10 pellets per session were delivered on a fixed-ratio 1 schedule; subsequently, pellets were delivered on a variable-ratio 2 schedule. Responses to the inactive lever were recorded but had no programmed consequence. A 0.5 s timeout followed each pellet delivery, during which responses could not elicit delivery of another reward, but did count toward completion of the next reinforcement schedule.
Sign tracking/goal tracking
Methods for sign-tracking/goal-tracking pavlovian learning were modeled after Flagel et al. (2011). In the instrumental conditioning studies (above), DATcre+;NR1 flox/wt mice were phenotypically similar to DATcre−;NR1 flox/wt and DATcre—;NR1 flox/flox control groups (Figures 1, 2), indicating that they could act as adequate controls; here, we treated them as such and compared their behavior with DATcre+;NR1 flox/flox animals. A set of 63 experimentally naive animals was used (males only, DATcre+;NR1 flox/wt , n = 32; DATcre+;NR1 flox/flox , n = 31). We also tested DATcre—;NR1 flox/wt animals (males, n = 31) to provide further empirical support for the validity of comparisons between DATcre+;NR1 flox/wt and DATcre+;NR1 flox/flox animals. The same schedule of caloric restriction described above was initiated prior to behavioral training. Animals first underwent 2 d of magazine training in which 30 food pellets were delivered to the magazine on a variable-time 60 s schedule. Fifteen daily sessions of sign-tracking/goal-tracking conditioning began the next day. These sessions consisted of 15 presentations on a variable-time 180 s schedule of a CS (“lever-CS”). Each lever-CS involved a 20 s extension of the lever to the right of the food magazine; two food pellets were delivered to the magazine coincident with lever-CS termination. Actuations of the lever-CS were recorded but had no programmed consequences.
On the day following the last conditioning session, all mice underwent a single test of conditioned reinforcement, wherein the two most lateral nose-poke apertures were illuminated. Responses to the active aperture (designated left vs right in a counterbalanced fashion across genotypes) resulted in a 5 s extension of the lever-CS, while responses to the inactive aperture were recorded but were without programmed effect. No food was delivered during this session. The session ended 60 min after the first active aperture response or after 90 min had elapsed, whichever occurred first.
Data analysis
Statistical tests, outlined in Table 1, were conducted using Stata 13 (StataCorp LP). In all omnibus tests, DATcre (+ vs −) and NR1 (flox/wt vs flox/flox) zygosity were entered as between-subjects factors. In sign-tracking/goal-tracking experiments, for comparisons between DATcre+;NR1flox/wt and DATcre+;NR1flox/flox mice, NR1 genotype was the singular between-subjects factor; for comparisons between DATcre—;NR1flox/wt and DATcre+;NR1flox/wt mice, DATcre genotype was the singular between-subjects factor.
Statistical tests used to analyze data
All datasets were inspected for conformity to assumptions of the general linear model. Where assumptions were met, data were analyzed by univariate or repeated-measures ANOVA, with t tests where appropriate. For locomotor and learning experiments, we found significant departures from assumptions of traditional repeated-measures ANOVA, including violations of sphericity and/or heterogeneous, correlated residuals. These were not entirely unexpected, especially in our learning experiments, because correlations between testing days change as behavior progressively changes. Because population-level analysis often does not accurately characterize individual learning curves (Lashley, 1942; Estes, 1956; Gallistel et al., 2004; Verbeke and Molenberghs, 2009), generalized linear mixed models were used as a means to address these assumption violations, leading to better fits of the data by allowing subjects to vary with respect to intercepts and slopes and accommodating non-normal data distributions and nonconstant error variances/covariances. Models were fitted via maximum likelihood with cluster robust SEs using mean-variance adaptive Gauss–Hermite quadrature. Random subject-specific intercepts and/or linear slopes across days and their covariance were included on the basis of significantly improved model fit (tested via likelihood ratio testing of nested models). Distribution and link functions were chosen on the basis of properties of the variable studied and normality of the model residuals. Continuous data were analyzed using Gaussian identity-link models (i.e., linear mixed models); heavily skewed continuous data were modeled as log-normal. Log-link negative binomial models were applied to overdispersed count data and binomial logit models were applied to probability data. Statistics presented are tests of fixed effects. Wald
tests of main effects and interactions were followed by contrasts of simple effects and, where appropriate, Bonferroni-adjusted tests of means.
Locomotor behavior measures (number of x-axis beam breaks) were analyzed across 5 min time bins; the bin was treated as a linear covariate. Free-food consumption (ml/kg consumed) was analyzed with day of measurement as a repeated measure. Because water consumption levels were negligible, these data were not analyzed. Dopamine utilization was analyzed as the ratio of metabolite DOPAC content to dopamine content.
In all learning experiments, training day was treated as continuous covariate, initially as a quadratic effect (i.e., curvilinear regression); if no quadratic effect of day was detected, it was removed, leaving the linear effect. For instrumental learning, reinforcers earned across days were analyzed, as were active and inactive lever presses. For sign-tracking/goal-tracking data, we analyzed genotype effects on behavioral data acquired across successive sessions, mirroring the analysis in Flagel et al. (2011). Sign tracking was quantified by analyzing (1) the probability of lever contact (contacts were defined as full actuations of the lever-CS) during lever-CS presentation, (2) total number of lever contact responses, and (3) latency to contact the lever. Goal tracking was similarly measured as the (1) probability of making a head entry into the magazine during a lever-CS presentation, (2) total number of head entries during the lever-CS presentations, and (3) latency to enter the magazine upon lever-CS presentation. A “conditioning ratio” measure of discriminative responding was also formed from goal-tracking data, calculated by comparing magazine head entries during the CS to those made during a time period of equivalent duration immediately preceding the CS (the latter termed the pre-CS period):
.
We also calculated proposed conditional response “bias” measures described by Meyer et al. (2012), wherein phenotypic tendency toward sign tracking versus goal tracking is quantified by the following: (1) differences in response probabilities, Pr(lever contact) − Pr(magazine entry), (2) a discrimination index of responses,
, and (3) relative response latencies,
. These three indices ranged from +1 to −1, representative of bias toward sign tracking versus goal tracking, respectively. Their correlational structure was explored, and they were then averaged to form a conditional approach “summary bias score.” Summary bias scores were further averaged over three session blocks. Distributions of summary scores at the start and end of training were analyzed using nonparametric tests. To investigate whether any genotype effects on sign tracking were obscured by analysis of all subjects’ behavior simultaneously, we used the final summary bias score (from the last three sessions) to designate mice as either a sign tracker or goal tracker on the basis of whether their score was positive or negative, respectively. Genotype effects on designation distribution were analyzed with Fisher’s exact test. We then plotted sign-trackers’ behavior and goal-trackers’ behavior separately, visualizing learning rates within each genotype/conditional response type combination.
Data from six subjects on day 8 were lost due to technical failure. These data points were treated as missing at random in mixed model analysis.
Measures of responding for conditioned reinforcement included number of lever-CSs earned and number of active and inactive aperture nose pokes. Because sign tracking has been associated with greater conditioned reinforcement in rats (Robinson and Flagel, 2009; Flagel et al., 2011; Lomanowska et al., 2011), we also compared number of lever-CSs earned by animals designated sign trackers with numbers earned by animals designated goal trackers to establish whether the same relationship exists in mice.
Figures are presented as mean ± SE line plots or as Tukey box-plots, the latter demonstrating spread about a group median with plus symbols (+) demarcating group means.
Results
Baseline characterization
In Figure 1A, Cre-mediated gene recombination can be seen prominently in the substantia nigra pars compacta and the ventral tegmental area of the DATcre+ mouse, consistent with its initial characterization (Bäckman et al., 2006). Quantification of monoamine utilization by HPLC indicated that neither the DATcre construct, nor the floxed NR1 gene or its excision in DATcre+ subjects, affected basal dopamine utilization within the ventral striatum (Figure 1B; DATcre: F(1,31) = 0.01, p = 0.932; NR1: F(1,31) = 1.51, p = 0.229; DATcre x NR1: F(1,31) = 0.14, p = 0.713, a in Table 1).
Initial characterization of the DATcre;NR1 mouse. A, Prominent Cre-mediated recombination is seen in the midbrain of DATcre+ mice crossed with ROSA26-LacZ mice; arrows indicate ventral tegmental and substantia nigra pars compacta nuclei. B, Ventral striatum dopamine turnover is indistinguishable among the four combinations of DATcre and NR1 genotypes. C, No genotype effects were found over successive 5 min bins of locomotor behavior, and D, levels of consumption of a 10% sweetened condensed milk solution were similar across all genotypes.
DATcre–;NR1
flox/wt
, DATcre–;NR1
flox/flox
, DATcre+;NR1
flox/wt
, and DATcre+;NR1
flox/flox
mice were initially characterized for total ambulatory activity in a novel environment. As depicted in Figure 1C, mice exhibited reduced locomotor behavior over time as they habituated to their surroundings; however, no main effects or interactions involving DATcre genotype or NR1 genotype were detected (DATcre: Wald
= 0.12, p = 0.731; NR1: Wald
= 1.88, p = 0.171; DATcre × NR1: Wald
= 0.51, p = 0.474; DATcre × NR1 × time bin: Wald
= 0.03, p = 0.863, b), indicating that that locomotor behavior was unaffected by genetic manipulation of NMDAR in dopamine cells. All mice increased consumption of the sweetened condensed milk solution across successive days of access (Day: F(1,320) = 12.38, p = 0.0005), but as depicted in Figure 1D, no effects of genotype were detected (DATcre: F(1,320) = 0.06, p = 0.804; NR1: F(1,320)= 0.02, p = 0.902; DATcre × NR1: F(1,320)= 0.64, p = 0.423; DATcre × NR1 × day: F(1,320) = 0.001, p = 0.962, c).
Instrumental learning
Reinforcers earned during the instrumental conditioning sessions are depicted in Figure 2A. Here, mixed model revealed significant DATcre × day (Wald
= 4.24, p = 0.039), NR1 × day (Wald
= 4.65, p = 0.031), and DATcre x NR1 × day interactions (Wald
= 7.38, p = 0.007, d). The NR1 × day interaction was significant within DATcre+ animals (within DATcre+, Wald
= 15.55, p = 0.001; within DATcre–, Wald
= 0.18, p = 0.673, e), and successive Bonferroni-corrected contrasts revealed that while behavior during the initial training sessions did not differ, DATcre+;NR1
flox/flox
mice earned fewer reinforcers than DATcre+;NR1
flox/wt
mice on days 3–6 (Day 3: Wald
= 9.01, p = 0.027; Day 4: Wald
= 13.69, p = 0.002; Day 5: Wald
= 16.89, p < 0.001; Day 6: Wald
= 9.49, p = 0.021, f). Similar findings were obtained when the omnibus interaction was explored via simple effects within NR1 genotypes (within NR1
flox/flox
, DATcre × day: Wald
= 11.93, p = 0.0006; within NR1
flox/wt
, DATcre × day: Wald
= 0.25, p = 0.620, g). DATcre+;NR1
flox/flox
earned fewer reinforcers than DATcre—;NR1
flox/flox
mice on day 5 (Wald
= 10.14, p = 0.014), and a similar trend was found on day 6 (Wald
= 7.41, p = 0.065, h). Importantly, no differences in instrumental behavior between DATcre–;NR1
flox/flox
, DATcre–;NR1
flox/wt
, and DATcre+;NR1
flox/wt
mice were detected (genotype: Wald
= 4.07, p = 0.133; genotype × day: Wald
= 0.76, p = 0.683, i).
Loss of NMDA receptors in dopamine neurons impairs instrumental learning. A, DATcre+;NR1 flox/flox mice earn less reinforcers over 10 d of instrumental learning and make less active lever presses (B), but press the inactive lever at levels similar to the three other genotypes (C). *p < 0.05, **p < 0.01, ***p < 0.001 DATcre+;NR1 flox/flox versus DATcre+;NR1 flox/wt ; #p < 0.05 DATcre+;NR1 flox/flox versus DATcre—;NR1 flox/flox mice.
To provide evidence that this difference in instrumental responding reflected differences in associative behavior, similar analyses were performed on number of active (reinforced) lever (Figure 2B) and inactive (nonreinforced) lever (Figure 2C) presses. A DATcre × NR1 × day interaction for active lever presses (Wald
= 8.27, p = 0.004, j) was decomposed (within DATcre+, Wald
= 18.21, p = 0.00001; within DATcre–, Wald
= 0.19, p = 0.664, k) to reveal that fewer active lever presses were made by DATcre+;NR1
flox/flox
mice relative to DATcre+;NR1
flox/wt
mice, again on days 3–6 (Day 3: Wald
= 9.36, p = 0.022; Day 4:
= 14.88, p = 0.001; Day 5:
= 18.45, p = 0.0002; Day 6:
= 9.72, p = 0.018, l). Fewer active lever presses were also made by DATcre+;NR1
flox/flox
mice relative to DATcre+;NR1
flox/wt
on day 5 (Wald
= 11.55, p = 0.007), with near-significant differences on day 6 (Wald
= 7.72, p = 0.054, m). On the other hand, no interactions with genotypes were found for inactive lever pressing (DATcre × NR1: Wald
= 0.35, p = 0.065; DATcre x NR1 × day: Wald
= 0.60, p = 0.439, n), indicating that the impairment in instrumental behavior observed in animals lacking NMDA receptors in dopamine neurons was selective to the active lever.
Sign tracking/goal tracking
The acquisition of both sign-tracking and goal-tracking conditional responses is depicted in Figure 3, using the dependent measures described in Flagel et al. (2011). Because we present quantitative measures of both goal tracking and sign tracking from the same subjects (rather than segregating subjects as expressing one response or the other; see Figure 5), the slope of goal-tracking learning curves appears modest; discrimination ratios, however, indicate clear evidence of learning. Goal tracking tended to be expressed first (likely due to the fact that we conducted magazine training prior to pavlovian conditioning), as can occur in rats (Meyer et al., 2012). In a subset of animals, goal-tracking is then diminished as it undergoes response competition during the emergence of sign-tracking behaviors. Importantly, in this subpopulation, we detected both reliable and vigorous sign-tracking behavior.
Genetic deletion of NMDAR in dopamine neurons is without effect on sign-tracking or goal-tracking responses during pavlovian approach learning. Mice with two floxed NR1 alleles (knock-outs) engage in goal-tracking and sign-tracking behaviors at levels similar to heterozygote controls, as measured by probability of a single magazine entry (left) or lever contact (right) during lever-CS presentation (A) and number of magazine head entries (left) and lever contacts (right) during lever-CS presentation (B). For head entries (left), the ratio between responding during the CS and pre-CS (the latter an equivalent duration preceding period; see Materials and Methods), a measure of discriminative approach behavior, is plotted on the y-axis (right). Genotype also did not affect latency to enter the magazine (left) or contact the lever-CS (right) upon its extension (C).
Analyses comparing goal-tracking behavior (Figure 3A–C, left) of DATcre+;NR1
flox/flox
mice and DATcre+;NR1
flox/wt
mice revealed no effects of genotype on the probability of making a magazine head entry during the lever-CS (genotype: Wald
= 0.29, p = 0.592; genotype × day: Wald
= 1.22, p = 0.268, o); moreover, there were no effects of genotype on number of magazine entries during the lever-CS (genotype: Wald
= 0.74, p = 0.389; genotype × day: Wald
= 0.66, p = 0.418, p). We also analyzed the discrimination ratio between CS and pre-CS period responding, again finding no genotype effects (plotted on right-hand y-axis of Figure 3B; genotype: Wald
= 0.01, p = 0.913; genotype × day: Wald
= 0.76, p = 0.384, q). Though a significant day × genotype effect on latency to enter the magazine upon lever-CS onset was found (Wald
= 3.90, p = 0.048, r), post hoc tests did not reveal any significant differences between groups on any of the 15 d of training (all ps > 0.076 uncorrected for multiple comparisons; all ps = 1.000 Bonferroni corrected, s). Thus, with the exception of a marginal omnibus test suggesting deviations in response latency, these analyses do not support altered discriminated goal approach after loss of NMDA receptors in dopamine neurons.
Surprisingly, analyses of the development of corresponding lever-CS approach/sign-tracking behaviors (Figure 3A–C, right) also did not reveal evidence of genotype effects: DATcre+;NR1
flox/flox
mice and DATcre+;NR1
flox/wt
mice approached and actuated with the lever-CS with similar probabilities as conditioning sessions progressed (genotype: Wald
= 1.48, p = 0.223; genotype × day: Wald
= 0.001, p = 0.969, t), making similar numbers of lever contacts (genotype: Wald
= 1.49, p = 0.222; genotype × day: Wald
= 1.01, p = 0.314, u) and doing so with latencies that did not differ (genotype: Wald
= 0.35, p = 0.553; genotype × day: Wald
= 0.03, p = 0.874, v).
We also compared NR1
flox/wt
animals that were either DATcre+ or DATcre— to establish whether Cre-mediated deletion of a single NR1 allele was sufficient to alter goal-tracking or sign-tracking responses. Analyses of these two groups indicated that goal-tracking behaviors were not significantly different (probability of head entry during lever-CS, genotype: Wald
= 0.04, p = 0.839; genotype × day: Wald
= 0.49 p = 0.484, w; number of head entries, genotype: Wald
= 0.04, p = 0.850; genotype × day: Wald
= 0.25 p = 0.616, x; discrimination ratio, genotype: Wald
= 0.73, p = 0.393, genotype × day: Wald
= 0.04 p = 0.842, y; magazine entry latency, genotype: Wald
= 0.07, p = 0.784; genotype × day: Wald
= 0.25 p = 0.833, z). DATcre—;NR1
flox/wt
and DATcre+;NR1
flox/wt
differed in their probability of actuating the lever during a lever-CS (genotype: Wald
= 5.91, p = 0.015; genotype × day: Wald
= 3.77, p = 0.052, aa). Nevertheless, like DATcre+;NR1
flox/wt
mice, when DATcre—;NR1
flox/wt
animals were compared with DATcre+;NR1
flox/flox
knock-out mice lacking both NR1 alleles in dopamine neurons, no differences were found (genotype: Wald
= 0.10, p = 0.755; genotype × day: Wald
= 1.83, p = 0.176, bb). DATcre—;NR1
flox/wt
and DATcre+;NR1
flox/wt
expressed other sign-tracking measures at similar rates (number of lever contacts, genotype: Wald
= 1.67, p = 0.196; genotype × day: Wald
= 0.26, p = 0.610, cc; lever contact latency, genotype: Wald
= 0.23, p = 0.629; genotype × day: Wald
= 0.08 p = 0.778, dd). Thus, the behavior DATcre—;NR1
flox/wt
and DATcre+;NR1
flox/wt
was generally equivalent, supporting the use of DATcre+;NR1
flox/wt
as controls with DATcre+;NR1
flox/flox
knock-outs for the main comparisons described above.
Because sign-tracking responses tend to come at the expense of goal-tracking responses, and vice versa, we calculated relative response bias scores on the basis of probabilities, responses, and latencies to respond to the lever-CS versus the food magazine (see Materials and Methods). Individual scores, averaged across 3 d blocks, demonstrated significant pairwise correlations that increased in magnitude across training (Days 1–3: probability vs response, Spearman’s ρ = 0.367, p = 0.003; probability vs latency, Spearman’s ρ = 0.934, p < 0.001; latency vs response, Spearman’s ρ = 0.328, p = 0.009; Days 13–15: probability vs response, Spearman’s ρ = 0.720, p < 0.001; probability vs latency, Spearman’s ρ = 0.969, p < 0.001; latency vs response, Spearman’s ρ = 0.698, p < 0.001, ee). The three scores were then averaged to form a summary bias score, as described previously (Meyer et al., 2012). Plotted in Figure 4, summary bias scores above zero indicate a tendency to sign track rather than goal track, and negative values correspond to a bias toward goal tracking. At the start of training, summary bias scores were similar in both genotypes (Wilcoxon rank sum, Days 1–3: z = 0.007, p = 0.994), and no genotype differences were found by the conclusion of testing (Wilcoxon rank sum, Days 13–15, z = 1.650, p = 0.099, ff).
Distributions of conditional approach summary bias. Summary bias scores, formed from relative probability, response, and latency data for sign-tracking and goal-tracking responses for individual mice (see Materials and Methods), are plotted in 3 training day bins. Positive values indicate a tendency to sign track and negative values indicate a tendency to goal track. Goal tracking is dominant early in training, but sign tracking emerges progressively across successive days; however, no significant differences in score distributions were found between genotypes. Closed box-plots, DATcre+;NR1 flox/wt (partial loss control, “flox/wt”); open box-plots, DATcre+;NR1 flox/flox (knock-out, “flox/flox”).
We designated mice as sign trackers or goal trackers according to whether their final 3 d average summary bias scores were positive or negative. A bias toward the sign-tracking conditional response, under this scheme of categorization, occurred in fewer mice than did goal tracking (n = 11 vs n = 52; n = 5 additional mice were found to make sign-tracking responses, but in magnitudes that did not exceed their goal-tracking behaviors). Relative rates of phenotype designations did not differ between DATcre+;NR1 flox/flox mice and DATcre+;NR1 flox/wt mice (7 of 31 and 4 of 32, respectively, designated sign trackers; Fisher’s exact test, p = 0.337, gg). Reasoning we might be able to more sensitively observe differences in the rate of learning by examining only their respective conditional response, we assessed the acquisition of goal-tracking behaviors in goal trackers exclusively and the acquisition of sign-tracking behaviors in sign trackers exclusively, as shown in Figure 5A–C (left and right, respectively). Though goal-tracking behavior appears similar to that expressed by the sample as a whole, visual inspection of Figure 5 suggests that among sign trackers, DATcre+;NR1 flox/flox mice express a greater degree of sign-tracking behavior–opposite of the hypothesized effect. However, because comparisons of summary bias score distributions did not reach traditional levels of statistical significance, no additional exploratory statistical evaluations of these data were conducted.
Behavior plotted according to conditional response designation. Animals with a positive summary bias score for days 13–15 were designated sign trackers; those with negative scores were designated sign trackers. As in Figure 3, probability of a single magazine entry (left) or lever contact (right) during lever-CS presentation (A); number of magazine head entries (left) and lever contacts (right) during lever-CS presentation, with the ratio between CS and pre-CS responding plotted for head entries on the right-hand y-axis (B); and latency to enter the magazine (left) or contact the lever-CS (right) upon its extension (C) are measured. Sign-tracking DATcre+;NR1 flox/flox mice appear to display a greater degree of sign-tracking behaviors than controls.
Finally, the ability of the lever-CS to support new learning via conditioned reinforcement, a phenomenon elevated in sign-tracking animals (Robinson and Flagel, 2009; Flagel et al., 2011; Lomanowska et al., 2011) and considered reflective of incentive motivational properties acquired by cues (Berridge, 2000; Flagel et al., 2009), was evaluated. Mice were allowed to make nose-poke responses to elicit brief presentations of the lever-CS during a single session that followed the last day of conditioning. As shown in Figure 6, the number of lever-CS presentations earned by DATcre+;NR1 flox/wt mice and DATcre+;NR1 flox/flox mice did not significantly differ (t(61) = 0.526, p = 0.601), nor did the number of active aperture (t(61) = 0.559, p = 0.551) or inactive aperture (t(61) = 0.553, p = 0.581, hh) responses. However, as occurs in rats, mice designated sign trackers exhibited higher levels of conditioned reinforcement than mice designated goal trackers, earning more lever-CS presentations (Wilcoxon rank sum, z = −2.608, p = 0.009, ii).
Test of conditioned reinforcement. Mice were allowed to earn brief presentations of the lever-CS by performing a novel instrumental response. No differences between genotypes in the number of lever-CSs earned, and nor responses to the active or inactive nose-poke apertures were found.
Discussion
Here, the effects of genetic excision of the NMDAR from dopamine neurons on associative reward-related learning were evaluated. Mice lacking NMDAR in dopamine neurons exhibited impaired instrumental learning but normal sign-tracking and goal-tracking responses in a pavlovian conditioning procedure. These results are presented against the backdrop of normal exploratory locomotion and palatable food consumption, eliminating these ancillary phenotypes as likely explanations for the observed learning effects.
NMDAR activity in dopamine neurons contributes to acquisition of an appetitive instrumental response
Loss of NMDAR in dopamine neurons resulted in slower acquisition of instrumental responding; this finding is in general agreement with results gathered earlier using a similar mouse model (Zweifel et al., 2009). Qualitative aspects of the particular pattern of results offer indications of the nature of the behavioral deficit observed: the absence of group differences during the first and final days of training suggests that genotype did not affect baseline lever-pressing rates per se, and the similar asymptotic rates of pressing at the end of training indicates that motivation to obtain the food reward may not be sensitive to genotype. The impairments in a spontaneously acquired instrumental response were observed only during intermediate stages of the learning process, suggesting that phenotypic differences in mice lacking NMDAR in dopamine neurons relate to altered learning capabilities.
This result suggests a causal role for NMDA-mediated neurotransmission in dopamine neurons in instrumental learning. One possibility is that the loss of NMDA receptors disables one mechanism that contributes to phasic, stimulus-related dopamine neuron firing (Suaud-Chagny et al., 1992; Zweifel et al., 2009; Parker et al., 2010). Additionally, loss of the NMDAR eliminates NMDAR-mediated synaptic plasticity within dopamine neurons (Engblom et al., 2008; Zweifel et al., 2008; Luo et al., 2010). Thus, while the behavioral effect observed may well relate to altered phasic release, it is possible that other mechanisms are at play, including loss of synaptic plasticity between glutamatergic inputs and dopamine neurons or other downstream molecular changes. Because we did not measure NMDA expression or monitor dopamine activity in the context of behavior, it is difficult to disentangle these different interpretations, and further experiments are needed to parse these possibilities.
Our data are, however, consistent with a number of optogenetic studies wherein response-contingent optical activation of dopamine neurons either facilitated an appetitive instrumental response or was sufficient to support responding alone (Adamantidis et al., 2011; Witten et al., 2011; Kim et al., 2012). Moreover, after asymptotic acquisition of the relationship between a CS that was predictive of periods of response-contingent reward availability (i.e., a discriminative stimulus), transient optical activation of dopamine neurons delivered concurrently with presentation of a compound CS prevented the normally observed blocking effect (Steinberg et al., 2013). Further, the behavioral impact of unexpected negative shifts in outcome value was also diminished by activation of dopamine neurons. These findings are consistent with phasic dopamine acting as a prediction error signal that causally drives learning. The optogenetic studies are convincing, and thus we argue that the current data are parsimonious with a hypothesized role for phasic dopamine activity in reward learning. That said, it remains unclear how light-evoked events interact with ongoing endogenous phasic events and tonic activity states, and whether they reproduce the postsynaptic effects of normal stimulus-elicited phasic events. Moreover, experimenter-prescribed stimulation timing is likely unable to precisely mimic ongoing changes in temporal relationships between the onset of phasic dopamine bursts and environmental events, for example, as a stimulus-outcome relationship is learned and phasic signals shift from the time of reward delivery to cue onset (Ljungberg et al., 1992; Day et al., 2007). Here, we demonstrate that instrumental learning is modulated by NMDA-mediated activity in dopamine neurons and by putative attenuation of phasic dopamine signals that were endogenously generated by afferent inputs to dopamine neurons in response to environmental stimuli.
Loss of NMDARs in dopamine neurons does not impact frequency or acquisition of a sign-tracker conditional response
Sign-tracking rats—those that approach and interact with a predictive cue (e.g., the extension of a lever-CS) during an autoshaping task, often at the expense of approaching the location of reward delivery (Williams and Williams, 1969)—are thought of as exhibiting a form of incentive salience attribution over and above the pavlovian contingency learning exhibited by goal-tracking rats. These differential conditional responses offer an opportunity to distinguish between the prediction error and incentive salience attribution perspectives of dopamine. Flagel et al. (2011) provided causal evidence that dopamine receptor activity is required for sign tracking, observing a deficit in acquisition of sign tracking, but not goal tracking, after treatment with a dopamine receptor antagonist. Importantly, sign-tracking animals also display more prominent CS-evoked dopamine as they learn their conditional response than do goal trackers (Flagel et al., 2011). Because goal trackers and sign trackers must both learn the contingency between the CS and the US to express their responses, it has been suggested that the phasic dopamine release patterns do not simply teach contingency learning. Phasic dopamine release is argued, in this case, instead to be necessary for a cue to acquire incentive properties, progressively increasing its motivational pull on behavior (and, correspondingly, progressively increasing sign-tracking conditional responses) as CS-US pairings continue (Flagel et al., 2011).
Because NMDAR loss in dopamine neurons results in attenuation of the magnitude of phasic dopamine release to ∼30% that of controls (Zweifel et al., 2009; Parker et al., 2010), this genetic model applied to the sign-tracker/goal-tracker paradigm offered an experimental design equipped to distinguish between prediction error and incentive salience perspective. If the relationship between the magnitude of CS-evoked dopamine and sign tracking is causal, we hypothesized that a putative reduction in the amplitude of NMDA-mediated dopamine release should reduce the frequency of sign-tracking behavior or the rate of its acquisition. We found no evidence to support this conclusion: for all dependent measures; no differences in the form of conditional responses expressed by mice lacking NMDAR in dopamine neurons and control mice were detected.
In addition to failing to support the incentive salience perspective of dopamine activity in reward learning, we also did not yield evidence of a contribution of NMDA-mediated dopamine activity and/or phasic release to pavlovian goal approach, nor have several others using a similar mouse genetics approach (Parker et al., 2010, 2011). Given that prediction error signals in the mesencephalon have been observed during pavlovian conditioning, across a wide variety of task conditions and parameters, but most extensively characterized within the context of appetitive pavlovian conditioning (Ljungberg et al., 1992; Schultz et al., 1993, 1997; Waelti et al., 2001; Fiorillo et al., 2003), and because pharmacological strategies have shown that the pavlovian approach has been shown to be dependent upon NMDAR activity in the ventral tegmental area (Stuber et al., 2008; Ranaldi et al., 2011), this is a surprising result.
One possibility is that the reported residual 30% phasic signal in the DATcre;NR1 mouse may be sufficient to support pavlovian approach learning. Given that reward preference and reward learning is possible even after massive dopamine depletions (Cannon and Palmiter, 2003; Robinson et al., 2005), this residual phasic activity may indeed provide more than adequate signal to noise necessary for pavlovian delay conditioning, especially when associative contingencies are binary and deterministic (i.e., P(US|CS) = 1, P(US|∼CS) = 0). The magnitude of midbrain neuron burst responses encodes the relative value of predictive stimuli (Fiorillo et al., 2003); perhaps a behavioral impairment would be revealed in a scenario where 30% of the normal signal-to-noise in dopamine neurons provides insufficient dynamic range (e.g., discriminating between two stimuli with marginal differences in predictive value). Given, however, that sign-tracker rats are distinguished from goal-tracking rats by a quantitative difference in CS-evoked dopamine (Flagel et al., 2011), if this difference causally influenced the form of conditional response expressed, we would still expect a measureable difference in the degree of sign-tracking behavior in DATcre+;NR1 flox/flox mice that have a dramatic, albeit not full, diminution of phasic dopamine release. There was no indication of this in our data. Alternatively, it is possible that a loss of NMDAR-mediated synaptic plasticity or other NMDAR-dependent physiological mechanisms in mice lacking NMDAR in dopamine neurons obscured the observation of a behavioral difference in sign tracking. Additionally, it is possible that these behaviors may be supported by dopaminergic projections to the basolateral amygdala or prefrontal cortex, as these cells express very little DAT (Lammel et al., 2008); therefore, NR1 recombination may have not fully occurred. However, a study using PCR to detect recombination of NR1 in the SN/VTA in the same mouse model found successful recombination of NR1 in 34 of 36 such cells (Luo et al., 2010). Thus, Cre recombinase expression appears to be sufficient to drive excision of NR1 in the majority of dopaminergic neuronal populations, even those expressing very low levels of DAT.
The role of NMDAR in dopamine neurons in reward-related behaviors
What is clear from these experiments is that NMDAR-mediated activity in dopamine neurons is not required to adaptively respond in the pavlovian approach paradigm. Interestingly, in addition to the pavlovian approach, other phenotypes that were historically thought to require NMDAR in dopamine neurons, such as sensitization to psychostimulants (Kalivas and Alesdatter, 1993; Wolf et al., 1994, 1998; Vanderschuren and Kalivas, 2000), have also turned out to be unaffected in their absence (Zweifel et al., 2008; Luo et al., 2010; Beutler et al., 2011). Given that the degree of NMDAR-dependent plasticity in dopamine neurons—expressed as increased AMPA receptor expression or current—induced by drugs of abuse correlates with degree of behavioral sensitization observed (Ungless et al., 2001; Borgland et al., 2004) and that NMDAR-dependent plasticity is observed selectively during periods of active learning of pavlovian conditioning (Stuber et al., 2008), these results are especially unanticipated. However, several studies have implicated NMDAR in non-dopaminergic cell types or brain regions as responsible for these phenomena (Luo et al., 2010; Beutler et al., 2011; Parker et al., 2011).
Because they co-occur and share dopaminergic substrates, locomotor sensitization to psychostimulants has been linked with heightened or sensitized incentive salience attribution (Robinson and Berridge, 1993; Wyvell and Berridge, 2001; Tindell et al., 2005; Olausson et al., 2006; Ostlund et al., 2014), including sign-tracking behavior (Doremus-Fitzwater and Spear, 2011). Supporting this link, we observed that elimination of NMDA receptors on dopamine neurons did not affect pavlovian sign tracking, and previous studies using similar models have also found locomotor sensitization is not dependent upon NMDARs in dopamine neurons (Engblom et al., 2008; Zweifel et al., 2008; Luo et al., 2010; Beutler et al., 2011). Thus, while a host of neural mechanisms likely influence the development of a sign-tracking conditional response (Flagel et al., 2007, 2010; Lomanowska et al., 2011; Fitzpatrick et al., 2013; Perez-Sepulveda et al., 2013; Haight and Flagel, 2014), our data indicate that NMDAR activity in dopamine neurons—along with its contribution to phasic dopamine release—is not among these factors.
Previous work has demonstrated persistent elevations in synaptic AMPA/NMDA ratios in dopamine neurons of animals self-administering cocaine, while AMPA/NMDA ratios were only transiently elevated in animals responding for food (Chen et al., 2008). NMDAR dynamics are therefore susceptible to modulation by rewarding experiences and reward modality. Thus, our observation that the acquisition and performance of pavlovian conditional responses were not different in mice lacking NMDAR in dopamine neurons may depend on the specific experimental conditions used here. We note, however, that enhanced AMPA/NMDA ratios have been observed during pavlovian conditioning for food (Stuber et al., 2008), and in a similar mouse model of loss of NMDARs in dopamine neurons, cue-based learning was impaired (Zweifel et al., 2009). Conversely, the acquisition of a pavlovian conditioned place preference for cocaine is unaffected in knock-out mice (Engblom et al., 2008; Luo et al., 2010; but see Zweifel et al., 2008). These data do not support the simple idea that NMDAR are involved in pavlovian responses to drugs but not food.
Sign tracking in mice
Sign-tracking behavior comparable to that observed toward a lever-CS in rats has been difficult to reproduce in C57BL/6J mice: mice either show no lever-CS-directed behavior (Zweifel et al., 2009; Parker et al., 2011) or only demonstrate conditional locomotion in the vicinity of the lever-CS (Gore and Zweifel, 2013). Sign tracking in the form of full lever actuations, however, has not been reported previously.
Of interest was the considerable individual variation in whether a sign-tracking or goal-tracking conditional response emerged. Mice generally began with goal tracking (presumably because of previous magazine training), but 20–25% then developed overt sign-tracking conditional responses, some without any appreciable accompanying goal tracking (see summary bias score distribution, Figure 4), ultimately pressing the lever several hundred times per session; others continued goal tracking, and others performed both behaviors. Given that the mice studied here are, at least within a genotype group, isogenic, this variation in response type suggests considerable influence of (unmeasured) environmental or other nonheritable genetic factors, as has been observed in rats (Lomanowska et al., 2011).
While the sign-tracking conditional responses measured here appeared to be less common than in published data on rat behavior, and its onset is likely delayed relative to rats as well, video observation of testing chambers indicated that the sign-tracking phenotype is very much present in mice: those that engaged in this behavior did it consistently and vigorously, engaging in the same rapid biting, gnawing, and invigorated approach and contact with the lever-CS reported in rats (Zener, 1937; Jenkins and Moore, 1973; Boakes, 1977; Tomie, 1996; Flagel et al., 2010). Behavior was many times observed to be intensely focused toward the lever-CS, expressed as stereotypic sniffing and various other interactions, which did not necessarily result in a lever actuation; consequently, it is likely that mice sign track more often than we or others have reported. In addition, as has been repeatedly demonstrated in the rat (Robinson and Flagel, 2009; Flagel et al., 2011; Lomanowska et al., 2011), responding for conditioned reinforcement was higher in mice that sign tracked than in mice that goal tracked, indicating that similar phenotypic covariations exist across both species.
Limitations
Although previous studies have, we did not demonstrate recombination of NR1 in dopamine neurons or measure phasic dopamine release here, and, consequently, some caution must be taken in the interpretation of the present findings (Engblom et al., 2008; Zweifel et al., 2008, 2009, 2011; Luo et al., 2010; Parker et al., 2010). Additionally, the finding of a DATcre × NR1 interaction for instrumental learning in the present study (i.e., both Cre recombinase and two floxed NR1 alleles were required to observe impairment) demonstrates that the model system functions as expected, at least in the context of instrumental reward learning.
Critically, as mentioned previously, the conditional inactivation of NR1 blocks NMDAR currents, which reduces phasic firing, but it also eliminates NMDAR-mediated synaptic plasticity (Engblom et al., 2008; Zweifel et al., 2008; Luo et al., 2010). This presents considerable difficulties with respect to interpreting our results strictly from the perspective of phasic dopamine release. Thus, although this study is not equipped to fully rule out a role for phasic dopamine release in conditional responses during pavlovian approach, we can conclude that they do not rely upon NMDAR-related plasticity in dopamine neurons or NMDAR-mediated components phasic activity, irrespective of whether it takes the form of goal tracking or sign tracking. More work is needed to fully ascribe the present results to differences in phasic dopaminergic neuron activity.
Because the transgenic mouse model used here is a constitutive knock-out, developmental alterations may have influenced the observed results. ROSA26-LacZ recombination is observable in the DATcre mouse used from E15 onward, and although no changes in DAT protein or D1 or D2 mRNA levels are observed (Bäckman et al., 2006), AMPA currents appear to be upregulated in a similar mouse line lacking NMDAR in dopamine neurons (Engblom et al., 2008; Zweifel et al., 2008). Little other work has been done regarding compensatory alterations in this mouse model, therefore, this remains an interpretational limitation.
Finally, our study lacked secondary confirmation of results to increase the confidence ascribed to the null results in the sign-tracking/goal-tracking experiment (e.g., a subthreshold dose of an NMDA antagonist to mimic the 30% loss of phasic release); future studies are needed to address this limitation.
Conclusions
Here, we utilized a mouse model of compromised NMDA-dependent dopamine activity to characterize multiple components of reward-driven associative learning. Complementing the temporal precision of the optogenetics approaches, this approach allowed us to study the behavioral impact of putatively dampened endogenously generated phasic dopamine signals and loss of NMDAR-related synaptic plasticity. Our data revealed a clear role of NMDA activity in dopamine neurons in the acquisition of instrumental learning. We then tested causally, for the first time, predictions about the role of phasic dopamine in reward learning made by the incentive salience that contrast with those made by prediction error accounts. Though dopamine voltammetry data indicated a relationship between elevated CS-evoked dopamine activity and sign-tracking behavior (Flagel et al., 2011), our results, though not without notable interpretational limitations, lend no support to the conclusion of a causal relationship: the expression of conditional responses, regardless of whether they took the form of goal tracking or sign tracking, was unaffected in a model of eliminated NMDAR activity and putatively diminished NMDA-dependent phasic dopamine release. Thus, conditional responses associated with incentive salience attribution may not be under direct influence of the magnitude of NMDAR-regulated stimulus-evoked phasic release.
Therefore, our results are not fully consistent with the incentive salience perspective of phasic dopamine. They also are not necessarily uniformly consistent with a prediction error account of dopamine because putatively diminished phasic dopamine release did not affect pavlovian approach learning in any measured outcome. Thus, our results may be more congruent with a multifaceted conceptualization of dopaminergic transmission, wherein the behavioral significance of phasic dopamine could shift adaptively between prediction error, incentive salience attribution, and other forms of behavioral invigoration and flexibility, or combinations thereof, depending on the particular configuration of biological demands, internal goal states, and motivators present in the environment. This view is consistent with traditional views of dopamine as a neuromodulator, interacting with and adjusting ongoing circuit activity in a manner that can give rise to a multiplicity of context-dependent behavioral phenomena.
Footnotes
↵1 The authors declare no competing financial interests.
↵2 A.S.J. and J.D.J. designed research; A.S.J., Z.T.P., and P.T. performed research; A.S.J. analyzed data; A.S.J., Z.T.P., and J.D.J. wrote the manuscript.
↵3 This work was funded by Public Health Service grants F31-DA27309, T32-DA024635,and PL1-NS062410 and National Institutes of Health.
This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International, which permits unrestricted use, distribution and reproduction in any medium provided that the original work is properly attributed.