Compromised NMDA/Glutamate Receptor Expression in Dopaminergic Neurons Impairs Instrumental Learning, But Not Pavlovian Goal Tracking or Sign Tracking1,2,3

Behavior is shaped to a dramatic degree by the occurrence of rewards, through both pavlovian and instrumental conditioning processes; these mechanisms give rise to both normal and abnormal behavior. It is crucial to understand the neural mechanisms that give rise to normal actions and how they lead to pathological behaviors, such as overeating and drug addictions.

A considerable body of evidence, derived mostly from electrophysiological recordings of midbrain neurons in nonhuman primates, implicates brief event-related, highfrequency discharge activity of dopaminergic neurons, and the associated phasic, nonlinear increases in the quantity of transmitter released (Grace and Bunney, 1984;Gonon, 1988;Bean and Roth, 1991), as a neural instantiation of the "prediction error" signal that figures in both classical and modern mathematical learning models (Rescorla and Wagner, 1972;Schultz et al., 1993Schultz et al., , 1997Sutton and Barto, 1998;Day et al., 2007). Phasic aspects of dopamine signaling may represent the difference between predicted and actually received rewards (Schultz, 2002), information used in these models to update expectancies of the organism as it learns the contingent relationships between stimuli that predict biologically significant outcomes, and the responses that produce them.
An alternate perspective regards dopaminergic transmission as the mechanism by which rewarding events and reward-predictive stimuli are imbued with incentive motivational properties, transforming them from merely pleasurable, or "liked," to "wanted" attractors of motivated behavior and attention (Crow, 1976;Robinson and Berridge, 1993;Berridge and Robinson, 1998). A variety of lines of evidence support this conclusion: elevating dopamine release, in multiple contexts, can invigorate motivation to engage in a behavior, without affecting learning of the behavior itself (Robbins, 1978;Wyvell andBerridge, 2000, 2001;Salamone et al., 2001Salamone et al., , 2005Peciña et al., 2003;Cagniard et al., 2006;Yin et al., 2006). Altering dopamine can alter the magnitude of established responding immediately (Berridge, 2007), indicating that dopamine can impact reward-driven behavior without an experience of a prediction error as a precondition; indeed, aspects of reward learning are possible when dopamine is nearly absent altogether (Cannon and Palmiter, 2003;Hnasko et al., 2005), suggesting that dopamine might function to instruct motivational value, rather than associative contingencies.
The prediction error and the incentive salience perspectives are often both supported by the results of experimental manipulations of dopamine transmission. For example, optogenetic simulation of dopamine neuron burst firing acts as an unconditioned stimulus that reinforces instrumental and pavlovian behaviors (Tsai et al., 2009;. While this establishes a causal role for phasic dopaminergic activity in reward-related learning, whether it conveys a prediction error signal that teaches contingencies or whether it instructs the incentive motivation to engage in these behaviors cannot readily be distinguished. However, studies of individual differences in the nature of behaviors expressed during autoshaping may offer a unique paradigm better suited for distinguishing these theories. Specifically, contingency learning via prediction error signals, expressed as a pavlovian approach to a reward-delivery location (goal tracking), can be differentiated from contingency learning that additionally involves incentive salience attribution to rewardpredictive cues (sign tracking; Robinson and Flagel, 2009). Recent evidence suggests that the magnitude of cue-evoked, phasic dopamine release positively relates to incentive salience attribution : signtracking rats exhibited greater conditional stimulus (CS)elicited dopamine transients than goal trackers.
Because NMDA/glutamate receptors (NMDARs localized within midbrain dopaminergic neurons regulate dopamine transmission, including through influences on the burst firing activity; Suaud-Chagny et al., 1992), phasic dopamine release is attenuated in a mouse model lacking NMDAR in dopamine neurons (Zweifel et al., 2009;Luo et al., 2010). One application of this system, therefore, is to evaluate the effects of quantitiative reductions of afferent input-generated phasic dopamine signaling on behavior. Here, we assessed instrumental learning (which involves both prediction error and incentive salience attribution) in mice lacking NMDAR in dopamine neurons, and then studied sign-tracking/goal-tracking behavior to test the idea that NMDA-dependent aspects of dopamine signaling are causally related to propensity for incentive salience attribution.

Mouse lines
B6.SJL-Slc6a3 tm1.1(cre)Bkmn /J (stock #006660; http:// jaxmice.jax.org/strain/006660.html; referred to here as DATcreϩ) mice, each heterozygous for a mutated dopamine transporter (DAT) gene expressing Cre recombinase, and B6.129S4-Grin1tm2Stl/ J (stock #005246; http://jaxmice. jax.org/strain/005246.html; referred to here as NR1 flox/flox ) mice were purchased from The Jackson Laboratory. In DATcreϩ mice, Cre recombinase cDNA was inserted into the 3' untranslated region of the DAT gene for bicistronic mRNA translation; Cre-mediated recombination is detectable in this line as early as E15 and is primarily restricted to the substantia nigra, ventral tegmental area, and retrorubral field within the midbrain (Bäckman et al., 2006). NR1 flox/flox mice have a loxP site between exons 11 and 12 and another loxP site, along with a neomycin resistance gene, at the 3' end of the Grin1 gene . The NR1 gene is an obligatory component of the functional NMDAR (Forrest et al., 1994), which regulates NMDAR-mediated plasticity and also dopamine cell burst firing, the latter by facilitating temporal summation of excitatory inputs (Suaud-Chagny et al., 1992;Overton and Clark, 1997). Conditional deletion of NR1 expression blocks NMDAR activity , reducing the magnitude of phasic dopamine release events to ϳ30% of control levels (Zweifel et al., 2009;Parker et al., 2010).
Mice were between 60 and 120 d old when involved in this study. All subjects were socially housed in cages of two to four individuals with Sani-Chip cage bedding (PJ Murphy Forest Products) in a temperature-and humiditycontrolled room on a 14/10 h light/dark cycle. Behavioral testing was conducted during the light cycle. Food was available ad libitum during locomotor behavior and freereward consumption testing, but was restricted during other experiments, as detailed below. All animal procedures are performed according to the regulations of the university animal care committee for each author.

LacZ X-Gal staining
DATcreϩ mice also expressing the ROSA26-LacZ gene were killed by isoflurane overdose, then transcardially perfused with freshly mixed, cold 4% paraformaldehyde. Brains were stored in paraformaldehyde for 1 d before being switched to a 30% sucrose/PBS solution. Slices of 40 m width were cut on a cryostat and rinsed in PBS. The staining solution contained 85.33 mg potassium ferrocyanide, 64 mg potassium ferricyanide, 4 ml of 20 mM MgCl 2 , 36 ml PBS, 60 mg X-gal, and 800 l dimethylformamide. The solution was allowed to react with brain slices at 37°C for 48 h; the slices were then rinsed, counterstained, and mounted on slides.

Free consumption of a palatable food
Subsequently, the same sample of 165 mice used in the locomotor experiment underwent habituation to a two bottle, free-choice palatable food consumption procedure over the course of 2 d. In 2 h sessions of individual housing, mice had access to 2 Lixit tube-equipped water bottles, one filled with water and the other filled with a 10% v/v sweetened condensed milk solution (Kroger). Bottle positions (i.e., left side of the cage vs right side, order counterbalanced across genotypes) were switched on the second day of habituation. Testing began the following day, bottle positions were again switched, and data were collected for 2 d; a final switch, followed by 2 d of data collection, concluded the procedure. Data presented are averages of consumption levels on the second day of placement on each side.

Instrumental conditioning
An experimentally naive set of 112 mice (males only, DATcre-;NR1 flox/wt , n ϭ 26; DATcre-;NR1 flox/flox , n ϭ 27; DATcreϩ;NR1 flox/wt , n ϭ 22; DATcreϩ;NR1 flox/flox , n ϭ 26; reflects data exclusion from 11 mice due to technical failures with the operant chambers, e.g., pellet dispenser or lever failures) were introduced to limited access to chow in their home cages in order to achieve body weights ϳ85% of free-feeding levels. Mice were exposed to 0.5 g of the reinforcer pellets (14 mg Dustless Precision Pellets, used in subsequent behavioral experiments; Bio-Serv) in their home cages during the first day of food restriction. Body weight was maintained at this level throughout the experiment, and standard chow was pro-vided in the home cage at least 1 h after daily testing. Mice were trained on sequential days in extra wide aluminum and polycarbonate Med Associates modular mouse-testing chambers, each stationed inside a soundattenuating chamber and equipped with a white noise generator, house light (both always on during all experiments), and a tone generator. A horizontal array of five illuminable nose-poke apertures formed one side of the box, and on the other resided an illuminable pelletdelivery magazine with an entry-detection photocell. Chambers also contained two retractable ultrasensitive mouse levers (2 g force requirement for actuation; Med Associates); these were positioned one each on both sides of the food magazine.
Training began with 2 d of familiarization to delivery of food pellets to the magazine. Fifty pellets were delivered to the magazine on a fixed-time 30 s schedule, each followed by a 2 s illumination of the magazine. Ten daily 30 min sessions of instrumental training followed. Sessions began with the extension of both levers, and responses on the active lever (designated left vs right in a counterbalanced fashion across genotypes) resulted in a 50 ms tone pulse, which was accompanied by pellet delivery and a 2 s illumination of the magazine light upon completion of the ratio schedule. The first 10 pellets per session were delivered on a fixed-ratio 1 schedule; subsequently, pellets were delivered on a variable-ratio 2 schedule. Responses to the inactive lever were recorded but had no programmed consequence. A 0.5 s timeout followed each pellet delivery, during which responses could not elicit delivery of another reward, but did count toward completion of the next reinforcement schedule.

Sign tracking/goal tracking
Methods for sign-tracking/goal-tracking pavlovian learning were modeled after Flagel et al. (2011). In the instrumental conditioning studies (above), DATcreϩ; NR1 flox/wt mice were phenotypically similar to DATcreϪ; NR1 flox/wt and DATcre-;NR1 flox/flox control groups (Figures 1, 2), indicating that they could act as adequate controls; here, we treated them as such and compared their behavior with DATcreϩ;NR1 flox/flox animals. A set of 63 experimentally naive animals was used (males only, DATcreϩ;NR1 flox/wt , n ϭ 32; DATcreϩ;NR1 flox/flox , n ϭ 31). We also tested DATcre-;NR1 flox/wt animals (males, n ϭ 31) to provide further empirical support for the validity of comparisons between DATcreϩ;NR1 flox/wt and DATcreϩ; NR1 flox/flox animals. The same schedule of caloric restriction described above was initiated prior to behavioral training. Animals first underwent 2 d of magazine training in which 30 food pellets were delivered to the magazine on a variable-time 60 s schedule. Fifteen daily sessions of sign-tracking/goal-tracking conditioning began the next day. These sessions consisted of 15 presentations on a variable-time 180 s schedule of a CS ("lever-CS"). Each lever-CS involved a 20 s extension of the lever to the right of the food magazine; two food pellets were delivered to the magazine coincident with lever-CS termination. Actuations of the lever-CS were recorded but had no programmed consequences.
On the day following the last conditioning session, all mice underwent a single test of conditioned reinforcement, wherein the two most lateral nose-poke apertures were illuminated. Responses to the active aperture (designated left vs right in a counterbalanced fashion across genotypes) resulted in a 5 s extension of the lever-CS, while responses to the inactive aperture were recorded but were without programmed effect. No food was delivered during this session. The session ended 60 min after the first active aperture response or after 90 min had elapsed, whichever occurred first.
All datasets were inspected for conformity to assumptions of the general linear model. Where assumptions were met, data were analyzed by univariate or repeatedmeasures ANOVA, with t tests where appropriate. For locomotor and learning experiments, we found significant departures from assumptions of traditional repeatedmeasures ANOVA, including violations of sphericity and/or heterogeneous, correlated residuals. These were not entirely unexpected, especially in our learning experiments, because correlations between testing days change as behavior progressively changes. Because population-level analysis often does not accurately characterize individual learning curves (Lashley, 1942;Estes, 1956;Gallistel et al., 2004;Verbeke and Molenberghs, 2009), generalized linear mixed models were used as a means to address these assumption violations, leading to better fits of the data by allowing subjects to vary with respect to intercepts and slopes and accommodating non-normal data distributions and nonconstant error variances/covariances. Models were fitted via maximum likelihood with cluster robust SEs using mean-variance adaptive Gauss-Hermite quadrature. Random subjectspecific intercepts and/or linear slopes across days and their covariance were included on the basis of significantly improved model fit (tested via likelihood ratio testing of nested models). Distribution and link functions were chosen on the basis of properties of the variable studied and normality of the model residuals. Continuous data were analyzed using Gaussian identity-link models (i.e., linear mixed models); heavily skewed continuous data were modeled as log-normal. Log-link negative binomial models were applied to overdispersed count data and binomial logit models were applied to probability data. Statistics presented are tests of fixed effects. Wald 2 tests of main effects and interactions were followed by contrasts of simple effects and, where appropriate, Bonferroni-adjusted tests of means.
Locomotor behavior measures (number of x-axis beam breaks) were analyzed across 5 min time bins; the bin was treated as a linear covariate. Free-food consumption (ml/kg consumed) was analyzed with day of measurement as a repeated measure. Because water consumption levels were negligible, these data were not analyzed. Dopamine utilization was analyzed as the ratio of metabolite DOPAC content to dopamine content.
In all learning experiments, training day was treated as continuous covariate, initially as a quadratic effect (i.e., curvilinear regression); if no quadratic effect of day was detected, it was removed, leaving the linear effect. For instrumental learning, reinforcers earned across days were analyzed, as were active and inactive lever presses. For sign-tracking/goal-tracking data, we analyzed genotype effects on behavioral data acquired across successive sessions, mirroring the analysis in Flagel et al. (2011). Sign tracking was quantified by analyzing (1) the probability of lever contact (contacts were defined as full actuations of the lever-CS) during lever-CS presenta- 1.00 c GLMM, generalized linear mixed model; RI, random intercept; S, random slope (of repeated measure; UCS, unstructured covariance matrix between random effects (UCS matrix; covariance was fixed to zero in other GLMM models). Estimates of observed (post hoc) power are for experimentally relevant interaction effects. a Estimates for main effects and interactions in GLMMS with RI and/or S, and for normally distributed data with RI and S are not readily calculable. This is the result of the complex, nonclosed form nature of optimizations of GLMMs with multiple random effects, which renders estimation of power not directly derivable, nor estimation via brute force, highly repeated simulation readily feasible. b Simulation assumes normal distributions. c Simulations assume (fitted) Weibull distributions.
tion, (2) total number of lever contact responses, and (3) latency to contact the lever. Goal tracking was similarly measured as the (1) probability of making a head entry into the magazine during a lever-CS presentation, (2) total number of head entries during the lever-CS presentations, and (3) latency to enter the magazine upon lever-CS presentation. A "conditioning ratio" measure of discriminative responding was also formed from goal-tracking data, calculated by comparing magazine head entries during the CS to those made during a time period of equivalent duration immediately preceding the CS (the latter termed the pre-CS period): CS magazine entries / CS entries ϩ Pre-CS entries .
We also calculated proposed conditional response "bias" measures described by Meyer et al. (2012), wherein phenotypic tendency toward sign tracking versus goal tracking is quantified by the following: (1) differences in response probabilities, Pr(lever contact) Ϫ Pr(magazine entry), (2) a discrimination index of responses, lever contacts Ϫ magazine entries / lever contacts ϩ magazine entries, and (3) relative response latencies, x magazine entry latency Ϫ x lever contact latency / CS duration ͑20 sec͒. These three indices ranged from ϩ1 to Ϫ1, representative of bias toward sign tracking versus goal tracking, respectively. Their correlational structure was explored, and they were then averaged to form a conditional approach "summary bias score." Summary bias scores were further averaged over three session blocks. Distributions of summary scores at the start and end of training were analyzed using nonparametric tests. To investigate whether any genotype effects on sign tracking were obscured by analysis of all subjects' behavior simultaneously, we used the final summary bias score (from the last three sessions) to designate mice as either a sign tracker or goal tracker on the basis of whether their score was positive or negative, respectively. Genotype effects on designation distribution were analyzed with Fisher's exact test. We then plotted sign-trackers' behavior and goal-trackers' behavior separately, visualizing learning rates within each genotype/conditional response type combination.
Data from six subjects on day 8 were lost due to technical failure. These data points were treated as missing at random in mixed model analysis.
Measures of responding for conditioned reinforcement included number of lever-CSs earned and number of active and inactive aperture nose pokes. Because sign tracking has been associated with greater conditioned reinforcement in rats Flagel et al., 2011;Lomanowska et al., 2011), we also compared number of lever-CSs earned by animals designated sign trackers with numbers earned by animals designated goal trackers to establish whether the same relationship exists in mice.
Figures are presented as mean Ϯ SE line plots or as Tukey box-plots, the latter demonstrating spread about a group median with plus symbols (ϩ) demarcating group means.

Sign tracking/goal tracking
The acquisition of both sign-tracking and goal-tracking conditional responses is depicted in Figure 3, using the dependent measures described in Flagel et al. (2011). Because we present quantitative measures of both goal tracking and sign tracking from the same subjects (rather than segregating subjects as expressing one response or the other; see Figure 5), the slope of goal-tracking learning curves appears modest; discrimination ratios, however, indicate clear evidence of learning. Goal tracking tended to be expressed first (likely due to the fact that we conducted magazine training prior to pavlovian conditioning), as can occur in rats (Meyer et al., 2012). In a subset of animals, goal-tracking is then diminished as it undergoes response competition during the emergence of sign- tracking behaviors. Importantly, in this subpopulation, we detected both reliable and vigorous sign-tracking behavior.
Analyses comparing goal-tracking behavior ( Figure  3A-C, left) of DATcreϩ;NR1 flox/flox mice and DATcreϩ; NR1 flox/wt mice revealed no effects of genotype on the probability of making a magazine head entry during the lever-CS (genotype: Wald ͑1͒ 2 ϭ 0.29, p ϭ 0.592; genotype ϫ day: Wald ͑1͒ 2 ϭ 1.22, p ϭ 0.268, o); moreover, there were no effects of genotype on number of magazine entries during the lever-CS (genotype: Wald ͑1͒ 2 ϭ 0.74, p ϭ 0.389; genotype ϫ day: Wald ͑1͒ 2 ϭ 0.66, p ϭ 0.418, p). We also analyzed the discrimination ratio between CS and pre-CS period responding, again finding no genotype effects (plotted on right-hand y-axis of Figure 3B; genotype: Wald ͑1͒ 2 ϭ 0.01, p ϭ 0.913; genotype ϫ day: Wald ͑1͒ 2 ϭ 0.76, p ϭ 0.384, q). Though a significant day ϫ genotype effect on latency to enter the magazine upon lever-CS onset was found (Wald ͑1͒ 2 ϭ 3.90, p ϭ 0.048, r), post hoc tests did not reveal any significant differences between groups on any of the 15 d of training (all ps Ͼ 0.076 uncorrected for multiple comparisons; all ps ϭ 1.000 Bonferroni corrected, s). Thus, with the exception of a marginal omnibus test suggesting deviations in response latency, these analyses do not support altered discriminated goal approach after loss of NMDA receptors in dopamine neurons.
Because sign-tracking responses tend to come at the expense of goal-tracking responses, and vice versa, we calculated relative response bias scores on the basis of probabilities, responses, and latencies to respond to the lever-CS versus the food magazine (see Materials and Methods). Individual scores, averaged across 3 d blocks, demonstrated significant pairwise correlations that increased in magnitude across training (Days 1-3: probability vs response, Spearman's ϭ 0.367, p ϭ 0.003; probability vs latency, Spearman's ϭ 0.934, p Ͻ 0.001; latency vs response, Spearman's ϭ 0.328, p ϭ 0.009; A B C - Figure 3 Genetic deletion of NMDAR in dopamine neurons is without effect on sign-tracking or goal-tracking responses during pavlovian approach learning. Mice with two floxed NR1 alleles (knock-outs) engage in goal-tracking and sign-tracking behaviors at levels similar to heterozygote controls, as measured by probability of a single magazine entry (left) or lever contact (right) during lever-CS presentation (A) and number of magazine head entries (left) and lever contacts (right) during lever-CS presentation (B). For head entries (left), the ratio between responding during the CS and pre-CS (the latter an equivalent duration preceding period; see Materials and Methods), a measure of discriminative approach behavior, is plotted on the y-axis (right). Genotype also did not affect latency to enter the magazine (left) or contact the lever-CS (right) upon its extension (C).
Days 13-15: probability vs response, Spearman's ϭ 0.720, p Ͻ 0.001; probability vs latency, Spearman's ϭ 0.969, p Ͻ 0.001; latency vs response, Spearman's ϭ 0.698, p Ͻ 0.001, ee). The three scores were then averaged to form a summary bias score, as described previously (Meyer et al., 2012). Plotted in Figure 4, summary bias scores above zero indicate a tendency to sign track rather than goal track, and negative values correspond to a bias toward goal tracking. At the start of training, summary bias scores were similar in both genotypes (Wilcoxon rank sum, Days 1-3: z ϭ 0.007, p ϭ 0.994), and no genotype differences were found by the conclusion of testing (Wilcoxon rank sum, Days 13-15, z ϭ 1.650, p ϭ 0.099, ff).
We designated mice as sign trackers or goal trackers according to whether their final 3 d average summary bias scores were positive or negative. A bias toward the signtracking conditional response, under this scheme of categorization, occurred in fewer mice than did goal tracking (n ϭ 11 vs n ϭ 52; n ϭ 5 additional mice were found to make sign-tracking responses, but in magnitudes that did not exceed their goal-tracking behaviors). Relative rates of phenotype designations did not differ between DATcreϩ;NR1 flox/flox mice and DATcreϩ;NR1 flox/wt mice (7 of 31 and 4 of 32, respectively, designated sign trackers; Fisher's exact test, p ϭ 0.337, gg). Reasoning we might be able to more sensitively observe differences in the rate of learning by examining only their respective conditional response, we assessed the acquisition of goal-tracking behaviors in goal trackers exclusively and the acquisition of sign-tracking behaviors in sign trackers exclusively, as shown in Figure 5A-C (left and right, respectively). Though goal-tracking behavior appears similar to that expressed by the sample as a whole, visual inspection of Figure 5 suggests that among sign trackers, DATcreϩ;NR1 flox/flox mice express a greater degree of sign-tracking behavioropposite of the hypothesized effect. However, because comparisons of summary bias score distributions did not reach traditional levels of statistical significance, no additional exploratory statistical evaluations of these data were conducted. Finally, the ability of the lever-CS to support new learning via conditioned reinforcement, a phenomenon elevated in sign-tracking animals Flagel et al., 2011;Lomanowska et al., 2011) and considered reflective of incentive motivational properties acquired by cues (Berridge, 2000;, was evaluated. Mice were allowed to make nose-poke responses to elicit brief presentations of the lever-CS during a single session that followed the last day of conditioning. As shown in Figure 6, the number of lever-CS presentations earned by DATcreϩ;NR1 flox/wt mice and DATcreϩ; NR1 flox/flox mice did not significantly differ (t (61) ϭ 0.526, p ϭ 0.601), nor did the number of active aperture (t (61) ϭ 0.559, p ϭ 0.551) or inactive aperture (t (61) ϭ 0.553, p ϭ 0.581, hh) responses. However, as occurs in rats, mice A B C - Figure 5 Behavior plotted according to conditional response designation. Animals with a positive summary bias score for days 13-15 were designated sign trackers; those with negative scores were designated sign trackers. As in Figure 3, probability of a single magazine entry (left) or lever contact (right) during lever-CS presentation (A); number of magazine head entries (left) and lever contacts (right) during lever-CS presentation, with the ratio between CS and pre-CS responding plotted for head entries on the right-hand y-axis (B); and latency to enter the magazine (left) or contact the lever-CS (right) upon its extension (C) are measured. Sign-tracking DATcreϩ;NR1 flox/flox mice appear to display a greater degree of sign-tracking behaviors than controls. designated sign trackers exhibited higher levels of conditioned reinforcement than mice designated goal trackers, earning more lever-CS presentations (Wilcoxon rank sum, z ϭ Ϫ2.608, p ϭ 0.009, ii).

Discussion
Here, the effects of genetic excision of the NMDAR from dopamine neurons on associative reward-related learning were evaluated. Mice lacking NMDAR in dopamine neurons exhibited impaired instrumental learning but normal sign-tracking and goal-tracking responses in a pavlovian conditioning procedure. These results are presented against the backdrop of normal exploratory locomotion and palatable food consumption, eliminating these ancillary phenotypes as likely explanations for the observed learning effects.

NMDAR activity in dopamine neurons contributes to acquisition of an appetitive instrumental response
Loss of NMDAR in dopamine neurons resulted in slower acquisition of instrumental responding; this finding is in general agreement with results gathered earlier using a similar mouse model (Zweifel et al., 2009). Qualitative aspects of the particular pattern of results offer indications of the nature of the behavioral deficit observed: the absence of group differences during the first and final days of training suggests that genotype did not affect baseline lever-pressing rates per se, and the similar asymptotic rates of pressing at the end of training indicates that motivation to obtain the food reward may not be sensitive to genotype. The impairments in a spontaneously acquired instrumental response were observed only during intermediate stages of the learning process, suggesting that phenotypic differences in mice lacking NMDAR in dopamine neurons relate to altered learning capabilities.
This result suggests a causal role for NMDA-mediated neurotransmission in dopamine neurons in instrumental learning. One possibility is that the loss of NMDA receptors disables one mechanism that contributes to phasic, stimulus-related dopamine neuron firing (Suaud-Chagny et al., 1992;Zweifel et al., 2009;Parker et al., 2010). Additionally, loss of the NMDAR eliminates NMDARmediated synaptic plasticity within dopamine neurons (Engblom et al., 2008;Zweifel et al., 2008;Luo et al., 2010). Thus, while the behavioral effect observed may well relate to altered phasic release, it is possible that other mechanisms are at play, including loss of synaptic plasticity between glutamatergic inputs and dopamine neurons or other downstream molecular changes. Because we did not measure NMDA expression or monitor dopamine activity in the context of behavior, it is difficult to disentangle these different interpretations, and further experiments are needed to parse these possibilities.
Our data are, however, consistent with a number of optogenetic studies wherein response-contingent optical activation of dopamine neurons either facilitated an appetitive instrumental response or was sufficient to support responding alone (Adamantidis et al., 2011;Kim et al., 2012). Moreover, after asymptotic acquisition of the relationship between a CS that was predictive of periods of response-contingent reward availability (i.e., a discriminative stimulus), transient optical activation of dopamine neurons delivered concurrently with presentation of a compound CS prevented the normally observed blocking effect (Steinberg et al., 2013). Further, the behavioral impact of unexpected negative shifts in outcome value was also diminished by activation of dopamine neurons. These findings are consistent with phasic dopamine acting as a prediction error signal that causally drives learning. The optogenetic studies are convincing, and thus we argue that the current data are parsimonious with a hypothesized role for phasic dopamine activity in reward learning. That said, it remains unclear how lightevoked events interact with ongoing endogenous phasic events and tonic activity states, and whether they reproduce the postsynaptic effects of normal stimulus-elicited phasic events. Moreover, experimenter-prescribed stimulation timing is likely unable to precisely mimic ongoing changes in temporal relationships between the onset of phasic dopamine bursts and environmental events, for example, as a stimulus-outcome relationship is learned and phasic signals shift from the time of reward delivery to cue onset (Ljungberg et al., 1992;Day et al., 2007). Here, we demonstrate that instrumental learning is modulated by NMDA-mediated activity in dopamine neurons and by putative attenuation of phasic dopamine signals that were endogenously generated by afferent inputs to dopamine neurons in response to environmental stimuli.

Loss of NMDARs in dopamine neurons does not impact frequency or acquisition of a sign-tracker conditional response
Sign-tracking rats-those that approach and interact with a predictive cue (e.g., the extension of a lever-CS) during an autoshaping task, often at the expense of approaching - Figure 6 Test of conditioned reinforcement. Mice were allowed to earn brief presentations of the lever-CS by performing a novel instrumental response. No differences between genotypes in the number of lever-CSs earned, and nor responses to the active or inactive nose-poke apertures were found. the location of reward delivery (Williams and Williams, 1969)-are thought of as exhibiting a form of incentive salience attribution over and above the pavlovian contingency learning exhibited by goal-tracking rats. These differential conditional responses offer an opportunity to distinguish between the prediction error and incentive salience attribution perspectives of dopamine. Flagel et al. (2011) provided causal evidence that dopamine receptor activity is required for sign tracking, observing a deficit in acquisition of sign tracking, but not goal tracking, after treatment with a dopamine receptor antagonist. Importantly, sign-tracking animals also display more prominent CS-evoked dopamine as they learn their conditional response than do goal trackers . Because goal trackers and sign trackers must both learn the contingency between the CS and the US to express their responses, it has been suggested that the phasic dopamine release patterns do not simply teach contingency learning. Phasic dopamine release is argued, in this case, instead to be necessary for a cue to acquire incentive properties, progressively increasing its motivational pull on behavior (and, correspondingly, progressively increasing sign-tracking conditional responses) as CS-US pairings continue .
Because NMDAR loss in dopamine neurons results in attenuation of the magnitude of phasic dopamine release to ϳ30% that of controls (Zweifel et al., 2009;Parker et al., 2010), this genetic model applied to the signtracker/goal-tracker paradigm offered an experimental design equipped to distinguish between prediction error and incentive salience perspective. If the relationship between the magnitude of CS-evoked dopamine and sign tracking is causal, we hypothesized that a putative reduction in the amplitude of NMDA-mediated dopamine release should reduce the frequency of sign-tracking behavior or the rate of its acquisition. We found no evidence to support this conclusion: for all dependent measures; no differences in the form of conditional responses expressed by mice lacking NMDAR in dopamine neurons and control mice were detected.
In addition to failing to support the incentive salience perspective of dopamine activity in reward learning, we also did not yield evidence of a contribution of NMDAmediated dopamine activity and/or phasic release to pavlovian goal approach, nor have several others using a similar mouse genetics approach (Parker et al., 2010. Given that prediction error signals in the mesencephalon have been observed during pavlovian conditioning, across a wide variety of task conditions and parameters, but most extensively characterized within the context of appetitive pavlovian conditioning (Ljungberg et al., 1992;Schultz et al., 1993Schultz et al., , 1997Waelti et al., 2001;Fiorillo et al., 2003), and because pharmacological strategies have shown that the pavlovian approach has been shown to be dependent upon NMDAR activity in the ventral tegmental area (Stuber et al., 2008;Ranaldi et al., 2011), this is a surprising result.
One possibility is that the reported residual 30% phasic signal in the DATcre;NR1 mouse may be sufficient to support pavlovian approach learning. Given that reward preference and reward learning is possible even after massive dopamine depletions (Cannon and Palmiter, 2003;, this residual phasic activity may indeed provide more than adequate signal to noise necessary for pavlovian delay conditioning, especially when associative contingencies are binary and deterministic (i.e., P(US|CS) ϭ 1, P(US|ϳCS) ϭ 0). The magnitude of midbrain neuron burst responses encodes the relative value of predictive stimuli (Fiorillo et al., 2003); perhaps a behavioral impairment would be revealed in a scenario where 30% of the normal signal-to-noise in dopamine neurons provides insufficient dynamic range (e.g., discriminating between two stimuli with marginal differences in predictive value). Given, however, that sign-tracker rats are distinguished from goal-tracking rats by a quantitative difference in CS-evoked dopamine , if this difference causally influenced the form of conditional response expressed, we would still expect a measureable difference in the degree of sign-tracking behavior in DATcreϩ;NR1 flox/flox mice that have a dramatic, albeit not full, diminution of phasic dopamine release. There was no indication of this in our data. Alternatively, it is possible that a loss of NMDAR-mediated synaptic plasticity or other NMDAR-dependent physiological mechanisms in mice lacking NMDAR in dopamine neurons obscured the observation of a behavioral difference in sign tracking. Additionally, it is possible that these behaviors may be supported by dopaminergic projections to the basolateral amygdala or prefrontal cortex, as these cells express very little DAT (Lammel et al., 2008); therefore, NR1 recombination may have not fully occurred. However, a study using PCR to detect recombination of NR1 in the SN/VTA in the same mouse model found successful recombination of NR1 in 34 of 36 such cells (Luo et al., 2010). Thus, Cre recombinase expression appears to be sufficient to drive excision of NR1 in the majority of dopaminergic neuronal populations, even those expressing very low levels of DAT.

The role of NMDAR in dopamine neurons in rewardrelated behaviors
What is clear from these experiments is that NMDARmediated activity in dopamine neurons is not required to adaptively respond in the pavlovian approach paradigm. Interestingly, in addition to the pavlovian approach, other phenotypes that were historically thought to require NMDAR in dopamine neurons, such as sensitization to psychostimulants (Kalivas and Alesdatter, 1993;Wolf et al., 1994Wolf et al., , 1998Vanderschuren and Kalivas, 2000), have also turned out to be unaffected in their absence (Zweifel et al., 2008;Luo et al., 2010;Beutler et al., 2011). Given that the degree of NMDAR-dependent plasticity in dopamine neurons-expressed as increased AMPA receptor expression or current-induced by drugs of abuse correlates with degree of behavioral sensitization observed (Ungless et al., 2001;Borgland et al., 2004) and that NMDAR-dependent plasticity is observed selectively during periods of active learning of pavlovian conditioning (Stuber et al., 2008), these results are especially unanticipated. However, several studies have implicated NMDAR in non-dopaminergic cell types or brain regions as responsible for these phenomena (Luo et al., 2010;Beutler et al., 2011;Parker et al., 2011).
Because they co-occur and share dopaminergic substrates, locomotor sensitization to psychostimulants has been linked with heightened or sensitized incentive salience attribution (Robinson and Berridge, 1993;Wyvell and Berridge, 2001;Tindell et al., 2005;Olausson et al., 2006;Ostlund et al., 2014), including sign-tracking behavior (Doremus-Fitzwater and Spear, 2011). Supporting this link, we observed that elimination of NMDA receptors on dopamine neurons did not affect pavlovian sign tracking, and previous studies using similar models have also found locomotor sensitization is not dependent upon NMDARs in dopamine neurons (Engblom et al., 2008;Zweifel et al., 2008;Luo et al., 2010;Beutler et al., 2011). Thus, while a host of neural mechanisms likely influence the development of a sign-tracking conditional response Lomanowska et al., 2011;Fitzpatrick et al., 2013;Perez-Sepulveda et al., 2013;Haight and Flagel, 2014), our data indicate that NMDAR activity in dopamine neurons-along with its contribution to phasic dopamine release-is not among these factors.
Previous work has demonstrated persistent elevations in synaptic AMPA/NMDA ratios in dopamine neurons of animals self-administering cocaine, while AMPA/NMDA ratios were only transiently elevated in animals responding for food (Chen et al., 2008). NMDAR dynamics are therefore susceptible to modulation by rewarding experiences and reward modality. Thus, our observation that the acquisition and performance of pavlovian conditional responses were not different in mice lacking NMDAR in dopamine neurons may depend on the specific experimental conditions used here. We note, however, that enhanced AMPA/NMDA ratios have been observed during pavlovian conditioning for food (Stuber et al., 2008), and in a similar mouse model of loss of NMDARs in dopamine neurons, cue-based learning was impaired (Zweifel et al., 2009). Conversely, the acquisition of a pavlovian conditioned place preference for cocaine is unaffected in knock-out mice (Engblom et al., 2008;Luo et al., 2010;but see Zweifel et al., 2008). These data do not support the simple idea that NMDAR are involved in pavlovian responses to drugs but not food.

Sign tracking in mice
Sign-tracking behavior comparable to that observed toward a lever-CS in rats has been difficult to reproduce in C57BL/6J mice: mice either show no lever-CS-directed behavior (Zweifel et al., 2009;Parker et al., 2011) or only demonstrate conditional locomotion in the vicinity of the lever-CS (Gore and Zweifel, 2013). Sign tracking in the form of full lever actuations, however, has not been reported previously.
Of interest was the considerable individual variation in whether a sign-tracking or goal-tracking conditional response emerged. Mice generally began with goal tracking (presumably because of previous magazine training), but 20 -25% then developed overt sign-tracking conditional responses, some without any appreciable accompanying goal tracking (see summary bias score distribution, Figure  4), ultimately pressing the lever several hundred times per session; others continued goal tracking, and others performed both behaviors. Given that the mice studied here are, at least within a genotype group, isogenic, this variation in response type suggests considerable influence of (unmeasured) environmental or other nonheritable genetic factors, as has been observed in rats (Lomanowska et al., 2011).
While the sign-tracking conditional responses measured here appeared to be less common than in published data on rat behavior, and its onset is likely delayed relative to rats as well, video observation of testing chambers indicated that the sign-tracking phenotype is very much present in mice: those that engaged in this behavior did it consistently and vigorously, engaging in the same rapid biting, gnawing, and invigorated approach and contact with the lever-CS reported in rats (Zener, 1937;Jenkins and Moore, 1973;Boakes, 1977;Tomie, 1996;Flagel et al., 2010). Behavior was many times observed to be intensely focused toward the lever-CS, expressed as stereotypic sniffing and various other interactions, which did not necessarily result in a lever actuation; consequently, it is likely that mice sign track more often than we or others have reported. In addition, as has been repeatedly demonstrated in the rat Flagel et al., 2011;Lomanowska et al., 2011), responding for conditioned reinforcement was higher in mice that sign tracked than in mice that goal tracked, indicating that similar phenotypic covariations exist across both species.

Limitations
Although previous studies have, we did not demonstrate recombination of NR1 in dopamine neurons or measure phasic dopamine release here, and, consequently, some caution must be taken in the interpretation of the present findings (Engblom et al., 2008;Zweifel et al., 2008Zweifel et al., , 2009Luo et al., 2010;Parker et al., 2010). Additionally, the finding of a DATcre ϫ NR1 interaction for instrumental learning in the present study (i.e., both Cre recombinase and two floxed NR1 alleles were required to observe impairment) demonstrates that the model system functions as expected, at least in the context of instrumental reward learning.
Critically, as mentioned previously, the conditional inactivation of NR1 blocks NMDAR currents, which reduces phasic firing, but it also eliminates NMDAR-mediated synaptic plasticity (Engblom et al., 2008;Zweifel et al., 2008;Luo et al., 2010). This presents considerable difficulties with respect to interpreting our results strictly from the perspective of phasic dopamine release. Thus, although this study is not equipped to fully rule out a role for phasic dopamine release in conditional responses during pavlovian approach, we can conclude that they do not rely upon NMDAR-related plasticity in dopamine neurons or NMDAR-mediated components phasic activity, irrespective of whether it takes the form of goal tracking or sign tracking. More work is needed to fully ascribe the present results to differences in phasic dopaminergic neuron activity.
Because the transgenic mouse model used here is a constitutive knock-out, developmental alterations may have influenced the observed results. ROSA26-LacZ recombination is observable in the DATcre mouse used from E15 onward, and although no changes in DAT protein or D1 or D2 mRNA levels are observed (Bäckman et al., 2006), AMPA currents appear to be upregulated in a similar mouse line lacking NMDAR in dopamine neurons (Engblom et al., 2008;Zweifel et al., 2008). Little other work has been done regarding compensatory alterations in this mouse model, therefore, this remains an interpretational limitation.
Finally, our study lacked secondary confirmation of results to increase the confidence ascribed to the null results in the sign-tracking/goal-tracking experiment (e.g., a subthreshold dose of an NMDA antagonist to mimic the 30% loss of phasic release); future studies are needed to address this limitation.

Conclusions
Here, we utilized a mouse model of compromised NMDAdependent dopamine activity to characterize multiple components of reward-driven associative learning. Complementing the temporal precision of the optogenetics approaches, this approach allowed us to study the behavioral impact of putatively dampened endogenously generated phasic dopamine signals and loss of NMDARrelated synaptic plasticity. Our data revealed a clear role of NMDA activity in dopamine neurons in the acquisition of instrumental learning. We then tested causally, for the first time, predictions about the role of phasic dopamine in reward learning made by the incentive salience that contrast with those made by prediction error accounts. Though dopamine voltammetry data indicated a relationship between elevated CS-evoked dopamine activity and sign-tracking behavior , our results, though not without notable interpretational limitations, lend no support to the conclusion of a causal relationship: the expression of conditional responses, regardless of whether they took the form of goal tracking or sign tracking, was unaffected in a model of eliminated NMDAR activity and putatively diminished NMDA-dependent phasic dopamine release. Thus, conditional responses associated with incentive salience attribution may not be under direct influence of the magnitude of NMDAR-regulated stimulus-evoked phasic release.
Therefore, our results are not fully consistent with the incentive salience perspective of phasic dopamine. They also are not necessarily uniformly consistent with a prediction error account of dopamine because putatively diminished phasic dopamine release did not affect pavlovian approach learning in any measured outcome. Thus, our results may be more congruent with a multifaceted conceptualization of dopaminergic transmission, wherein the behavioral significance of phasic dopamine could shift adaptively between prediction error, incentive salience attribution, and other forms of behavioral invigoration and flexibility, or combinations thereof, depending on the particular configuration of biological demands, internal goal states, and motivators present in the environment. This view is consistent with traditional views of dopamine as a neuromodulator, interacting with and adjusting ongoing circuit activity in a manner that can give rise to a multiplicity of context-dependent behavioral phenomena.