Dopamine D2 Receptors in Dopaminergic Neurons Modulate Performance in a Reversal Learning Task in Mice

Abstract Neuroimaging studies in animal models and human subjects have each revealed that relatively low striatal dopamine D2-like receptor binding potential is associated with poor impulse control and with vulnerability for addiction-related behaviors. These studies cannot, however, disambiguate the roles for various pools of D2 receptors found in the striatum (e.g., those expressed on medium spiny striato-pallidal neurons vs on dopamine-releasing nerve terminals) in these behavioral outcomes. To clarify the role of the latter pool, namely, D2 autoreceptors, we studied mice carrying a conditional DRD2 gene, with or without Cre-recombinase expressed under the transcriptional control of the dopamine transporter gene locus (autoDrd2-KO, n = 19 and controls, n = 21). These mice were tested for locomotor response to cocaine, and spatial reversal learning was assessed in operant conditioning chambers. As predicted, compared to control mice, autoDrd2-KO animals demonstrated heightened sensitivity to the locomotor stimulating effect of cocaine (10 mg/kg, i.p.), confirming previous research using a similar genetic model. In the spatial reversal learning task, autoDrd2-KO mice were slower to reach a learning criterion and had difficulty sustaining a prolonged nose poke response, measurements conceptually related to impaired response inhibition. Rate of learning of the initial discrimination and latencies to collect rewards, to initiate trials and to produce a response were unaffected by genetic deletion of D2 autoreceptors, discarding possible motor and motivational factors. Together, these findings confirm the role of D2 autoreceptors in reversal learning and suggest a broader involvement in behavioral inhibition mechanisms.


Introduction
Substance use disorders are behavioral phenotypes that result from a confluence of both inherited liability factors and environmental influences that include, but are not limited to, the pharmacological effects of the drugs consumed. Indeed, experience with drugs or alcohol are not sufficient to produce a use disorder in humans, an outcome that appears to additionally depend on interindividual variability in vulnerability (Egervari et al., 2018). Identification of the bio-behavioral markers of this susceptibility could enable its measurement and aid in the design and empirical evaluation of targeted prevention strategies (Uhl, 2006).
In the search for molecular or behavioral indicators of risk for substance use disorders, dopamine D2/D3 receptors and impulsivity, as well as their mechanistic relationship, have received considerable attention. In laboratory rodents and nonhuman primates, as well as in human subjects, interindividual variation in impulsivity phenotypes have been positively correlated with alacrity to initiate drug or alcohol use, or with clinically-impairing addictions (Belin et al., 2008;Quinn et al., 2011;. Moreover, dopamine D2/D3 receptors within the striatum are negatively correlated with both outcomes Lee et al., 2009;Groman et al., 2012). In other words, inherited and environmental factors that lead to relatively low forebrain D2/D3 receptors are associated with greater behavioral impulsivity (poorer impulse control) and elevated susceptibility for drug and alcohol use/ use disorder.
No studies associating dopamine D2/D3 receptors to impulsivity and/or addiction liability have been able to dissect the specific subpopulation of receptors implicated. For those studies reporting a relationship between in vivo or ex vivo estimates of D2/D3 receptor binding potential or density in the striatum and impulsivity, it has been impossible to dissect the functional roles of presynaptic receptors expressed on the axon terminals of dopamine neurons from postsynaptic receptors expressed on striato-pallidal medium spiny neurons, the terminals of cortico-striatal glutamatergic axons or other neuronal populations, including striatal interneurons (Le Foll et al., 2009), although these distinct cellular subcompartments of D2/D3 receptors, in addition to having distinct functional effects on cellular physiology in the brain, may well contribute differently to impulse control phenotypes.
In animal research, the population of D2 receptors confined to dopamine neurons have themselves been linked with drug self-administration behaviors. Selective genetic depletion of D2 receptors in dopaminergic neurons augments the acquisition of cocaine self-administration behavior (de Jong et al., 2015). Moreover, higher firing activity of dopaminergic neurons, an effect that could result from D2 autoreceptor subsensitivity, is associated with heightened self-administration behavior (Marinelli et al., 2003). Based on this evidence, and the aforementioned association between addiction phenotypes and behavioral impulsivity, we hypothesized that selective reductions in D2 autoreceptors would mechanistically alter the patterns of behavioral responding in tests thought to measure aspects of the inhibitory control over impulsive and/or compulsive behaviors. Specifically, we hypothesize that deletion of the Drd2 gene will cause mice to require a greater number of trials before reaching criterion in the reversal condition. We tested this hypothesis using mice carrying Cre-recombinase-dependent null alleles of the Drd2 gene (which encodes the dopamine D2 receptor protein), with and without Cre expression driven from the Slc6a3 (dopamine transporter encoding) locus, allowing for dopamine neuron-specific genetic deletion of D2 receptors.

Animals
All experimental procedures were following the National Institutes of Health Guide for Care and Use of Laboratory Animals (National Research Council, 2011). All animal procedures were performed in accordance with the State University of New York at Binghamton and University of California Los Angeles animal care committee's policies and were approved by their respective institutional animal care and use committees.
A total of 40 male mice, aged four to six months at the start of testing, were group-housed in polycarbonate tubs with wood chip bedding; they were maintained in a humidity-and temperature-controlled vivarium (20 -22°C) on a 12/12 h light/dark schedule. Animals had access to ad libitum food and water, except over the duration of operant testing, during when they were food-restricted to maintain them around 85% of their initial (prerestriction) body weight. No statistical methods were used to estimate the ideal sample sizes, but the numbers of animals used in the study were comparable to those reported in previous publications using similar methods. Power calculations were performed retrospectively to insure the sample size was large enough in respect of the effect size found in the statistical analyses.
B6.129S4(FVB)-Drd2 tm1.1Mrub /J mice (https://www.jax. org/strain/020631) homozygous for a Cre-dependent, conditional allele of the dopamine D2 receptor (Drd2) gene in which two LoxP sites flank exon 2 were initially crossed with B6.SJL-Slc6a3 tm1.1(cre)Bkmn /J mice (https:// www.jax.org/strain/006660) that were hemizygous for a variant of the dopamine transporter Slc6a3 gene directing the expression of Cre-recombinase (DATCre). A subset of the offspring from this cross were DATCre ϩ and carried one conditional Drd2 allele; this progeny was mated to a DATCremouse carrying two conditional Drd2 alleles to produce a generation of mice bearing either the one or two conditional Drd2 alleles and/or the DATCre allele. Mice homozygous for the conditional Drd2 gene and hemizygous for the DATCre allele thus presented a conditional deletion of presynaptic D2 autoreceptors (AutoDrd2-KO, n ϭ 19). This breeding scheme has been used in the past to produce animals with a confirmed lack of D2 autoreceptors in midbrain dopamine neurons, in the absence of broadly abnormal neurobehavioral phenotype (Bello et al., 2011). DATCrelittermates, also homozygous for the conditional Drd2 allele, were used as controls (control, n ϭ 21). All animals (N ϭ 40) underwent the operant reversal learning procedure, but only a subset (N ϭ 28) was used in the locomotor assessment. Both experiments were conducted on male animals. Genotypes were confirmed by real-time polymerase chain reaction, using a commercial vendor (Transnetyx).

Psychomotor response to cocaine
Locomotor response to acute cocaine exposure was tested in a subset of autoDrd2-KO (n ϭ 12) and control (n ϭ 16) mice after completion of the operant procedure. Locomotion was assessed in clean rat home cages (20 ϫ 40 cm) placed into a photocell apparatus equipped with 8 infrared detectors (Omnitech Electronics, Inc.). Sessions lasted for 75 min, divided into 15 5-min time bins. Mice were first accustomed to the chamber for 30 min, during which their baseline locomotion was evaluated. At 30 min, mice were injected a single dose of cocaine (10 mg/kg, i.p.; dosages calculated as the free base weight; drug was provided by the National Institute on Drug Abuse). Cocaineinduced hyperlocomotion was then recorded for an additional 45 min.

Reversal learning
Operant conditioning took place in a set of 8 modular mouse operant conditioning chambers (model MED-NP5M-D1; Med Associates), each equipped with an aluminum wall fitted with a food-tray, a pellet dispenser, a house light; they also contained a horizontal array of 5 nose-poke apertures on a wall on the opposite side of the box. All apertures and the food-tray were fitted with infrared beam sensors and internal lights. All chambers were enclosed in a dimly lit sound-attenuating cubicle, with white noise broadcast in the background.

Shaping of the operant response
On the first day of operant conditioning, animals were habituated to the operant chamber for 1 h. On the second day, all mice underwent magazine training. Pellets (14 mg Dustless Precision Pellets; Bio-Serv) were delivered after a 20-s intertrial interval (ITI); retrieval of the pellet was required before the next interval commenced. Magazine training ended after 45 min elapsed or when 50 pellets had been delivered, whichever came first. In the next phase (shaping 1), animals were trained to poke into the central aperture (hole 3 of five), on the chamber wall opposite the food tray. Trials started with illumination of this central aperture. After the expected response, the food-magazine lit up and a pellet was delivered. During the first trial, a nose poke into the central aperture led to immediate pellet delivery, while a requirement for a sustained nose poke of variable duration (0 -400 ms) was imposed on subsequent trials. If an animal withdrew from the aperture before reaching the holding requirement, failure was indicated by 1-s time-out period indicated by offset of the house-light; if the mouse managed to stay in the aperture for the required period, a food pellet was delivered into the illuminated food tray. Each trial was followed by a 20-s ITI. Sessions ended after 1 h or after 100 pellets were earned, whichever came first. When a mouse managed to obtain 40 pellets in a single session, it was moved to a similar procedure that differed only in that the nose poke duration requirements were extended to 600 ms (shaping 2). After reaching the 40 pellets criterion again, the response hold requirements were set to vary between 400 and 600 ms (shaping 3).

Acquisition
After initial training of the variable duration nose poke response to the central aperture, the acquisition of a simple spatial discrimination commenced. In these sessions, each trial was again initiated by a sustained, variable duration nose poke in aperture 3 (200 -600 ms). Once this observing response was completed, the flanking apertures (holes 2 and 4) were illuminated for up to 30 s. Selected in a pseudorandom fashion for each mouse, subsequent responses (this time a simple, not-sustained nose poke) into the right or left hole (counterbalanced across genotypes) were rewarded with a food pellet, while responses in the other aperture led to a 3-s time-out period (signaled with darkness). A failure to respond within the 30-s period of illumination was counted as an omission. Each trial was followed by a 3-s ITI. Daily sessions lasted for 60 min or until the acquisition criterion was met. This acquisition criterion consisted of performing 16 correct responses in a moving window of 20 trials.

Reversal
Reversal learning was similar to the acquisition phase, except mice had to respond in the opposite aperture to the one previously reinforced. Reversal was achieved once an animal performed 16 correct responses out of 20 trials. Sessions lasted 60 min, or until the criterion was met.

Statistics
The numbers of trials to reach learning criterion were compared using a one-tail Student's t test. All other measures were compared using mixed-model ANOVAs (the Greenhouse-Geisser correction was used when data did not meet the sphericity criterion). All statistical analyses were performed with Statistica 13.0 or SPSS 23.0. Results were considered significant at p Ͻ 0.05. A summary of all the statistical analyses can be found in Table 1.
To demonstrate that nonsignificant effects could not simply result from underpowered analyses and to substantiate the reality of significant effects, we performed additional analyses, calculating effect size, post hoc power (or observed power) and ideal sample size for each behavioral index and each tested factor. We also provided an interpretation of effect sizes (Cohen, 1988). These analyses were performed with G‫ء‬Power 3 (Faul et al., 2007) and are summarized in Table 2.

Psychomotor response to cocaine
The 75-minute locomotor activity session was divided in 15 time bins, and the average distance traveled was analyzed using a repeated measures ANOVA, with time as the within-subject factor and genotype as the betweensubjects factor. The ANOVA yielded a significant main effect of genotype (F (1,26) ϭ 12.049; p ϭ 0.002 a ), a significant main effect of time (F (14,364) ϭ 30.891; p Ͻ 0.001 a ) and a significant genotype ϫ time interaction (F (14,364) ϭ 7.584; p Ͻ 0.001) a . Using Bonferroni-corrected post hoc tests, we first sought to characterize the locomotionstimulating effect of cocaine in both control and autoDrd2-KO animals separately. Both groups appeared to experience the stimulating effect of cocaine: in the control group, the traveled distance increased significantly between the 30-, 40-, and 45-min time bins (all p Ͻ 0.005), while in the autoDrd2-KO group the increase extended through the 50-and 55-min time bins (all p Ͻ 0.001). We then compared experimental and control groups independently for each time bin. The post hoc tests revealed a significant difference between genotypes at 40 (p Ͻ 0.001), 45 (p Ͻ 0.001), and 50 min (p ϭ 0.01; 10, 15, and 20 min after cocaine injection, respectively; Fig.  1A). To provide a more synthetic approach and highlight the difference in cocaine activation across genotypes, we also compared the total traveled distance during the first 30 min before cocaine injection (pre-exposure) to the 30-min following cocaine injection (postexposure) using a mixed-model ANOVA with the cocaine (pre-vs postexposure) as the within-subject factor and genotype as betweensubjects factor. The analysis revealed a significant main effect of genotype (F (1,26) ϭ 11.693; p ϭ 0.002 b ) and of cocaine (F (1,26) ϭ 40.545; p Ͻ 0.001 b ), as well as a significant cocaine ϫ genotype interaction (F (1,26) ϭ 9.836; p ϭ 0.004 b ). Furthermore, a Bonferroni-corrected post hoc comparison showed that: (1) baseline pre-exposure locomotion was comparable between genotypes (p ϭ 0.144), (2) both autoDrd2-KO and control mice saw a significant increase in locomotor activity after cocaine injection (p Ͻ 0.001 and p ϭ 0.02, respectively), and (3) autoDrd2-KO displayed a significantly exacerbated locomotor response to cocaine (p ϭ 0.001; Fig. 1B).
In each case, autoDrd2-KO mice exhibiting a significantly greater locomotor response to cocaine than littermate controls, as expected (Bello et al., 2011;Koulchitsky et al., 2016).

Shaping of the operant response
Because dopamine neurons are a key component in reward-based learning, there is a possibility that manipulations altering dopaminergic transmission also interfered with reward-based learning abilities. To explore this possibility, we compared the number of sessions to reach learning criterion in each training stage (magazine training; shaping 1, 2, and 3) with a mixed-model ANOVA, with the stage as within-subject factor and genotype as between subject factor in a subset of autoDrd2-KO (n ϭ 13) and control mice (n ϭ 14). The ANOVA showed that there was a significant training stage main effect (F (3,69) ϭ 18.680; p Ͻ 0.001 c ), but there was no main effect of genotype (F (1,23) ϭ 1.308; p ϭ 0.265 c ) nor any genotype ϫ stage interaction (F (3,69) ϭ 0.181; p ϭ 0.812 c ) regarding the number of sessions needed to learn the response. A Bonferronicorrected post hoc test indicated that animals took longer in the initial stage, during which they had to first learn the operant response (shaping 1), than in either subsequent stage (all p Ͻ 0.01; Fig. 2).

Reversal learning
The reversal learning test produces a large set of behavioral measures, each collected at two training stages: acquisition and reversal. However, the a priori hypothesis tested in these studies was that reversal learning would be impaired in autoDrd2-KO (n ϭ 19), compared with wild-type (n ϭ 21), mice. This hypothesis was evaluated using a one-tailed t test, revealing a statistical trend for autoDrd2-KO mice to require more trials than wild-type controls to reach criterion performance during reversal learning (t (39) ϭ 1.696, p ϭ 0.051; Fig. 3A).
The larger dataset was next evaluated using repeated measures ANOVAs, with the genotype as betweensubject factor and testing phase as the within-subject factor. Fig. 3A exhibits the pattern of effects when trials to reach criterion in the initial acquisition and subsequent reversal phase were evaluated in both genotype groups. These analyses revealed a main effect of testing phase, Post hoc power and effect size are reported for each behavioral measure, as well as a comparison between ideal and actual sample sizes.   ; Fig. 3B). Finally, analyses of the average number of omissions occurring per trial did not reveal any main effect of genotype (F (1,38) ϭ 1.64; p ϭ 0.208 f ) or any genotype ϫ testing phase interaction (F (1,38) ϭ 2.238; p ϭ 0.143 f ); the main effect of testing phase was not significant, although there was a trend to omit more trial across testing phases (F (1,38) ϭ 3.37; p ϭ 0.074 f ; Fig. 3C). Analyses next focused on the average trial initiation times and pellet retrieval latencies, to evaluate possible genotype effects on motor or motivational vigor. AutoDrd2-KO did not differ from wild-type control mice on any of these measures. Trial initiation times were systematically affected by testing phase (F (1,38) ϭ 9.5; p ϭ 0.003 g ), but not by genotype (F (1,38) ϭ 0.5; p ϭ 0.49 g ); no genotype ϫ testing phase interaction was detected (F (1,38) ϭ 0.2; p ϭ 0.63 g ; Fig. 4A). Similarly, reward retrieval latencies were affected by testing phase (F (1,38) ϭ 5.4; p ϭ 0.026 h ) but not genotype (F (1,38) ϭ 0.4; p ϭ 0.510 h ), nor was there a genotype ϫ testing phase interaction (F (1,38) ϭ 1.9; p ϭ 0.172 h ; Fig. 4B).
Using a 5-choice serial reaction time task (5CSRTT),  showed that rats exhibiting elevated levels of anticipatory responding (nose-pokes made before a target stimulus was delivered) exhibit a lower availability of D2-like receptors in the ventral striatum. In the current task, mice must make a sustained response at the central hole to trigger trial onset, and difficulty with doing so may reflect a response inhibition or waiting deficit. The average number of times per trial that mice withdrew their snout early from the central hole were evaluated as a function of genotype and the programmed hold duration (which varied across trials). ANOVA revealed a main effect of genotype (F (1,38) ϭ 8.578; p ϭ 0.006 i ), a main effect of hold requirement (F (2,76) ϭ 77.362; p Ͻ 0.001 i ) and, after applying the Greenhouse-Geisser correction, a significant interaction between genotype and hold requirement (F (2,76) ϭ 7.34; p ϭ 0.008 i ). A Bonferroni-corrected post hoc test revealed a significant difference between AutoDrd2-KO and control mice in the ability to successfully wait until the hold requirement was completed when that duration was set at 600 ms (p ϭ 0.001), but not when the hold duration was set at either 200 ms (p Ͼ 0.999) or 400 ms (p Ͼ 0.999; Fig. 5).

Discussion
Dopamine D2 receptors have been implicated in the pathophysiology of various psychiatric disorders, including impulse control conditions and substance use disorders, but very few studies have examined the specific contributions of pre-vs postsynaptic D2 receptors in relevant behavioral endophenotypes. Based on the observation of a strong relationship between individual-level variation in striatal dopamine D2-like receptor complement and reversal learning (Groman et al., 2011(Groman et al., , 2016Izquierdo and Jentsch, 2012), we tested whether the fraction localized to presynaptic dopaminergic neuronal terminals contributed to this effect. We found that autoDrd2-KO mice lacking D2 autoreceptors tend to display relatively poorer reversal learning capacities, al- Figure 2. Shaping of the operant response. Genotype had no effect on the total number of sessions needed to learn the operant response (p Ͻ 0.265). However, when both groups were considered together, the first stage of shaping (shaping 1) took significantly longer than any other stage (all p Ͻ 0.01). Bars show group means Ϯ SEM. Figure 3. Trials to criterion, errors to criterion and omission rates. Although there was no effect of genotype on the number of trials needed to reach criterion, testing the specific hypothesis of a slower performance in autoDrd2-KO during the reversal phase with a t test returned a probability closely approaching significance level (p ϭ 0.051; A). There was no statistically significant difference between the groups in the total numbers of errors committed (B) or in the number of omissions per trial (C). Each dot represents a single data point. Bars show group means Ϯ SEM. though this effect is made ambiguous by the unchanged number of incorrect responses. More significantly, they also exhibited difficulty with completing a sustained nosepoke response that required the ability to wait to obtain reward. Both outcomes suggest that presynaptic D2 receptors have a role in influencing mechanisms such as behavioral or response inhibition.
To compare our results with this genetic model with past research, we also evaluated the locomotor responses to an acute dose of cocaine in autoDrd2-KO and control littermate animals. AutoDrd2-KO mice showed potentiated cocaine-induced hyperlocomotion. Previous studies that used the same construct found similar results linking excess dopamine release caused by the lack of D2 autoreceptor-mediated feedback inhibition to greater locomotor responses to this stimulant drug of abuse (Bello et al., 2011;Holroyd et al., 2015).

Relationships between dopamine and different forms of behavioral control
Reversal learning tests, including the variant used here, measure multiple psychological/behavioral processes, including (but not limited to) sensorimotor abilities, reinforcement learning, incentive motivation and inhibitory control over a pre-potent response. Given that this test has been shown to be sensitive to D2 receptor function Izquierdo and Jentsch, 2012;Jen-tsch et al., 2014), we hypothesized that deletion of presynaptic D2 receptors would impede the ability to learn the new reinforcement contingencies when reversal occurred. We identified a pronounced trend in autoDrd2-KO animals to require more trials before reaching criterion after reversal of learning contingencies, although this effect did not reach a conventional significance level. Additional analyses of the results gathered showed genotype effects on another response inhibition-related index, namely the ability to perform sustained duration observing response to initiate trials. In the current studies, autoDrd2-KO mice displayed a clear difficulty with maintaining this response for longer, but not shorter, durations. The absence of motor hyperactivity at baseline in the locomotor test, as well as no differences in the response times associated with trial initiation or pellet retrieval suggest that the difficulty maintaining the sustained duration observing response is not simply a hyperactivity phenotype.
Whereas difficulty to learn a reversed contingency is considered as "action impulsivity," inability to perform a sustain response is more closely linked to premature responding, which operationally reflects "waiting impulsivity" Voon et al., 2014). Although both share partially common neural circuits, evidence suggest that the D2 receptors exert have an opposite effect on action and waiting impulsivity, as the D2/D3 agonist quinpirole impairs reversal learning but reduces premature responding (Boulougouris et al., 2009;Fernando et al., 2012). The complex role of D2 receptors on impulsivity might explain why deletion of the D2 autoreceptors only moderately affected reversal learning while more severely impairing execution of long sustained responses in the present study.
Reversal learning has been repeatedly used as a measure of behavioral flexibility in rodents, monkeys and human subjects, possibly reflecting the capacity for inhibitory control. However, the results of pharmacological studies have sometimes produced mixed results regarding the relationship of this phenotype to dopaminergic systems. Manipulation of dopamine release with psychostimulant drugs -such as cocaine or D-amphetamine -leads to inconsistent results within the reversal learning paradigm. In rats, reversal learning is compromised by acute administration of D-amphetamine (Idris et al., 2005), while not being af-  fected by methylphenidate (Seu and Jentsch, 2009;Cheng and Li, 2013). Acute administration of cocaine impairs reversal learning in monkeys (Jentsch et al., 2002). In humans, small doses of cocaine improve reversal performance (Spronk et al., 2016), while methylphenidate is reported to produce opposite effects on reversal learning depending on working memory load and trait impulsiveness (Clatworthy et al., 2009;van der Schaaf et al., 2013). The effects of drugs interacting with dopamine on behavioral control are not always consistent across studies, though incongruences might result from the nonspecific action of cocaine, amphetamine and methylphenidate, as all of them also interact with other monoaminergic neurotransmitters, mainly serotonin and norepinephrine (Segal and Kuczenski, 1997;Andrews and Lucki, 2001).
Nevertheless, inconsistencies are also found in studies of genetically-engineered mice lacking the dopamine transporter (resulting in a prolonged presence of dopamine in the synapse); this genetic model has been associated with either improvement (Milienne-Petiot et al., 2017) or impairment (Del'Guidice et al., 2014 in reversal learning, although in both cases the change was rather modest. One explanation for the ambiguous effects of dopaminergic manipulations might reside in the dual nature of reversal learning, which requires subjects to not only to inhibit a pre-potent response but also to concurrently learn a new association. It is also important to mention that although many studies have aimed to assess reversal learning, they do so using tasks that vary greatly in their procedural aspects (training regimen, schedule of reinforcement, type/number of discriminanda, etc.). This might make the comparison between this task and our own results uncertain and lead to some of the reported inconsistencies.
The role of D2 receptors in behavioral control has been better characterized by the use of specific pharmacological agents. D2/D3 agonist quinpirole impaired reversal learning in rats, but did not affect acquisition or retention of the operant response (Boulougouris et al., 2009). Similarly, microinfusions of quinpirole in the nucleus accumbens impaired. Reversal learning performance (but not set-shifting) in rats (Haluk and Floresco, 2009). Human subjects who received bromocriptine, a D2 agonist, also showed decreased performance in a probabilistic reversal learning task (Mehta et al., 2001), though the D3-preferring agonist, pramiprexole, did not alter perseverative responses in healthy volunteers in a similar test (Ersche et al., 2011). Seemingly paradoxically, blockade of D2 receptors is also associated with poor reversal learning abilities. The selective D2 antagonist eticlopride, when infused into the orbitofrontal cortex, disrupted performance of a reaction-time task after reversal of reinforcement contingencies (Calaminus and Hauber, 2008). In nonhuman primates (vervet monkeys), D2/D3 receptor antagonist raclopride decreased reversal performance, without affecting discrimination learning per se (Lee et al., 2007). Finally, D2 receptor blockade with sulpiride also hindered reversal learning in healthy humans (Janssen et al., 2015).
Qualitatively similar impairments in reversal learning associated with either activation or blockade of D2 subtype receptors might be explained by the role dopamine plays in inhibitory control, taking into account the substantial range of individual variation in the magnitude of transmission. Recently, an elegant neuroimaging study showed that subjects with higher dopamine synthesis capacity exhibited better learning in response to rewards in a deterministic reversal learning task but also exhibited impaired reversal performance in response to a D2 agonist, while subjects with lower dopamine synthesis capacity learned more from punishment and showed improved performance in response to a D2 agonist. One interpretation of these results is that there exists a curvilinear relationship between dopaminergic transmission and reversal learning ; in such a model, the role for highaffinity D2-type receptors, located both pre-and postsynaptically, in affecting reversal learning varies in a manner influenced by subjects' trait dopamine synthesis and release, as well as by the task/testing conditions that could independently influence dopaminergic activity.
Our task also allowed for the measurement of the ability to perform a sustained response which may, in a simpler way, reflect the capacity for response or behavioral inhibition. It is somewhat similar in principle to the ability to wait in either a differential reinforcement of low-rate responding (DRL) schedule or in the 5CSRTT. In the DRL test, animals are reinforced for producing an operant response only if it follows a period of time of preset duration during which no response was made; it is thus regarded as a measure of the ability to wait/defer a reward-eliciting response (Ferster and Skinner, 1957;Stoffel and Cunningham, 2008;. Treatments with psychostimulant drugs elicit an impulsive pattern of responding in the DRL (Wenger and Wright, 1990;Lobarinas and Falk, 1999;Liao and Cheng, 2005;Cheng and Liao, 2007). Similarly, the number of responses that anticipate target delivery in the 5CSRTT is also thought to measure a similar waiting construct (Robbins, 2002;Sanchez-Roige et al., 2012). Similar to the DRL schedule, cocaine and D-amphetamine also increase impulsive responding in the 5CSRTT in mice (Loos et al., 2010) or rats (Grottick and Higgins, 2002;Van Gaalen et al., 2006;Pattij et al., 2007;Paterson et al., 2011;Baarendse and Vanderschuren, 2012). The ability to wait may exhibit a more consistent pattern of change (relative to reversal learning) following dopaminergic manipulations, suggesting it may have utility in mechanistic studies examining the consequences of genetic and/or pharmacological manipulations of the type used here.

Specific roles for pre-and postsynaptic D2 receptors in behavioral control
As noted above, lower availability, levels and/or functionality of striatal dopaminergic D2 receptors have been consistently linked to impulsive behavioral phenotypes in laboratory rodents and human or nonhuman primates (Kruzich and Grandy, 2004;Kruzich et al., 2006;Boulougouris et al., 2009;De Steno and Schmauss, 2009;Buckholtz et al., 2010;Groman et al., 2011;Laughlin et al., 2011;Morita et al., 2016). D2 receptors are located within multiple cellular compartments within the striatum, including postsynaptic medium spiny neurons that project to the pallidum, as well as on dopamine-releasing terminals where they regulate dopamine synthesis and release (Jentsch and Roth, 2000). Past studies linking impulsivity with D2-like receptors, measured in tissue homogenates or with positron emission tomography, were unable to differentiate these two populations because the ligands used in pharmacological studies target both pre-and postsynaptic receptors and have affinity for both the D2 and D3 receptor subtypes. Our studies suggest that the D2 subtype located on dopamine neurons contributes, at least in part, to the relationship between impulse control phenotypes and the dopamine D2-like receptor complements measured in previous studies.
In addition to the D2 family of dopamine receptors, D1 receptors have been characterized as exerting complementary role in transmitting dopaminergic neuronal signals. Inhibition of nonrelevant motor patterns relies on a fine balance between tonic and phasic dopamine release. In one theory, tonic dopamine acts on high-affinity postsynaptic D2 receptors, while phasic bursts of dopamine energize behavioral outputs through activation of lower affinity D1 receptors (Baik, 2013;Volkow and Morales, 2015;Soares-Cunha et al., 2016). Because deletion of D2 autoreceptors is predicted to disinhibit phasic dopamine release, it is likely that both postsynaptic D1 and D2 are anomalously activated in autoDrd2-KO mice, but that the resulting cellular effects produce dissociable modulation of behavioral control. Direct silencing of D1 versus D2expressing medium spiny neurons produce different effects on reinforcement learning and inhibitory control, with only modulation of D2-expressing striato-pallidal neurons selectively impairing reversal learning (Yawata et al., 2012). These findings are consistent with the notion that dopamine D2 receptors, and their actions on indirect pathway output neurons of the striatum, may more specifically relate to the selection of a single appropriate response (and consequently, to the inhibition of others; Keeler et al., 2014).

Conclusions
The present study extends on the substantial evidence linking impulse control phenotypes to brain dopamine D2-like receptors . These studies suggest that the relationship between low dopamine D2 receptor complement and impaired behavioral flexibility in a reversal learning task  is mediated, at least in part, by the fraction of D2 receptors localized to dopaminergic nerve terminals, a finding consistent with the link between low midbrain D2 receptor binding potential and impulsivity in humans (Buckholtz et al., 2010). By contrast, a recent positron emission tomography study in rats suggested that relatively higher D3-specific binding in the midbrain dopaminergic nuclei is also associated with behavioral inflexibility (Groman et al., 2016). These studies are therefore highlighting dissociable contributions of specific subtypes of dopamine D2 receptors and of the cel-lular compartments in which they are expressed. Only through systematic analyses of the role for D2 versus D3 receptors in pre-and postsynaptic circuits can a thorough characterization of their effects be provided. Given the importance of impulse control, and its modulation by D2 receptors, to drug addiction , the mechanistic details of these modulatory effects are important to the design and implementation of rational strategies for enhancing inhibitory self-control over the impulsive (and perhaps compulsive) aspects of clinicallyimpairing substance use.