Lateral Orbitofrontal Cortex and Basolateral Amygdala Regulate Sensitivity to Delayed Punishment during Decision-Making

Abstract In real-world decision-making scenarios, negative consequences do not always occur immediately after a choice. This delay between action and outcome drives the underestimation, or “delay discounting,” of punishment. While the neural substrates underlying sensitivity to immediate punishment have been well-studied, there has been minimal investigation of delayed consequences. Here, we assessed the role of lateral orbitofrontal cortex (LOFC) and basolateral amygdala (BLA), two regions implicated in cost/benefit decision-making, in sensitivity to delayed versus immediate punishment. The delayed punishment decision-making task (DPDT) was used to measure delay discounting of punishment in rodents. During DPDT, rats choose between a small, single-pellet reward and a large, three-pellet reward accompanied by a mild foot shock. As the task progresses, the shock is preceded by a delay that systematically increases or decreases throughout the session. We observed that rats avoid choices associated with immediate punishment, then shift preference toward these options when punishment is delayed. LOFC inactivation did not influence choice of rewards with immediate punishment, but decreased choice of delayed punishment. We also observed that BLA inactivation reduced choice of delayed punishment for ascending but not descending delays. Inactivation of either brain region produced comparable effects on decision-making in males and females, but there were sex differences observed in omissions and latency to make a choice. In summary, both LOFC and BLA contribute to the delay discounting of punishment and may serve as promising therapeutic targets to improve sensitivity to delayed punishment during decision-making.


Introduction
Many psychiatric diseases are characterized by insensitivity to detrimental outcomes (Bechara, 2005;Hartley and Phelps, 2012;Jean-Richard-dit-Bressel et al., 2019;Orsini and Simon, 2020). One factor that drives this insensitivity is the presence of a delay that often precedes occurrence of these outcomes (Murphy et al., 2001;Bechara et al., 2002;Field et al., 2019). For example, individuals with substance misuse problems seek out drugs to receive immediate positive reinforcement, but often underestimate impending withdrawal symptoms or financial/legal concerns that occur later in time (Bickel et al., 2014a;Dalley and Ersche, 2019). This can be attributed to "delay discounting," wherein the motivational value of delayed outcomes is underestimated compared with immediate outcomes.
While delay discounting of rewards has been well-studied (Floresco et al., 2008;Kable and Glimcher, 2010;Mar et al., 2011;Bickel et al., 2014b;Burton et al., 2014;Frost and McNaughton, 2017), there is minimal research investigating the neurobiological mechanisms underlying sensitivity to immediate versus delayed punishment. Furthermore, preclinical research on punished reward-seeking has primarily focused on consequences that occur immediately after an action (Pollard and Howard, 1979;Jonkman et al., 2012;Orsini et al., 2015a;Park and Moghaddam, 2017;Jean-Richard-Dit-Bressel et al., 2018;Rodríguez et al., 2018;Halladay et al., 2020). To address this discrepancy, we developed the rat delayed punishment decision-making task (DPDT), which offers choice between a small reward and a large reward followed by a mild foot shock. The shock initially occurs immediately after choice, but is preceded by a systematically escalating delay as the task progresses (Liley et al., 2019). Rats initially avoid the punished option, then shift preference toward the punished option as delays increase, thereby demonstrating delay discounting of the negative motivational value as a function of delay. Moreover, despite comparable decision-making with immediate punishment, males discount delayed punishment to a greater degree than females. Additionally, delay discounting of punishment is not correlated with discounting of rewards, suggesting that these processes may be governed by divergent neurobiological mechanisms (Liley et al., 2019).
The lateral orbitofrontal cortex (LOFC) and basolateral amygdala (BLA), two brain regions with dense reciprocal connections (Price, 2007), are two candidates that likely drive discounting of delayed punishment during reward-seeking. OFC is a prefrontal cortical brain region that receives input from all major sensory systems in addition to influences from limbic regions (Mcdonald, 1991;Carmichael and Price, 1995), making it an optimal site for integration of perceptual and emotional information to guide decision-making. LOFC is involved with a number of cognitive processes that are important for cost-benefit decision-making (Padoa-Schioppa and Assad, 2006;Karimi et al., 2019), including reward/punishment integration (Morrison, et al., 2011;Orsini et al., 2015b;Jean-Richard-Dit-Bressel and McNally, 2016). Additionally, LOFC is involved with sensitivity to delayed rewards (Mobini et al., 2002;Roesch et al., 2006;Zeeb et al., 2010); despite the lack of correlation between delayed reward and punishment discounting, it is feasible to speculate that brain regions that encode delays in rewarding contexts contribute to delay processing with punishment, even if processing differs based on rewarding/aversive valence. BLA regulates punishment-induced suppression of reward seeking and is involved with attribution of salience to aversive cues (Jean-Richard-dit-Bressel and McNally, 2015;Piantadosi et al., 2017;Hernandez et al., 2019). BLA is also involved with goal-directed behavior and encodes outcome-specific values to guide actions (Wassum and Izquierdo, 2015). Additionally, it associates cues with rewards during events, making this region imperative for using past experiences to flexibly guide decisions for optimal outcomes (Schoenbaum et al., 2000). Thus, both regions likely contribute to integrating delays with punishment to guide decision-making.
Here, we separately assessed the involvement of LOFC and BLA in sensitivity to delayed punishment during decision-making using pharmacological inactivation of each region before DPDT testing. We also compared the effects of LOFC and BLA inactivation between male and female rats. Finally, to test whether effects of inactivation were influenced by task design or impaired behavioral flexibility, LOFC and BLA were individually inactivated before a modified version of DPDT with descending punishment delays (REVDPDT).

Subjects
A total of 73 Long-Evans rats obtained from Envigo were aged 70 d on arrival and used for these experiments (total LOFC: n = 38,female: 18,male: 20;total BLA: n = 35,female: 15,male: 20). Rats were restricted to 85% free feeding weight one week before behavioral training to encourage motivation and pursuit of rewards during task performance. Free-feeding weights were altered throughout the experiment in accordance with Envigo growth charts to account for growth. All rats were individually housed and maintained on a 12/12 h reverse light/dark cycle. All methods were approved by the University of Memphis Institutional Animal Care and Use Committee.

Surgery
Before behavioral assessments, rats underwent bilateral cannulation surgery for either LOFC (13.0 mm AP, 13.2 ML, and À5.5 DV from skull surface; edited from Roesch et al., 2006;Paxinos and Watson, 2013) or BLA (À3.0 mm AP, 15.0 ML, and À8.7 DV from skull surface; edited from Orsini et al., 2015b;Paxinos and Watson, 2013) infusions. Rats were anesthetized in an isoflurane gas induction chamber, then placed into a stereotaxic apparatus (Kopf) while resting on a heating pad adjusted to 40°C. Isoflurane was provided throughout surgery via a nose cone. Cannulae were held in place by a dental cement headcap anchored by three bone screws. Upon completion of surgery, rats were subcutaneously given 1 ml of sterile saline, and a solution of acetaminophen and H 2 O in a dish to moisten food during recovery. Rats were closely monitored for signs of infection or distress during the next week, with cage bedding changed daily for the first 3 d.

Behavior apparatus
Testing was conducted in standard rat behavioral test chambers (Med Associates) housed within sound attenuating cubicles. Each chamber was equipped with a recessed food pellet delivery trough fitted with a photograph beam to detect head entries, and a 1.12-W lamp to illuminate the food trough. Food pellets were delivered into the food trough 2 cm above the floor centered in the side wall. Two retractable levers were located on the left and right side of the food trough 11 cm above the floor. A 1.12-W house light was mounted on the opposing side wall of the chamber. Beneath the house light was a circular nose poke port equipped with a light and photograph beam to detect entry. The floor of the test chamber was composed of steel rods connected to a shock generator that delivered scrambled foot shocks. Locomotor activity was assessed throughout each session with infrared activity monitors located on either side of the chamber just above the floor. Test chambers are interfaced with a computer running MedPC software, which controlled all external cues and behavioral events.

Shaping procedures
Food restriction and behavioral training began after one week of recovery. Before acquisition of DPDT, rats underwent a series of shaping procedures. Rats were first taught to associate the food trough with food pellets during magazine training. In separate sessions, rats then trained to press a single lever (left or right, counterbalanced across groups) to receive one pellet of food. After performing 50 reinforced lever presses within 30 min, rats trained to press the opposite lever under the same criterion. Following this were shaping trials in which both left and right levers were retracted, and rats were required to nose poke into the food trough during a period of illumination from both the house and food trough lights. Nose poking evoked the extension of a single lever (either left or right in pseudorandom order). Each subsequent lever press was reinforced with a single pellet, along with extinguishing of house and trough lights and retraction of the lever. After achieving a minimum of 30 presses of each lever in a 60-min time span, rats progressed to magnitude discrimination training.
The 30-min reward magnitude sessions used two levers with counterbalanced presses producing either one or three pellets. As in the previous stage of training, each trial began with the illuminated house and trough lights, after which a nose poke into the trough led to extension of one or both levers. A press on one lever produced a single pellet while the other produced three, followed by lever retraction, termination of all cues, and progression to a 10 6 2-s intertrial interval (ITI). There were five blocks of 18 trials in each task, with the first eight providing only a single lever (forced choice) the next 10 both levers (free choice). Once rats achieved .75% preference for the large reward during free choice trials, they began either DPDT or REVDPDT training.

DPDT
During DPDT, rats chose between a small reward and larger reward associated with punishment preceded by varying delays. DPDT methodology was comparable to magnitude discrimination described above, with choice between small and large food pellet reinforcers. However, in this task, the large option was associated with a mild, 1-s foot shock. Shock initially occurred immediately after a choice, then systematically took place later in time throughout the task or vice versa (Fig. 1). Lever identity (small or large reward) was fixed within each session and remained consistent across the entirety of training and testing.
Trials began with illumination of the house light and food trough, after which a nose poke into the trough caused one or both levers to extend simultaneously. A press on one lever dispensed a single pellet, while the other dispensed three pellets with a 1-s mild foot shock. After all outcomes were delivered, the house light extinguished, and the next trial proceeded after an ITI of 10 6 2 s. The sessions were divided into six blocks, with two forced choice and 10 free choice trials in each block for a total of 72 trials. The first two trials of each block were "forced choice" trials in which only a single lever was available to establish the reward/punishment parameters within that block. The following 10 trials were "free-choice" trials in which both levers extended, allowing rats to choose a preferred lever. During the first block, shock occurred immediately after lever press. In each subsequent block, a delay was introduced preceding shock that extended from 0, 4, 8, 12, and 16 s, followed by a block with and No Shock/Delay (Fig. 1). Notably, on trials in which the unpunished lever was chosen, the ITI increased by a period equivalent to the delay preceding shock in that block (4, 8, 12, 16 s) to maintain consistency of trial length regardless of choice.
Shock intensity began at 0.05 mA, then increased by 0.02, 0.03, 0.05, or 0.1 mA (based on sensitivity) in subsequent sessions if rats completed .85% of trials. This incremental increase in shock intensity limited omissions and allowed rats to acquire task parameters. Upon reaching the final shock intensity, subjects trained until they achieved stability, which consisted of no more than a 10% overall shift in daily choice behavior for 2-3 d. To minimize individual differences in performance, shock intensity was titrated for each individual rat until (1) mean choice of punishment across the entire session was between floor (0%) and ceiling (100%), and (2) either a positive slope (DPDT) or negative slope (REVDPDT) was observed for the percent choice of the punished lever. Between these two criteria, all subjects produced a robust discounting curve with sufficient parametric space for treatment to increase or decrease choice of delayed punishment.
Trials were recorded as "omissions" in two possible scenarios: (1) the rat failed to nose poke into the lit port to begin the trial during the allotted 10 s, or (2) the rat nose poked to make the levers extend but did not choose a lever in the allotted 10 s. Omissions could also occur during forced choice trials, but data included in results are restricted to omissions of free choice trials. If a subject omitted a full block of trials, that was tabulated as 10 omissions, and the missing data point was extrapolated using the linear slope of the other data points. We chose a linear slope based on patterns of decision-making observed by Liley et al. (2019). Notably, there was only one session each with missing data for LOFC REVDPDT and BLA DPDT, and no missing points of for LOFC DPDT and BLA REVDPDT, so the amount of extrapolated data were minimal. If two full blocks of trials were omitted in a single session, that session was repeated after an additional baseline session to avoid overuse of extrapolated data. This occurred for one rat in LOFC DPDT, four rats in LOFC REVDPDT, one rat in BLA DPDT, and zero rats in BLA REVDPDT.

DPDT with descending delays (REVDPDT)
To confirm that effects of inactivation were not driven by task design or aberrant behavioral flexibility, a subset of subjects was trained in a reversed task (REVDPDT). This task was identical to DPDT, except the delays were presented in descending order, beginning with No Shock/ Delay, 16, 12, 8, 4, then 0 s preceding foot shock (Fig. 1). Criteria for stability and shaping procedures were comparable to those used for DPDT.

LOFC and BLA inactivation
After rats reached stable performance in DPDT or REVDPDT, they underwent habituation sessions to acclimate to infusion procedure handling. They then received bilateral drug microinfusions to inactivate either bilateral LOFC or BLA. A drug cocktail of GABA agonists baclofen (Reis and Duarte, 2006) and muscimol (Chandra et al., 2010) dissolved in sterile saline (concentration: 250 ng/ml, 0.5 ml infusion volume over 1 min; Piantadosi et al., 2017;Orsini et al., 2018) was administered into each hemisphere via an automated infusion pump and two 50-ml Hamilton syringes. Behavioral testing commenced after a 15-min absorption period. After a day of baseline testing with no treatment, subjects were given bilateral sterile saline microinfusions (5 ml infused at 0.5 ml/min). Drug/saline order was counterbalanced across subjects.

Histology
Rats were euthanized with Euthasol, and perfusions were conducted with saline and 10% formalin solution. Brains were extracted, stored in 10% formalin solution, sliced at 60-100 mm using a Cryostat, and mounted onto slides. Cannula placements and infusion localization were confirmed via light microscopy (Paxinos and Watson, 2013).

Experimental design and statistical analysis
Custom-made MATLAB scripts were used to compile behavioral data, and all statistical analyses were conducted using IBM SPSS Statistics 24. If Mauchly's test of sphericity was violated, Greenhouse-Geisser values and degrees of freedom were used accordingly. If a rat failed to make any choices during a block of the task, the slope of that subject's curve was used to extrapolate that missing data point. If two or more blocks of behavioral data were missing, that rat was removed from analysis because of excessive omissions.
Following task acquisition, stable decision-making for DPDT and REVDPDT were measured using a day Â block repeated measures ANOVA, quantified as lack of Figure 1. a, Delayed punishment decision-making task (DPDT and REVDPDT). Rats chose between two levers, one delivering a one-pellet reward and the other delivering a three-pellet reward accompanied by a delayed foot shock (delay sequence: 0, 4, 8, 12, 16 s, No Shock/Delay for DPDT; No Shock/Delay, 16, 12, 8, 4, 0 s for REVDPDT). b, A six-day microinfusion schedule was used for both brain regions, with inactivation and saline order (days 3 and 5) counterbalanced across subjects. effect of day and a significant effect of block. Effects of microinfusions on behavior were analyzed via sex Â infusion (drug vs saline) Â block ANOVA. Latency to lever press during testing was evaluated using a mixed sex Â safe versus punished lever ANOVA.
A two-way ANOVA was performed to observe the impact of sex and task (DPDT vs REVDPDT) on differences in titration of foot shock mA (0.05-0.55). There was no significant difference between males and females, although there was a trend toward males having higher terminal shock levels (F (1,53) = 2.757, p = 0.10; male DPDT: 0.28 mA; male REVDPDT: 0.30 mA; female DPDT: 0.24 mA; female REVDPDT: 0.27 mA).
OFC inactivation reduced overall choice of the punished reward (F (1,10) = 5.888, p = 0.036). Critically, there was also an inactivation Â block interaction (F (5,50) = 3.261, p = 0.013; Fig. 3a), such that LOFC inactivation only reduced large reward choice when punishment occurred after long delays, but not when punishment occurred immediately or after a short delay. Further investigation using two-tailed paired samples t tests (see Table 1) revealed no effects of inactivation in the first three blocks, a near significant difference between drug versus saline for the 12-s delayed shock (p = 0.074), and a significant difference for the 16-s delayed shock (p = 0.005). Finally, there was no effect of LOFC inactivation during the final, unpunished block (p = 0.16), suggesting that LOFC inactivation did not cause gross motivational deficits or inability to discriminate reward magnitude.
Next, we assessed the effects of LOFC inactivation on omitted trials during DPDT. There was a main effect of inactivation (F (1,10) = 10.494, p = 0.009; Fig. 4a) such that omissions were greater following inactivation compared with saline infusions. There was also an effect of block (F (1.706,17.061) = 5.734, p = 0.015), with subjects omitting more trials early in the session wherein punishment had shorter delay times. There was no inactivation Â block interaction (F (1.590,15.901) = 2.220, p = 0.148). There was a trend toward a main effect of sex in which females displayed more omissions throughout the task than males (F (1,10) = 4.395, p = 0.062; Fig. 4b,c). There was also an inactivation Â sex interaction (F (1,10) = 9.655, p = 0.011), with females showing increased omitted trials after LOFC inactivation compared with males.
We next investigated the effects of LOFC inactivation on latency to choose a lever. There was no significant difference in latency to choose safe versus punished levers (F (1,11) = 0.073, p = 0.792; Fig. 4d), nor was there an effect of LOFC inactivation (F (1,11) = 0.003, p = 0.960). However, there was a trend toward an inactivation Â lever type interaction (F (1,11) = 4.435, p = 0.059), such that inactivation lengthened the time required for subjects to choose the punished but not safe lever. There was also an effect of sex (F (1,11) = 8.871, p = 0.013; Fig. 4e,f), with females taking longer than males to make a choice, but no sex Â lever type interaction (F (1,11) = 2.319, p = 0.156).
There were no effects of inactivation on choice of the punished option (F (1,12) = 1.920, p = 0.191), nor an inactivation Â block interaction (F (5,60) = 1.128, p = 0.355). However, based on the LOFC inactivation exerting the most substantial effects during the 16-s delayed  Research Article: New Research punishment in the standard DPDT (Fig. 5a), we probed for an effect of inactivation on this block during REVDPDT. We observed that LOFC inactivation did indeed reduce choice of the punished reward in this block (t (13) = À2.816, p = 0.015), with no differences observed in other blocks (Table 1). This revealed that, as in the original task, LOFC inactivation reduced choice of the punished reward when punishment occurred after a long (16-s) delay, but not when punishment coincided with choice after shorter (0-to 12-s) delays.
There was no overall main effect of inactivation on choice (F (1,12) = 1.800, p = 0.205). However, there was an inactivation Â block interaction (F (5,60) = 3.102, p = 0.015) such that BLA inactivation reduced choice of the punished reward when punishment was delayed (Fig. 7a) but did not affect choice when punishment was immediate. Paired samples t tests revealed this significant reduction in choice of the punished reward only occurred in the 16-s delay condition (t (13) = À2.787, p = 0.015; Table 2). Interestingly, BLA inactivation also reduced choice of the large reward in the final, unpunished block (t (13) = À3.006, p = 0.010).

Effects of BLA inactivation on REVDPDT
BLA was inactivated before REVDPDT in 6 female and 9 male rats (Fig. 2b). As expected, there was a main effect of block (F (5,65) = 12.065, p , 0.001; Fig. 9a) such that the     There was no effect of inactivation (F (1,13) = 0.444, p = 0.517) or inactivation Â block interaction (F (5,65) = 0.427, p = 0.828; Fig. 9a) on choice of the punished reward. There were also no effects of inactivation for any block (p . 0.05; Table 2), suggesting that unlike standard DPDT, BLA inactivation had no effects on REVDPDT with descending punishment delays.

Discussion
While discounting of delayed rewards has been wellstudied, little is known about the neural substates underlying delayed punishment discounting. Here we replicated previous findings that rats undervalue punishment preceded by a delay, reflected as increased choice of rewards with delayed compared with immediate punishment. This increased choice of delayed punishment was comparable between ascending and descending punishment delay schedules. LOFC inactivation reduced choice of delayed rewards with both ascending and descending delays, although the effect was confined to the longest (16-s) punishment delay in the descending condition. BLA inactivation also reduced choice of ascending delayed punishments, but not punishments with descending delays.
LOFC regulates sensitivity to delayed punishment LOFC inactivation reduced choice of rewards with longer delayed (but not immediate) punishments, suggesting that LOFC contributes to underestimation of delayed punishment during reward seeking. This is comparable to OFC driving discounting of delayed rewards (Mobini et al., 2002;Rudebeck et al., 2006), although effects of OFC manipulation vary based on task design and individual differences in impulsivity (Winstanley, 2004;Zeeb et al., 2010). Notably, a population of neurons in OFC signals reduction in value of delayed rewards (Roesch et al., 2006); it is possible that OFC activity signals discounting of impending punishment in similar fashion. However, based on the lack of correlation between delay discounting of reward and punishment (Liley et al., 2019), it is feasible that OFC encodes delayed outcomes differently based on motivational valence.
One explanation for reduced choice of delayed punishment after LOFC inactivation is impaired ability to adapt to changes in delay. This inability to update task contingencies would likely manifest as a "flattened" discounting curve. However, this is unlikely based on effects of LOFC inactivation during REVDPDT, in which punishment delays decreased throughout the session. As with standard DPDT, LOFC inactivation reduced choice of delayed punishment but not immediate or briefly delayed punishment, resulting in a "steeper" curve. This verifies that LOFC inactivation does not impair behavioral flexibility in this context. Notably, LOFC inactivation in REVDPDT evoked more selective effects than during the standard task, only reducing choice of punishment during the longest (16-s) delay. Performing individual comparisons typically requires the presence of an interaction; however, because the 16-s delay produced the greatest effect in standard DPDT, there was strong rationale to selectively probe this data point in REVDPDT. Nonetheless, the effects of LOFC inactivation on REVDPDT are not as substantial as on DPDT. Future replications using longer delays may increase the sensitivity of this task to LOFC inactivation and other experimental manipulations.
It is also possible that reduced choice of delayed punishment following LOFC inactivation was not caused by reduced delayed punishment discounting, but by increased overall sensitivity to punishment. However, this is  unlikely because LOFC inactivation did not influence choice when punishment was immediate or after a short delay (0-8 s). Another possible explanation for reduced large reward choice is that LOFC inactivation impaired magnitude discrimination, as LOFC has been shown to signal reward value (Van Duuren et al., 2008;Simon et al., 2015;Ballesta and Padoa-Schioppa, 2019). This is unlikely because both DPDT and REVDPDT include a punishment-free 1 versus three pellet block, during which LOFC inactivation did not influence reward choice.
It is possible that rats performing DPDT are not discounting delayed punishment but are instead unaware of impending punishment because of reduced temporal contiguity between action and outcome, leading to increased choice of options with delayed punishment. OFC has a well-established role in outcome representation (Ursu and Carter, 2005;Mainen and Kepecs, 2009;Panayi et al., 2021); therefore, if choice of delayed punishment was driven by reduced punishment expectancy, inactivation of LOFC would further disrupt expectancy and increase choice of delayed punishment. However, LOFC inactivation here had the opposite effect, reducing choice of rewards with delayed punishment. Therefore, the most plausible explanation for the reduced choice of delayed punishment is reduced punishment delay discounting.
While decision-making in DPDT involves delay discounting of punishment, it is important to consider that this is superimposed over reward-based decision-making. Optimal choice in DPDT requires two cognitive processes: merging punishment with expected delay to produce a negative motivational value, and integration of this aversive information with the value of appetitive outcomes (one vs three pellets). Thus, it is difficult to disentangle which of these factors is affected by LOFC inactivation. OFC is theorized to encode a dynamic cognitive map of task space that integrates all available action-outcome contingencies to guide decision-making (Wilson et al., 2014;Schuck et al., 2016;Cai, 2021). It is possible that the ability to incorporate delays preceding punishment into this "map" is dependent on LOFC. More granular research of the behavioral components of this task along with assessment of functional neuronal activity is necessary to fully delineate how LOFC drives decision-making in this context.
A previous study determined that males select rewards accompanied by delayed punishment more than females when shock intensity was comparable for all subjects (Liley et al., 2019). Baseline sex differences in decisionmaking were not observed here, as shock levels were titrated to avoid ceiling or floor effects for each subject (Orsini and Simon, 2020). Surprisingly, evaluation of final shock intensities per group for LOFC and BLA did not reveal a sex difference, although there was a trend toward males requiring a higher intensity shock than females. It is important to note that there were 17 more males than females overall, which likely contributed to this lack of main effect of sex. This imbalance was caused by several females failing to complete acquisition of the task and proceed to the testing phase, remaining at 0% choice of the large reward even at extremely low shock intensity. Had these subjects been included, it is likely that we would have replicated the sex difference observed by Liley et al. (2019). Regardless, the results of the inactivation experiments suggest that both LOFC and BLA regulate choice in DPDT similarly in both males and females. Because of difficulty with task acquisition, the sample size of females was smaller than males; however, the lack of sex differences in sensitivity to inactivation permitted merging sexes for each experiment.
Female rats omitted more trials than males, consistent with previous data showing that estradiol drives avoidance during punishment-based decision-making (Orsini et al., 2021). Inactivation also increased latency for females to make a choice compared with males. Finally, females required almost three times as many sessions as males to acquire DPDT. This is likely attributable to the first exposure to immediate shock driving avoidance of all options (including safe choice) in females. This subsequently increased the time required for females to be exposed to all task parameters, attenuating the overall rate of task acquisition. Alternatively, when the task began with delayed punishment (REVDPDT), females acquired the task as quickly as males. Therefore, beginning training with the option of immediate punishment may cause females to generalize punishment to both levers and completely disengage from the task early in training.

BLA inactivation selectively reduces choice of delayed punishment
Like LOFC, BLA inactivation did not reduce choice of rewards with immediate punishment, but decreased selection of longer delayed punishments (16 s). This is somewhat surprising, as BLA has been shown to regulate choice of options associated with immediate punishment. Lesioning BLA caused increased choice of punished rewards during the risky decision-making task (Orsini et al., 2015b) and BLA inactivation reduced the suppression of punished reward seeking responses to a foot shock associated lever (Jean-Richard-dit-Bressel and McNally, 2015). This finding also indicates that BLA inactivation only reduces punishment choice in situations with dynamic punishment delays. The reduced delay discounting of punishment after BLA inactivation is consistent with optical BLA inactivation reducing delay discounting of reward (Hernandez et al., 2019), although BLA lesions increase reward discounting (Winstanley, 2004).
One possible explanation for the lack of effects of BLA inactivation on sensitivity to punishment is the location of infusions within BLA. Jean-Richard-dit- Bressel and McNally (2015) found a functional distinction between anterior and posterior BLA, with only the posterior/caudal subregion being involved with detecting the aversive value of punishment. Our placements were mostly centralized in BLA; nonetheless, it is possible that these were insufficient to fully suppress the function of posterior BLA function. When we separated subjects based on cannula location, we found no statistical difference between subjects with anterior versus posterior placement (data not shown). However, more than half of the subjects were removed from this analysis because of centralized placement; thus, a more directed experiment is required to compare anterior versus posterior BLA function in this context. Regardless, the stereotaxic coordinates used here were sufficient to alter choice of delayed punishment in DPDT, suggesting that sensitivity to delayed punishment BLA may be independent of the anterior/posterior axis.
Interestingly, BLA inactivation reduced choice of delayed punishment during ascending punishment delays (DPDT), but not descending delays (REVDPDT). It is possible that decision-making beginning with the option of immediate punishment creates a "high stress" scenario wherein BLA is recruited to influence choice. Conversely, REVDPDT begins with no possibility of punishment, which may reduce BLA involvement in decision-making. LOFC inactivation also exerted more influence on choice during DPDT than REVDPDT, suggesting that transitioning from delayed punishment to immediate punishment may be less sensitive to neural manipulations than ascending punishment delays. Notably, there was a qualitative difference in saline responding between LOFC and BLA groups in REVDPDT (Figs. 5,9), suggesting that treatment in the latter experiment may have shifted baseline choice toward the punished reinforcer. However, decisionmaking after saline was highly comparable to baseline decision-making before any infusions (p = 0.781; data not shown), suggesting that this difference was independent of BLA manipulation.
While BLA inactivation reduced choice of the punished reward at the longest delay length, we also observed that this avoidance persisted after punishment was removed in the final block of DPDT. It is possible that BLA provides information about reward safety that is no longer available after inactivation, leading to avoidance of the large reward regardless of shock presence. This is supported by evidence that BLA contributes to the processing of safety signals during previously punished scenarios (Ng et al., 2018). The perseverative large reward avoidance could also be explained by BLA inactivation impairing ability to adapt to task changes. However, BLA inactivation during REVDPDT had no impact on choice, suggesting that BLA is not critical for flexibility within a session.
There were no sex differences in decision-making following BLA inactivation, but there were sex differences in other behavioral measures. Omissions were greater in females only during DPDT, suggesting that females are prone to omitting trials when sessions begin with immediate shock. Following BLA inactivation, there was a trend toward increased omissions in later trials as the session progressed, which diverged from increased omissions with immediate shock after LOFC inactivation. It is possible that LOFC but not BLA inactivation increased the salience of immediate punishment (as shown by Orsini et al., 2015b), reflected as task disengagement rather than changes in % choice. Notably, these effects were primarily in females, and the BLA group had more females than LOFC (5 vs 3). Additional experiments focused on female subjects will enable deeper investigation of this phenomenon. Finally, as seen with the LOFC experiments, BLA females had longer response latencies than males for both tasks.
The OFC and BLA are densely interconnected (Price, 2007), and both contribute to decision-making informed by reward and punishment (Jean-Richard-Dit-Bressel and McNally, 2016;Orsini et al., 2017). Furthermore, connections between LOFC and BLA are critically involved with encoding/retrieving the incentive value of cues and actions to help guide future decision-making (Groman et al., 2019;Malvaez et al., 2019;Sias et al., 2021). We observed that both LOFC and BLA appear to subserve similar roles in delay discounting of punishment, as inactivation of either region reduced choice of delayed punishment. It is possible that BLA-LOFC work in concert to regulate delayed punishment discounting. However, based on the selective effects of BLA inactivation on DPDT but not REVDPDT, BLA may only be engaged in situations beginning with the threat of immediate punishment. Future experiments will investigate the specific role of this circuit in sensitivity to delayed versus immediate punishment.
In conclusion, insensitivity to delayed punishment is a critical aspect of psychiatric illnesses, during which future consequences are often undervalued in favor of immediate rewards. To our knowledge, this is the first assessment of the neurobiological mechanisms underlying this critical phenotype. These data indicate that LOFC and BLA circuitry may serve as a promising therapeutic target to improve sensitivity to delayed punishment during decision-making.