Skip to main content

Main menu

  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Blog
    • Collections
    • Podcast
  • TOPICS
    • Cognition and Behavior
    • Development
    • Disorders of the Nervous System
    • History, Teaching and Public Awareness
    • Integrative Systems
    • Neuronal Excitability
    • Novel Tools and Methods
    • Sensory and Motor Systems
  • ALERTS
  • FOR AUTHORS
  • ABOUT
    • Overview
    • Editorial Board
    • For the Media
    • Privacy Policy
    • Contact Us
    • Feedback
  • SUBMIT

User menu

Search

  • Advanced search
eNeuro

eNeuro

Advanced Search

 

  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Blog
    • Collections
    • Podcast
  • TOPICS
    • Cognition and Behavior
    • Development
    • Disorders of the Nervous System
    • History, Teaching and Public Awareness
    • Integrative Systems
    • Neuronal Excitability
    • Novel Tools and Methods
    • Sensory and Motor Systems
  • ALERTS
  • FOR AUTHORS
  • ABOUT
    • Overview
    • Editorial Board
    • For the Media
    • Privacy Policy
    • Contact Us
    • Feedback
  • SUBMIT
PreviousNext
Research ArticleTheory/New Concepts, Novel Tools and Methods

A Multilevel Computational Characterization of Endophenotypes in Addiction

Vincenzo G. Fiore, Dimitri Ognibene, Bryon Adinoff and Xiaosi Gu
eNeuro 3 July 2018, 5 (4) ENEURO.0151-18.2018; DOI: https://doi.org/10.1523/ENEURO.0151-18.2018
Vincenzo G. Fiore
1School of Behavioral and Brain Sciences, University of Texas at Dallas, Richardson, TX 75080
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Dimitri Ognibene
2Department of Computer Science and Electronic Engineering, University of Essex, Colchester CO4 3SQ, United Kingdom
3Department of Information and Communication Technologies, Universitat Pompeu Fabra, Barcelona 08018, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Bryon Adinoff
4University of Texas Southwestern Medical Center, Dallas, TX 75390
5VA North Texas Health Care System, Dallas, TX 75216
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Xiaosi Gu
1School of Behavioral and Brain Sciences, University of Texas at Dallas, Richardson, TX 75080
6Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Article
  • Figures & Data
  • Info & Metrics
  • eLetters
  • PDF
Loading

Abstract

Addiction is characterized by a profound intersubject (phenotypic) variability in the expression of addictive symptomatology and propensity to relapse following treatment. However, laboratory investigations have primarily focused on common neural substrates in addiction and have not yet been able to identify mechanisms that can account for the multifaceted phenotypic behaviors reported in the literature. To fill this knowledge gap theoretically, here we simulated phenotypic variations in addiction symptomology and responses to putative treatments, using both a neural model, based on cortico-striatal circuit dynamics, and an algorithmic model of reinforcement learning (RL). These simulations rely on the widely accepted assumption that both the ventral, model-based, goal-directed system and the dorsal, model-free, habitual system are vulnerable to extra-physiologic dopamine reinforcements triggered by addictive rewards. We found that endophenotypic differences in the balance between the two circuit or control systems resulted in an inverted-U shape in optimal choice behavior. Specifically, greater unbalance led to a higher likelihood of developing addiction and more severe drug-taking behaviors. Furthermore, endophenotypes with opposite asymmetrical biases among cortico-striatal circuits expressed similar addiction behaviors, but responded differently to simulated treatments, suggesting personalized treatment development could rely on endophenotypic rather than phenotypic differentiations. We propose our simulated results, confirmed across neural and algorithmic levels of analysis, inform on a fundamental and, to date, neglected quantitative method to characterize clinical heterogeneity in addiction.

  • addiction
  • neural model
  • phenotyping
  • reinforcement learning

Significance Statement

Addiction is known to encompass heterogeneity in its development, maintenance, and treatment response. While previous work has mostly focused on the common mechanisms underlying vulnerabilities in addiction at a group level, the neurocomputational causes for such intersubject variability in addition are not well understood. To fill this knowledge gap, we combine a neural and a reinforcement learning (RL) model to reveal that the balance between neural circuits or computational control modalities characterizes the presence of behavioral phenotypes in addiction. The presence of converging effects, validated across neural and algorithmic levels of analysis, informs on a quantitative method to characterize clinical heterogeneity and potentially helps future development of precision treatments.

Introduction

Addiction is known to encompass a wide range of individual behavioral differences (i.e., phenotypes) in development, maintenance and severity of symptoms, and treatment response (Everitt and Robbins, 2016). Previous investigations into the mechanisms underlying this heterogeneity of behaviors have identified two fundamental neurocomputational alterations correlated with vulnerability in the development and severity of addictive behaviors (Garrison and Potenza, 2014; Jupp and Dalley, 2014; Belin et al., 2016). These neural and computational intersubject differentiations (i.e., endophenotypes) include (1) a dysregulation of D2 receptors in the striatum (Morgan et al., 2002; Nader and Czoty, 2005; Dalley et al., 2007; Flagel et al., 2014) and (2) an alteration of learning rates within a reinforcement-learning framework (Gutkin et al., 2006; Piray et al., 2010). However, these endophenotypic differences are found across a wide spectrum of dissociable phenotypes, so that the same neural or computational mechanism is used to account for separable behavioral traits. For instance, different forms of striatal D2 dysregulation are found in individuals differing in terms of their impulsivity (Dalley et al., 2007; Volkow et al., 2007), social dominance (Morgan et al., 2002; Gould et al., 2014), motor reactivity or preference for novelty (Flagel et al., 2010, 2014), or sensitivity to rewards (Belcher et al., 2014). Each of these behavioral traits is separately correlated with development of addiction, but they do not necessarily coexist in the same individuals (cf. novelty seeking and impulsivity: Ersche et al., 2010; Molander et al., 2011; Belin and Deroche-Gamonet, 2012). This mismatch between few known endophenotypic differences and a wide variety of multifaceted, dissociable, behavioral phenotypes suggests there are yet unknown neural and computational mechanisms that are responsible, alone or in interaction, for the reported behavioral differentiations. Finally, investigations into intersubject variability often emphasize the initial stage of addiction development (but see Belin et al., 2008; Economidou et al., 2009; Pelloux et al., 2015). Yet, individual differences also exist in treatment response, resulting in diverse relapse patterns among individuals showing similar severity of symptoms. These differences have not been so far addressed in previous neural or computational models.

Here, we propose a theoretical investigation into the interaction between ventral and dorsal cortico-striatal circuits and the associated behavioral control modalities. Several studies have emphasized that addiction is associated with alterations of ventral and dorsal cortico-striatal circuits, and of motivations and habits (Volkow and Morales, 2015; Everitt and Robbins, 2016; Koob and Volkow, 2016). However, the role played by the interaction between the two neural circuits or between the two behavioral control modalities in generating intersubject variability in addiction, has been so far neglected. To investigate this interaction, we use two models to simulate neural dynamics and algorithmic (or normative) choice selections in a multiple-choice task involving drug and non-drug rewards. Then we test these models under different conditions of circuit or control modality dominance (i.e., simulated endophenotypes). Consistently with previous models, we assume addictive substances hijack the healthy reward prediction error signal (Schultz et al., 1997) by triggering extra-physiologic dopamine bursts (Nestler and Aghajanian, 1997; Koob and Volkow, 2016). These dopamine activities signal the presence of an aberrant unexpected reward, leading to the repetition of drug-related actions and escalation of consumption (Redish et al., 2008; Dayan, 2009). In our neural model, this process of reinforcement learning (RL; Sutton and Barto, 1998) is mediated by extra-physiologic changes in cortico-striatal connectivity weights (Hyman et al., 2006; Haber, 2008; Koob and Volkow, 2016). These changes in turn aberrantly affect circuit gain and the stability of both ventral and dorsal cortico-striatal circuits, disrupting their respective roles in encoding and selecting goal-directed behaviors (Balleine, 2005; Balleine and O'Doherty, 2010; Gruber and McDonald, 2012) and habitual responses (Yin et al., 2004; Balleine and O'Doherty, 2010). A similar effect is assumed for our algorithmic model, where overevaluation of drugs and related RL affect the two control modalities, termed model-based and model-free, that approximate ventral/goal-oriented and dorsal/habitual implementations (Dolan and Dayan, 2013; Voon et al., 2017). As a result, and consistently with previous formulations of RL models of addiction (Redish et al., 2008; Piray et al., 2010; Gillan et al., 2016), both the planned evaluation of known action-outcome contingencies, represented in an internal model of the world, and the reactive immediate motor responses are biased toward drug-related selections.

Based on these assumptions, our models show that phenotypic differentiation in addiction development and treatment response can emerge as a function of the interaction between ventral and dorsal circuits or model-based and model-free control modalities. Our simulated results offer a proof-of-concept that this interaction is a candidate independent neural and computational mechanism underlying addiction vulnerability, putatively characterizing three different endophenotypes differing in the likelihood to develop addiction, severity of symptoms and treatment response. We suggest this neurocomputational mechanism could interact with both previously described D2 receptors dysregulation in the striatum (Dalley et al., 2007; Flagel et al., 2014) and altered learning rates (Gutkin et al., 2006; Piray et al., 2010) to generate the variety of dissociable behavioral traits reported in literature as associated with addiction vulnerabilities.

Materials and Methods

In brief, we present two complementary models simulating endophenotypic differences and their effects on addiction development and treatment response. In the models, intersubject differences are expressed in terms of either neural circuit dominance (i.e., ventral or dorsal circuit) or control modality dominance (i.e., model-based or model-free) in determining behavioral selections. The resulting phenotypes are tested in environments granting free access to a simulated substance of addiction, as usually implemented in laboratory studies. In particular, we compare our simulated phenotypic variability with the results described in a recent study investigating individual differences in rats self-administrating the stimulants cocaine or a designer drug, a dopamine- and mixed dopamine-norepinephrine reuptake inhibitor, respectively (Gannon et al., 2017). We selected this study because it highlights how different drugs, dosages, and tasks result in different ranges of phenotypic differentiation. For instance, an initial acquisition phase, over a 10-d period, shows compulsive behavior developed in up to 75% rats self-administering cocaine and 87.5% of those exposed to the designer drug. Furthermore, under a condition of fixed ratio (=5) schedule, the study shows self-administration varied significantly among subjects. A subset of rat population, termed high responders, self-administered cocaine up to 60% more times in comparison with a different subset, termed low responders, depending on dosage (cf. Gannon et al., 2017, and their Fig. 3). Importantly, the task setup chosen for both of our proposed models involves the selection of a drug reward over explicit non-drug-related alternatives; in contrast, the chosen empirical study utilizes a time-out responding paradigm, where the only explicit non-drug-related behavior (a lever-press) is not rewarded. As for most studies simulating addiction (Redish, 2004), we believe the choice to present our simulated agents with a richer set of options (i.e., more than one) does not invalidate a parallel between simulated and real data. We consider the simulated competing options as a proxy for the many conflicting stimuli and associated behaviors that animals have access to, even in the limited environment of a standard operant conditioning chamber. Thus, our focus is on perturbing the balance between the dorsal/model-free and the ventral/model-based systems, to compare our simulated behavioral differentiations in the escalation and compulsive selection of drug-related actions with the data reported in the chosen laboratory study.

The two models comprise a neural mass model that has been validated and described in the context of choice behavior and dopaminergic modulation (Fiore et al., 2016, 2018; Hauser et al., 2016) and a normative or algorithmic model based on standard RL schemes (Sutton and Barto, 1998). In the neural model, addiction and treatment response are modeled through DA-dependent associative plasticity in both ventral and dorsal circuits. In the RL model, aberrant learning is modeled using a duplex of model-based and model-free schemes that competed for control over action selection. The model-based scheme entails learning a model of the environment (in the form of probability transition matrices among states) that is used to compute value functions under the Bellman optimality principle (Bellman, 1966). The equivalent model-free scheme uses prediction error-based learning to directly acquire the value of state action pairs. Both neural and RL models are tested under four successive stages or phases: (1) before exposure to the simulated drug (termed pre-drug); (2) learning of addictive behavior (termed addiction); (3) simulated ideal therapeutic interventions (termed treatment) that partially revert the learning of the previous phase; and finally, (4) reinstated access to the simulated drug following each treatment (termed relapse). The simulated treatments are conceived to emphasize endophenotypic response and relapse differentiation; and therefore, they predominantly affect only one control system, targeting either the goal-oriented/model-based or the habitual/model-free. The former treatment is assumed to modify only the internal model of the environment and related selection of action-outcome contingencies performed in the ventral circuit. The latter treatment represents a condition in which the model of the world of the agent remains mainly unaltered, but the acquired drug-related stimulus-response associations are disrupted, thus preventing the agent from exhibiting habitual responses (cf. Doll et al., 2009).

The unique aspect of this complementary modeling approach is that converging results from neural and algorithmic models can validate each other, as process and implementation theories (i.e., synaptic and dynamical mechanisms) complement the normative principles formalized in the RL model.

Neural field model

Basic model architecture and parameterization

In cortico-striatal circuits, the signal processed in the cortex is conveyed toward its respective area of the striatum, processed in basal ganglia and finally relayed to the same cortical area where it originated, via thalamus (Haber, 2003; Draganski et al., 2008; Jahanshahi et al., 2015). Thus, despite diverging in terms of the information processed, e.g., sensorimotor or rewards and outcomes, these circuits are characterized by similar computational dynamics (Obeso et al., 2014). Temporal responses in recurrent neural networks co-occur with state transitions or input transformations that are often described in terms of energy landscapes (Fig. 1A–C). If multiple inputs or initial states generate transitions toward the same final state, this is termed attractor state (Amit, 1989). In recurrent networks such as cortico-striatal circuits, learning processes modulate the circuit gain, thereby affecting the strength of the attractor states and the overall stability of the system (Fiore et al., 2015, 2016; Hauser et al., 2016).

Figure 1.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 1.

Illustrative representation of energy landscapes and neural architecture of the model. A–C, These representations of energy landscapes are meant to illustrate differences in the temporal responses provided by neural systems. Depending on the energy landscape, three arbitrary inputs (magenta dots) are transformed into different stable states (gray dots). Learning processes increase or decrease the strength of the connections among nodes in a network, thereby altering its energy landscape and reshaping temporal responses toward existing attractors. Attractors are defined as low-energy states (bottom of the basins) at the end point of the temporal responses to multiple starting inputs. A, The landscape is characterized by multiple shallow attractors: these allow slow temporal responses, transforming multiple inputs into multiple weakly stable states. Noise and changes in the incoming input easily determine new responses toward different attractors. B, In this second illustrative configuration, steep and vast attractors characterize the energy landscape, allowing quick state transitions toward two equilibrium points. This new configuration is able to resist noise and minor changes in the incoming input, and, at the same time, allows a differentiation of inputs in two broad categories. C, Finally, the third energy landscape illustrates the presence of a parasitic attractor, exemplifying the condition of addiction: all inputs fall now at the bottom of a single steep basin. Under this condition, noise and changes in the incoming input determine temporal responses that keep falling in the same attractor, therefore preventing the system from executing different behaviors. D, Neural architecture used to simulate neural dynamics and behavior for the mean field neural model. The activity in the dorsal cortico-striatal circuit is responsible for the motor output of the system (left circuit), while activity in the ventral cortico-striatal circuit is responsible for goal selections (right circuit). The two systems bias each other via corticocortical connectivity and learning processes affect the weights of the connections between the two cortical outputs and the striatum in their corresponding circuits. The components in the architecture are labeled as follows: cortex (Cx), thalamus (Th), globus pallidus pars externa and interna (GPe and GPi), substantia nigra pars reticulata (SNr), subthalamic nucleus (STN), and striatum (Str), divided into two areas enriched by either D1 or D2 dopamine receptors.

We simulate the temporal responses in cortico-striatal circuits in a neural model (for illustrative representation of the neural architecture, see Fig. 1D). This neural model simulates mean-field activity (Deco et al., 2008) within multiple channels of both dorsal and ventral cortico-striatal loops. A continuous-time differential equation simulates changes over time Embedded Image of the average action potential Embedded Image of a pool of neurons (Eq. 1), and a positive transfer function (Eq. 2) converts this action potential in the final activation of the pool (Embedded Image ). Finally, the plasticity of the connections (Embedded Image ) between cortex and striatum is characterized by DA-dependent Hebbian learning, corrected with a constant threshold (th) as defined in Equation 3. The resulting rule strengthens the connections among all active nodes in the cortex and those active in the striatum and weakens the connections among nodes showing opposite activation status.Embedded Image (1) Embedded Image (2) Embedded Image (3)

The input (Embedded Image ), reaching each node in the neural network is modulated by two coefficients λ and ϵ . These regulate the ratio between the signal affected by the presence of dopamine release d and the amount of signal that is computed independent of dopamine release. For most units, the values of the two coefficients are set to Embedded Image and Embedded Image , with the exception of the simulated striatal units, where these parameters are set to Embedded Image and Embedded Image , to simulate the differential effect dopamine has, depending on the most prevalent receptor type ( > 1 and λ < 0 for D1 and D2 receptors, respectively). Due to the different effects the dopamine receptors have on the activity of the simulated neurons, the drug-induced dopamine-dependent Hebbian learning significantly affects D1-enriched units in the striatum, while having negligible effects on D2-enriched units (Gerfen and Surmeier, 2011; Volkow and Morales, 2015).

Simulating different addiction phenotypes and treatment effects

Agents controlled by the neural model are immersed in a simplified environment and can select among three arbitrary actions or inactivity (cf. nonstationary three armed bandit environment). The selection of the actions is conducted in the circuit simulating the dorsal cortico-striatal activity, and it is considered completed if the neural activity of any of the units in the external layer of the simulated cortex (Fig. 1D) is maintained for at least 2 s. Ventral and dorsal circuits interact, both ways, via corticocortical connectivity. Therefore, the activity in the simulated ventral circuit biases action selection in the dorsal circuit and the selection of actions in the dorsal circuit biases the activity in the ventral circuit. To test our hypothesis about the effect these reciprocal biases have on choice behavior, we assumed corticocortical weights do not vary over time and we tested eleven combinations for the parameters determining their weights, as Embedded Image = [0.02–0.2], [0.03–0.17], [0.03–0.15], [0.05–0.15], [0.07–0.13], or [0.1–0.1] (and symmetrical). This spectrum of weights describes the strength of the biases between the two major circuits, thereby characterizing either a balanced condition or a dominance of one of the two circuits. We report the effects in terms of behavioral responses for these putative endophenotypes and test each of these with thirty noise seeds, random inputs and under four stages, to allow within phenotype comparisons. The first stage, “pre-drug,” represents an assessment of behavior before any drug or reward is introduced, as the three available inputs randomly change their value to determine a nonstationary order of preferences. Under the second stage, termed “addiction,” one action is associated with the administration of a simulated addictive substance, triggering DA phasic responses and associated Hebbian learning in cortico-striatal connections of both ventral and dorsal circuits. For the third stage, termed “treatment,” we simulate the effects of deprivation coupled with one of two hypothetical treatments targeting either the dorsal or the ventral cortico-striatal circuits. The treatments are simulated by reverting the learning process in either the dorsal or the ventral cortico-striatal circuit, respectively, representing an intervention that would block or extinguish either the habitual drug-related response (an ideal behavioral treatment) or the drug-related emotional and value association (an ideal cognitive treatment). The dorsal treatment brings back the pre-drug configuration in the dorsal circuit and keeps the configuration reached under the addiction stage for the ventral circuit. The ventral treatment is achieved with the opposite intervention. Finally, during the fourth stage, termed “relapse,” we reintroduce access to the simulated addictive substance, inducing relapse. For this stage, relapse time is defined as the time required to reinstate the configuration of cortico-striatal weights found at the end of the addiction stage.

RL model

Basic model architecture and parameterization

In this model, we assume that the behavior of the agent relies on a hybrid model (Daw et al., 2011) that learns and computes the value of choices (actions, Embedded Image ) under each condition (state, st). Value is defined as a quantity that combines short and long-term expected rewards and negative outcomes when a specific strategy of action is followed (policy, π). It is formally defined as:Embedded Image (4)

In Equation 4, Embedded Image denotes the instantaneous reward received when action a is performed in state s. γ is a discount factor, comprised between 0 and 1, which defines the trade-off between immediate and long-term rewards. The value of a state given the policy is defined as Embedded Image . For each environment, there is an optimal policy Embedded Image , which maximizes the value Embedded Image for every state (Sutton and Barto, 1998).

The environment can be completely characterized through the state transitions distributions Embedded Image , and the expected rewards Embedded Image . These two functions together represent a model of the environment. Model-based behaviors compute Embedded Image and the policy relying on such functions, at each state, following the Bellman equation (Daw and Dayan, 2014):Embedded Image (5)

The model-based component learns the transition distributions and the expected rewards during the interaction with the environment. Thus, differently from other hybrid models (Daw et al., 2005; Keramati et al., 2011; Pezzulo et al., 2013), the quality of Q value estimation at any given moment depends on the experience the agent acquired up to that point in time. To compute value estimation (Embedded Image ), this bounded (Gershman et al., 2015) component applies at each step the Bellman equation (Eq. 5) a limited number of times Embedded Image to states sampled stochastically following a heuristic for efficient state update selection. The algorithm is an early-interrupted variation of the Prioritized Sweeping algorithm (Moore and Atkeson, 1993) with stochastic state update selection. Crucially, our model-based component does not accumulate the variations of Q values over time, and restarts the computation after each step (desJardins et al., 1999). This choice is meant to instate a plausible bounded rationality for our model which can account for the cognitive costs and ensuing limits of integrating old and new information about the environment, while updating and extending a complex plan to navigate it. This implementation is suitable for a bounded rational model-based component that shows controlled stochasticity of deliberation performances in nontrivial environments. This choice allows to test the effects of the hypothesized endophenotypic differentiation in an environment characterized by higher degree of complexity in comparison with both the one chosen for the neural model and those described in the literature of RL models of addiction. In particular, we consider drug consumption to be associated with complex after-effects that make it difficult to predict the overall result of pursuing the related action course.

In comparison with other hybrid models such as Dyna and Dyna2 (Sutton, 1990; Silver et al., 2016), the proposed architecture does not share Q values between model-based and model-free components, nor it requires that the two processes share the same state representations. The two components separately represent their Q values and integrate them in a later phase. This decoupling is assumed to result in a more biologically plausible agent (Daw and Dayan 2014), and it is essential for the simulations of two separate treatments, essential requirement to establish a comparison with the behavior simulated with the neural model. In contrast with previous work using a hybrid Dyna-like architecture and prioritized sweeping algorithm, where the sharing of the Q values explained the appearance of model based drug oriented behavior (Simon and Daw, 2012), in our simulations this model based addiction emerges in independent model-free and model based components. Thus, addiction behavior results from the joint effect of high reward (i.e., the drug), a limited number of stochastically selected policy updates and limited knowledge of the environment.

The model-free component has been implemented using the Q-Learning algorithm in tabular form (Watkins and Dayan, 1992). Q-learning updates initial state value estimations as follows:Embedded Image (6) Embedded Image (7)where α is a learning factor comprised between 0 and 1. Our hybrid model computes choice values in a fashion that balances model-free (MF in the equations) and model-based (MB in the equations) components depending on a parameter β . Six values (1, 0.8, 0.6, 0.4, 0.2, 0) are used for this parameter to simulate different endophenotypes, on a spectrum between purely model-based ( β = 1) and purely model-free ( β = 0) RL.

To allow exploration, the action to execute is selected randomly 10% of the times. This exploration factor is kept constant to support adaptation to a changing environment (Singh et al., 2000) and to simulate the continuous update of knowledge necessary to cope with ecological environments. The remaining 90% of the times, actions are determined by maximizing QMX(s,a) in a strategy defined as ε-greedy (ε = 0.1). These values are produced by combining the values computed by the model-based and model-free components:Embedded Image (8)

The choice for a fixed balance between model-based and model-free requires minimal assumptions on their interaction and has been used in recent RL architectures (Silver et al., 2016).

Simulating different addiction phenotypes and treatment effects

In comparison with the simulations characterizing the neural model, a more complex environment is in use for the RL model to highlight how our endophenotypic differentiations can also affect the likelihood to develop addiction. This environment is characterized by a total of 20 states divided into four different types (Fig. 2): (1) healthy rewards (i.e., normal rewards that are not directly associated with drugs); (2) neutral states (no reward or negative outcome); (3) drug-related states, which give a high reward but are followed by multiple (4) drug aftereffects, characterized by small negative outcomes. Similar to the neural model investigations, the agent deals with environment variations meant to simulate four phases of addiction: initial pre-drug phase (f1); addiction (i.e., the drug becomes accessible for the first time, f2); treatment (f3); relapse (i.e., second drug exposures; f4). Under the initial pre-drug phase (dinit = 50 steps), the agent does not receive any reward or negative outcome by entering the drug-related and aftereffects area, but a moderate reward is assigned (Rg = 1) by accessing the healthy reward state. Under the phases of addiction and post-treatment addiction (dtpy = 1000 steps), the agent can also receive a high reward, after accessing a drug-related state (Rd = 10). The drug state always leads to a series of randomized state transitions among the aftereffects states (Ra = -1.2) and simulates generic negative consequences associated with addiction. The agent can occasionally leave this aftereffect area of the environment (Fig. 2) to reach a neutral state, at the price of a further negative outcome (Ra = -4). Under the treatment phase (dtpy = 1000 steps), the drug-related state results in a negative outcome (Rdt = -1; Tables 1, 2, column f3), thus increasing the chances the agent stops pursuing this state. To allow for a comparison with the results in the neural model, we simulate a model-based and model-free treatment by manipulating the learning factor of the nontreated control modality, decreasing it: αCtpy = 0.01 * α. Under the relapse phase, we measure the simulated time required by the agents to reach at least 95% of drug-related action preference as recorded under the addiction phase, after the drug is introduced again in the environment. This threshold is used to measure the percentage of agents relapsing, as well as the time required to complete the relapse, per endophenotype.

Figure 2.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 2.

Illustrative representation of the environment used for the RL model of addiction. The states are disposed in a linear arrangement: on one extreme is a healthy reward state (1), on the opposite side a drug state (8) followed by twelve aftereffects states (9–22). Healthy reward and drug states are separated by six neutral states (2–7). The agent can traverse between nearby neutral states. From the two borders of the central segment of neutral states, an agent can enter the healthy reward state (from state 2), securing a moderate reward (Rg = 1), or the drug state (from state 7), receiving an initial high reward (Rd = 10, during the phase of addiction) and a series of sparse but temporally extended negative outcomes, characterizing the aftereffects states. The presence of negative outcomes makes entering the drug and aftereffects area suboptimal during all experimental phases (see optimal policy in Table 3). From both the goal state and the drug/aftereffects segment the agent is then returned to the middle of the neutral segment. In this representation, we explicitly portray the transitions related to states 1 (healthy reward), 4 (neutral), and 15 and 20 (drug aftereffects) for illustrative purposes. Line width represents related transition probability value. Line and text color represent the action class (as, ag, aw, ad). Neutral states are navigable with actions as2-7, which are deterministic for adjacent state while have high chance of failing for distant states. From the neutral states the agent can reach: (1) the healthy reward, if executing action ag when in state 2; and (2) the drug state (8) and aftereffects area (state 9–22), if executing action ad, when in state 7. From the healthy reward area, the agent can issue again ag, receiving a reward of 1 and going back to the center of the neutral area, state 4. By entering the drug area, the agent receives a reward of 10. Action results in the drug/aftereffect area are probabilistic: the agent can reach a nearby state in the area or leave the area and reach the center of the neutral state. Leaving the drug/aftereffects area has a cost of -4, whereas every other transition inside the area costs -1.2. For a full description of transitions and their probability distribution in the environment, see Tables 1, 2, 4, 5).

View this table:
  • View inline
  • View popup
Table 1.

Environment transition probabilities across endophenotypes controlled by the RL model

View this table:
  • View inline
  • View popup
Table 2.

Environment rewards across endophenotypes controlled by the RL model

View this table:
  • View inline
  • View popup
Table 3.

Optimal policy across endophenotypes controlled by the RL model (2nd drug phase)

View this table:
  • View inline
  • View popup
Table 4.

Agent model parameters across endophenotypes controlled by the RL model

View this table:
  • View inline
  • View popup
Table 5.

Environment parameters across endophenotypes controlled by the RL model

Code accessibility

All models rely on custom code developed in MATLAB (optimized for R2014b) that has been run successfully on multiple OS (iOS, Linux and Windows) on different computers and local servers. The code can be accessed at any time from the repository ModelDB (http://modeldb.yale.edu/239540). The downloadable archive file consists of two folders (respectively, for the neural model and the RL model), which include the entire source code required to replicate the data reported in our Results section. Code available as Extended Data Code File 1.

Extended Data Code File 1.

To access the source code of both models, visit the ModelDB website (https://senselab.med.yale.edu/modeldb/enterCode.cshtml?model=239540) and download the archive. The source code shows its structure in the commented main files “separate_test.m” and “RunExperimentLearning96.m,” respectively, in the folder “neural_model” and “RL_model.” Download Extended Data C, ZIP file.

Results

Simulations from the neural field model

During all stages, the three stimuli randomly change every few seconds, putatively representing a dynamic fluctuation of values associated with perceived cues in a nonstationary environment. This setup requires the agents to rapidly adapt to these changes, transiently triggering the motor response associated with the most valuable cue, to achieve optimal behavior. During the pre-drug stage, dorsal and ventral circuits perform unbiased selections, collaborating in the generation of a near-optimal sequence of motor selections. All eleven endophenotypes show uniform distributions of action selections, complying with the random distribution of the inputs configurations (Fig. 3A). This control stage allows the simulated network to generate transient temporal responses that couple multiple initial states with multiple stable states, in a transient winner-take-all or winner-less competition (Rabinovich et al., 2006; Afraimovich et al., 2008).

Figure 3.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 3.

Distribution of action selections across endophenotypes controlled by the neural model. Histograms show how the distribution of simulated action selections changes depending on the endophenotype (11 variations in corticocortical connectivity weights). Thirty random seeds/inputs are used per endophenotype, tested under two stages: pre-drug (A) and addiction (B). The three colors represent the occurrence of selections of three arbitrary actions. Under the pre-drug stage, no reward is provided, and action selections are triggered by random fluctuation in values of competing sensory inputs. The simulations show the agents adapt to the changes in sensory stimuli and therefore exhibit a near-uniform distribution of action selections. Conversely, under the addiction stage, the action represented in blue is associated with administration of the simulated drug, triggering DA-dependent Hebbian learning in cortico-striatal connectivity, and consequently overselection. Under addiction, the differences among endophenotypes clearly emerge in the selection frequency of the action leading to drug consumption. Asymmetric control (endophenotypes 1–3 and 9–11) leads to a stronger overselection in comparison with balanced control (endophenotypes 4–7), despite identical learning processes and reward encoding.

During the simulated addiction stage, one of the actions is associated with drug administration (Fig. 3B, values represented in blue). Substance use triggers phasic dopamine bursts, leading to Hebbian learning in cortico-striatal connections of both dorsal and ventral circuits (Eq. 3). In recurrent networks, circuit gain increases as a direct function of the weights of reentrant synapses (Amit, 1989). A dopamine response triggered by healthy unexpected rewards would create a bias toward the selection of the reinforced motor response to a perceived cue (Cohen and Frank, 2009; Grahn et al., 2009; Baldassarre et al., 2013). However, drug consumption triggers extra-physiologic dopamine-dependent learning, which in our model results in aberrantly high circuit gain, compromising the ability of all affected circuits to discriminate among different inputs and produce temporal transitions toward multiple stable states (cf. Fiore et al., 2014). The cortico-striatal circuits become overstable and resistant to perturbation caused by a change of input or by noise as they are dominated by parasitic attractors (Hoffman and McGlashan, 2001; Fig. 1C). In the ventral cortico-striatal circuit, a parasitic attractor sets and maintains the selection of drug-related goals or outcomes, biasing the action-outcome assessments required for planning. In the dorsal circuit, the same process determines overstable selections of the reinforced motor behavior, generating reactive responses and habits. Importantly, the learning process simulated in our neural model leads to the generation of parasitic attractors in both circuits across all endophenotypes, as all agents eventually reach a fixed threshold in cortico-striatal neural plasticity. Despite the generation of a form of compulsive drug seeking behavior across all endophenotypes, we observe significant differences in motor response patterns as a function of the balance between ventral and dorsal circuits. Specifically, the endophenotypes characterized by unbalanced dorsal or ventral control (i.e., Fig. 3B, endophenotypes 1–3 and 9–11) express distributions of motor selections that are significantly more compromised by drug-related aberrant rewards, in comparison with balanced endophenotypes (i.e., Fig. 3B, endophenotypes 5–7). The presence of identical learning processes, and the associated attractor formation in both ventral and dorsal circuits, ascribes all phenotypic differences univocally to the only remaining independent variable, which controls corticocortical connectivity and therefore the strength of the biases between circuits. Unbalanced agents are characterized by more frequent drug-related selections as actions leading to drug consumption are selected more frequently than in balanced endophenotypes, in a range between +3% and +45%. This result identifies all phenotypes within the limits of individual differentiation described in the study chosen for behavioral comparison (Gannon et al., 2017).

Next, we investigate how the simulated endophenotypes behave during the stages of treatment and relapse. First, we measure the frequency of drug-related action selections during the stages of addiction and treatment (Fig. 4A,B). Both ventral (goal-oriented) and dorsal (habitual) treatments effectively reduce the number of actions associated with drug consumption, in comparison with baseline addiction. However, the dorsal treatment is more effective for dorsal-dominated endophenotypes and the ventral treatment is more effective for ventral-dominated endophenotypes. These endophenotype-specific treatment effects are further confirmed by our analysis of individual differences under the relapse stage (Fig. 4C,D): dorsal treatments are more effective in elongating time to relapse for dorsal-dominated endophenotypes, whereas ventral treatments are more successful in delaying relapse for ventral-dominated endophenotypes. This analysis shows that simulated treatments focusing either on the dorsal circuit (and therefore habitual responses) or the ventral circuit (and therefore motivational responses) can have substantially different effects, depending on the balance between dorsal and ventral circuits. Importantly, these differences emerge only after the treatment is applied, where a pre-treatment comparison between compulsive behaviors expressed by the opposite unbalanced endophenotypes (i.e., ventral-dominant or dorsal-dominant) does not show any significant difference in choice selections (Fig. 3B, endophenotypes 1–3 and 9–11).

Figure 4.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 4.

Severity of addiction and relapse time across endophenotypes controlled by the neural model. Shaded error lines report mean and standard error for 30 simulated agents across endophenotypes (11 variations in corticocortical connectivity weights). A, B, Selections of actions leading to substance consumption, as a percentage of the overall number of action selections. In the first case (A), we compare the values recorded during the addiction stage with those recorded during the stage of dorsal treatment jointly with abstinence (i.e., drug-related actions do not trigger self-administration of a drug and the treatment targets the dorsal circuit). In the second case (B), the comparison involves addiction and ventral treatment (treatment targeting the ventral circuit, during abstinence). C, D, We compare the simulated time required by the 11 endophenotypes to reach an arbitrary threshold of cortico-striatal connectivity during the stage of addiction and during the stage of relapse after either dorsal (C) or ventral (D) treatment. Within the time of a simulation run, all simulated agents reached the addiction threshold. The two treatments are simulated by restoring either the dorsal/motor (A–C) or the ventral/outcome circuit (B–D) to the configuration characterizing the pre-drug stage. The percentage of the action selections shows the dorsal treatment is more effective in endophenotypes characterized by high dorsal dominance (A), whereas the ventral treatment only has an effect in endophenotypes characterized by high ventral dominance (B). Similarly, dorsal and ventral treatments result in long relapse times in endophenotypes characterized by high dorsal and high ventral dominance, respectively; *, significant difference: p < .05.

Simulations from the RL model

By simulating explicit negative outcomes associated with drug consumption, the RL model allows to measure the likelihood each agent has to develop addiction, as a function of its endophenotype. In our analysis, addiction is defined as a behavior leading to drug selections more frequently than the healthy alternative reward, under the addiction phase. The mean percentage of these addicted agents (over 300 runs) was 43.05%, across endophenotypes, which is consistent with the percentage of rats developing compulsive self-administration of cocaine, as reported in the reference study (∼40% over a period of 5 d; cf. Gannon et al., 2017). Importantly, when considering endophenotype differentiation, the percentage varies significantly: 60.3% for β = 0, 40.3% for β = 0.2, 30.1% for β = 0.4, 36.7% for β = 0.6, 39.3% for β = 0.8, and 51.6% for β = 1 (Fig. 5A,B). This phenotypic differentiation is consistent with well-established data from animal models. For instance, rat strains selectively bred for either high or low voluntary running differ in the likelihood to develop addiction when given free access to cocaine (respectively, ∼35% and ∼60% of each strain develop addiction over a period of 5 d; cf. Smethells et al., 2016). Free access to substances of abuse does not necessarily lead to compulsive behaviors (Piazza et al., 1989; Belin et al., 2011), as addiction varies as a function of factors such as exposure extent, amount of drug delivered, and associated negative effects (Pelloux et al., 2007; Jonkman et al., 2012). Our simulations suggest that endophenotypes with lower chances of addiction are characterized by balanced control modalities. Note that an optimal agent, knowing the environment structure and being able to compute the long-term effects of drug, will never select drug states (Table 3).

Figure 5.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 5.

Likelihood to develop addiction and relapse time across endophenotypes controlled by the RL model. Shaded error lines report mean and standard error for ∼100 simulated agents across six endophenotypes (differential balance between model-based and model-free control modalities, β = [0, 0.2, 0.4, 0.6, 0.8, 1]). A, B, Percentage of agents developing addiction (i.e., drug-related choices are more frequent than healthy reward-related choices), per endophenotype, under the addiction and treatment phases. In the first case (A), the comparison involves data recorded during the phase of addiction and those recorded during the phase of model-free treatment. In the second case (B), the comparison involves the phases of addiction and model-based treatment. C, D, Illustration of the simulated time required by the six endophenotypes to reach 95% of action preference toward the drug state, in comparison with action preference recorded during the phase of addiction (f2). In the first case (C), the comparison involves the phases of addiction and relapse after model-free treatment, whereas in the second case (D), the comparison involves the phases of addiction and relapse after model-based treatment. In terms of action selection ratio, the simulated results show both treatments have a significant effect only on those phenotypes characterized by strong unbalance of control (A, B). In terms of relapse, the results show the model-free treatment is on average more successful than the model-based one, as five endophenotypes show no significant difference between the phases of addiction and post-treatment addiction (i.e., the time required to relapse is not significantly different from the time required to develop addiction the first time). Each endophenotype, or parameter selection, was simulated 100 times across the four phases (3050 steps per simulation). Results depend on the statics of the environment, but over similar environments, the results were qualitatively similar; *, significant difference: p < .05.

Finally, the simulations suggest that the hypothetical treatment targeting model-free control is the most effective, reducing the likelihood to pursue drug-related behaviors for all endophenotypes (Fig. 5A). In contrast, the model-based treatment appears to be less effective for all endophenotypes, with the exception of the purely model-based one ( β = 1; Fig. 5B). Under the relapse phase, our data confirm that the simulated treatments significantly differ in their effectiveness across the proposed endophenotypes, also suggesting the treatment targeting model-free control is the most successful in prolonging relapse time (Fig. 5C,D). Relapse time after model-free treatment is mostly similar to the time required to develop addiction behavior before any treatment (Fig. 5C). At the opposite side of the control spectrum, the model-based treatment shows a positive effect only for the purely model-based endophenotype. All remaining endophenotypes show relapse times significantly shorter than those recorded for the first development of addiction ( β = 1; Fig. 5D).

Discussion

Individual differences in stress and anxiety responses (Dilleen et al., 2012; Jimenez and Grant, 2017), social dominance (Morgan et al., 2002; Covington and Miczek, 2005), aggressive temperament (McClintick and Grant, 2016), preference for saccharine (Carroll et al., 2002), sensation or novelty seeking (Suto et al., 2001; Nadal et al., 2002; Belin et al., 2011; Flagel et al., 2014), impulsivity (Perry and Carroll, 2008; Verdejo-García et al., 2008; Dalley et al., 2011), and sensitivity to rewards (Belcher et al., 2014) have all been found in both animal models and clinical studies in humans to be associated with addiction vulnerabilities, and in particular with the likelihood to develop and maintain addiction, or to resist to treatment (Piazza et al., 1989; Belin et al., 2016; Everitt and Robbins, 2016). However, investigations into the mechanisms underlying this phenotypic differentiation in addiction has so far revealed few neural or computational candidates, which are found to be associated with diverse and dissociable behavioral traits. An important example is represented by the endophenotypic differentiation reported in the expression and reactivity of striatal D2 dopaminergic receptors, which is found to be negatively correlated with the traits of impulsivity (Dalley et al., 2007), social dominance (Morgan et al., 2002), and sensitivity to rewards (Belcher et al., 2014) and nonlinearly correlated with novelty preference (Flagel et al., 2014). The overlap of this endophenotypic trait across multiple, noncoexisting, phenotypes associated with addiction vulnerabilities suggests other neural or computational mechanisms have yet to be identified to allow accounting for the reported variety in behavioral traits.

Here, we have presented a neural field model, augmented by an RL model, to expand on existing neuropsychological and computational accounts of addiction. Our models propose a theoretical investigation into the interaction among cortico-striatal circuits or behavioral control modalities, and the effects this interaction has on addiction development and treatment response. As described in classic models (Redish, 2004, 2008; Dayan, 2009), we have assumed that overevaluation of a drug leads to aberrant dopamine release and associated overlearning in multiple DA targets (Volkow and Morales, 2015; Koob and Volkow, 2016). In the neural field model, this mechanism results in the dysregulation of the circuit gain and associated dynamics of both ventral and dorsal cortico-striatal circuits (Fiore et al., 2014; Hauser et al., 2016). In the integrated model-based and model-free RL model, sequential choice behavior is confounded by the presence of a high immediate reward (drug state). This leads to misrepresent the negative outcomes following drug consumption, if their distribution across states and time is sufficiently complex to escape the capabilities of the agent to correctly represent the environment (Doll and Daw, 2016; Sadacca et al., 2016). We found that both models jointly indicate that the balance between neural circuits or behavioral control modalities is a candidate neurocomputational mechanism characterizing endophenotypes in addiction. The neural and RL models converge in suggesting that individuals characterized by balanced behavioral control between reward-seeking or planning (ventral circuit/model-based) and reactive or habitual responses (dorsal circuit/model-free) would have a reduced chance to develop addiction and decreased severity of symptoms if developing addiction. We propose that this neurocomputational mechanism may be interacting with other known endophenotypic differentiations, such as alterations of D2 receptors in the striatum (Morgan et al., 2002; Nader and Czoty, 2005; Dalley et al., 2007; Volkow et al., 2007; Belcher et al., 2014; Flagel et al., 2014) or differences in learning rates (Gutkin et al., 2006; Piray et al., 2010), to generate the multifaceted behavioral traits that have been reported in literature to be associated with addiction vulnerabilities.

In our neural model, ventral and dorsal circuits are mostly in phase in their selections under the pre-drug stage, exhibiting synchronous transient stability of neural activity and enhancing the overall ability of the system to adapt to changing stimuli (i.e., the two circuits adapt to the input changes with a similar pace and synchronize in their selection). Under the addiction stage, the two circuits are mostly pulled toward the parasitic attractor state associated with drug consumption, and they occasionally select the competing non-drug stimuli. If only one of the two systems performs a selection outside of the attractor, the difference in selection generates a dissonance or interference. In neural endophenotypes characterized by unbalanced control, this dissonance is solved by one circuit taking the lead, so that both systems eventually converge on the selection of the dominant circuit. These dynamics result in limited opportunities to generate non-drug-related responses to the external stimuli, as they can only be generated by the dominant circuit. Conversely, in balanced control endophenotypes, if any of the two circuits ignores the drug-stimulus and selects a competing option, the resulting dissonance can trigger a state transition pulling out the parasitic attractor states associated with substance use. The endophenotypes in our simulations vary only in the parameters regulating the balance between circuits, as dopamine-driven learning processes established between cortex and striatum (Eq. 3) do not vary across endophenotypes, resulting in identical habit formation and drug-related biases in the outcome representations. Thus, our proposed phenotypic differentiation does not interfere with the usual role ascribed to the ventral and dorsal circuits as, respectively, implicated in the initial reward-seeking phase in addiction (Belin and Everitt, 2008; Willuhn et al., 2012) and the subsequent consolidation of stimulus-response, habitual, association (Everitt and Robbins, 2013, 2016). However, our simulated dynamics show that, after addiction is developed, systemic overstability can be reduced or further enhanced, depending on the corticocortical biases between cortico-striatal circuits. In turn, this modulation of system stability can foster or further impair input discrimination and motor response versatility, affecting addiction symptomatology. As a result, our neural model shows phenotypic variability emerging after the presentation of the reward simulating the drug and addiction is developed, in a gradient of overselection of drug-related actions.

With the RL model, we investigate whether the balance between model-based and model-free modalities would also increase the robustness of the system against the selection of drug states in a more complex environment and in presence of explicit negative outcomes. Similar to the neural model, a system with balanced control modalities introduces more diversity in action selection during exploration, reducing (yet not cancelling) the chances of developing maladaptive reactive responses. This increased diversity and overall reliability are likely to be induced by a higher redundancy and diversification of the system. While both components may fail, the causes of failures are not necessarily correlated. The model-based system can fail due to its sensitivity to cognitive resources but it is more efficient in encoding previous experience of the agent. On the other hand, the model-free component is affected by limited exploration but it is reliable in its selections, which are not affected by the availability of cognitive resources. Consistent with the neural model, differentiations in behaviors among endophenotypes emerge in an inverted-U shape, where unbalanced control system are the most vulnerable to developing addiction.

The phenomenon of relapse is more elusive and the two models do not fully converge on this aspect. To investigate this phenomenon, we have adapted the complexity of real world treatments to the capabilities of our simulated agents and environments, where we can easily manipulate or extinguish consolidated memory, but we cannot engage all other aspects commonly involved in addiction treatment, such as cognitive or emotional functions or developing new behavioral strategies to compete with drug-related habits. Therefore, we implemented two compartmentalized treatments that we consider as ideal reference models that target only a single decision system or circuit. These putatively represent treatments capable of affecting only drug-related emotional/value or habitual/motor associations. In the neural model, balanced dorsal and ventral endophenotypes respond well to both types of simulated treatments. For the unbalanced endophenotypes, however, only the appropriate treatment, targeting the dominant neural circuit, is effective. The simulations in the RL model do not show the same symmetric effects for the two treatments: the model-free treatment is effective for most endophenotypes, whereas the model-based treatment is mostly unsuccessful, with short relapse times across all endophenotypes, but the purely model-based one. The latter result is possibly due to the learning process characterizing the model-based component, which is affected by conflicting information as drug use is associated with both positive and negative outcomes, experienced by the agent when entering the drug state under different phases.

It is worth noting that habitual and goal-oriented behaviors have neural representations in the dorsal and ventral cortico-striatal circuits, respectively, but they do not fully overlap with model-based and model-free control modalities in RL (Dolan and Dayan, 2013). Nonetheless, the neural and RL models independently simulate choices among competing options in addiction. Thus, we have been able to test our hypothesis of endophenotypic differentiation under two complementary levels in Marr’s tri-level of analysis: the neural implementation and the algorithmic level (Marr and Poggio, 1976). This multilevel modeling approach has been often used in computational psychiatry (Maia and Frank, 2011; Montague et al., 2012; Adams et al., 2016; Hauser et al., 2016; Huys et al., 2016) to highlight model convergence and associate specific neural structure and dynamics with mathematical formalizations of optimal and suboptimal behavior in RL. The convergence of neural and RL models on important predictions also provides more confidence in the reliability of the identified computational mechanisms underlying addiction and the associated characterization of endophenotypes. Specifically, both models indicate individuals with unbalanced cortico-striatal activity or control modality are at higher risk of developing addiction and relapse after any treatment. Thus, independent of phenotypic-specific treatments, our results suggest that individuals with these traits would require a prolonged or more intense treatment, in comparison with balanced endophenotypes. Finally, when considering phenomena that are divergent across both models (e.g., response across endophenotypes to our simulated treatments), our findings still demonstrate that important endophenotypic features might remain undetected in terms of pre-treatment observable behavior. The models showed that opposite unbalanced agents resulted in similar addictive behaviors and vulnerabilities, but diverged in treatment response, potentially informing the development of precision interventions. Further studies will be required to provide empirical validation of our models. For example, computational analysis of fMRI data can be used to test effective connectivity among cortico-striatal circuits (Friston et al., 2003), in conjunction with cognitive tasks targeting the model-based and model-free control systems.

Acknowledgments

Acknowledgements: We thank Prof. Karl Friston for his comments and kind suggestions in shaping this manuscript.

Footnotes

  • The authors declare no competing financial interests.

  • This work is supported by the Dallas Foundation and a startup grant from University of Texas at Dallas.

This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license, which permits unrestricted use, distribution and reproduction in any medium provided that the original work is properly attributed.

References

  1. ↵
    Adams RA, Huys QJ, Roiser JP (2016) Computational psychiatry: towards a mathematically informed understanding of mental illness. J Neurol Neurosurg Psychiatry 87:53–63. doi:10.1136/jnnp-2015-310737 pmid:26157034
    OpenUrlAbstract/FREE Full Text
  2. ↵
    Afraimovich V, Tristan I, Huerta R, Rabinovich MI (2008) Winnerless competition principle and prediction of the transient dynamics in a Lotka-Volterra model. Chaos 18:043103. doi:10.1063/1.2991108
    OpenUrlCrossRefPubMed
  3. ↵
    Amit DJ (1989) Modeling brain function: the world of attractor neural networks. Cambridge; New York: Cambridge University Press.
  4. ↵
    Baldassarre G, Mannella F, Fiore VG, Redgrave P, Gurney K, Mirolli M (2013) Intrinsically motivated action-outcome learning and goal-based action recall: a system-level bio-constrained computational model. Neural Netw 41:168–187. doi:10.1016/j.neunet.2012.09.015 pmid:23098753
    OpenUrlCrossRefPubMed
  5. ↵
    Balleine BW (2005) Neural bases of food-seeking: affect, arousal and reward in corticostriatolimbic circuits. Physiol Behav 86:717–730. doi:10.1016/j.physbeh.2005.08.061 pmid:16257019
    OpenUrlCrossRefPubMed
  6. ↵
    Balleine BW, O'Doherty JP (2010) Human and rodent homologies in action control: corticostriatal determinants of goal-directed and habitual action. Neuropsychopharmacology 35:48–69. doi:10.1038/npp.2009.131
    OpenUrlCrossRefPubMed
  7. ↵
    Belcher AM, Volkow ND, Moeller FG, Ferré S (2014) Personality traits and vulnerability or resilience to substance use disorders. Trends Cogn Sci 18:211–217. doi:10.1016/j.tics.2014.01.010 pmid:24612993
    OpenUrlCrossRefPubMed
  8. ↵
    Belin D, Everitt BJ (2008) Cocaine seeking habits depend upon dopamine-dependent serial connectivity linking the ventral with the dorsal striatum. Neuron 57:432–441. doi:10.1016/j.neuron.2007.12.019 pmid:18255035
    OpenUrlCrossRefPubMed
  9. ↵
    Belin D, Deroche-Gamonet V (2012) Responses to novelty and vulnerability to cocaine addiction: contribution of a multi-symptomatic animal model. Cold Spring Harb Perspect Med 2. doi:10.1101/cshperspect.a011940
    OpenUrlAbstract/FREE Full Text
  10. ↵
    Belin D, Mar AC, Dalley JW, Robbins TW, Everitt BJ (2008) High impulsivity predicts the switch to compulsive cocaine-taking. Science 320:1352–1355. doi:10.1126/science.1158136 pmid:18535246
    OpenUrlAbstract/FREE Full Text
  11. ↵
    Belin D, Berson N, Balado E, Piazza PV, Deroche-Gamonet V (2011) High-novelty-preference rats are predisposed to compulsive cocaine self-administration. Neuropsychopharmacology 36:569–579. doi:10.1038/npp.2010.188 pmid:20980989
    OpenUrlCrossRefPubMed
  12. ↵
    Belin D, Belin-Rauscent A, Everitt BJ, Dalley JW (2016) In search of predictive endophenotypes in addiction: insights from preclinical research. Genes Brain Behav 15:74–88. doi:10.1111/gbb.12265 pmid:26482647
    OpenUrlCrossRefPubMed
  13. ↵
    Bellman R (1966) Dynamic programming. Science 153:34–37. doi:10.1126/science.153.3731.34 pmid:17730601
    OpenUrlAbstract/FREE Full Text
  14. ↵
    Carroll ME, Morgan AD, Lynch WJ, Campbell UC, Dess NK (2002) Intravenous cocaine and heroin self-administration in rats selectively bred for differential saccharin intake: phenotype and sex differences. Psychopharmacology (Berl) 161:304–313. doi:10.1007/s00213-002-1030-5 pmid:12021834
    OpenUrlCrossRefPubMed
  15. ↵
    Cohen MX, Frank MJ (2009) Neurocomputational models of basal ganglia function in learning, memory and choice. Behav Brain Res 199:141–156. doi:10.1016/j.bbr.2008.09.029 pmid:18950662
    OpenUrlCrossRefPubMed
  16. ↵
    Covington HE 3rd., Miczek KA (2005) Intense cocaine self-administration after episodic social defeat stress, but not after aggressive behavior: dissociation from corticosterone activation. Psychopharmacology (Berl) 183:331–340. doi:10.1007/s00213-005-0190-5
    OpenUrlCrossRefPubMed
  17. ↵
    Dalley JW, Fryer TD, Brichard L, Robinson ES, Theobald DE, Lääne K, Peña Y, Murphy ER, Shah Y, Probst K, Abakumova I, Aigbirhio FI, Richards HK, Hong Y, Baron JC, Everitt BJ, Robbins TW (2007) Nucleus accumbens D2/3 receptors predict trait impulsivity and cocaine reinforcement. Science 315:1267–1270. doi:10.1126/science.1137073 pmid:17332411
    OpenUrlAbstract/FREE Full Text
  18. ↵
    Dalley JW, Everitt BJ, Robbins TW (2011) Impulsivity, compulsivity, and top-down cognitive control. Neuron 69:680–694. doi:10.1016/j.neuron.2011.01.020 pmid:21338879
    OpenUrlCrossRefPubMed
  19. ↵
    Daw ND, Dayan P (2014) The algorithmic anatomy of model-based evaluation. Philos Trans R Soc Lond B Biol Sci 369:doi:10.1098/rstb.2013.0478
    OpenUrlAbstract/FREE Full Text
  20. ↵
    Daw ND, Niv Y, Dayan P (2005) Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat Neurosci 8:1704–1711. doi:10.1038/nn1560 pmid:16286932
    OpenUrlCrossRefPubMed
  21. ↵
    Daw ND, Gershman SJ, Seymour B, Dayan P, Dolan RJ (2011) Model-based influences on humans' choices and striatal prediction errors. Neuron 69:1204–1215. doi:10.1016/j.neuron.2011.02.027 pmid:21435563
    OpenUrlCrossRefPubMed
  22. ↵
    Dayan P (2009) Dopamine, reinforcement learning, and addiction. Pharmacopsychiatry 42:S56–S65. doi:10.1055/s-0028-1124107
    OpenUrlCrossRefPubMed
  23. ↵
    Deco G, Jirsa VK, Robinson PA, Breakspear M, Friston K (2008) The dynamic brain: from spiking neurons to neural masses and cortical fields. PLoS Comput Biol 4:e1000092. doi:10.1371/journal.pcbi.1000092 pmid:18769680
    OpenUrlCrossRefPubMed
  24. ↵
    desJardins ME, Durfee EH, Ortiz J, Charles L, Wolverton MJ (1999) A survey of research in distributed, continual planning. AI Mag 20:13–22.
    OpenUrl
  25. ↵
    Dilleen R, Pelloux Y, Mar AC, Molander A, Robbins TW, Everitt BJ, Dalley JW, Belin D (2012) High anxiety is a predisposing endophenotype for loss of control over cocaine, but not heroin, self-administration in rats. Psychopharmacology (Berl) 222:89–97. doi:10.1007/s00213-011-2626-4 pmid:22245944
    OpenUrlCrossRefPubMed
  26. ↵
    Dolan RJ, Dayan P (2013) Goals and habits in the brain. Neuron 80:312–325. doi:10.1016/j.neuron.2013.09.007 pmid:24139036
    OpenUrlCrossRefPubMed
  27. ↵
    Doll BB, Daw ND (2016) The expanding role of dopamine. Elife 5. doi:10.7554/eLife.15963
    OpenUrlCrossRef
  28. ↵
    Doll BB, Jacobs WJ, Sanfey AG, Frank MJ (2009) Instructional control of reinforcement learning: a behavioral and neurocomputational investigation. Brain Res 1299:74–94. doi:10.1016/j.brainres.2009.07.007 pmid:19595993
    OpenUrlCrossRefPubMed
  29. ↵
    Draganski B, Kherif F, Klöppel S, Cook PA, Alexander DC, Parker GJ, Deichmann R, Ashburner J, Frackowiak RS (2008) Evidence for segregated and integrative connectivity patterns in the human basal ganglia. J Neurosci 28:7143–7152. doi:10.1523/JNEUROSCI.1486-08.2008 pmid:18614684
    OpenUrlAbstract/FREE Full Text
  30. ↵
    Economidou D, Pelloux Y, Robbins TW, Dalley JW, Everitt BJ (2009) High impulsivity predicts relapse to cocaine-seeking after punishment-induced abstinence. Biol Psychiatry 65:851–856. doi:10.1016/j.biopsych.2008.12.008 pmid:19181308
    OpenUrlCrossRefPubMed
  31. ↵
    Ersche KD, Turton AJ, Pradhan S, Bullmore ET, Robbins TW (2010) Drug addiction endophenotypes: impulsive versus sensation-seeking personality traits. Biol Psychiatry 68:770–773. doi:10.1016/j.biopsych.2010.06.015 pmid:20678754
    OpenUrlCrossRefPubMed
  32. ↵
    Everitt BJ, Robbins TW (2013) From the ventral to the dorsal striatum: devolving views of their roles in drug addiction. Neurosci Biobehav Rev 37:1946–1954. doi:10.1016/j.neubiorev.2013.02.010 pmid:23438892
    OpenUrlCrossRefPubMed
  33. ↵
    Everitt BJ, Robbins TW (2016) Drug addiction: updating actions to habits to compulsions ten years on. Annu Rev Psychol 67:23–50. doi:10.1146/annurev-psych-122414-033457 pmid:26253543
    OpenUrlCrossRefPubMed
  34. ↵
    Fiore VG, Dolan RJ, Strausfeld NJ, Hirth F (2015) Evolutionarily conserved mechanisms for the selection and maintenance of behavioural activity. Philos Trans R Soc Lond B Biol Sci 370:doi:10.1098/rstb.2015.0053
    OpenUrlAbstract/FREE Full Text
  35. ↵
    Fiore VG, Sperati V, Mannella F, Mirolli M, Gurney K, Friston K, Dolan RJ, Baldassarre G (2014) Keep focussing: striatal dopamine multiple functions resolved in a single mechanism tested in a simulated humanoid robot. Front Psychol 5:124. doi:10.3389/fpsyg.2014.00124
    OpenUrlCrossRefPubMed
  36. ↵
    Fiore VG, Rigoli F, Stenner MP, Zaehle T, Hirth F, Heinze HJ, Dolan RJ (2016) Changing pattern in the basal ganglia: motor switching under reduced dopaminergic drive. Sci Rep 6:23327. doi:10.1038/srep23327 pmid:27004463
    OpenUrlCrossRefPubMed
  37. ↵
    Fiore VG, Nolte T, Rigoli F, Smittenaar P, Gu X, Dolan RJ (2018) Value encoding in the globus pallidus: fMRI reveals an interaction effect between reward and dopamine drive. Neuroimage 173:249–257. doi:10.1016/j.neuroimage.2018.02.048 pmid:29481966
    OpenUrlCrossRefPubMed
  38. ↵
    Flagel SB, Robinson TE, Clark JJ, Clinton SM, Watson SJ, Seeman P, Phillips PE, Akil H (2010) An animal model of genetic vulnerability to behavioral disinhibition and responsiveness to reward-related cues: implications for addiction. Neuropsychopharmacology 35:388–400. doi:10.1038/npp.2009.142 pmid:19794408
    OpenUrlCrossRefPubMed
  39. ↵
    Flagel SB, Waselus M, Clinton SM, Watson SJ, Akil H (2014) Antecedents and consequences of drug abuse in rats selectively bred for high and low response to novelty. Neuropharmacology 76 Pt B:425–436. doi:10.1016/j.neuropharm.2013.04.033 pmid:23639434
    OpenUrlCrossRefPubMed
  40. ↵
    Friston KJ, Harrison L, Penny W (2003) Dynamic causal modelling. Neuroimage 19:1273–1302. doi:10.1016/S1053-8119(03)00202-7
    OpenUrlCrossRefPubMed
  41. ↵
    Gannon BM, Galindo KI, Rice KC, Collins GT (2017) Individual differences in the relative reinforcing effects of 3,4-methylenedioxypyrovalerone under fixed and progressive ratio schedules of reinforcement in rats. J Pharmacol Exp Ther 361:181–189. doi:10.1124/jpet.116.239376 pmid:28179474
    OpenUrlAbstract/FREE Full Text
  42. ↵
    Garrison KA, Potenza MN (2014) Neuroimaging and biomarkers in addiction treatment. Curr Psychiatry Rep 16:513. doi:10.1007/s11920-014-0513-5 pmid:25308385
    OpenUrlCrossRefPubMed
  43. ↵
    Gerfen CR, Surmeier DJ (2011) Modulation of striatal projection systems by dopamine. Annu Rev Neurosci 34:441–466. doi:10.1146/annurev-neuro-061010-113641 pmid:21469956
    OpenUrlCrossRefPubMed
  44. ↵
    Gershman SJ, Horvitz EJ, Tenenbaum JB (2015) Computational rationality: a converging paradigm for intelligence in brains, minds, and machines. Science 349:273–278. doi:10.1126/science.aac6076 pmid:26185246
    OpenUrlAbstract/FREE Full Text
  45. ↵
    Gillan CM, Kosinski M, Whelan R, Phelps EA, Daw ND (2016) Characterizing a psychiatric symptom dimension related to deficits in goal-directed control. Elife 5. doi:10.7554/eLife.11305
    OpenUrlAbstract/FREE Full Text
  46. ↵
    Gould RW, Duke AN, Nader MA (2014) PET studies in nonhuman primate models of cocaine abuse: translational research related to vulnerability and neuroadaptations. Neuropharmacology 84:138–151. doi:10.1016/j.neuropharm.2013.02.004 pmid:23458573
    OpenUrlCrossRefPubMed
  47. ↵
    Grahn JA, Parkinson JA, Owen AM (2009) The role of the basal ganglia in learning and memory: neuropsychological studies. Behav Brain Res 199:53–60. doi:10.1016/j.bbr.2008.11.020 pmid:19059285
    OpenUrlCrossRefPubMed
  48. ↵
    Gruber AJ, McDonald RJ (2012) Context, emotion, and the strategic pursuit of goals: interactions among multiple brain systems controlling motivated behavior. Front Behav Neurosci 6:50. doi:10.3389/fnbeh.2012.00050 pmid:22876225
    OpenUrlCrossRefPubMed
  49. ↵
    Gutkin BS, Dehaene S, Changeux JP (2006) A neurocomputational hypothesis for nicotine addiction. Proc Natl Acad Sci USA 103:1106–1111. doi:10.1073/pnas.0510220103 pmid:16415156
    OpenUrlAbstract/FREE Full Text
  50. ↵
    Haber S (2008) Parallel and integrative processing through the basal ganglia reward circuit: lessons from addiction. Biol Psychiatry 64:173–174. doi:10.1016/j.biopsych.2008.05.033 pmid:18617023
    OpenUrlCrossRefPubMed
  51. ↵
    Haber SN (2003) The primate basal ganglia: parallel and integrative networks. J Chem Neuroanat 26:317–330. pmid:14729134
    OpenUrlCrossRefPubMed
  52. ↵
    Hauser TU, Fiore VG, Moutoussis M, Dolan RJ (2016) Computational psychiatry of ADHD: neural gain impairments across Marrian levels of analysis. Trends Neurosci 39:63–73. doi:10.1016/j.tins.2015.12.009 pmid:26787097
    OpenUrlCrossRefPubMed
  53. ↵
    Hoffman RE, McGlashan TH (2001) Neural network models of schizophrenia. Neuroscientist 7:441–454. doi:10.1177/107385840100700513 pmid:11597103
    OpenUrlCrossRefPubMed
  54. ↵
    Huys QJ, Maia TV, Frank MJ (2016) Computational psychiatry as a bridge from neuroscience to clinical applications. Nat Neurosci 19:404–413. doi:10.1038/nn.4238 pmid:26906507
    OpenUrlCrossRefPubMed
  55. ↵
    Hyman SE, Malenka RC, Nestler EJ (2006) Neural mechanisms of addiction: the role of reward-related learning and memory. Annu Rev Neurosci 29:565–598. doi:10.1146/annurev.neuro.29.051605.113009 pmid:16776597
    OpenUrlCrossRefPubMed
  56. ↵
    Jahanshahi M, Obeso I, Rothwell JC, Obeso JA (2015) A fronto-striato-subthalamic-pallidal network for goal-directed and habitual inhibition. Nat Rev Neurosci 16:719–732. doi:10.1038/nrn4038 pmid:26530468
    OpenUrlCrossRefPubMed
  57. ↵
    Jimenez VA, Grant KA (2017) Studies using macaque monkeys to address excessive alcohol drinking and stress interactions. Neuropharmacology 122:127–135. doi:10.1016/j.neuropharm.2017.03.027 pmid:28347838
    OpenUrlCrossRefPubMed
  58. ↵
    Jonkman S, Pelloux Y, Everitt BJ (2012) Drug intake is sufficient, but conditioning is not necessary for the emergence of compulsive cocaine seeking after extended self-administration. Neuropsychopharmacology 37:1612–1619. doi:10.1038/npp.2012.6
    OpenUrlCrossRefPubMed
  59. ↵
    Jupp B, Dalley JW (2014) Behavioral endophenotypes of drug addiction: etiological insights from neuroimaging studies. Neuropharmacology 76 Pt B:487–497. doi:10.1016/j.neuropharm.2013.05.041 pmid:23756169
    OpenUrlCrossRefPubMed
  60. ↵
    Keramati M, Dezfouli A, Piray P (2011) Speed/accuracy trade-off between the habitual and the goal-directed processes. PLoS Comput Biol 7:e1002055. doi:10.1371/journal.pcbi.1002055 pmid:21637741
    OpenUrlCrossRefPubMed
  61. ↵
    Koob GF, Volkow ND (2016) Neurobiology of addiction: a neurocircuitry analysis. Lancet Psychiatry 3:760–773. doi:10.1016/S2215-0366(16)00104-8 pmid:27475769
    OpenUrlCrossRefPubMed
  62. ↵
    Maia TV, Frank MJ (2011) From reinforcement learning models to psychiatric and neurological disorders. Nat Neurosci 14:154–162. doi:10.1038/nn.2723 pmid:21270784
    OpenUrlCrossRefPubMed
  63. ↵
    Marr D, Poggio T (1976) From understanding computation to understanding neural circuitry. Cambridge: Massachusetts Institute of Technology, Artificial Intelligence Laboratory.
  64. ↵
    McClintick MN, Grant KA (2016) Aggressive temperament predicts ethanol self-administration in late adolescent male and female rhesus macaques. Psychopharmacology (Berl) 233:3965–3976. doi:10.1007/s00213-016-4427-2 pmid:27627910
    OpenUrlCrossRefPubMed
  65. ↵
    Molander AC, Mar A, Norbury A, Steventon S, Moreno M, Caprioli D, Theobald DE, Belin D, Everitt BJ, Robbins TW, Dalley JW (2011) High impulsivity predicting vulnerability to cocaine addiction in rats: some relationship with novelty preference but not novelty reactivity, anxiety or stress. Psychopharmacology (Berl) 215:721–731. doi:10.1007/s00213-011-2167-x pmid:21274702
    OpenUrlCrossRefPubMed
  66. ↵
    Montague PR, Dolan RJ, Friston KJ, Dayan P (2012) Computational psychiatry. Trends Cogn Sci 16:72–80. doi:10.1016/j.tics.2011.11.018 pmid:22177032
    OpenUrlCrossRefPubMed
  67. ↵
    Moore A, Atkeson CG (1993) Prioritized sweeping: reinforcement learning with less data and less time. Mach Learn 13:103–130. doi:10.1007/BF00993104
    OpenUrlCrossRef
  68. ↵
    Morgan D, Grant KA, Gage HD, Mach RH, Kaplan JR, Prioleau O, Nader SH, Buchheimer N, Ehrenkaufer RL, Nader MA (2002) Social dominance in monkeys: dopamine D2 receptors and cocaine self-administration. Nat Neurosci 5:169–174. doi:10.1038/nn798 pmid:11802171
    OpenUrlCrossRefPubMed
  69. ↵
    Nadal R, Armario A, Janak PH (2002) Positive relationship between activity in a novel environment and operant ethanol self-administration in rats. Psychopharmacology (Berl) 162:333–338. doi:10.1007/s00213-002-1091-5
    OpenUrlCrossRefPubMed
  70. ↵
    Nader MA, Czoty PW (2005) PET imaging of dopamine D2 receptors in monkey models of cocaine abuse: genetic predisposition versus environmental modulation. Am J Psychiatry 162:1473–1482. doi:10.1176/appi.ajp.162.8.1473 pmid:16055768
    OpenUrlCrossRefPubMed
  71. ↵
    Nestler EJ, Aghajanian GK (1997) Molecular and cellular basis of addiction. Science 278:58–63. pmid:9311927
    OpenUrlAbstract/FREE Full Text
  72. ↵
    Obeso JA, Rodriguez-Oroz MC, Stamelou M, Bhatia KP, Burn DJ (2014) The expanding universe of disorders of the basal ganglia. Lancet 384:523–531. doi:10.1016/S0140-6736(13)62418-6 pmid:24954674
    OpenUrlCrossRefPubMed
  73. ↵
    Pelloux Y, Everitt BJ, Dickinson A (2007) Compulsive drug seeking by rats under punishment: effects of drug taking history. Psychopharmacology (Berl) 194:127–137. doi:10.1007/s00213-007-0805-0 pmid:17514480
    OpenUrlCrossRefPubMed
  74. ↵
    Pelloux Y, Murray JE, Everitt BJ (2015) Differential vulnerability to the punishment of cocaine related behaviours: effects of locus of punishment, cocaine taking history and alternative reinforcer availability. Psychopharmacology (Berl) 232:125–134. doi:10.1007/s00213-014-3648-5 pmid:24952093
    OpenUrlCrossRefPubMed
  75. ↵
    Perry JL, Carroll ME (2008) The role of impulsive behavior in drug abuse. Psychopharmacology (Berl) 200:1–26. doi:10.1007/s00213-008-1173-0 pmid:18600315
    OpenUrlCrossRefPubMed
  76. ↵
    Pezzulo G, Rigoli F, Chersi F (2013) The mixed instrumental controller: using value of information to combine habitual choice and mental simulation. Front Psychol 4:92. doi:10.3389/fpsyg.2013.00092 pmid:23459512
    OpenUrlCrossRefPubMed
  77. ↵
    Piazza PV, Deminière JM, Le Moal M, Simon H (1989) Factors that predict individual vulnerability to amphetamine self-administration. Science 245:1511–1513. pmid:2781295
    OpenUrlAbstract/FREE Full Text
  78. ↵
    Piray P, Keramati MM, Dezfouli A, Lucas C, Mokri A (2010) Individual differences in nucleus accumbens dopamine receptors predict development of addiction-like behavior: a computational approach. Neural Comput 22:2334–2368. doi:10.1162/NECO_a_00009 pmid:20569176
    OpenUrlCrossRefPubMed
  79. ↵
    Rabinovich MI, Varona P, Selverston AI, Abarbanel HDI (2006) Dynamical principles in neuroscience. Rev Mod Phys 78. doi:10.1103/RevModPhys.78.1213
    OpenUrlCrossRef
  80. ↵
    Redish AD (2004) Addiction as a computational process gone awry. Science 306:1944–1947. doi:10.1126/science.1102384 pmid:15591205
    OpenUrlAbstract/FREE Full Text
  81. ↵
    Redish AD, Jensen S, Johnson A (2008) A unified framework for addiction: vulnerabilities in the decision process. Behav Brain Sci 31:415–437; discussion 437–487. doi:10.1017/S0140525X0800472X pmid:18662461
    OpenUrlCrossRefPubMed
  82. ↵
    Sadacca BF, Jones JL, Schoenbaum G (2016) Midbrain dopamine neurons compute inferred and cached value prediction errors in a common framework. Elife 5. doi:10.7554/eLife.13665
    OpenUrlCrossRefPubMed
  83. ↵
    Schultz W, Dayan P, Montague PR (1997) A neural substrate of prediction and reward. Science 275:1593–1599. pmid:9054347
    OpenUrlAbstract/FREE Full Text
  84. ↵
    Silver D, Huang A, Maddison CJ, Guez A, Sifre L, van den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, Dieleman S, Grewe D, Nham J, Kalchbrenner N, Sutskever I, Lillicrap T, Leach M, Kavukcuoglu K, Graepel T, Hassabis D (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529:484–489. doi:10.1038/nature16961 pmid:26819042
    OpenUrlCrossRefPubMed
  85. ↵
    Simon DA, Daw ND (2012) Dual-system learning models and drugs of abuse. In: Computational neuroscience of drug addiction, Ed 1 ( Gutkin B, Ahmed SH , eds), pp 145–161. New York: Springer.
  86. ↵
    Singh S, Jaakkola T, Littman ML, Szepesvári C (2000) Convergence results for single-step on-policy reinforcement-learning algorithms. Mach Learn 38:287–308. doi:10.1023/A:1007678930559
    OpenUrlCrossRef
  87. ↵
    Smethells JR, Zlebnik NE, Miller DK, Will MJ, Booth F, Carroll ME (2016) Cocaine self-administration and reinstatement in female rats selectively bred for high and low voluntary running. Drug Alcohol Depend 167:163–168. doi:10.1016/j.drugalcdep.2016.08.020 pmid:27567437
    OpenUrlCrossRefPubMed
  88. ↵
    Suto N, Austin JD, Vezina P (2001) Locomotor response to novelty predicts a rat's propensity to self-administer nicotine. Psychopharmacology (Berl) 158:175–180. doi:10.1007/s002130100867
    OpenUrlCrossRefPubMed
  89. ↵
    Sutton RS (1990) Integrated architecture for learning, planning, and reacting based on approximating dynamic programming. In: Proceedings of the seventh international conference (1990) on Machine learning, pp 216-224. Austin, TX: Morgan Kaufmann Publishers Inc.
  90. ↵
    Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. Cambridge, MA: MIT Press.
  91. ↵
    Verdejo-García A, Lawrence AJ, Clark L (2008) Impulsivity as a vulnerability marker for substance-use disorders: review of findings from high-risk research, problem gamblers and genetic association studies. Neurosci Biobehav Rev 32:777–810. doi:10.1016/j.neubiorev.2007.11.003 pmid:18295884
    OpenUrlCrossRefPubMed
  92. ↵
    Volkow ND, Morales M (2015) The brain on drugs: from reward to addiction. Cell 162:712–725. doi:10.1016/j.cell.2015.07.046 pmid:26276628
    OpenUrlCrossRefPubMed
  93. ↵
    Volkow ND, Fowler JS, Wang GJ, Swanson JM, Telang F (2007) Dopamine in drug abuse and addiction: results of imaging studies and treatment implications. Arch Neurol 64:1575–1579. doi:10.1001/archneur.64.11.1575 pmid:17998440
    OpenUrlCrossRefPubMed
  94. ↵
    Voon V, Reiter A, Sebold M, Groman S (2017) Model-based control in dimensional psychiatry. Biol Psychiatry 82:391–400. doi:10.1016/j.biopsych.2017.04.006 pmid:28599832
    OpenUrlCrossRefPubMed
  95. ↵
    Watkins CJ, Dayan P (1992) Q-learning. Mach Learn 8:279–292. doi:10.1007/BF00992698
    OpenUrlCrossRef
  96. ↵
    Willuhn I, Burgeno LM, Everitt BJ, Phillips PE (2012) Hierarchical recruitment of phasic dopamine signaling in the striatum during the progression of cocaine use. Proc Natl Acad Sci USA 109:20703–20708. doi:10.1073/pnas.1213460109 pmid:23184975
    OpenUrlAbstract/FREE Full Text
  97. ↵
    Yin HH, Knowlton BJ, Balleine BW (2004) Lesions of dorsolateral striatum preserve outcome expectancy but disrupt habit formation in instrumental learning. Eur J Neurosci 19:181–189. doi:10.1111/j.1460-9568.2004.03095.x
    OpenUrlCrossRefPubMed

Synthesis

Reviewing Editor: Gustavo Deco, Universitat Pompeu Fabra

Decisions are customarily a result of the Reviewing Editor and the peer reviewers coming together and discussing their recommendations until a consensus is reached. When revisions are invited, a fact-based synthesis statement explaining their decision and outlining what is needed to prepare a revision will be listed below. The following reviewer(s) agreed to reveal their identity: Marc Exton-McGuinness, Vincent Van Waes.

Rev 1:

The authors describe a computational model for the propensity of different striatal-based endophenotypes to drug addiction and simulate the effect of ventral/dorsal based treatments on relapse. The simulations are well conducted using established computational frameworks. The rationale is strong, as there is significant heterogeneity in addiction; moreover, endophenotypic differences may predict optimal treatment routes. My main criticism is the foundation of the dorsal/ventral (model-free/based) endophenotype model is not communicated as clearly as it could be in the introduction and methods; however, my points are all relatively minor and the study would be of interest to a broad readership, suitable for publication with minor modification. A point by point review is provided below.

1) Line 39: “two fundamental alterations...” It would be useful to explicitly state at some point in the introduction the hypothesis of dorsal/ventral balance in addiction, as this forms the basis of the model construction.

2) Line 115-117: “neural mass model that has been validated”. It would be helpful to the reader to explicitly cite here to point the reader in the direction of the validation study(s).

3) Line 136: “construct validation...” I'm unsure they truly provide for validation of each other, as this is only true if they predict the same outcome - which they don't necessarily?

4) Line 195: “these putative endophenotypes...” it is not immediately clear at this point in the manuscript what endophenotypes are being tested (relates to point 1).

5) Line 204: “simulated reverting learning process in either dorsal of ventral...” I think it would be useful to link this to existing addiction treatment programs? What current behavioural treatments/counselling methods would these simulated treatments map onto?

6) Also, it might be useful to have a ‘no treatment’ control simulation to compare time to relapse following an abstinence period equivalent to the treatment period, if this can be captured by the model?

7) Line 402: It would be useful here to critically appraise the reference study (Gannon et al. 2017) definition of compulsive self-administration (i.e. time-out responding) contrasted with the present study (selection of drug over alternatives), as these are not the same.

8) Line 525: There is an opportunity here for some interesting discussion related to the hypothesis that addiction is driven by a maladaptive habit (Everitt and Robbins, 2005). Perhaps the authors could comment here on whether they believe it is most useful to conceptualise addition as a ‘model-free’ habit for the purposes of treatment given their finding the model-free treatment was most effective generally; vs the incentive-salience hypothesis (Robinson and Berridge 2008)?

Rev 2:

This manuscript entitled “A Multilevel Computational Characterization of Endophenotypes in Addiction” is very well written and fascinating. The authors used two models (a neural model based on cortico-striatal circuit dynamics, and an algorithmic model of reinforcement learning) in an attempt to better understand the origin of interindividual differences in the vulnerability to develop an addiction. This is an elegant, novel and complementary approach to address this important theoretical question. Overall, this paper is very good, although quite difficult to read for a neophyte in computational neuroscience. It is important to note that the authors have also a good knowledge of the neurobiological mechanisms underlying addiction-related behavior. For these reasons, I think the manuscript deserve to be published in eNeuro.

Author Response

Dear editor and reviewers,

Thank you for considering our manuscript and for highlighting the merits of our study. We have now revised our manuscript after carefully considering and addressing all your previous concerns. We hope you will find this revised version appropriate for publication in eNeuro. Below, we provide a point-by-point reply to the reviewers’ comments. Please note our inline responses and excerpts from the revised manuscript.

Reviewer #1:

The authors describe a computational model for the propensity of different striatal-based endophenotypes to drug addiction and simulate the effect of ventral/dorsal based treatments on relapse. The simulations are well conducted using established computational frameworks. The rationale is strong, as there is significant heterogeneity in addiction; moreover, endophenotypic differences may predict optimal treatment routes. My main criticism is the foundation of the dorsal/ventral (model-free/based) endophenotype model is not communicated as clearly as it could be in the introduction and methods; however, my points are all relatively minor and the study would be of interest to a broad readership, suitable for publication with minor modification. A point by point review is provided below.

1) Line 39: “two fundamental alterations...” It would be useful to explicitly state at some point in the introduction the hypothesis of dorsal/ventral balance in addiction, as this forms the basis of the model construction.

Response 1: Thank you for your encouraging and constructive comments. We fully agree that further clarifying the theoretical foundation of ventral/dorsal circuits could help the reader better understand our study. We have addressed this concern by modifying the text in the introduction as follows (line 62):

“Here we propose a theoretical investigation into the interaction between ventral and dorsal cortico-striatal circuits and the associated behavioral control modalities. Several studies have emphasized that addiction is associated with alterations of ventral and dorsal cortico-striatal circuits, and of motivations and habits (Volkow and Morales, 2015; Everitt and Robbins, 2016; Koob and Volkow, 2016). However, the role played by the interaction between the two neural circuits or between the two behavioral control modalities in generating intersubject variability in addiction, has been so far neglected. To investigate this interaction, we use two models to simulate neural dynamics and algorithmic (or normative) choice selections in a multiple-choice task involving drug and non-drug rewards. Then we test these models under different conditions of circuit or control modality dominance (i.e. simulated endophenotypes).”

At the beginning of the Methods section we have further amended the text, as follows (line 104):

“In brief, we present two complementary models simulating endophenotypic differences and their effects on addiction development and treatment response. In the models, intersubject differences are expressed in terms of either neural circuit dominance (i.e. ventral or dorsal circuit) or control modality dominance (i.e. model-based or model-free) in determining behavioral selections.”

2) Line 115-117: “neural mass model that has been validated”. It would be helpful to the reader to explicitly cite here to point the reader in the direction of the validation study(s).

Response 2: We have now modified the main text to include references providing validation for both the neural model and the RL model (line 134):

“The two models comprise a neural mass model that has been validated and described in the context of choice behavior and dopaminergic modulation (Fiore et al., 2016; Hauser et al., 2016; Fiore et al., 2018) and a normative or algorithmic model based upon standard RL schemes (Sutton and Barto, 1998).”

3) Line 136: ‘construct validation...’ I'm unsure they truly provide for validation of each other, as this is only true if they predict the same outcome - which they don't necessarily?

Response 3: We fully agree that there are important differences in some of the predictions formulated by the two models (e.g. concerning treatment response), as well as significant convergence on other important predictions (e.g. phenotypic robustness in the face of addictive rewards for “balanced” endophenotypes). We have now rephrased this sentence to emphasize that the validation is limited to the converging predictions (line 157):

“The unique aspect of this complementary modeling approach is that converging results from neural and algorithmic models can validate each other, as process and implementation theories (i.e., synaptic and dynamical mechanisms) complement the normative principles formalized in the RL model.”

4) Line 195: ‘these putative endophenotypes...’ it is not immediately clear at this point in the manuscript what endophenotypes are being tested (relates to point 1).

Response 4: As proposed in the above response to comment #1, we have modified both the Introduction and the Methods section to provide more explicit details about endophenotypic differentiation. We report these changes here for your convenience. In the introduction (line 63):

“Here we propose a theoretical investigation into the interaction between ventral and dorsal cortico-striatal circuits and the associated behavioral control modalities. Several studies have emphasized that addiction is associated with alterations of ventral and dorsal cortico-striatal circuits, and of motivations and habits (Volkow and Morales, 2015; Everitt and Robbins, 2016; Koob and Volkow, 2016). However, the role played by the interaction between the two neural circuits or between the two behavioral control modalities in generating intersubject variability in addiction, has been so far neglected. To investigate this interaction, we use two models to simulate neural dynamics and algorithmic (or normative) choice selections in a multiple-choice task involving drug and non-drug rewards. Then we test these models under different conditions of circuit or control modality dominance (i.e. simulated endophenotypes).”

At the beginning of the Methods section, as follows (line 104):

“In brief, we present two complementary models simulating endophenotypic differences and their effects on addiction development and treatment response. In the models, intersubject differences are expressed in terms of either neural circuit dominance (i.e. ventral or dorsal circuit) or control modality dominance (i.e. model-based or model-free) in determining behavioral selections.”

5) Line 204: “simulated reverting learning process in either dorsal of ventral...” I think it would be useful to link this to existing addiction treatment programs? What current behavioural treatments/counselling methods would these simulated treatments map onto?

Response 5: Thank you for this suggestion. We fully agree that it would be very informative to both clinician and non-clinician readers, if we can establish a clear parallel between addiction treatment currently carried out in clinics and our simulated ones. Such comparison would also directly support the goal of computational psychiatry (i.e. bridging the gap between basic research and clinical work). Our “stylized therapeutic interventions” have been conceived with a twofold purpose as we aim to 1) report neural dynamics and information processing expressed by the different endophenotypes under significantly different conditions, and 2) represent as much as possible realistic behavioural and cognitive treatments. Furthermore, the complexity of real treatments had to be scaled and adapted to the capabilities of our simulated agents and environments. Our simulations easily allow to manipulate or extinguish consolidated memory, but they cannot engage (at least not in this initial investigation) other cognitive or emotional functions that are commonly involved in addiction treatment. Furthermore the environment does not allow the formation of newly learned behaviors to avoid substance use. As a result, we implemented two compartmentalised treatments that we consider as ideal or abstract reference models that artificially target only a single decision system or circuit to emphasize the different results on the simulated endophenotypes. Despite these limitations, we believe a parallel between our two simulated treatments and actual behavioural and cognitive treatments can provide important insight into plausible processes and neural substrates underlying differences in addiction vulnerabilities and treatment responses.

The text has been modified in the Methods as follows (line 226):

“The treatments are simulated by reverting the learning process in either the dorsal or the ventral cortico-striatal circuit, respectively representing an intervention that would block or extinguish either the habitual drug-related response (an ideal behavioral treatment) or the drug-related emotional and value association (an ideal cognitive treatment).”

In the Discussion we now report (line 546):

“The phenomenon of relapse is more elusive and the two models do not fully converge on this aspect. To investigate this phenomenon, we have adapted the complexity of real world treatments to the capabilities of our simulated agents and environments, where we can easily manipulate or extinguish consolidated memory, but we cannot engage all other aspects commonly involved in addiction treatment, such as cognitive or emotional functions or developing new behavioral strategies to compete with drug-related habits. Therefore, we implemented two compartmentalized treatments that we consider as ideal reference models that target only a single decision system or circuit. These putatively represent treatments capable of affecting only drug-related emotional/value or habitual/motor associations.”

6) Also, it might be useful to have a ‘no treatment’ control simulation to compare time to relapse following an abstinence period equivalent to the treatment period, if this can be captured by the model?

Response 6: Thank you for this helpful suggestion. A “no treatment” control simulation may imply two scenarios. The first scenario involves agents that are not affected by any manipulation and are simply tested as they are at the end of the addiction phase, in a new run under the no-drug condition. In this case, however, time of relapse would be equal to zero across all endophenotypes, as for the neural model we consider agents have relapsed if they reach the neural configuration recorded under the addiction phase. The second scenario entails a memory decay, assuming that simulated addict individuals kept away from both the stimuli eliciting drug-related behaviour and the substance of abuse itself, without any other treatment, would partially “forget” drug-related motivations and actions, due to a diminished strength of drug-related motor and goal-oriented connectivity. We have tested this second scenario under different levels of decay or “forgetting rate”. As exemplified in the results we report here, we found that the time of simulated relapse was a fraction of the time required to develop addiction the first time, as stronger decays resulted in longer time in a stable proportion across endophenotypes. Given these results, we have not amended the manuscript to describe this test, as we deemed it to be not informative of a real abstinence without treatment scenario.

7) Line 402: It would be useful here to critically appraise the reference study (Gannon et al. 2017) definition of compulsive self-administration (i.e. time-out responding) contrasted with the present study (selection of drug over alternatives), as these are not the same.

Response 7: We appreciate your concern regarding this matter. There are indeed important differences between task used for our two models and the standard operant conditioning chambers used in the chosen reference study by Gannon et al. (2017). This is not an uncommon choice: despite the fact “time-out” procedures are very common in animal models of addiction, artificial agents mostly rely on multiple-options simulated set-ups (see e.g. Redish, 2004 and subsequent). This choice of a richer environment allows to establish a better approximation of both real world and experimental conditions, as animals are always confronted with different types of stimuli and rewards, even in the simplified environment of an operant chamber (e.g. a rodent might be driven by the desire to explore the chamber). We have modified the text at the beginning of the Methods section as follows, to better describe our reasoning on this matter (line 120):

“Importantly, the task setup chosen for both of our proposed models involves the selection of a drug reward over explicit non-drug related alternatives; in contrast, the chosen empirical study utilizes a time-out responding paradigm, where the only explicit non-drug related behavior (a lever-press) is not rewarded. As for most studies simulating addiction (e.g. see: Redish, 2004), we believe the choice to present our simulated agents with a richer set of options (i.e. more than one) does not invalidate a parallel between simulated and real data. We consider the simulated competing options as a proxy for the many conflicting stimuli and associated behaviors that animals have access to, even in the limited environment of a standard operant conditioning chamber. Thus, our focus is on perturbing the balance between the dorsal/model-free and the ventral/model-based systems, to compare our simulated behavioral differentiations in the escalation and compulsive selection of drug-related actions with the data reported in the chosen laboratory study.”

8) Line 525: There is an opportunity here for some interesting discussion related to the hypothesis that addiction is driven by a maladaptive habit (Everitt and Robbins, 2005). Perhaps the authors could comment here on whether they believe it is most useful to conceptualise addition as a ‘model-free’ habit for the purposes of treatment given their

finding the model-free treatment was most effective generally; vs the incentive-salience hypothesis (Robinson and Berridge 2008)?

Response 8: Thank you for suggesting this possible interpretation of our data. It is indeed interesting that the MF treatment in the RL model is the most effective across endophenotypes, including the purely model-based one. However, the two models do not converge on this prediction, as the neural model shows a significant differentiation in predicted outcomes for the two treatments, depending on circuit dominance. Indeed, the dorsal treatment in the neural model, which represents the intervention in the neural substrate mainly responsible for MF control, fails to have a significant effect for those phenotypes unbalanced in favour of the ventral circuit, which respond well to the ventral treatment (loosely equivalent to MB treatment). This divergence is due to the different constructs characterising the two models as well as the different key assumptions at the basis of the respective idealised treatments. We have amended the text to clarify merits and weaknesses underlying this difference at the end of the Discussion section, as follows (line 575):

“The convergence of neural and RL models on important predictions also provides more confidence in the reliability of the identified computational mechanisms underlying addiction and the associated characterization of endophenotypes. Specifically, both models indicate individuals with unbalanced cortico-striatal activity or control modality are at higher risk of developing addiction and relapse after any treatment. Thus, independent of phenotypic-specific treatments, our results suggest that individuals with these traits would require a prolonged or more intense treatment, in comparison with balanced endophenotypes. Finally, when considering phenomena that are divergent across both models (e.g. response across endophenotypes to our simulated treatments), our findings still demonstrate that important endophenotypic features might remain undetected in terms of pre-treatment observable behavior. The models showed that opposite unbalanced agents resulted in similar addictive behaviors and vulnerabilities, but diverged in treatment response, potentially informing the development of precision interventions.”

Reviewer #2:

This manuscript entitled “A Multilevel Computational Characterization of Endophenotypes in Addiction” is very well written and fascinating. The authors used two models (a neural model based on cortico-striatal circuit dynamics, and an algorithmic model of reinforcement learning) in an attempt to better understand the origin of interindividual differences in the vulnerability to develop an addiction. This is an elegant, novel and complementary approach to address this important theoretical question. Overall, this paper is very good, although quite difficult to read for a neophyte in computational neuroscience. It is important to note that the authors have also a good knowledge of the neurobiological mechanisms underlying addiction-related behavior. For these reasons, I think the manuscript deserve to be published in eNeuro.

Response 1: We thank the reviewer for this very positive and encouraging review and for the appreciation expressed for our efforts in carrying out this study.

Back to top

In this issue

eneuro: 5 (4)
eNeuro
Vol. 5, Issue 4
July/August 2018
  • Table of Contents
  • Index by author
Email

Thank you for sharing this eNeuro article.

NOTE: We request your email address only to inform the recipient that it was you who recommended this article, and that it is not junk mail. We do not retain these email addresses.

Enter multiple addresses on separate lines or separate them with commas.
A Multilevel Computational Characterization of Endophenotypes in Addiction
(Your Name) has forwarded a page to you from eNeuro
(Your Name) thought you would be interested in this article in eNeuro.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Print
View Full Page PDF
Citation Tools
A Multilevel Computational Characterization of Endophenotypes in Addiction
Vincenzo G. Fiore, Dimitri Ognibene, Bryon Adinoff, Xiaosi Gu
eNeuro 3 July 2018, 5 (4) ENEURO.0151-18.2018; DOI: 10.1523/ENEURO.0151-18.2018

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Respond to this article
Share
A Multilevel Computational Characterization of Endophenotypes in Addiction
Vincenzo G. Fiore, Dimitri Ognibene, Bryon Adinoff, Xiaosi Gu
eNeuro 3 July 2018, 5 (4) ENEURO.0151-18.2018; DOI: 10.1523/ENEURO.0151-18.2018
del.icio.us logo Digg logo Reddit logo Twitter logo Facebook logo Google logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Jump to section

  • Article
    • Abstract
    • Significance Statement
    • Introduction
    • Materials and Methods
    • Code accessibility
    • Results
    • Discussion
    • Acknowledgments
    • Footnotes
    • References
    • Synthesis
    • Author Response
  • Figures & Data
  • Info & Metrics
  • eLetters
  • PDF

Keywords

  • addiction
  • Neural Model
  • phenotyping
  • reinforcement learning

Responses to this article

Respond to this article

Jump to comment:

No eLetters have been published for this article.

Related Articles

Cited By...

More in this TOC Section

Theory/New Concepts

  • How Do Spike Collisions Affect Spike Sorting Performance?
  • Linking Brain Structure, Activity, and Cognitive Function through Computation
  • Understanding the Significance of the Hypothalamic Nature of the Subthalamic Nucleus
Show more Theory/New Concepts

Novel Tools and Methods

  • Behavioral and Functional Brain Activity Alterations Induced by TMS Coils with Different Spatial Distributions
  • Bicistronic expression of a high-performance calcium indicator and opsin for all-optical stimulation and imaging at cellular resolution
  • Synthetic Data Resource and Benchmarks for Time Cell Analysis and Detection Algorithms
Show more Novel Tools and Methods

Subjects

  • Novel Tools and Methods

  • Home
  • Alerts
  • Visit Society for Neuroscience on Facebook
  • Follow Society for Neuroscience on Twitter
  • Follow Society for Neuroscience on LinkedIn
  • Visit Society for Neuroscience on Youtube
  • Follow our RSS feeds

Content

  • Early Release
  • Current Issue
  • Latest Articles
  • Issue Archive
  • Blog
  • Browse by Topic

Information

  • For Authors
  • For the Media

About

  • About the Journal
  • Editorial Board
  • Privacy Policy
  • Contact
  • Feedback
(eNeuro logo)
(SfN logo)

Copyright © 2023 by the Society for Neuroscience.
eNeuro eISSN: 2373-2822

The ideas and opinions expressed in eNeuro do not necessarily reflect those of SfN or the eNeuro Editorial Board. Publication of an advertisement or other product mention in eNeuro should not be construed as an endorsement of the manufacturer’s claims. SfN does not assume any responsibility for any injury and/or damage to persons or property arising from or related to any use of any material contained in eNeuro.