Skip to main content

Main menu

  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Blog
    • Collections
    • Podcast
  • TOPICS
    • Cognition and Behavior
    • Development
    • Disorders of the Nervous System
    • History, Teaching and Public Awareness
    • Integrative Systems
    • Neuronal Excitability
    • Novel Tools and Methods
    • Sensory and Motor Systems
  • ALERTS
  • FOR AUTHORS
  • ABOUT
    • Overview
    • Editorial Board
    • For the Media
    • Privacy Policy
    • Contact Us
    • Feedback
  • SUBMIT

User menu

Search

  • Advanced search
eNeuro

eNeuro

Advanced Search

 

  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Blog
    • Collections
    • Podcast
  • TOPICS
    • Cognition and Behavior
    • Development
    • Disorders of the Nervous System
    • History, Teaching and Public Awareness
    • Integrative Systems
    • Neuronal Excitability
    • Novel Tools and Methods
    • Sensory and Motor Systems
  • ALERTS
  • FOR AUTHORS
  • ABOUT
    • Overview
    • Editorial Board
    • For the Media
    • Privacy Policy
    • Contact Us
    • Feedback
  • SUBMIT
PreviousNext
Research ArticleTheory/New Concepts, Cognition and Behavior

A Dynamic Connectome Supports the Emergence of Stable Computational Function of Neural Circuits through Reward-Based Learning

David Kappel, Robert Legenstein, Stefan Habenschuss, Michael Hsieh and Wolfgang Maass
eNeuro 2 April 2018, 5 (2) ENEURO.0301-17.2018; DOI: https://doi.org/10.1523/ENEURO.0301-17.2018
David Kappel
1Institute for Theoretical Computer Science, Graz University of Technology, 8010 Graz, Austria
2Bernstein Center for Computational Neuroscience, III Physikalisches Institut–Biophysik, Georg-August Universität, 37077 Göttingen, Germany
3Chair for Highly-Parallel VLSI-Systems and Neuromorphic Circuits, Technische Universität Dresden, 01069 Dresden, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Robert Legenstein
1Institute for Theoretical Computer Science, Graz University of Technology, 8010 Graz, Austria
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Robert Legenstein
Stefan Habenschuss
1Institute for Theoretical Computer Science, Graz University of Technology, 8010 Graz, Austria
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Michael Hsieh
1Institute for Theoretical Computer Science, Graz University of Technology, 8010 Graz, Austria
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Wolfgang Maass
1Institute for Theoretical Computer Science, Graz University of Technology, 8010 Graz, Austria
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Wolfgang Maass
  • Article
  • Figures & Data
  • Info & Metrics
  • eLetters
  • PDF
Loading

Article Figures & Data

Figures

  • Tables
  • Extended Data
  • Figure1
    • Download figure
    • Open in new tab
    • Download powerpoint
  • Figure 1.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 1.

    Illustration of the theoretical framework. A, A neural network scaffold Embedded Image of excitatory (blue triangles) and inhibitory (purple circles) neurons. Potential synaptic connections (dashed blue arrows) of only two excitatory neurons are shown to keep the figure uncluttered. Synaptic connections (black connections) from and to inhibitory neurons are assumed to be fixed for simplicity. B, A reward landscape for two parameters Embedded Image with several local optima. Z-amplitude and color indicate the expected reward Embedded Image for given parameters Embedded Image (X-Y plane). C, Example prior that prefers small values for θ1 and θ2. D, The posterior distribution Embedded Image that results as product of the prior from C and the expected discounted reward of B. E, Illustration of the dynamic forces (plasticity rule Eq. 5) that act on Embedded Image in each sampling step Embedded Image (black) while sampling from the posterior distribution. The deterministic term (red), which consists of the first two terms (prior and reward expectation) in Equation 5, is directed to the next local maximum of the posterior. The stochastic term Embedded Image (green) of Equation 5 has a random direction. F, A single trajectory of policy sampling from the posterior distribution of D under Equation 5, starting at the black dot. The parameter vector Embedded Image fluctuates between different solutions and moves primarily along the task-irrelevant dimension θ2.

  • Figure 2.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 2.

    Reward-based routing of input patterns. A, Illustration of the network scaffold. A population of 20 model MSNs (blue) receives input from 200 excitatory input neurons (green) that model cortical neurons. Potential synaptic connections between these two populations of neurons were subject to reward-based synaptic sampling. In addition, fixed lateral connections provided recurrent inhibitory input to the MSNs. The MSNs were divided into two groups, each projecting exclusively to one of two target areas T1 and T2. Reward was delivered whenever the network managed to route an input pattern Pi primarily to that group of MSNs that projected to target area Ti. B, Illustration of the model for spine dynamics. Five potential synaptic connections at different states are shown. Synaptic spines are represented by circular volumes with diameters proportional to Embedded Image for functional connections, assuming a linear correlation between spine-head volume and synaptic efficacy wi (Matsuzaki et al., 2001). C, Dynamics of weights wi in log scale for 10 potential synaptic connections i when the activity-dependent term Embedded Image in Equation 5 is set equal to zero). As in experimental data (Holtmaat et al., 2006, their Fig. 2I) the dynamics is in this case consistent with an Ornstein–Uhlenbeck process in the logarithmic scale. Weight values are plotted relative to the initial value at time 0. D, E, Dynamics of a model synapse when a reward-modulated STDP pairing protocol as in Yagishita et al. (2014) was applied. D, Reward delivery after repeated firing of the presynaptic neuron before the postsynaptic neuron resulted in a strong weight increase (left). This effect was reduced without reward (right) and prevented completely if no presynaptic stimulus was applied. Values in D, E represent percentage of weight changes relative to the pairing onset time (dashed line, means ± SEM over 50 synapses). Compare with Yagishita et al. (2014), their Figure 1F,G. E, Dependence of resulting changes in synaptic weights in our model as a function of the delay of reward delivery. Gray shaded rectangle indicates the time window of STDP pairing application. Reward delays denote time between paring and reward onset. Compare to Yagishita et al. (2014), their Figure 1O. F, The average reward achieved by the network increased quickly during learning according to Equation 5 (mean over five independent trial runs; shaded area indicates SEM). G, Synaptic parameters kept changing throughout the experiment in F. The magnitude of the change of the synaptic parameter vector Embedded Image is shown (mean ± SEM as in F; Euclidean norm, normalized to the maximum value). The parameter change peaks at the onset of learning but remains high (larger than 80% of the maximum value) even when stable performance has been reached. H, Spiking activity of the network during learning. Activities of 20 randomly selected input neurons and all MSNs are shown. Three salient input neurons (belonging to pools S1 or S2 in I) are highlighted. Most neurons have learnt to fire at a higher rate for the input pattern Pj that corresponds to the target area Tj to which they are projecting. Bottom, Reward delivered to the network. I, Dynamics of network rewiring throughout learning. Snapshots of network configurations for the times t indicated below the plots are shown. Gray lines indicate active connections between neurons; connections that were not present at the preceding snapshot are highlighted in green. All output neurons and two subsets of input neurons that fire strongly in pattern P1 or P2 are shown (pools S1 and S2, 20 neurons each). Numbers denote total counts of functional connections between pools. The connectivity was initially dense and then rapidly restructured and became sparser. Rewiring took place all the time throughout learning. J, Analysis of random exploration in task-irrelevant dimensions of the parameter space. Projection of the parameter vector Embedded Image to the two dPCA components that best explain the variance of the average reward. dpc1 explains >99.9% of the reward variance (dpc2 and higher dimensions <0.1%). A single trajectory of the high-dimensional synaptic parameter vector over 24 h of learning projected onto dpc1 and dpc2 is shown. Amplitude on the y-axis denotes the estimated average reward (in fractions of the total maximum achievable reward). After converging to a region of high reward (movement mainly along dpc1), network continues to explore task-irrelevant dimensions (movement mainly along dpc2).

  • Figure 3.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 3.

    Reward-based self-configuration and compensation capability of a recurrent neural network. A, Network scaffold and task schematic. Symbol convention as in Figure 1A. A recurrent network scaffold of excitatory and inhibitory neurons (large blue circle); a subset of excitatory neurons received input from afferent excitatory neurons (indicated by green shading). From the remaining excitatory neurons, two pools D and U were randomly selected to control lever movement (blue shaded areas). Bottom inset, Stereotypical movement that had to be generated to receive a reward. B, Spiking activity of the network at learning onset and after 22 h of learning. Activities of random subsets of neurons from all populations are shown (hidden: excitatory neurons of the recurrent network, which are not in pool D or U). Bottom, Lever position inferred from the neural activity in pools D and U. Rewards are indicated by red bars. Gray shaded areas indicate cue presentation. C, Task performance quantified by the average time from cue presentation onset to movement completion. The network was able to solve this task in <1 s on average after ∼8 h of learning. A task change was introduced at time 24 h (asterisk; function of D and U switched), which was quickly compensated by the network. Using a simplified version of the learning rule, where the reintroduction of nonfunctional potential connections was approximated using exponentially distributed waiting times (green), yielded similar results (see also E). If the connectome was kept fixed after the task change at 24 h, performance was significantly worse (black). D, Trial-averaged network activity (top) and lever movements (bottom). Activity traces are aligned to movement onsets (arrows); y-axis of trial-averaged activity plots are sorted by the time of highest firing rate within the movement at various times during learning: sorting of the first and second plot is based on the activity at t = 0 h, third and fourth by that at t = 22 h, fifth is resorted by the activity at t = 46 h. Network activity is clearly restructured through learning with particularly stereotypical assemblies for sharp upward movements. Bottom: average lever movement (black) and 10 individual movements (gray). E, Turnover of synaptic connections for the experiment shown in D; y-axis is clipped at 3,000. Turnover rate during the first 2 h was around 12,000 synapses (∼25%) and then decreased rapidly. Another increase in spine turnover rate can be observed after the task change at time 24 h. F, Effect of forgetting due to parameter diffusion over 14 simulated days. Application of reward was stopped after 24 h when the network had learned to reliably solve the task. Parameters subsequently continue to evolve according to the SDE (Eq. 5). Onset of forgetting can be observed after day 6. A simple consolidation mechanism triggered after 4 days reliably prevents forgetting. G, Histograms of time intervals between disappearance and reappearance of synapses (waiting times) for the exact (upper plot) and approximate (lower plot) learning rule. H, Relative fraction of potential synaptic connections that were stably nonfunctional, transiently decaying, transiently emerging or stably function during the relearning phase for the experiment shown in D. I, PCA of a random subset of the parameters θi. The plot suggests continuing dynamics in task-irrelevant dimensions after the learning goal has been reached (indicated by red color). When the function of the neuron pools U and D was switched after 24 h, the synaptic parameters migrated to a new region. All plots show means over five independent runs (error bars: SEM).

  • Figure 4.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 4.

    Impact of the prior distribution and reward amplitude on the synaptic dynamics. Task performance and total number of active synaptic connections throughout learning for different prior distributions and distribution of initial synaptic parameters. Synaptic parameters were initially drawn from a Gaussian distribution with mean μinit and σ = 0.5. Comparison of the task performance and number of active synapses for the parameter set used in Figure 3 (A) and Gaussian prior distribution with different parameters (B). C, In addition, a Laplace prior with different parameters was tested. The prior distribution and the initial synaptic parameters had a marked effect on the task performance and overall network connectivity. D, Impact of the reward amplitude on the synaptic dynamics. Task performance is here measured for different values of cr to scale the amplitude of the reward signal. Dashed lines denote the task switch as in Figure 3.

  • Figure 5.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 5.

    Contribution of spontaneous and neural activity-dependent processes to synaptic dynamics. A, B, Evolution of synaptic weights wi plotted against time for a pair of CI synapses in a, and non-CI synapses in B, for temperature T = 0.5. C, Pearson’s correlation coefficient computed between synaptic weights of CI and non-CI synapses of a network with T = 0.5 after 48 h of learning as in Figure 3C,D. CI synapses were only weakly, but significantly stronger correlated than non-CI synapses. D, Impact of T on correlation of CI synapses (x-axis) and learning performance (y-axis). Each dot represents averaged data for one particular temperature value, indicated by the color. Values for T were 1.0, 0.75, 0.5, 0.35, 0.2, 0.15, 0.1, 0.01, 0.001, and 0.0. These values are proportional to the small vertical bars above the color bar. The performance (measured in movement completion time) is measured after 48 h for the learning experiment as in Figure 3C,D, where the network changed completely after 24 h. Good performance was achieved for a range of temperature values between 0.01 and 0.5. Too low (<0.01) or too high (>0.5) values impaired learning. Means ± SEM over five independent trials are shown. E, Synaptic weights of 100 pairs of CI synapses that emerged from a run with T = 0.5. Pearson’s correlation is 0.239, comparable to the experimental data in Dvorkin and Ziv (2016), their Figure 8A–D. F, Estimated contributions of activity history dependent (green), spontaneous synapse-autonomous (blue) and neuron-wide (gray) processes to the synaptic dynamics for a run with T = 0.15. The resulting fractions are very similar to those in the experimental data, see Dvorkin and Ziv (2016), their Figure 8E. G, Evolution of learning performance and total number of active synaptic connections for different temperatures as in D. Compensation for task perturbation was significantly faster with higher temperatures. Temperatures larger than 0.5 prevented compensation. Overall number of synapses was decreasing for temperatures T < 0.1 and increasing for T ≥ 0.1.

  • Figure 6.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 6.

    Drifts of neural codes while performance remained constant. Trial-averaged network activity as in Figure 3D evaluated at three different times selected from a time window where the network performance was stable (Fig. 3C). Each column shows the same trial-averaged activity plot but subject to different sorting. Rows correspond to one sorting criterion based on one evaluation time.

  • Figure 7.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 7.

    2D projections of the PCA analysis in Figure 3I. The 3D projection as in Figure 3I, top right, and the corresponding 2D projections are shown.

Tables

  • Figures
  • Extended Data
    • View popup
    Table 1.

    Parameters of the synapse model Equations 15, 16, and 18

    SymbolValueDescription
    T0.1Temperature
    τe1 sTime constant of eligibility trace
    τg50 sTime constant of gradient estimator
    τa50 sTime constant to estimate the average reward
    α0.02Offset to reward signals
    β10–5Learning rate
    μ0Mean of prior
    σ2STD of prior
    • Parameter values were found by fitting the experimental data of Yagishita et al. (2014). If not stated otherwise, these values were used in all experiments.

Extended Data

  • Figures
  • Tables
  • Extended Data

    Supplementary Source Code. Download Extended Data, ZIP file.

Back to top

In this issue

eneuro: 5 (2)
eNeuro
Vol. 5, Issue 2
March/April 2018
  • Table of Contents
  • Index by author
Email

Thank you for sharing this eNeuro article.

NOTE: We request your email address only to inform the recipient that it was you who recommended this article, and that it is not junk mail. We do not retain these email addresses.

Enter multiple addresses on separate lines or separate them with commas.
A Dynamic Connectome Supports the Emergence of Stable Computational Function of Neural Circuits through Reward-Based Learning
(Your Name) has forwarded a page to you from eNeuro
(Your Name) thought you would be interested in this article in eNeuro.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Print
View Full Page PDF
Citation Tools
A Dynamic Connectome Supports the Emergence of Stable Computational Function of Neural Circuits through Reward-Based Learning
David Kappel, Robert Legenstein, Stefan Habenschuss, Michael Hsieh, Wolfgang Maass
eNeuro 2 April 2018, 5 (2) ENEURO.0301-17.2018; DOI: 10.1523/ENEURO.0301-17.2018

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Respond to this article
Share
A Dynamic Connectome Supports the Emergence of Stable Computational Function of Neural Circuits through Reward-Based Learning
David Kappel, Robert Legenstein, Stefan Habenschuss, Michael Hsieh, Wolfgang Maass
eNeuro 2 April 2018, 5 (2) ENEURO.0301-17.2018; DOI: 10.1523/ENEURO.0301-17.2018
del.icio.us logo Digg logo Reddit logo Twitter logo Facebook logo Google logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Jump to section

  • Article
    • Visual Abstract
    • Abstract
    • Significance Statement
    • Introduction
    • Results
    • Discussion
    • Materials and Methods
    • Footnotes
    • References
    • Synthesis
  • Figures & Data
  • Info & Metrics
  • eLetters
  • PDF

Keywords

  • reward-Modulated STDP
  • Spine Dynamics
  • Stochastic Synaptic Plasticity
  • Synapse-Autonomous Processes
  • Synaptic Rewiring
  • Task-Irrelevant Dimensions in Motor Control

Responses to this article

Respond to this article

Jump to comment:

No eLetters have been published for this article.

Related Articles

Cited By...

More in this TOC Section

Theory/New Concepts

  • How Do Spike Collisions Affect Spike Sorting Performance?
  • Linking Brain Structure, Activity, and Cognitive Function through Computation
  • Understanding the Significance of the Hypothalamic Nature of the Subthalamic Nucleus
Show more Theory/New Concepts

Cognition and Behavior

  • Dopamine Receptor Type 2-Expressing Medium Spiny Neurons in the Ventral Lateral Striatum Have a Non-REM Sleep-Induce Function
  • How Sucrose Preference Is Gained and Lost: An In-Depth Analysis of Drinking Behavior during the Sucrose Preference Test in Mice
  • Similarities and Distinctions between Cortical Neural Substrates That Underlie Generation of Malevolent Creative Ideas
Show more Cognition and Behavior

Subjects

  • Cognition and Behavior

  • Home
  • Alerts
  • Visit Society for Neuroscience on Facebook
  • Follow Society for Neuroscience on Twitter
  • Follow Society for Neuroscience on LinkedIn
  • Visit Society for Neuroscience on Youtube
  • Follow our RSS feeds

Content

  • Early Release
  • Current Issue
  • Latest Articles
  • Issue Archive
  • Blog
  • Browse by Topic

Information

  • For Authors
  • For the Media

About

  • About the Journal
  • Editorial Board
  • Privacy Policy
  • Contact
  • Feedback
(eNeuro logo)
(SfN logo)

Copyright © 2023 by the Society for Neuroscience.
eNeuro eISSN: 2373-2822

The ideas and opinions expressed in eNeuro do not necessarily reflect those of SfN or the eNeuro Editorial Board. Publication of an advertisement or other product mention in eNeuro should not be construed as an endorsement of the manufacturer’s claims. SfN does not assume any responsibility for any injury and/or damage to persons or property arising from or related to any use of any material contained in eNeuro.