A spiking neural network model of an actor-critic learning agent

Wiebke Potjans; Abigail Morrison; Markus Diesmann

doi:10.1162/neco.2008.08-07-593

A spiking neural network model of an actor-critic learning agent

Neural Comput. 2009 Feb;21(2):301-39. doi: 10.1162/neco.2008.08-07-593.

Authors

Wiebke Potjans¹, Abigail Morrison, Markus Diesmann

Affiliation

¹ Computational Neuroscience Group, RIKEN Brain Science Institute, Wako City, Saitama 351-0198, Japan. wiebke_potjans@brain.riken.jp

PMID: 19196231
DOI: 10.1162/neco.2008.08-07-593

Abstract

The ability to adapt behavior to maximize reward as a result of interactions with the environment is crucial for the survival of any higher organism. In the framework of reinforcement learning, temporal-difference learning algorithms provide an effective strategy for such goal-directed adaptation, but it is unclear to what extent these algorithms are compatible with neural computation. In this article, we present a spiking neural network model that implements actor-critic temporal-difference learning by combining local plasticity rules with a global reward signal. The network is capable of solving a nontrivial gridworld task with sparse rewards. We derive a quantitative mapping of plasticity parameters and synaptic weights to the corresponding variables in the standard algorithmic formulation and demonstrate that the network learns with a similar speed to its discrete time counterpart and attains the same equilibrium performance.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Action Potentials / physiology*
Algorithms
Animals
Computer Simulation
Humans
Learning / physiology*
Models, Neurological*
Neural Networks, Computer*
Neurons / physiology*
Reinforcement, Psychology
Time Factors