Elsevier

Cognition

Volume 110, Issue 3, March 2009, Pages 380-394
Cognition

Flexible shaping: How learning in small steps helps

https://doi.org/10.1016/j.cognition.2008.11.014Get rights and content

Abstract

Humans and animals can perform much more complex tasks than they can acquire using pure trial and error learning. This gap is filled by teaching. One important method of instruction is shaping, in which a teacher decomposes a complete task into sub-components, thereby providing an easier path to learning. Despite its importance, shaping has not been substantially studied in the context of computational modeling of cognitive learning. Here we study the shaping of a hierarchical working memory task using an abstract neural network model as the target learner. Shaping significantly boosts the speed of acquisition of the task compared with conventional training, to a degree that increases with the temporal complexity of the task. Further, it leads to internal representations that are more robust to task manipulations such as reversals. We use the model to investigate some of the elements of successful shaping.

Introduction

Humans and animals acquire an extensive and flexible repertoire of complex behaviors through learning. However, many tasks are too complex to be amenable to simple trial and error learning, and therefore external guidance or teaching is critical. Three central forms of teaching are:

  • abstract or verbal communication of symbolic rules governing the underlying tasks;

  • repeated demonstration of the required action sequences. This turns unsupervised problems into supervised ones, and in some cases permits imitation;

  • shaping, i.e., provision of a simpler path to learning by task simplification and refinement.

Here, we provide a computational treatment of shaping in the context of a cognitive task.

The term shaping was first coined by Skinner (1938), who described it as a “method of successive approximations”. In shaping, a sequence of intermediate, simple tasks is taught, in order to aid acquisition of an original, complex, task. Skinner was perhaps motivated by the possibility of taking advantage of animals’ innate repertoires of responses (Breland and Breland, 1961, Peterson, 2004); but the term is also used more widely. Divide-and-conquer is inherent in shaping and appears to help with facets of complexity such as branching, hierarchies and large variations in the timescales of the effects of actions. Shaping also provides behavioral “units” that can be deployed in a range of tasks. Shaping is almost ubiquitous in animal behavioral experiments.

Two main aspects of shaping have been considered in theoretical learning frameworks. First, Elman (1993) realized a concept described by Newport, 1988, Newport, 1990 in terms of “Less is More”, in the context of the learning of grammars in simple recurrent networks. The idea was to use an initial phase of training with only simpler incarnations of the rules of the grammar. Although the issue is not uncontroversial (see Rohde & Plaut, 1999), Elman (1993) argued that this simplification may arise intrinsically, through a process of self-shaping. This would happen on the basis of a working memory that, consistent with evidence for differential rate of maturation of parts of the prefrontal cortex (Brown et al., 2005, Sowell et al., 1999), initially has a very low capacity, but expands over development. The second aspect of shaping that has been studied is associated with robot learning or reinforcement learning (Dorigo and Colombetti, 1998, Savage, 1998, Savage, 2001, Saksida et al., 1997, Singh, 1992), typically in the context of navigation.

By contrast with these suggestions, we consider shaping for the adult learning of the sort of complex cognitive tasks that are popular for the elucidation of the prefrontal neural architecture of cognition (Badre et al., 2005, Gilbert et al., 2005, Koechlin et al., 2003, Shallice and Burgess, 1991). The main emphasis in the computational modeling of these tasks has so far been in developing architectural mechanistic elaborations (Frank et al., 2001, O’Reilly and Frank, 2005), to overcome the complexity of learning. However, shaping is extensively used in training human and animal subjects in order to simplify complex learning; here, we seek to model it and understand aspects of its power.

Hazy et al., 2007, O’Reilly and Frank, 2005 suggested one of the most powerful and effective architectures in their prefrontal, basal ganglia, working memory (PBWM) model. This employs a gated working memory (adapted from the long short-term memory (LSTM), architecture of Gers et al., 2000, Hochreiter and Schmidhuber, 1997) in an elaborate overall structure. Hazy et al., 2007, O’Reilly and Frank, 2005 illustrated their model using an abstract, hierarchical, version of the continuous performance working memory task (CPT) called the 12-AX task, which they invented for the purpose. The complexity of this task arises from its hierarchical organization, which involves what amounts to subroutines.

Here, we build an unelaborated LSTM model (which Hazy et al., 2007, O’Reilly and Frank, 2005, used as a comparison point for the learning performance of PBWM) and study the additional role that shaping might play in generating complex behavior in tasks such as the 12-AX. We consider a straightforward shaping path for this task, highlight the importance of the allocation of resources in shaping, and assess the improvements in training times that come from the external guidance, as a function of parametric task complexity. Finally, we look at the effects of shaping on the flexibility of the network in dealing with variations of the stimulus statistics (while keeping the rules constant), and with a shift in the task rules themselves.

Section snippets

General methods

In this section, we describe the 12-AX task, the unelaborated LSTM network used to solve it, the particular shaping path that decomposes the task into its elements, and the learning methodology. One of the most important questions in shaping is how to increase the capacity or power of the network as new elements of a task are presented. In order to focus cleanly on the effects of shaping, the main results in Section 3 depend on manually allocating new LSTM components at each additional step of

Results

As a baseline for performance, we train a standard (naive) LSTM network directly on the full 12-AX task. On average it acquires the task in 186 epochs. (standard error: 8.5 epochs) This is rather faster than the roughly 350 epochs that O’Reilly and Frank (2005) suggested for an unembellished LSTM network. Since the number of memory blocks (4), cells per block (2) and learning rate (0.1) are similar to those they used, this presumably results from the altered learning criterion and network

Automatic allocation

In all the simulations so far, we have employed a deus ex machina, in which shaping has been associated with the manual allocation of resources. Indeed, Section 3.1 showed that shaped networks perform substantially worse than the unshaped networks without allocation. We did this to focus on the potential benefits of shaping in computational modeling, rather than particular implementation details. However, if there was no way of realizing these benefits without such an external intervention,

General discussion

Although shaping is widespread as a method of training subjects to perform complex behavioral tasks, its effects in computational models have not been extensively investigated. We studied shaping in a moderately complex hierarchical working memory task, showing that it offers significant benefits for learning in the face of medium-term to long-term temporal demands. Speed is not the only benefit of shaping – we showed that it also leads to a solution of the task that generalizes better over

Acknowledgements

This work was supported by the Gatsby Charitable Foundation. We are grateful to Paul Burgess, Tony Dickinson, John Duncan, Sam Gilbert, Randy O’Reilly, Jeremy Reynolds and Tim Shallice for many helpful comments and for sharing data and ideas prior to publication.

References (51)

  • Bakker, B., & Schmidhuber, J. (2004). Hierarchical reinforcement learning based on subgoal discovery and subpolicy...
  • A.G. Barto et al.

    Recent advances in hierarchical reinforcement learning

    Discrete Event Dynamic Systems: Theory and Applications

    (2003)
  • K. Breland et al.

    The misbehavior of organisms

    American Psychologist

    (1961)
  • S.L. Brown et al.

    Encoding a temporally structured stimulus with a temporally structured neural representation

    Nature Neuroscience

    (2005)
  • E. Dahlin et al.

    Transfer of learning after updating training mediated by the striatum

    Science

    (2008)
  • P. Dayan

    Images, frames, and connectionist hierarchies

    Neural Computation

    (2006)
  • P. Dayan et al.

    Phasic norepinephrine: A neural interrupt signal for unexpected events

    Network: Computation in Neural Systems

    (2006)
  • M. Dorigo et al.

    Robot shaping: An experiment in behavior engineering

    (1998)
  • J. Duncan

    Attention, intelligence, and the frontal lobes

  • Frank, M. J., Loughry, B., & O’Reilly, R. C. (2001). Interactions between frontal cortex and basal ganglia in working...
  • F.A. Gers et al.

    Learning to forget: Continual prediction with LSTM

    Neural Computation

    (2000)
  • S.J. Gilbert et al.

    Involvement of rostral prefrontal cortex in selection between stimulus-oriented and stimulus-independent thought

    European Journal of Neuroscience

    (2005)
  • S. Grossberg

    How does a brain build a cognitive code?

    Psychological Review

    (1980)
  • M. Haruno et al.

    MOSAIC model for sensorimotor learning and control

    Neural Computation

    (2001)
  • T.E. Hazy et al.

    Towards an executive without a homunculus: Computational models of the prefrontal cortex/basal ganglia system

    Philosophical Transactions of the Royal Society B. Biological Sciences

    (2007)
  • Cited by (0)

    View full text