Commentary
Can conditioned reinforcers and Variable-Ratio Schedules make food- and fluid control redundant? A comment on the NC3Rs Working Group's report

https://doi.org/10.1016/j.jneumeth.2011.03.021Get rights and content

Abstract

This commentary challenges the conclusions of the NC3Rs Working Group's recent special report that food- or fluid control is sometimes necessary to conduct modern neuroscientific investigations in macaque monkeys (Prescott et al., 2010). Given the potential suffering of animals subjected to food- or fluid control, the decision to subject an animal to such practices should be taken hesitantly. That decision hinges on to which extent the animal is willing to be involved in the task. The authors have done a scientific literature search and express expert opinion, but fail to mention two techniques that may greatly influence the animals’ motivation to participate in the task and thus reduce the need for food- or fluid control, namely (1) the use of conditioned reinforcers in addition to primary reinforcers; and (2) the use of Variable-Ratio Schedules rather than continuous reinforcement. An ethical and humane approach to animal experimentation suggests that all options should be explored and thoroughly investigated before resorting to methods potentially challenging the animals’ welfare.

Highlights

► Cross-discipline fertilization may improve animals’ welfare. ► Dietary control may be redundant – alternatives suggested. ► Conditioned reinforcers improve animals’ motivation to perform a task. ► Variable-Ratio Schedules involve surprises and enhance motivation. ► Aversive stimuli inhibit responses and should be avoided.

Introduction

A group of experts recently published a special report entitled “refinement of the use of food- and fluid control as motivational tools for macaques used in behavioural neuroscience research: report of a Working group of the NC3Rs.” In the report, they conclude that “many, but not all, cognitive and behavioural tests performed with macaques require the use of food- and fluid control in order to obtain the large number of trials for modern neuroscientific investigations” (Prescott et al., 2010: 8.2.1, p. 184). They also give a number of very valuable, useful and constructive recommendations on how to best Refine procedures involving food- or fluid control.

Food- and fluid control is challenging and ethically questionable and, if used, should be a last resort to turn to if all other options have been tested and proven inadequate. I felt the need to comment publicly on the conclusions of the Working Group, since I believe that they have omitted some very important aspects of animal learning and motivation in their analyses. These aspects may indeed, if used knowledgeably and carefully, greatly reduce – perhaps eliminate – the need for food- and/or water control. I thus challenge the claim that the use of food- and fluid control is sometimes required (as stated by the Working Group). My argument has two main components.

Firstly, the use of a conditioned reinforcer is not mentioned anywhere in the article. A conditioned reinforcer is a reinforcer that earns its reinforcing properties through association with a primary reinforcer. A primary reinforcer is, for instance, food or water (Young, 2002, Mills, 2010). In their article, the Working Group consistently discuss only primary reinforcers in the set-up of the experiments. My suggestion is, in order to refine any experimental set-up, to use conditioned reinforcers in addition to primary reinforcers. A conditioned reinforcer is established through classical, or Pavlovian, conditioning. The trainer produces a salient stimulus that is not used in any other circumstances, such as a clicking sound (using a clicker) or a whistle – or a visual stimulus. Immediately after being exposed to the stimulus, the animal receives something that he or she wants, and will be willing to work for, such as food or fluid – a primary reinforcer. What trainers using conditioned reinforcers have found is that as training progresses and the animal learns to expect primary reinforcers to follow conditioned reinforcers (or secondary reinforcers), the animals start responding to the conditioned reinforcer itself (Phillips et al., 2003, Young, 2002) – indeed, once conditioned, they will work to have the conditioned reinforcer presented even in the absence of the primary reinforcer – for a while (Hyde, 1976). However, I caution users to allow the conditioned reinforcer to maintain its association with the primary reinforcers or else the conditioning effect will extinguish with time (Hyde, 1976, Mills, 2010). Brain research explains the mechanisms behind the effects of conditioned reinforcers: they go through the amygdala, bypassing the cortex, to sites of memory and emotion (Cardinal et al., 2002). Conditioned reinforcers lead to a behavioural invigoration effect that is associated with dopamine transmission (Ikemoto and Panksepp, 1999). Indeed, cocaine-using rats experience a cascade of dopamine when a prolonged session of lever-pressing finally produces a dose of cocaine. Importantly, it is not the cocaine delivery per se that causes the dopamine cascade; it is the sound the machinery makes at the start of the delivery: the conditioned, secondary reinforcer (Phillips et al., 2003). Furthermore, conditioned reinforcers activate the core emotional feeling called the SEEKING system, a general-purpose limbic emotional action system designed to actively engage the world, particularly its life-sustaining resources (Panksepp, 1998, Panksepp, 2005). Some of the effects of an activated SEEKING system are excitement, an urge to explore the environment; animals display an energized and riveted attention to tasks – it is euphorgenic in humans and probably in animals as well (Panksepp, 2005). Using an intentional secondary reinforcer (and not just the screen turning blank in between trials) should thus be a powerful tool to increase motivation in animals performing operant tasks. The risk is perhaps that the animal gets too motivated and over-excited – which however reduces the need for food- or fluid control.

In addition, the cue, or stimulus telling the animal that reinforcement is available if the animal performs a specific action (Mills, 2010), also assumes properties of a conditioned reinforcer during training. The cue is sometimes referred to as a tertiary reinforcer because it is learned by association with a secondary reinforcer (Mills, 2010). Again, the conditioned reinforcer becomes reinforcing per se, and the animal is willing, to a certain extent, to work just to hear – or see – that cue again, even in the absence of primary reinforcers (the “conditioned reinforcement” paradigm reviewed by Seymor and Dolan, 2008). Indeed, the mere presentation of conditioned reinforcers without any primary reward can facilitate operant conditioning (Ikemoto and Panksepp, 1999). Intriguingly, the SEEKING system is especially highly tuned to stimuli that predict rewards (such as cues or other conditioned reinforcers), rather than to rewards themselves. In addition, the SEEKING system mediates instrumental behaviour required to obtain food. In rats, sniffing responses are at the very heart of the SEEKING system and are present whenever a rat is searching, investigating or expecting positive reward; they can be evoked by electrical brain stimulation within the Lateral Hypothalamus. In a study on rats working for rewarding brain stimulation, these sniffing responses were seen in the second preceding the required operant response – lever pressing. The SEEKING system is thus mostly activated during the appetitive (searching) phase of foraging rather than the consummatory (eating) phase. As the animal encounters the food object and shifts into the consummatory mode, the appetitive SEEKING urge ceases temporarily (Panksepp, 1998). I suggest trying to shift the animal into SEEKING mode as often as possible, in order to encourage the operant responses needed for the task – conditioned reinforcers (both tertiary and secondary) can be used to this end (Panksepp, 1998). This may reduce the dependence on food/drink as the main motivator for compliance.

Secondly, I advocate the use of careful Variable-Ratio Schedules to maintain a high number of correct responses without leading to satiation. The only type of schedule that is mentioned in the NC3Rs’ report is a Fixed-Ratio schedule (which, as the authors note, is when a reward is delivered only after a given number of responses). However, there is a substantial difference in how well animals respond to Variable-Ratio schedules as opposed to Fixed-Ratio Schedules (Chance, 1998, Chance, 2006, Schoenfeld et al., 1956). In the former, the animal is given reinforcement after a variable number of correct repetitions of the response, and in the latter, the animal is given reinforcement after a fixed number of correct repetitions of the response. For example, FR3 means that the animal receives a reinforcement after exactly 3 correct responses. The animal quickly learns the contingency, and common effects of Fixed-Ratio Schedules are Fixed-Ratio Pauses in the response pattern (the animal hesitates to start responding, but once a response sequence has started it will continue until achieving reinforcement, after which it will pause again). The response rate also goes down with Fixed-Ratio Schedules. In contrast, Variable-Ratio Schedules will yield a high and constant rate of responses, without or with lower risk of post-reinforcement pauses (Chance, 2006, Mills, 2010, Schoenfeld et al., 1956, Tanno and Sakagami, 2008). The animal then learns over many training sessions that she can expect reinforcement after a variable number of correct repetitions of the response. For example, VR3 means that the animal receives a reinforcement after on average 3 correct responses. In a study on dolphins changing from a Fixed-Ratio Schedule to a Variable Schedule, the author noted that the animals’ attention span was improved (Komanski, 1996). Given the choice, animals prefer Variable-Ratio Schedules over Fixed-Ratio Schedules even when the mean response requirement of the Variable-Ratio Schedule is considerably higher than the Fixed-Ratio response requirement (see e.g. Sherman and Thomas, 1968). The downside of Variable-Ratio Schedules is that they take time to establish in comparison to Continuous reinforcement.

The Working Group expresses the concern that the animal “might perform less well if they are required to work without receiving rewards”. I challenge this notion in the words of Prof. Robert Sapolsky in a lecture on the effects of dopamine: “take a monkey, and there's nothing more addictive than the notion that there's a reward lurking out there – and it's a maybe” (Sapolsky, 2009).

Indeed, the dopamine surge correlating with reward probability is at its maximum when the probability of reward P is at 0.5 and decreases at higher and lower probabilities (Fiorillo et al., 2003) – surprising outcomes are more reinforcing than predictable outcomes (Lee et al., 2006). In other words, completely predictable outcomes (such as receiving a reward after every single correct response) are not as interesting as the NC3Rs’ Working Group asserts. Pryor (2009) discusses the notion that it is easiest to keep an animal in the SEEKING state and at high levels of excitement when small reinforcers occur at frequent but irregular intervals – comparable to a low-mean Variable-Ratio or -Interval Schedule.

The Working Group advocates “exploring measures that might increase the rate with which the animals gain reward, given that this will mitigate somewhat the effect of controlled access to food/fluid” (8.2.7). In contrast, I would like to suggest not using food/fluid control but rather keep the animal motivated and working for preferred treats/tidbits/juice dispensed unpredictably and at a Variable-Ratio Schedule that will encourage the animal to work while at the same time allowing enough repetitions to be performed before the animal is satiated and stops working (but see below). In Section 4.1 (type of reward), the authors recommend researchers to consult the recent literature on the particular species and seek advice before choosing rewards. The same reasoning can be applied to Section 4.2 regarding the rate of reward: seek advice from experienced behavioural analysts or animal trainers on how best to implement reinforcement schedules for the planned experiment. See Chance, 1998, Chance, 2006. One might also add that the amount of reward could be varied; jackpots can be used with some intriguing effects on motivation (Crespi, 1942 – but see also Kuroda, 2009, Muir, 2010). Another venue to explore is the use of reinforcement variability (Wunderlich, 1961), see below.

In the article, the authors discuss the use of a conditioned punisher to signal incorrect responses. In other words, when an animal in a food- or fluid control protocol performs incorrectly, it is signaled to the animal by providing a sensory cue (a beep) and a brief time out. I suggest testing to omit the conditioned punisher (the beep, or such) altogether. The lack of a conditioned reinforcer (“click”) will clearly signal to the animal that it made a wrong response, and the experience of making a wrong choice will be less aversive. The length of a pause after incorrect behaviour in marine mammal training is usually not longer than 2–3 s during public shows and is commonly referred to as an LRS (Least Reinforcing Scenario)(Davis and Todd, 2004). As far as I am aware, there is no formal study supporting the use of LRS over conditioned punishers; however, it is considered Best Practice in the marine mammal training community, and practitioners vouch for its efficacy in reducing undesirable behaviour and put the LRS on the reinforcing side (rather than neutral) on the Punishment-Reinforcement continuum (Scarpuzzi et al., 1999). In contrast, timeout from reinforcement is aversive rather than neutral and has functional properties similar to those of other aversive stimuli such as shock (Richardson and Baron, 2008). Perhaps these aversive events switch the learner from the SEEKING mode to the amygdala's path of avoidance and fear, as suggested by Pryor (2009), in which case it will probably impair the animals’ continued performance. Indeed, aversive stimuli may induce passivity and inhibit appetitive responses (Dickinson and Pearce, 1977).

Food- and fluid control is all about increasing the motivation for the animals to work and do hundreds – or thousands – of repetitions of responses within one experimental session. I would argue that the NC3Rs authors’ focus on food/fluid control as the main venue to increase motivation is unfortunate. They do mention the need to look at treat palatability and individual preferences as a way to improve motivation, as well as social and visual stimuli, but they fail to look outside the box and mention important techniques developed in other branches of human–animal interaction or neurology that may contribute to improving motivation without resorting to techniques potentially harmful to the animals. One type of incentive for motivation may be conditioned reinforcers: “cue-triggered wanting”, based on Pavlovian associations and resulting in motivational magnets that pull appropriate behavioural responses, as described by Berridge and Robinson, 2003. Indeed, conditioned cues paired with food when individuals were hungry can motivate sated animals to eat even beyond satiety (Seymor and Dolan, 2008). Another way to improve motivation might be to increase dopamine surges and the animals willingness to work by introducing reward probabilities in the neighbourhood of P = 0.5 (Fiorillo et al., 2003) instead of P = 1; “each time the monkey successfully performs the task” (Prescott et al., 2010).

In conclusion, I suggest the following procedure to Refine the motivational tools for macaques used in behavioural neuroscience research. It may not be feasible to adopt all ideas listed below depending on which types of research questions are addressed, but they would hopefully reduce the need for dietary control measures. I encourage practitioners to increase their knowledge base by seeking additional information on all aspects of the training rudimentarily outlined below.

  • (1)

    If at all possible – choose subjects carefully. A simple test (exposing an animal to a novel food item and seeing how long it stalls to take the treat) will reveal temperament, which correlates with trainability (Coleman et al., 2005). A wise selection of animals may greatly benefit the success of the project as well as the animals’ welfare.

  • (2)

    Try reducing or eliminating the need for food- or fluid control. Keep the animal on an ad-lib diet or allow the daily training ration to be part of the main diet. If the following techniques are properly carried out, the need to maintain animals on a target weight below the ad-lib weight should hopefully be minimal.

  • (3)

    Ordinary pellets or water will probably not suffice to motivate an animal that is not food- or fluid controlled. Choose primary reinforcers carefully – with preference tests – this will influence motivation and the outcome of training. Note that preferences may change over time (Clay et al., 2009, Martin et al., 2010). Varying rewards is another venue to explore. Wilson et al. (2005) have designed an automated food delivery system that allows researchers to reinforce monkeys with a variety of foods within a single experimental session, which is difficult to achieve with commercially available feeders. By using reward variability, the secondary reinforcer becomes more resistant to extinction (Wunderlich, 1961), perhaps because surprising outcomes are more reinforcing, producing a greater dopamine response, than well predicted outcomes (Fiorillo et al., 2003, Lee et al., 2006).

  • (4)

    Establish a conditioned reinforcer – a secondary reinforcer – through Pavlovian conditioning; teach the animal that “click” (or some other salient stimulus) means a treat is imminent. Then as the animal starts learning the task, use shaping (as described by Prescott et al., 2010) to teach the animals which operant responses will be reinforced. Keep using the secondary reinforcer as well as the primary reinforcer. During shaping, timing is of paramount importance; the secondary reinforcer should mark correct behaviours the instant they are performed. If the sound of the automatic food dispenser is used as a secondary reinforcer, make sure that it is indeed timely. Ignore mistakes; these will diminish as the animal learns the task and gets reinforced for correct responses.

  • (5)

    Once the animal knows the task, carefully establish (by shaping and gradually stretching the ratio over several training sessions) Variable Ratio Schedules (VRS's) to yield a high and constant rate of responses without satiating the animal too soon. Try to keep the VRS mean as low as possible while still maintaining a high enough number of repetitions needed for the experiment (VR2 for example, rather than VR7). Note that by establishing a probability of reward of P = 0.5 (corresponding to VR2) the dopamine surge is at its maximum (Fiorillo et al., 2003), but the animal will become sated sooner. When shaping persistence using a VRS, there is a risk of Ratio Strain if the ratio is stretched too rapidly or too far – the tendency to perform will break down. Also take into consideration the size of the average ratio and the lowest ratio, as this will influence the risk of post-reinforcement pauses (Chance, 1998, Chance, 2006). Pigeons showed the highest preference for VRS's whose lowest ratio was at or near 1 (Field et al., 1996). Mark all correct responses corresponding with the VRS with the secondary reinforcer (“click”) and continue giving the primary reinforcer after each click. In other instances (correct behaviours not reinforced by the VRS), use the cue as another conditioned reinforcer instead of the click (see below). Do not establish a VRS on a behaviour that is not well shaped as the animal will have problems distinguishing correct responses that are non-reinforced according to the VRS from incorrect responses that are non-reinforced – the shaping process risks getting stymied. The animal should be fluent in the task before a VRS is used.

  • (6)

    Try maximizing the utilization of the cue as a tertiary reinforcer. It is reinforcing for the animal to be given the stimulus announcing that reinforcement is potentially available should the right response be given. By properly addressing the contiguity of this event motivation may be increased by systematically utilizing the cue as a conditioned, tertiary reinforcer (Mills, 2010). i.e. in cases when the VRS does not deliver reinforcement after a correct behaviour, present the next cue at the time the secondary reinforcer would have been delivered (replacing the “click”). In cases when the VRS delivers reinforcement, present the next cue after an individualized inter-trial interval (some animals will not give responses while chewing, others get frustrated if the interval is too long).

  • (7)

    If ignoring mistakes is not an option, use an LRS (Least Reinforcing Scenario) to address incorrect responses (Davis and Todd, 2004) once the animal has learned the task. When the animal makes a mistake, pause for a few seconds (no “clicking” or cueing!) at the same point where reinforcement would have been provided after a correct response. After the LRS, give the cue for the next trial (a tertiary reinforcer) after the individualized inter-trial interval.

Taken together, I am hoping that these measures should motivate the animals to participate in neuroscientific investigations – even those requiring thousands of repetitions of the required responses – without the need for dietary control implementation. With this approach, the frequent use of conditioned reinforcers (“clicks” and cues) should keep the animal in a SEEKING state, and the VRS's, especially if approaching a reward probability of P = 0.5 (VR2) should keep the dopamine surge at its maximum.

The strategies outlined above are, as far as I know, largely untested – or at least unreported – in neuroscientific investigations in macaque monkeys; however, they are commonly used in marine mammal training in zoos and aquaria and in many dog-training communities. There is thus a paucity of literature comparing response rates from monkeys undergoing food- or fluid control to similar studies not using these techniques but rather involving conditioned reinforcers or Variable-Ratio Schedules. Some researchers do go to great lengths to minimize the use of dietary control techniques (see e.g. Fairhall et al., 2006). I therefore encourage researchers to evaluate the suggested techniques and share the resulting data so that others may benefit from their findings. From an ethical perspective, food- or water control of animals working in behavioural neuroscience research is questionable because of the difficulties involved and the risk of compromising the animals’ health and welfare. In this commentary, I have outlined a number of strategies that I think will help to improve the scientific protocol and reduce the need for such measures. In all matters referring to the Refinement of procedures for animals having to experience food- or water control, I wholeheartedly concur with the recommendations listed by the NC3Rs Working Group in their recent publication.

Section snippets

Acknowledgements

I thank Eva-Marie Wergård for important input. Also, comments from two anonymous referees were very valuable.

References (33)

  • L.P. Crespi

    Quantitative variation of incentive and performance in the white rat

    Am. Psychol.

    (1942)
  • Davis C, Todd M. Training and behavioral terms glossary. International Marine Animal Trainers Association (IMATA), 2004...
  • A. Dickinson et al.

    Inhibitory interactions between appetitive and aversive stimuli

    Psychol. Bull.

    (1977)
  • D.P. Field et al.

    Preference between variable-ratio and fixed-ratio schedules: local and extended relations

    J. Exp. Anal. Behav.

    (1996)
  • C.D. Fiorillo et al.

    Discrete coding of reward probability and uncertainty by dopamine neurons

    Science

    (2003)
  • P. Komanski

    Changing from a fixed-ratio schedule to a variable schedule

    Mar. Mammals: Public Display Res.

    (1996)
  • Cited by (10)

    • Systematic assessment of food item preference and reinforcer effectiveness: Enhancements in training laboratory-housed rhesus macaques

      2018, Behavioural Processes
      Citation Excerpt :

      Bruce et al. (2003) found that using highly preferred maggots as food reinforcers for an operant task in laying hens resulted in similar motivation as approximately 9 h of food deprivation. Westlund (2012) proposed that the use of preference tests may also result in a reduction in the need for fluid and food restriction in laboratory-housed primates. Finally, given the fact that nonhuman primates have been shown to prefer choice conditions (Perdue et al., 2014), the simple act of choosing a reinforcer may, in itself, provide a welfare benefit to animals.

    • Training is enrichment-And beyond

      2014, Applied Animal Behaviour Science
      Citation Excerpt :

      Animals do what works (Friedman, 2007). Using food rewards, even though the animal may be trained to present a limb for injection, its foraging drive is on full alert, particularly if a clicker or some other secondary reinforcer is used (Westlund, 2012). But why are the animal's emotions important?

    View all citing articles on Scopus
    View full text