Working memory (WM) is a capacity limited system that serves as the mind’s workspace, and the size of one’s WM is thus thought to be a key determinant of an individual’s ability to carry out a wide variety of cognitive tasks (Engle, Tuholski, Laughlin, & Conway, 1999; Kane et al., 2004). While WM capacity has long been assumed to have a strict limit (Cowan, 2001; Miller, 1956), mounting evidence suggests that WM capacity can be expanded though targeted training (Klingberg et al., 2005; Verhaeghen, Cerella, & Basak, 2004; Westerberg et al., 2007). The idea that training can effectively expand this central workspace of the mind has generated considerable interest, and has fueled speculations that the cognitive benefits of WM training may be far reaching (Jaeggi, Buschkuehl, Jonides, & Perrig, 2008). Indeed, there is a rapidly growing number of studies demonstrating that training-related increases in WM capacity can yield improvements in a range of important cognitive skills (e.g., Chein & Morrison, 2010) as well as improved cognitive function in clinical populations with known WM deficiencies (e.g., Klingberg et al., 2005).

Studies of WM training fit within a body of work showing that specially designed mental exercises (i.e., cognitive training paradigms) can be used to enhance cognitive performance. Other related approaches to cognitive enhancement through training include attention (Tang & Posner, 2009), speed of processing (Dux et al., 2009), neuro-feedback (Keizer, Verment, & Hommel, 2010), dual-task (Bherer, Kramer, & Peterson, 2008; Bherer et al., 2005) and perceptual (Mahncke et al., 2006) training. While these efforts suggest that there may be many different paths to cognitive enhancement, a recent large-scale study powerfully illustrates that not every type of cognitive training will lead to generalized improvement (Owen et al., 2010).

In the current review, we focus on training paradigms that directly aim to expand WM capacity (For examples of related reviews see Dahlin, Backman, Neely, & Nyberg, 2009; Green & Bavelier, 2008; Perrig, Hollenstein, & Oelhafen, 2009). The central question of this review asks: Does WM training work? That is, does the empirical literature support the view that WM training could be a panacea for cognitive enhancement; or, are there reasons to believe that the magnitude and scope of WM training benefits should be more narrowly construed? We begin with an examination of studies demonstrating alternative approaches to WM training and a presentation of the evidence favoring the conclusion that WM training can be an efficacious cognitive enhancement tool. We follow with a critical consideration of several issues that might mitigate enthusiasm regarding these putative successes in WM training.

Theoretical justification for training WM

Working memory can be defined as a flexible, capacity limited, mental workspace used to store and process information in the service of ongoing cognition. Although originally considered a dedicated temporary memory store (Baddeley & Hitch, 1974), many recent findings point to the involvement of both short-term and long-term memory (LTM) mechanisms in the performance of many WM tasks (Baddeley, 2000; Unsworth & Engle, 2007). There is empirical support for both multistore models (in which WM is viewed as a temporary workspace that is separated from LTM; (Baddeley & Hitch, 1974)) and unitary store models (in which all information resides within a single memory system (Nairne, 2002)); some relevant findings are reviewed by Cowan (1995), Davelaar, Goshen-Gottstein, Ashkenazi, Haarmann, and Usher (2005), and Nee, Berman, Moore, and Jonides (2008). Further debate surrounds the very notion of a capacity limitation in WM—with varying views on the specific limits to WM capacity (number of “slots” available in WM) and on whether mechanisms of interference, rather than capacity limits, might explain performance limitations (see e.g., Cowan, 2001, and associated commentaries).

Regardless of the theoretical perspective from which it is veiwed, WM has been extensively characterized as a construct vital to higher cognition. Consistent with this characterization, there is a rich psychometric literature demonstrating that WM capacity is a strong predictor of individual differences in fluid intelligence and executive functioning (Engle et al., 1999), and further predicts a very wide range of cognitive abilities, including reading comprehension (Daneman & Carpenter, 1980), language acquisition (Baddeley, 2003), non-verbal problem solving (Logie, Gilhooly, & Wynn, 1994), and a number of domain-specific reasoning skills (Kane et al., 2004).

The relationship between WM and higher cognition can be understood by consideration of the components of the WM system, which many accounts divide into domain-specific and domain-general (or executive) factors. Hypothesized domain-specific aspects of WM include strategies that are tied to the maintenance and management of particular types of information. Perhaps the most widely discussed of these domain-specific strategies is articulatory rehearsal, which involves the use of inner speech mechanisms to maintain representations of linguistic, or verbally coded items (Baddeley & Hitch, 1974). In contrast, hypothesized domain-general aspects of WM include processes that are not associated with a particular type of information or sensory modality, but that nonetheless aid in the encoding, maintenance, and retrieval of information from WM. Putative domain-general processes of WM include mechanisms that control attention, gate the flow of information into and out of WM buffers, reduce interference from irrelevant sources of information, and govern the engagement of domain-specific strategies. Both domain-specific and domain-general factors are involved in the link between WM and higher cognition (Jarrold & Towse, 2006; Kane et al., 2004). However, executive attention processes, more than domain-specific factors, seem to drive the predictive validity of WM in many higher cognitive skills (Cowan et al., 2005; Lépine, Gilhooly, & Wynn, 2005). Thus, while WM training exercises might impact various WM processes, the greatest generalization from training might be expected when the training protocol targets domain-general mechanisms.

Approaches to training WM

In our attempt to review the literature on WM training, we sought to identify studies using directed instruction, training, and/or task practice to impact the capacity or efficiency of WM task performance. Certainly, one’s definition of WM could dramatically influence the selection criteria for such a review. For instance, a study demonstrating improved WM task performance following training activities that impact LTM retrieval mechanisms would be embraced within some theories of WM, but not others. As another example, some theoretical accounts assume that WM is engaged even in task contexts where there is no need to manipulate or operate upon stored information (Unsworth & Engle, 2007), while other theories relegate such tasks to the domain of short-term memory (STM), but not working memory. To allow for a comprehensive review, we therefore employed broad selection criteria that included all studies in which training focused on participants’ memory for recently presented items, and which examined the impact on a measure of WM capacity or the efficiency of WM processes.

The approaches to WM training found in the extant literature can be readily classified according to their focus on domain-specific or domain-general components of the WM system. Specifically, one class of training studies involves strategy training, intended to promote the use of supplemental domain-specific strategies that might allow trainees to remember increasing amounts of information of a particular type (e.g., McNamara & Scott, 2001). In contrast, core training studies involve repetition of demanding WM tasks designed to target domain-general WM mechanisms (e.g., Klingberg, Forssberg, & Westerberg, 2002).

Strategy training

Strategy training paradigms involve teaching of effective approaches to encoding, maintenance, and/or retrieval from WM. The primary aim of most strategy training studies is to increase performance in tasks requiring retention of information over a delay. In strategy training studies, experimenters introduce participants to particular task strategies, and then provide practice sessions encouraging the strategy of interest. Some strategy training programs aim to increase reliance on, and facility with, articulatory rehearsal (Comblain, 1994; Conners, Rosenquist, Arnett, Moore, & Hume, 2008; Turley-Ames & Whitfield, 2003), while other programs aim to train elaborative encoding strategies (Carretti, Borella, & De Beni, 2007; Cavallini, Pagnin, & Vecchi, 2003; McNamara & Scott, 2001).

The rationale for rehearsal training stems from findings in the developmental literature showing that increased use of rehearsal over the course of childhood corresponds with increases in memory recall (Flavell, Beach, & Chinsky, 1966). Studies of WM in childhood suggest that children’s restricted WM performance can be partially explained by a production deficiency, an inability to rehearse that limits the capacity for recall (as opposed to a pure mediation deficiency, where the capacity for recall remains limited even after increased rehearsal efficiency) (Corsini, Pick, & Flavell, 1968; Flavell et al., 1966). Such findings led to early investigations of rehearsal training, which successfully demonstrated that both children and adults could improve WM task performance through use and practice with an articulatory rehearsal strategy (Ford, Pelham, & Ross, 1984; Ornstein & Naus, 1983). Rehearsal training procedures may promote improved WM performance by shifting trainees away from less effective strategies (e.g., retrospective retrieval), or by increasing the quality or efficiency of covert rehearsal mechanisms that support maintenance in WM.

Examples of strategy training that target elaborative encoding include practice with grouping items into chunks (St Clair-Thompson, Stevens, Hunt, & Bolder, 2010), devising a mental story with items (McNamara & Scott, 2001), and using imagery to make items more salient (Carretti et al., 2007). Training programs of this type stem in part from research on the mnemonic techniques thought to be used by skilled memorizers. Specifically, skilled memorizers strategically encode task-relevant information and create salient relationships between to-be-remembered information and information (e.g., semantic knowledge) already held in long-term memory (Ericsson & Chase, 1982). From some theoretical perspectives, such training procedures might not accurately be characterized as forms of “WM” training, since they may succeed by circumventing the limitations of WM (e.g., through chunking or LTM encoding) rather than by directly impacting the capacity or efficiency of WM mechanisms per se. Footnote 1 However, within some unitary models of WM the distinction is not especially meaningful (Nairne, 2002). Moreover, in some recent studies these strategy training techniques are applied directly toward the goal of increasing participants’ WM performance (Carretti et al., 2007; Cavallini et al., 2003; Comblain, 1994; Conners et al., 2008; Loomes, Rasmussen, Pei, Manji, & Andrew, 2008; McNamara & Scott, 2001; Turley-Ames & Whitfield, 2003). Therefore, we concluded that studies employing strategy training deserved consideration in the present review.

Both rehearsal and elaborative encoding training may be usefully applied in everyday contexts that require lists or groups of information to be retained; e.g., to help a student track the steps needed to complete a math problem, or to allow a shopper to navigate a supermarket while thinking of the remaining items needed for a recipe. However, despite utility in these everyday contexts, reliance on specific mnemonic strategies has classically been shown to enhance performance only in the trained task and a small set of closely related tasks that involve the same type of material. For example, in a classic case study conducted by Ericsson and Chase (1982), a participant and avid runner was able to reach a digit span of 80 after honing a strategy of chunking (or grouping) the numbers into running times. However, his exceptional memory for numbers relied on such a specific memorization scheme that his training exclusively impacted tasks involving similar numeric stimuli. Accordingly, the primary expectation of studies using strategy training is that they should yield increased performance only with tasks involving materials that are amenable to the trained strategy (near transfer), and should not generalize to more disparate task contexts (far transfer).

Studies of strategy training strongly support the claim that the amount of information remembered on measures of WM can be increased by teaching strategies such as rehearsing out loud (Turley-Ames & Whitfield, 2003), telling a story with stimuli (McNamara & Scott, 2001), or using imagery to make stimuli salient (Carretti et al., 2007). As might be expected, there is some evidence (though limited) that the benefits of this training are not task specific, but rather, can transfer to multiple WM tasks involving the same types of stimuli (Carretti et al., 2007). There is also some evidence that training can increase memory for untrained stimulus types; for example, participants trained to rehearse a series of images of concrete nouns showed improvements in digit and letter memory (Comblain, 1994).

However, to date, very few studies have assessed transfer of the benefits of strategy training to measures that go beyond assessment of memory for recently presented items. One of the few demonstrations of transfer from strategy training showed that practice with elaborative encoding can yield gains on self-report measures of everyday memory (Cavallini et al., 2003). Another study of strategy training in school children gave participants practice on a variety of strategies (rehearsal, grouping, visual imagery, storytelling) and found improvements in WM as well as mental arithmetic and the ability to follow task instructions, but no improvements in standardized tasks of reading or mathematics (St Clair-Thompson et al., 2010). Although these instances of generalization are limited, strategy training may yield broader benefits than suggested by classical studies in which the benefits of practice were found to be restricted to a particular task and information type (Ericsson & Chase, 1982).

While strategy training may have some utility in healthy, young adult populations, recent applications of this approach frequently involve training in populations where limited WM capacity may be a particular concern (including those with specific clinical diagnoses, and aging adults). For instance, strategy training has been successfully used in children with Down syndrome (Comblain, 1994; Conners et al., 2008) and fetal alcohol spectrum disorder (Loomes et al., 2008) to supplement specific WM deficits. Strategy training has also been reported to stem the decline of, and perhaps improve, WM in older adult populations (Carretti et al., 2007; Cavallini et al., 2003). Self-report measures indicating improved everyday memory in trained older adults (Cavallini et al., 2003) suggest that there is indeed some practical utility in this training approach. Still, since generalization from strategy training has rarely been tested and is not theoretically predicted, the principal value of this type of training may be toward the enhancement of skills, such as dialing a phone number, that directly rely on WM and are conducive to a trainable strategy.

Core training

Core training studies typically involve repetition of demanding WM tasks that are designed to target domain-general WM mechanisms. To achieve this purpose, core training paradigms are commonly designed to: 1) limit the use of domain-specific strategies, 2) minimize automization, 3) include tasks/stimuli that span multiple modalities, 4) require maintenance in the face of interference, 5) enforce rapid WM encoding and retrieval demands, 6) adapt to participants’ varying level of proficiency, and 7) demand high cognitive workloads or high intensity cognitive engagement (though different studies place variable emphasis on these factors). Tasks utilized in core training programs also commonly involve sequential processing and frequent memory updating. Although there has not been a systematic investigation of the specific components of training that are necessary in achieving improved WM function or generalization, there is some evidence suggesting that non-sequential, non-adaptive, and unimodal training paradigms would not be effective (Olson & Jiang, 2004). Sample tasks used in core training programs are shown in Fig. 1.

Fig. 1
figure 1

Example training protocols used in Core WM training studies. a) Schematic drawing of the identity-judgment N-back task used in Verhaeghen et al. (2004). Participants were asked to determined if each sequentially presented digit was the same as the digit shown N items back. b) Schematic drawing of the verbal condition of the complex WM span task used in Chein and Morrison (2010). Participants were asked to remember the storage items while making intermittent processing judgments; items were reported at the end of the trial. c) Schematic drawing of the letter updating task described in Dahlin et al. (2009). Participants were shown lists of items with an unknown length and asked to report the last four items. d) Schematic drawing of 2-back condition of the dual N-back task used in Jaeggi et al. (2008). Auditory and visual iterations of an N-back task were completed concurrently by participants

Some core training programs take a “kitchen-sink” approach, in which a compilation of several tasks with widely varying stimulus types is used to impact multiple components of the WM system. One example of this approach, Cogmed (e.g., Holmes, Gathercole, & Dunning, 2009; Klingberg et al., 2005), comprises a large battery of WM tasks including backward digit span, location memory, tracking of moving visual objects, and several other tasks. Another multifaceted training program, COGITO, includes various WM, perceptual speed, and episodic memory tasks (Schmiedek, Lovden, & Lindenberger, 2010). An advantage of these types of programs is that the diversity of exercises increases the chance that one of, or some combination of, the training tasks will produce desired training-related gains. In a best case scenario, the tasks could contribute to cognitive enhancement in an additive fashion, and thus yield large transfer effects with maximal efficiency. However, a drawback of this approach (at least from the standpoint of science) is that the use of a multifaceted package, with a variety of tasks, stimuli, and engaged processes, creates difficulty in determining which components of training underlie subsequent cognitive improvements, and in determining which specific mechanisms of WM are affected.

In an effort to isolate particular tasks and specific mechanisms that might underlie a WM training effect, others have favored a more stripped down approach. For example, in a study by Verhaeghen et al. (2004) participants were trained using one task; a variant of the n-back WM task (Fig. 1a). An obvious benefit of this single task approach is that it can be assumed that the sole task used in training must have produced the observed training effects. However, due to the complexity of the task, the specific WM mechanism(s) responsible for cognitive gains observed in that study cannot be fully determined. Rather than focusing on a single WM task, others have built training paradigms around a single component of the WM system. For instance, Dahlin & colleagues (Dahlin, Neely et al., 2008; Dahlin, Nyberg et al., 2008) have implemented training protocols that use multiple tasks with a common emphasis on the “updating” mechanism of WM (Fig. 1c).

Unsurprisingly, core training studies ubiquitously report that trained participants exhibit significantly improved performance on the trained WM task(s). In the majority of these studies, increased performance is also demonstrated on untrained measures of temporary memory (i.e., untrained tasks that test memory for recently presented items, sometimes with novel stimuli), though the particular assessment tasks are highly varied across studies. For instance, in one study using a multifaceted training program (COGITO), participants improved on untrained WM measures including “Animal Span” and the 3-back task (Schmiedek et al., 2010). In some studies, training-related gains are also found in tasks intended to index specific component processes of WM, such as updating (Dahlin, Neely, et al., 2008; Dahlin, Nyberg, et al., 2008) or interference buffering (Persson & Reuter-Lorenz, 2008). Similar results, which are summarized under the column “Temporary Memory” in Table 1, suggest that like strategy training, core training produces clearly demonstrable improvements in tasks that directly involve the retention and retrieval of temporarily stored information.

Table 1 Summary of core training studies

Core training seeks to produce increased WM capacity by focusing on the strengthening of domain-general WM processes. If these processes are indeed strengthened, then this approach should yield improvements not only on tasks similar to those used in training (near transfer), but also, on more disparate cognitive measures (far transfer). Accordingly, one might expect that core training will increase performance in a wide range of other cognitive tasks that are reliant on WM capacity. As there are strong links between domain-general components of WM and cognitive control, fluid intelligence, and reading comprehension, these are among the cognitive tasks that should, theoretically, be predicted to benefit from core training. Table 1 provides a synopsis of the methods and findings from studies examining the efficacy of core WM training. Studies included in this table were conducted across multiple training sessions, with assessment of transfer from training (in at least one transfer measure).

Many instances of positive transfer have been demonstrated in association with the Cogmed training battery. In an early core training study conducted by Klingberg et al. (2002), a training protocol that combined multiple WM tasks (part of the later Cogmed battery) was found to produce training benefits that extended to individual measures of cognitive control (Stroop) and general fluid intelligence (Ravens) in a group of healthy, young adults. These significant transfer findings were replicated in a small sample of young adult participants in two later studies (Olesen, Westerberg, & Klingberg, 2004; Westerberg & Klingberg, 2007). Using the same training paradigm, Klingberg et al. (2002) found similarly improved cognitive control and general fluid intelligence capabilities among a small cohort of children diagnosed with ADHD, and a concomitant reduction of ADHD symptom severity (based on parental reports). In a follow-up study conducted with a larger cohort, and using a more comprehensive training battery (Cogmed), children with ADHD were once again found to have reduced symptoms as well as improved cognitive control and general fluid intelligence performance following training (Klingberg et al., 2005). A recent independent assessment of the Cogmed program provides only partial replication of these results, with children with ADHD demonstrating training-related gains in measures of WM, but not general fluid intelligence (Holmes et al., 2010). In another application of Cogmed in children with low WM capacity, Holmes et al. (2009) found improvements in participants ability to follow classroom instructions. Additionally, 6 months after training, the same participants demonstrated improved math skills, as measured by the mathematical reasoning subtest of the Wechsler Objective Number Dimensions. In the study with the youngest participants to date, preschool children engaging in Cogmed training were found to exhibit increased cognitive control task performance, but not improvements in measures of inhibition, problem solving, or response speed (Thorell, Lindqvist, Nutley, Bohlin, & Klingberg, 2009).

Positive transfer findings have also been reported in studies using other core WM training protocols. Jaeggi et al. (2008) developed an adaptive, continuous, WM task involving the simultaneous tracking of an auditory-verbal and visuo-spatial sequence (Fig. 1d), and found a dose dependent increase in participants’ performance on a measure of general fluid intelligence (BOMAT). Using a training paradigm based upon the “complex WM span” tasks (Fig. 1b) that are focal in the psychometric literature, we found significant transfer among trained participants to measures of both cognitive control and reading comprehension, but no improvements in general fluid intelligence or reasoning (Chein & Morrison, 2010). Most recently, Schmiedek et al. (2010) reported that participants who trained using the COGITO program improved significantly on untrained assessments of WM, episodic memory, and fluid intelligence and reasoning. An important aspect of this latter study is that it assessed transfer not only on individual performance measures, but also to their latent constructs (based on aggregation of multiple assessment tasks). Significant transfer from WM training to latent measures of WM, episodic memory, and fluid intelligence indicate that the benefits of training likely did not derive from unintended, task-specific relationships between the trained and transfer tasks.

Arguably, a goal of cognitive training is to impact the ease and success of cognitive performance in one’s daily life—not just performance in the lab. When training is implemented in individuals with a specific mental disorder, the training goals might also include alleviation of the particular symptoms of the disorder. Accordingly, the utility of core WM training in specialized populations is increasingly being gauged by assessment of its generalization outside of the laboratory. Unlike strategy training, where generalization is largely limited to direct tests of memory for recently presented items, the value of core training in clinical populations has been demonstrated by displays of far transfer to laboratory, everyday memory, and quality of life measures. As mentioned earlier, there is evidence that core training can reduce the symptoms of ADHD (Klingberg et al., 2002, 2005). Additional quality of life improvements have been found following core WM training protocols in patients with multiple sclerosis (Vogt et al., 2009) and stroke (Westerberg et al., 2007), as well as in schizophrenia patients (Wykes, Reeder, Corner, Williams, & Everitt, 1999).

Despite popular acceptance of the notion that regular cognitive activity yields better cognition into later life, only a handful of studies have empirically tested the value of WM training in healthy, older populations. Studies focusing on WM training in older adults have succeeded in demonstrating improved performance in the trained task and sometimes closely related memory measures (Buschkuehl et al., 2008; Li et al., 2008). However, these studies provide surprisingly limited evidence that training produces benefits beyond the trained tasks (Table 1). Others have accordingly concluded that training-related “transfer effects are small, or non-existent, in old age” (Dahlin, Backman, Neely, & Nyberg, 2009, p. 405). Schmiedek et al. (2010) directly compared training gains in young adult and older adult populations. Consistent with the trends apparent in the literature, they found substantially greater transfer among the younger cohort, a finding that they explain in relation to declining cognitive plasticity across the lifespan.

We note, however, that prior studies conducted in healthy, older adults have examined transfer in a fairly restricted set of measures (see Table 1) that may not be ecologically relevant in older populations. Additionally, the specific training paradigms that have produced the broadest transfer in younger populations (Chein & Morrison, 2010; Jaeggi et al., 2008; Klingberg et al., 2005) have not been used in most studies conducted in older adults. To address these limitations, we conducted a study in which older adults (aged 60+) completed complex WM span training (as used in Chein & Morrison, 2010), and found significant transfer of improvements to ecological measures of verbal learning (the California Verbal Learning Test) and everyday attention (Test of Everyday Attention), and an increase in participants’ self-reported “everyday attention” ratings (Richmond, Morrison, Chein, & Olson, 2011). We anticipate that other studies using the appropriate combination of training and assessment tasks could succeed in demonstrating cognitive enhancement, or the amelioration of cognitive losses, in old age.

While the findings on transfer from core WM training suggest that there may be transfer of benefits to important cognitive skills, such training would be of limited value if improved performance did not persist after the conclusion of the training period. Although only a handful of studies conducted to date also included an assessment of cognitive ability after the discontinuation of training, the results again point in a positive direction. In studies of children, training-related gains have been shown to remain three months (Klingberg et al., 2002) and 6 months (Holmes et al., 2010) after training ceased. Moreover, one study in children found that a skill (math) that did not show improvements immediately after training, was improved six months after the training period ended (Holmes et al., 2009). In young adults, Dahlin, Nyberg et al. (2008) found that near transfer demonstrated in young adults following updating training remained stable 18 months after training. In older adults, near transfer was shown to be stable after 3 months (Li et al., 2008) but not after 12 months (Buschkuehl et al., 2008). It is perhaps an important gap in the literature that no studies have examined the plausibility of sustaining training gains through a schedule of “maintenance” training, involving less frequent or intensive training episodes.

Limitations in the WM training literature

The training-related benefits described above have generated substantial interest in the promise of WM training as a tool for broad cognitive enhancement, and this enthusiasm extends well beyond academia. However, there are a number of issues that cloud interpretation of the current training literature, and that must be considered before we too readily or enthusiastically endorse the utility of WM training.

Alternative interpretations of training gains

Effort/expectancy effects

An issue of great concern is that observed test score improvements may be achieved through various influences on the expectations or level of investment of participants, rather than on the intentionally targeted cognitive processes. One form of expectancy bias relates to the placebo effects observed in clinical drug studies. Simply the belief that training should have a positive influence on cognition may produce a measurable improvement on post-training performance. Participants may also be affected by the demand characteristics of the training study. Namely, in anticipation of the goals of the experiment, participants may put forth a greater effort in their performance during the post-training assessment. Finally, apparent training-related improvements may reflect differences in participants’ level of cognitive investment during the period of training. Since participants in the experimental group often engage in more mentally taxing activities, they may work harder during post-training assessments to assure the value of their earlier efforts.

Even seemingly small differences between control and training groups may yield measurable differences in effort, expectancy, and investment, but these confounds are most problematic in studies that use no control group (Holmes et al., 2010; Mezzacappa & Buckner, 2010), or only a no-contact control group; a cohort of participants that completes the pre and post training assessments but has no contact with the lab in the interval between assessments. Comparison to a no-contact control group is a prevalent practice among studies reporting positive far transfer (Chein & Morrison, 2010; Jaeggi et al., 2008; Olesen et al., 2004; Schmiedek et al., 2010; Vogt et al., 2009). This approach allows experimenters to rule out simple test-retest improvements, but is potentially vulnerable to confounding due to expectancy effects. An alternative approach is to use a “control training” group, which matches the treatment group on time and effort invested, but is not expected to benefit from training (groups receiving control training are sometimes referred to as “active control” groups). For instance, in Persson and Reuter-Lorenz (2008), both trained and control subjects practiced a common set of memory tasks, but difficulty and level of interference were higher in the experimental group’s training. Similarly, control training groups completing a non-adaptive form of training (Holmes et al., 2009; Klingberg et al., 2005) or receiving a smaller dose of training (one-third of the training trials as the experimental group, e.g., Klingberg et al., 2002) have been used as comparison groups in assessments of Cogmed variants. One recent study conducted in young children found no differences in performance gains demonstrated by a no-contact control group and a control group that completed a non-adaptive version of training, suggesting that the former approach may be adequate (Thorell et al., 2009). We note, however, that regardless of the control procedures used, not a single study conducted to date has simultaneously controlled motivation, commitment, and difficulty, nor has any study attempted to demonstrate explicitly (for instance through subject self-report) that the control subjects experienced a comparable degree of motivation or commitment, or had similar expectancies about the benefits of training.

Another way to address expectancy effects is to examine the selectivity or systematicity of transfer. For example, Chein and Morrison (2010) reported selective transfer from training to measures of cognitive control and reading comprehension, but not to reasoning or fluid intelligence measures. If post-test gains were attributable to effort or expectancy differences between the trained and control subjects, then more ubiquitous transfer might have been observed. Chein and Morrison (2010) further observed that the magnitude of training-related increases in WM task performance predicted the magnitude of transfer to reading comprehension. A similar relationship between training and transfer gains was reported in Schmeidek et al., where change in the trained WM task and a latent variable of near transfer WM measures were highly correlated (Schmiedek et al., 2010). They argued that such systematic relationships suggest a direct impact of training on transfer task performance, and offer some protection against expectancy confounds.

Shared components of the training and assessment tasks

WM training studies often test generalization from training with only a single task, but interpret observed improvements on that task as reflecting gains in some broadly defined cognitive ability. For instance, we claimed that WM training can improve reading comprehension (Chein & Morrison, 2010), but this finding was only demonstrated on a single measure of reading skill (Nelson Denny Reading Comprehension Assessment). Similarly, Jaeggi et al. (2008) claimed to have demonstrated an impact of WM training on fluid intelligence, when in fact, they only demonstrated the significant impact of training on a single task (Bochumer Matrizen-Test, BOMAT) that has been used to assess fluid intelligence. As Moody (2009) points out in a critique of Jaeggi et al. (2008), the interpretation of these findings is problematic because generalization may be the result of idiosyncratic relationships between the trained and assessment tasks, and not tied to enhancement of the underlying ability thought to be measured by the assessment task (e.g., reading comprehension, fluid intelligence). In his example, Moody argues that the BOMAT measure of intelligence used by Jaeggi and colleagues directly relies on the ability to store information in spatial WM, which is precisely the skill that is practiced during training. Thus, generalization to the BOMAT is simply the result of practice with spatial storage, and not the result of improved fluid intelligence per se. We suggest that by using multiple overlapping assessments to index a latent cognitive ability one could minimize these concerns. In the psychometric literature, it is common to use several measures that putatively assess an underlying psychological construct and to extract their shared variance as an index of individual subject differences within that construct. To date, only one WM training study has demonstrated far transfer using this latent variable approach (Schmiedek et al., 2010). In that study, training was shown to transfer not only to a single assessment task, but to the latent variable derived from multiple indices of fluid intelligence (e.g., multiple subtests of the Berlin Intelligence Structure Test and Raven’s Advanced Progressive Matrices). Future demonstrations of transfer to latent construct variables derived from composites of several tasks will strengthen the case that training impacts underlying cognitive abilities rather than task specific factors.

Lack of consistency in experimental methods and findings

Another concern about the current corpus of WM training studies is that there is almost no standardization and little convergence of findings. To begin, each research group has a different, favored approach to training (see Fig. 1 for examples), and there are very few cross-laboratory replications of a given WM training protocol. Moreover, to date there is not a single published comparative efficacy study. Thus, we are unable to determine whether a given approach to training provides a more effective tool for cognitive enhancement, or whether different training regimes may be used to differentially target particular cognitive skills.

One issue of particular concern is the variety of comparison groups used in WM training studies (see above for specifics). Unsurprisingly, reported training benefits can be drastically impacted by the qualities of the control group. Comparison to no contact control group may cause inflated estimates of training gains. Meanwhile, studies using only a tightly matched control group may yield small effect sizes that are difficult to interpret. That is, small training effect sizes may indicate either weak training benefits, or unanticipated cognitive enhancement associated with the control protocol.

There is also surprising variability in the particular training and transfer skills that are assessed. Such variability is apparent in the range of cognitive skills examined across studies, and some of the reported transfer findings that have not been replicated (i.e., reading comprehension, Chein & Morrison, 2010). Moreover, cross-study inferences are difficult because the particular instruments used to assess a given skill are also widely varying from study to study. For example, in the studies we have reviewed, over 30 different measures were used to determine the impact of WM training on “temporary memory”, with great inconsistency in the choice of stimuli, timing parameters, etc.

A glance at Table 1 shows that cognitive control and general fluid intelligence are the two most frequently tested transfer measures. Yet, rather than supporting a consensus claim regarding the benefits of training in these areas, a comparison of the results from different training studies prompts uncertain conclusions. Of the several studies examining transfer to the Stroop task, a cognitive control measure, some show successful transfer (Chein & Morrison, 2010; Klingberg et al., 2002, 2005; Olesen et al., 2004), but others show failed transfer (Dahlin, Nyberg et al., 2008; Thorell et al., 2009; Westerberg et al., 2007; Wykes et al., 1999). Furthermore, even among studies measuring Stroop performance, both the specific task administration and key outcome variables (accuracy or RT) are varied. In the case of general fluid intelligence, some WM training studies again report training-related improvement (Jaeggi et al., 2008; Klingberg et al., 2005, 2002; Schmiedek et al., 2010), while others report insignificant gains (Chein & Morrison, 2010; Dahlin, Nyberg et al., 2008; Holmes et al., 2009, 2010; Thorell et al., 2009).

Unsurprisingly, these incongruous findings have raised a debate in the literature; some authors stress cautious interpretation of positive transfer results (Conway & Getz, 2010; Moody, 2009), while others are more optimistic about the implications of successful transfer (Klingberg, 2010; Perrig et al., 2009). This debate highlights a major obstacle in drawing conclusions from the current body of WM training studies; a lack of consistency in the methodologies of these studies makes it difficult to make sweeping claims about the efficacy of training. Are conflicting findings due to the differential efficacy of alternative training paradigms, or due to differences in experimental procedures?

Even beyond obvious differences in the specific training and assessment tasks used by separate studies, there are a number of design issues that may further confound interpretation and comparison of training results. Areas of divergence in experimental procedures include: the timeline of training and assessments (e.g., length of training sessions, overall duration of training period, number of assessment sessions), conditions of assessment (e.g., comfort and location of assessment, encouragement level by lab staff), setting of training (e.g., laboratory, school, home), and the particular control groups that are used. These variables can have a profound impact on training outcomes. To illustrate the problem, consider how the number of measures used at the time of assessment may influence the results. Since training studies are both time consuming and costly, there is an impetus to collect data from a large number of assessment tasks. However, studies that include multiple assessment tasks are more susceptible to confounds due to subject boredom and exhaustion (i.e., the quality of the assessment data is diminished over the assessment session). There may also be unanticipated benefits of assessing multiple tasks together, wherein the combination of assessment measures themselves yield cognitive gains (Salthouse & Tucker-Drob, 2008). This “test taking” effect may dampen apparent training effect sizes because testing itself may confer some cognitive benefits to even control participants. Finally, the use of multiple assessment measures brings with it the need to correct for multiple comparisons, which reduces the statistical power to detect a given training gain.

Neural correlates of WM training

Recent research into the neural underpinnings of WM training has attempted to clarify the brain mechanisms that are influenced by training, and the findings have been lauded as providing support for the positive behavioral results (Klingberg, 2010). To date, there have been just a handful of neuroimaging studies attempting to characterize the neural changes associated with core WM training (Brehmer et al., 2009; Dahlin, Neely et al., 2008; Hempel et al., 2004; McNab et al., 2009; Olesen et al., 2004; Takeuchi et al., 2010; Westerberg & Klingberg, 2007), but the results are generally consistent in showing that training results in activity changes within a network of brain regions previously implicated in domain-general aspects of WM (e.g., dorsolateral prefrontal cortex, posterior parietal cortex, basal ganglia) (Wager & Smith, 2003). Few studies have examined the neural correlates of transfer from the trained task to untrained measures. One hypothesis is that transfer should occur specifically when the trained and transfer tasks recruit overlapping cortical regions (Jonides, 2004; Olesen et al., 2004), and this hypothesis gains support from findings demonstrating that transfer from WM training to performance improvements in the Stroop task is accompanied by increased activation in the dorsolateral prefrontal cortex, a region implicated in both WM and Stroop task performance (Jonides, 2004; Olesen et al., 2004). Similarly, an investigation of updating training (Dahlin, Neely et al., 2008) found overlapping striatal (basal ganglia) activation between a trained WM updating task and an instance of successful transfer (3-back), while no such striatal overlap was found between the trained task and an instance of failed transfer (Stroop).

Since individual differences in WM capacity are correlated with the structural integrity of white matter pathways connecting domain-general regions within the fronto-parietal network (Klingberg, 2006), one might further expect that WM training should impact the connectivity of this system. In accord with this expectation, a recent study using fractional anisotropy to reveal changes training-dependent changes in brain connectivity reported that WM training increased structural connectivity in white matter pathways within the parietal cortex (Takeuchi et al., 2010).

Efforts to discern the neurochemical underpinnings of cognitive training may further inform our understanding of the affected brain mechanisms. Cortical dopamine release is thought to serve an important gating function in WM (Gruber, Dayan, Gutkin, & Solla, 2006). In one recent study, it was found that WM training alters cortical dopamine D1 receptor binding potential in prefrontal and parietal areas (McNab et al., 2009). Relatedly, variations in a dopamine transporter gene (DAT1) have been found to predict the size of WM training benefits obtained in individual subjects (Brehmer et al., 2009).

Together, the neuroimaging results are consistent with the coarse grained claim that core WM training targets executive WM mechanisms, and that increased engagement of these mechanisms supports transfer. However, there are several caveats to be considered in interpreting the neuroimaging findings. To begin, the experimental designs used in these studies suffer from the same limitations as were discussed above (lack of construct measures, no-contact control groups, etc.). Moreover, the implicated fronto-parietal network is associated with many difficult cognitive tasks, and thus, difficulty may confound the relationship between WM and fronto-parietal engagement (Barch et al., 1997). So, we should be cautious in attributing these results specifically to WM processes, as opposed to more generic influences of cognitive engagement or arousal. Finally, increased regional output might, as suggested in most training studies in which increases are observed, reflect improved function (e.g., through cortical recruitment or increased reliance on strategic processes), but might also simply signal more effortful processing (Kelly & Garavan, 2005); and decreased activity associated with improved neural efficiency would have been an equally explainable result (Chein & Schneider, 2005; Landau, Schumacher, Garavan, Druzgal, & D'Esposito, 2004).

Reflections and future directions

The studies referenced above constitute an important initial step toward understanding the malleability of WM capacity and the design of effective WM training programs. Having considered the training literature, we can now return to the primary question of this paper: Does WM training work? Specifically, does WM training yield generalized cognitive enhancement? In the case of core training, our answer is a tentative yes. Studies of core training show improvements in a variety of areas of cognition (e.g., cognitive control, reading comprehension), persist even with the use of tightly matched controls, and are consistent with neuroimaging studies demonstrating activation changes in regions associated with domain-general cognitive performance. Core WM training thus represents a favorable approach to achieve broad cognitive enhancement, though clearly, important checks on the validity and interpretation of extant training findings are needed before we can definitively answer, yes, WM training works.

As was discussed above, our confidence in the interpretation of current WM training studies is diminished by the great variability that exists across studies. Such variation is perhaps useful in displaying the breadth of possible approaches to training and the extent of training benefits. Such variability is also to be expected given the relative nascence of the WM training field, and the desire to explore the space of possible training benefits for different cognitive skills and quality of life indicators. However, without standardization and comparative studies it is impossible to adjudicate whether contradictory results are due to lab specific or paradigm specific differences. Overall, the methodological inconsistencies do not invalidate the results from individual studies. Instead, they necessitate caution as one attempts to infer the implications and boundaries of apparent training benefits.

Agreement in future studies on two fronts could have particular impact on our ability to integrate and synthesize the findings. The first is an increased level of standardization in the choice of pre- and post-training assessments. At present, there is an emerging consensus for the inclusion of cognitive control (e.g., Stroop) and general fluid intelligence assessments, two skills that are evaluated in several training studies. Interestingly, there are instances of both successful and failed transfer for each construct. We suggest that these two constructs may thus serve as a useful benchmark for comparison of different training protocols, and we encourage their inclusion (preferably using normative instruments) in future studies in order to support such comparison.

As in the case of the assessment tasks, some parity in the types of control groups would benefit the field. In order to more strongly rule out confounding variables, the field may be motivated to move toward the use of active control groups (control training) whose experience is closely matched to the training group. Recent studies suggest the use of non-adaptive training variants, less intense forms of training, or a placebo training group to serve as active controls. However, the specific characteristics of a matched control group likely would vary according to the particular training paradigm under investigation, and thus add to the problem of cross-study variability. Accordingly, we suggest that the field would be aided by the further inclusion of a no-contact control group or the use of a consensus control training paradigm (though no specific candidate control protocol has emerged).

Future studies will also need to more explicitly clarify the specific mechanisms that beget training gains. Subjects’ post-training performance may improve through many possible routes: e.g., more efficient encoding of individual task stimuli; increased familiarity with the stimulus pool; acquisition of a stimulus specific chunking, maintenance, or elaborative encoding strategy; acquisition of a more effective task strategy; increased overall speed of processing; improved control over attention; increased ability to suppress sources of distraction; better ability to coordinate task demands; improved general test taking skills; changes in mood, self esteem, or confidence; etc. Successful transfer from training may similarly depend on many alternative mechanisms, and different training paradigms may act on distinct mechanisms. Knowledge of the different impacts of specific WM training procedures could afford opportunities to combine training programs and thereby alter cognition on multiple levels. Studies targeted at determining the specific “levels of action” for a given training protocol could thus greatly advance the state of the field.

Another remaining question is whether the magnitude of training benefits varies based on the characteristics of an individual. Qualities of an individual, such as initial performance on a set of cognitive measures, age, or level of education may be important predictors for training gains, and training approaches might vary in their appropriateness for different individuals. For instance, some types of training could impact low-span individuals but not high-span individuals, and vice versa (see Turley-Ames & Whitfield, 2003). Further probing of this issue would speak to the usefulness of training in clinical and aging populations, as well as the potential for matching particular training programs to particular individuals in order to optimize gains.

In this nascent field of WM training there are also a number of other questions that remain to be answered: How much training is necessary to produce a given level of improvement? Is a “maintenance” schedule after training useful to sustain training benefits? What are the limits to the scope and magnitude of transfer-from-training? Can different WM training paradigms be combined to affect greater impact on cognitive performance? We can only hope that the answers to these questions, among others, will emerge as the field grows.