When a speech signal is degraded, requiring perceptual effort for successful word recognition, recall for the speech content suffers. This is so whether perceptual effort is engendered by acoustic masking of speech stimuli for listeners with normal hearing acuity (Farley, Neath, Allbritton, & Surprenant, 2007; Murphy, Craik, Li, & Schneider, 2000; Surprenant, 1999) or by a mild-to-moderate hearing loss (McCoy et al., 2005; Rabbitt, 1990). Of importance, this negative effect on recall occurs even when the level of acoustic masking or the degree of hearing loss still allows for successful, albeit effortful, word recognition. This is an intriguing and much replicated phenomenon, but one whose source is not yet fully understood.

In the above-cited studies, masking has been applied to entire word lists or to sets of paired associates, thus obscuring the mechanisms that might underlie this effect. In a closer examination of the effect, Piquado, Cousins, Wingfield, and Miller (2010) demonstrated that masking just a single word in a spoken word list reduces the probability of recall, not only for the masked word itself, but also for an unmasked word prior to it. These data are shown in Fig. 1, demonstrating that, relative to the same word positions in control lists, a significant reduction was observed in the probability of recall for both the masked word (M) and the prior word (–1), but not for words that followed the masked word. Importantly, this effect of masking occurred even though the level of masking allowed for correct, albeit effortful, identification of the masked word.Footnote 1

Fig. 1
figure 1

The lower curve shows the probability of recall of an acoustically masked word (M), the two words preceding the masked word (–1, –2), and the two words following the masked word (+1, +2). The upper curve shows the probability of recall for words in analogous positions in a control list in which no words were masked. The lists comprised seven items, presented at a rate of 1,500 ms per word, and the position of the masked word varied from list to list. Error bars represent one standard error. The data are from “Effects of Degraded Sensory Input on Memory for Speech: Behavioral Data and a Test of Biologically Constrained Computational Models,” by T. Piquado, K. A. Q. Cousins, A. Wingfield, and P. Miller, 2010, Brain Research, 1365, pp. 48–65. Copyright 2010 by Elsevier. Adapted with permission

It was argued that the effect of a masked word on prior-word recall resulted from a disrupted output pattern during recall. That is, for lists in which all of the words are presented clearly, it is typically found that participants tend to make contiguous transitions during recall. Specifically, recall of any word in a list, n, is most likely followed by recall of the next word in the presented sequence, n + 1 (Howard & Kahana, 1999; Kahana, 1996). An analysis of the Piquado et al. data indicated that the occurrence of a masked word resulted in fewer such n + 1 transitions between early list items, suggesting that this may have underlain the diminished recall for these items.

In a general sense, these results appear consistent with a “resource” account of the negative effects of perceptual effort on recall. This is the postulate that perceptual identification of a stimulus and higher-order processing draw on the same pool of limited resources (Murphy et al., 2000; Rabbitt, 1968, 1990; Schneider & Pichora-Fuller, 2000; Surprenant, 1999, 2007). As a result, a degraded stimulus would require the dedication of more resources for item recognition, such that higher-order processing—in this case, encoding the stimulus in memory—would suffer (McCoy et al., 2005; Pichora-Fuller, Schneider, & Daneman, 1995). This would account for the poorer recall of a word prior to a masked word, so long as the prior word were still in the process of being encoded during the masked word’s presentation. Although this notion of perceptual or cognitive resources has descriptive appeal (Kahneman, 1973), a testable model of the mechanisms underlying both the general and order-specific recall effects of noise masking has until now been lacking.

To provide such an account, we developed the linking-by-active-maintenance model (LAMM). This model is a development of the “linking buffer model” proposed in Piquado et al. (2010; see also Miller & Wingfield, 2010). LAMM is based on dual-mechanism models of free recall (Davelaar, Goshen-Gottstein, Ashkenazi, Haarmann, & Usher, 2005; Lehman & Malmberg, 2013; Raaijmakers & Shiffrin, 1981), as well as on temporally proximate linking models (Howard & Kahana, 2002; Lehman & Malmberg, 2013). LAMM builds upon current process models of free recall, however, by accounting for the effects of degraded input on both item recall and on recall orders.

In the next section, we describe LAMM, outlining its mechanisms of encoding and recall. We give special attention to how recall is instantiated within the model and to the role of a limited-capacity memory buffer. We tested the model by examining the predictions that it makes regarding the effects of presentation rate on both recall probability and the output patterns during recall. We then discuss how these predictions are fundamentally different from those made by models lacking a limited-capacity buffer, even if they were modified to incorporate the effects of stimulus clarity. We follow the model presentation with description of a behavioral experiment to test these predictions. Our goal was to present a model that makes specific behavioral predictions, is consistent with biological principals, and can account for dynamic changes in recall under different stimulus protocols.

LAMM: A dual-mechanism account for effects of perceptual effort on recall

An adequate model for the effects of a degraded but still perceptible stimulus on recall must account for three key findings: (1) reduced recall of a degraded, acoustically masked word, even though it is audible; (2) reduced recall probabilities for the one or two words prior to the masked word, even though these words were not masked (a prior-word effect); and (3) reduced n + 1 transition probabilities in orders of recall in the presence of masking. We suggest that models of episodic recall possessing a limited-capacity, short-term memory buffer with an associative-linking mechanism (Davelaar et al., 2005; Lehman & Malmberg, 2013; Raaijmakers & Shiffrin, 1981) can account for such findings, but only insofar as such models incorporate the assumption that the masking of a word increases the likelihood of buffer disruption. That is, extant models of episodic recall assume that a list is presented with equal clarity across items. LAMM differs from other dual-store models in its focus on the case in which all words are not of equal clarity, a situation not uncommon in everyday listening with intermittent or modulated background noise (Cooke, 2006; Gustafsson & Arlinger, 1994).

Computationally, we have implemented LAMM as a Markov chain—a stochastic model whose transitions depend on the state of the system—with two stages. During the presentation and encoding stage, the state of the system is defined by the list of words presented and the state of a capacity-limited memory buffer. During the recall stage, the system state is defined by the encoded links, the buffer occupancy, and those words already output. The model outputs simulated recall sequences, which we analyze in the same way as the behavioral data.

Hebbian plasticity strengthens synaptic connections between coactive neurons, preferentially in the direction of the activation sequence (Bi & Poo, 1998; Hebb, 1949; Pfister & Gerstner, 2006). Such strengthening of connections between neurons (Erickson, Maramara, & Lisman, 2010) can enable successive recall of items represented by the activity of those neurons (Miller & Wingfield, 2010). Within our model, such biological mechanisms produce associative linking, by two processes.

First, direct linking occurs by the successive activation of neurons representing words as they are perceived. Considering perception as “winner takes all” (Pressnitzer & Hupé, 2006; Renart, Moreno-Bote, Wang, & Parga, 2007), we assume that the perception of each new item causes decay of the activity of neurons representing the prior item. Such successive activation leads to a directional “forward” link between successive items, promoting recall in serial order (Miller & Wingfield, 2010), a finding routinely observed in free recall (Howard & Kahana, 1999; Kahana, 1996).

Second, indirect linking can occur via a two-step process involving a limited-capacity memory buffer. The buffer operates similarly to previous dual-mechanism models of recall (e.g., Lehman & Malmberg, 2013; Raaijmakers & Shiffrin, 1981) and can be characterized as persistent neural activity (Fuster, 1973; Goldman-Rakic, Funahashi, & Bruce, 1990) or periodic reactivations, as was suggested by theories of rehearsal or “articulatory loops” (Baddeley & Hitch, 1974; Lisman & Idiart, 1995). Changes in buffer activity with each new word presentation can cause loss of maintained items, and thus, capacity is limited. In LAMM, linking occurs from neurons representing a word to buffer neurons, and from buffer neurons to neurons representing another word. This buffer-mediated process can produce stronger associative links than can direct linking alone, because the period of coactivation is longer—between item-representing neurons and buffer neurons, as well as between different buffer neurons (Miller & Wingfield, 2010). These links that have been established during encoding shape the retrieval of items during recall.

Predictions from the model

LAMM proposes that the observed transitions in recall are the manifestations of associations created between items during listening, and that the utilization of these associations aids recall and influences output order, a principle outlined in Raaijmakers and Shiffrin’s search-of-associative-memory model (SAM; Atkinson & Shiffrin, 1968; Raaijmakers & Shiffrin, 1981). Although these effects depend on the temporal relationships of presented words, unlike unitary models of episodic recall such as the temporal-context model (TCM; Howard & Kahana, 2002) and the “scale-independent memory, perception, and learning” model (SIMPLE; Brown, Neath, & Chater, 2007), in LAMM these associations are not based on a representation of the list in a temporal or temporal-context space, but are mediated by associations between the items themselves. Like dual-store models (Craik, 1968; Davelaar et al., 2005; Lehman & Malmberg, 2013; Raaijmakers & Shiffrin, 1981; Usher, Davelaar, Haarmann, & Goshen-Gottstein, 2008), LAMM possesses two biophysically based mechanisms for maintaining information—through a strengthening of associations, and through active maintenance in a buffer.

LAMM reproduces the common finding that slower presentation rates yield an overall improvement in recall performance (Craik & Rabinowitz, 1985; Murdock, 1962; Murdock & Okada, 1970; Vitulli & McNeil, 1990), for reasons in common with several other buffer models (Davelaar et al., 2005; Lehman & Malmberg, 2013; Raaijmakers & Shiffrin, 1981): The longer that two items are coactive in the store, the greater the strengthening of the associative connections between them. Such behavior fits the current understanding of activity-dependent or spike-dependent plasticity: The longer any two groups of neurons are coactive, the more spikes their neurons exchange with one another, and the more spike-pairing events are available to trigger synaptic strengthening between the groups.

In LAMM, however, presentation of a masked word disrupts the buffer store activity, reducing this coactivity and the resultant store-dependent associative strengthening. Because associations are developed only between words already heard, only prior associations are affected by such disruption. The reduction of association strengthening during list presentation leads to reduced n + 1 transitions among prior words during recall, and therefore poorer memory for them. Moreover, since masking impacts only buffer-mediated indirect linking in LAMM—direct linking remains intact—it is precisely the process that improves recall with slower presentation rate that is degraded by masking. Therefore, the detrimental effect on recall for words prior to the masked word is attenuated with a faster presentation rate. Thus, unlike shared-resource models (Schneider & Pichora-Fuller, 2000), LAMM does not predict a more detrimental effect of masking on recall of the prior word when the presentation rate is faster; if anything, the effect should be smaller.

Method

Participants

The participants were 43 native speakers of English aged 18–30 years. All met a criterion of age-normal hearing based on pure-tone thresholds averaged across 1000, 2000, and 4000 Hz (Hall & Mueller, 1997) and speech reception thresholds measured by the Central Institute for the Deaf W-1 list of spondee words (Auditec, St. Louis, MO). Six of the 43 participants were unaffected by the level of masking used in this experiment, defined as those individuals who did not show a recall decrement for masked words relative to words in the same position in the control lists in which no words were masked, and they were not included in our data analyses.Footnote 2

Stimuli and behavioral procedures

Participants heard 60 lists of seven words each selected randomly from the Toronto Word Pool (Friendly, Franklin, Hoffman, & Rubin, 1982). In this immediate free-recall task, a list length of seven words was used because medium-length lists permit the most variability in recall strategy (Ward, Tan, & Grenfell-Essam, 2010). Participants were instructed to listen to each list carefully, and once it ended, to recall as many of the seven words as possible in the order that they came to mind (any order was permitted). Each list was prepared under two conditions: a masked condition and a control condition. Word lists in both conditions were presented at a uniform level of 40 dB HL. Stimuli were presented binaurally over earphones in a sound-isolated testing room.

Masked condition

In the masked condition, one of the seven words was masked by simultaneously playing 20-talker babble at 38 dB (Leigh-Paffenroth & Murnane, 2011). This represents a signal-to-noise ratio (SNR) of +2—a level allowing for generally successful word recognition, albeit with perceptual effort. The position of the masked word was varied randomly across lists, as either the 3rd, 4th, or 5th word in the seven-item list. To reduce the acoustic contrast between the masked word and neighboring words, we smoothed the transition by ramping up the babble intensity for the 200 ms preceding the masked word and ramped the babble down for the 200 ms following the word. To further reduce a potential isolation, or von Restorff, effect (Fabiani & Donchin, 1995; von Restorff, 1933), all of the other words in the list were played with a low level of babble at 32 dB. This represents an SNR of +8, a level that allows for easy audibility (Jerger & Hayes, 1977).

Control condition

Lists in the control condition had the same construction as in the masked lists, except that no single word was masked, with all seven words in the lists being played with the same low-level babble (+8 SNR) as the nonmasked words in the masked lists. Stimuli were presented in a mixed-list design, such that participants did not know in advance whether a list would be a control list or a list with a masked word, and if the latter, which of the three possible positions would be masked. Each participant heard all 60 word lists: half in the control condition and half in the masked condition.

Presentation rates

Each participant heard half of the lists at a rate of one word per second, and half at a slower rate of one word per 2 s, with 15 lists at each rate being heard in the masked condition, and 15 in the control condition. The fast- and slow-presentation-rate lists were blocked, with half of all participants hearing the fast lists first and half receiving the slow lists first. Lists were counterbalanced across participants such that, by the end of the experiment, each list was heard an equal number of times in the masked and control conditions, as well as both rate conditions.

The acoustic properties of the control and masked conditions are shown in Fig. 2. The top black waveform illustrates the seven words in the slow-rate control condition, with continuous low-level background babble being represented in lighter gray (+8 SNR). Below this is the waveform for the slow-rate masked condition, illustrated for a word list with the masked word in the fourth position in the seven-item list. It can be seen that the level of background masking, shown in light gray, is increased to +2 SNR during presentation of the masked word. The two lower waveforms illustrate the control and masked conditions for the fast presentation rate.

Fig. 2
figure 2

The top waveform represents each of the words in the seven-item list (black vertical departures from the center baseline) relative to the continuous low-level background babble, represented in lighter gray, for a slow-presentation-rate list. Below this is the slow-rate waveform for the masked condition, with the masked word shown in the fourth position in the seven-item list. The words and background babble are again shown in black and light gray, respectively. The lower two waveforms show the same conditions for the fast-rate presentations

Audibility check

To ensure that both the masked and control words were audible, upon completion of the experiment, all previously presented masked words and an equal number of control words were presented again, with instructions to immediately repeat each word. Recognition accuracy was 98 % for the control words (SD = 2.3) and 87 % for the masked words (SD = 6.7) [t(37) = 8.91, p < .001]. Errors in the masked condition sometimes reflected a misidentification (e.g., “folly” reported as “volley”). To reduce the effect of such recognition errors on scoring, recall in the main experiment was scored as correct if the participant recalled a word as they had reported it in the audibility check.

Methods for modeling and simulations

In our modeling procedures, we labeled words with the integers 1 to N (N = 7 in the present study) according to presentation order, setting \( \tilde{w}=3,4\kern0.5em \mathrm{or}\kern0.5em 5 \) as the degraded word in the masked condition. Buffer occupancy is recorded as a binary vector B (t), with B t i  = 0, 1, where t corresponds to the tally of words presented thus far, and i denotes the word. A “link matrix,” J ij , encodes association, with the strength of the link from word positions j to i being given by

$$ {J}_i{}_{\leftarrow j}={\displaystyle \sum_{{{}_n}_{=1}}^N\left\{\mu {\delta}_{j,n}{}_{\hbox{--} 1}{\delta}_{i,n}+\nu {\delta}_{j,n}{\delta}_{i,n}{}_{\hbox{--} 1}+{B}_j^n{B}_j^n\left(\gamma {t}_{pres}+\eta {\delta}_{i,n}\right)\right\}}, $$

where the summation over n sums the contributions to the ij link from the state after each word presentation, and the Kronecker-delta δ i,j equals 1 if i = j, and 0 otherwise. The variables μ and ν are the forward and reverse direct link strengths, respectively (μ >> ν); γ is the linking rate, determining the strengths of the bidirectional associations between items coactive in the buffer; and η is a buffer-mediated link from active buffer items to the newly entered item. Dependence on presentation rate is through buffer-mediated linking, via the term t pres, which is the time between word presentations, our principal parameter of interest.

Limited capacity of the buffer

We set a “soft” bound to buffer capacity by increasing the probability, P, that an item could be lost from the buffer as more items were maintainedFootnote 3 via a power law, P = χk λ, where k is the buffer occupancy. The parameter χ directly impacts buffer size, whereas the parameter λ determines how strongly the likelihood of each word’s loss from the buffer depends on the current occupancy (λ = 0 means independent of occupancy). The values of χ (0.001) and λ (3.84) obtained from fitting to recall data produced mean maximum buffer occupancies of ~3.5 words.

Implementation of masking in LAMM

Three additional parameters determine how noise masking disrupts buffer activity. First, masked words may fail to enter the buffer, with probability ζ. Second, upon presentation of a masked word, the buffer disruption probability increases by an amount, ξ. Given that masking could disrupt attention, we also included a parameter, ρ, that determines the geometric decay of the effect of masking on later words and represents how rapidly buffer activity recovers following the masked word.

For both masked and nonmasked words, resistance to disruption increases while a word is being maintained in the buffer (buffer potentiation, denoted b (t)). This follows the dual-store framework (Atkinson & Shiffrin, 1968; Davelaar et al., 2005; Raaijmakers & Shiffrin, 1981; Rundus, 1971; Usher et al., 2008), in which extra rehearsal or extended coactivity enhances associations with long-term memory. Buffer potentiation is proportional to the interword presentation time, t pres, and resistance to disruption, κ. The potentiation of word i after the presentation of the nth word is thus:

$$ {b}_i^n={\displaystyle \sum_{{{}_m}_{=1}}^n{B}_i^mk{t}_{pres}}. $$

Computationally, stochastic retention (\( \overline{F}=1 \)) or loss (\( \overline{F}=0 \)) of a word from the buffer is realized via the comparison of each probability of loss, p, to a pseudorandom number on the unit interval, such that

$$ \overline{F}\left[p\right]=\left\{\begin{array}{l}\begin{array}{ll}1,\hfill & p<\mathrm{random}\ iid\in \left(0,1\right)\hfill \end{array}\hfill \\ {}\begin{array}{ll}0,\hfill & p\ge \kern0.4em \mathrm{random}\ iid\in \left(0,1\right)\hfill \end{array}.\hfill \end{array}\right. $$

The buffer’s state depends iteratively on its previous state, so that following presentation of the nth word, the presence (B n i  = 1) or absence (B n i  = 0) of the ith word is given by

$$ {B}_i^n=\left({B}_i^{n-1}+{\delta}_{i,n}\ \overline{F}\left[\zeta \kern0.1em {\delta}_{i\tilde{w}}\right]\right)\overline{F}\left[\chi\ {\left({\varSigma}_j{B}_j^{n-1}\right)}^{\lambda }-{b}_i^{n-1}+\xi\ \left({\delta}_{n,\tilde{w}}+\rho \kern0.1em {\delta}_{n-1,\tilde{w}}+{\rho}^2{\delta}_{n-2,\tilde{w}}\right)\right]. $$

Recall in LAMM

In the model, recall is initiated stochastically according to the behaviorally obtained probabilities of which items are recalled first (probability of first recall; PFR). Model recall then proceeds according to either the encoded item–item associations or the current buffer activity. Two final parameters, α and β, govern the trade-off between recall via item–item associations, which we attribute to synaptic connections strengthened by buffer activity, versus recall via direct output of the final buffer activity. The probability of recalling word i directly after having recalled word j is given by

$$ P\left(\ {r}_s=i\ \left|\ {r}_{s-1}\right.=j\right)={\left({J}_{i\leftarrow j}+\beta\ {B}_i^s\right)}^{\alpha }/{\displaystyle \sum_{m\notin {\left\{{r}_k\right\}}_{k=1}^{s-1}}}{\left({J}_{m\leftarrow j}+\beta\ {B}_m^s\right)}^{\alpha }, $$

which is the link strength J ij plus a bias (β) if word i is currently in the buffer, normalized by the sum of all remaining link strengths from j to nonrecalled items. α adds a nonlinearity to the conversion of link strengths to probabilities, enhancing or dampening the contrast between competing links. For high α, links are followed according to the rank order of their strengths, whereas with α = 0, recall is independent of link strength.

The buffer can be disrupted during recall in order to produce output interference (Roediger & Schmidt, 1980):

$$ {B}_i^{N+k}={B}_i^{N+k-1}\overline{F}\left[\ \varphi\ {\left({\varSigma}_j{B}_j^{N+k-1}\right)}^{\lambda }-{b}_i^N\right], $$

where \( {B}_i^{{{}^N}^{+k}} \) is the state of the buffer after the kth recall. Recall terminates if the remaining links are weak and no words remain in the buffer, according to

$$ P\left({r}_s=\varnothing \left|{r}_{s-1}\right.=j\right)={\displaystyle \prod_{m\notin {\left\{{r}_k\right\}}_{k=1}^{s-1}}}{\left[1-{J}_{m\leftarrow j}\right]}^{+}. $$

Simulation of the recall stage may stop before all words are recalled, with the number of words recalled thus being N rN.

For each condition and parameter set used, we simulated 300,000 trials of the model, producing minimal error bars for the model data. A summary of the model parameters and their values is given in Table 1.

Table 1 Model parameters

Optimization of the model

We scored the performance of the parameter sets using a weighted sum of the squared, scaled residuals between the model and experimental recall measures. Measures were optimized for recall accuracy of the masked word and the words immediately preceding and following it, the lag-CRP curves, and the output transition probabilities relative to a masked word. Recall accuracy (Fig. 3 in the Results) was most heavily weighted, such that the main focus of this investigation—the masking-induced deficits in recall—was the dominant constraint on the model fitting.

Fig. 3
figure 3

Effects of masking a stimulus word on item recall. (A) Probability of recalling an acoustically masked word (M) and the two words preceding it (–2, –1) and following it (+1, +2), for the slow presentation rate. The upper curve shows recall for analogous positions in a control list in which none of the words was masked. (B) The same data for the fast presentation rate. (C and D) Simulation results for the slow and fast presentation rates, respectively. Error bars represent one standard error; error bars that do not appear were too small to plot

A pattern-search global optimization (Hooke & Jeeves, 1961) was performed using the MATLAB (MathWorks, Natick, MA) computational suite’s Global Optimization Toolbox. This algorithm was chosen for its ability to find global minima of high-dimensional stochastic objective functions, such as our simulated recall measures. The algorithm explores multiple local minima to find a global solution via an iterative procedure. To improve the robustness of the process, in alternate iterations we employed the N + 1-basis-vector generalized pattern search and the 2 N-basis mesh adaptive direct search algorithms.

Testing presentation rate in LAMM

To test whether LAMM accounts for the effects of presentation rate, we fitted LAMM to the 1-s (faster) presentation rate data, and then validated against the 2-s (slower) data set. Given that the only explicit dependence on time in LAMM is through the variable t pres, which sets the time, in seconds, between word presentations (1 or 2 s), these 2-s model validations against the behavioral data had no free parameters: The time variable t pres was simply doubled.

Since LAMM uses the experimentally observed PFR to initiate model recall, it is necessary to avoid the possible confound that differences in PFR between conditions might account for any observed differences in overall recall accuracy or output ordering. Therefore, a single PFR was used for all simulated conditions: namely, the average PFR across conditions. Given such a fixed PFR, we could test whether changes within LAMM alone could explain the key behavioral consequences of altered presentation rate and masking.

Results

Effects of a masked word on item recall

Behavioral results

Table 2 shows the mean recall probabilities for each serial position in the control lists and for lists in which the words in Positions 3, 4, or 5 were masked. It can be seen that some nonsystematic variability occurred across the three masked positions, but with the strongest effect appearing when Position 3 was masked. Figure 3 shows these recall probabilities averaged over the three word-masking positions and collapsed across positions relative to the masked word (M), along with these same relative positions in the control lists. Figures 3A and B show these data for the slower (one word every 2 s) and faster (one word per second) presentation rates, respectively.

Table 2 Probabilities of item recall

As would be expected from the recall literature, for control lists, in which all words were clearly audible, a comparison of Figs. 3A and B shows better recall for the slower than for the faster presentation rate (Murdock, 1962), with an average of 4.76 (SD = 0.64) words being recalled in slower-presented control lists, and 4.33 words (SD = 0.63) in faster ones (p < .001). A 5 (position: –2, –1, M, +1, +2) × 2 (condition: masked, control) × 2 (presentation rate: fast, slow) mixed-effects analysis of variance (ANOVA) showed this significant main effect of rate [F(1, 36) = 26.48, MSE = 0.39, p < .001], interacting significantly with masking condition [Rate × Condition: F(1, 36) = 4.69, MSE = 0.06, p < .05], just as LAMM predicts. A significant Rate × Position interaction [F(4, 144) = 5.38, MSE = 0.07, p < .001] confirmed the apparently greater effect of presentation rate on earlier than on later positions in the list (Bhatarah, Ward, Smith, & Hayes, 2009; Grenfell-Essam et al., 2013). Consistent with prior work (Piquado et al., 2010), the masking of a single word in a list resulted in an overall reduction in recall accuracy relative to the nonmasked control list, as can be seen by the main effect of condition [F(1, 36) = 31.61, MSE = 0.27, p < .001]. We also observed a significant Condition × Position interaction [F(4, 144) = 10.20, MSE = 0.09, p < .001], due to the greater reduction in recall for the masked word relative to the others. Finally, a main effect of position [F(4, 144) = 26.67, MSE = 0.83, p < .001] reflected an expected serial-position effect, with greater recall for the earliest (primacy) and latest (recency) positions.

Of primary interest to this investigation was the interaction between masking and list position. When analyzing individual positions, not only was the masked word itself less likely to be recalled than a similarly placed word in a control list at both presentation rates [slow rate, t(36) = 5.17, p < .001; fast rate, t(36) = 5.65, p < .001], but, at the slow presentation rate, the same was true for the word prior to the masked word [t(36) = 4.60, p < .001]. By contrast, the effect on the prior word at the fast presentation rate failed to reach significance [t(36) = 0.94, p = .38]. We found no significant differences in the recall probabilities between masked and control lists at other positions for either rate.

The prediction of LAMM tested specifically in this study was a greater impact of masking when presentation rate was slow. Indeed, the effect of masking on prior-word recall apparent in Fig. 3 is significantly greater with a slow rate of presentation than with a fast rate. An additional 5 (position: –2, –1, M, +1, +2) × 2 (presentation rate: fast, slow) ANOVA on the individual participants’ recall differences between control and masked trials (the masking deficits) showed the effect of presentation rate on masking [F(1, 36) = 4.69, MSE = 0.12, p < .05]. Again, the masked position showed the greatest deficit, with a significant main effect of position [F(4, 144) = 10.20, MSE = 0.19, p < .001]. Although the difference in masking deficits between presentation rates could not be attributed to any one word position (Rate × Position interaction was not significant, p = .85), it is revealing to note that the mean recall deficits of the masked word (M) were very similar across the two presentation rates (0.129 for the slow rate, and 0.109 for the fast rate), whereas the masking deficit for the prior word (M – 1) was over three times as great with slow presentation (0.082 vs. 0.026).

Model simulations

Figures 3C and D show the model simulations for the slow and fast presentation rates, respectively. The simulations in the control condition for both presentation rates capture the overall shapes of the serial position curves, showing both primacy and recency effects. The effects of masking produced by LAMM are also in agreement with the data. Moreover, when comparing the 2-s condition with the 1-s condition, the model reproduces the observed lower overall recall in control lists when the presentation rate was faster. Finally, LAMM reproduces a critical component of the behavioral findings: that the recall deficit produced by masking was more pronounced with a slower presentation rate.

Lag-conditional response probability (lag-CRP)

“Lag” refers to the relative list positions of successive recall outputs, which can be analyzed to provide an important measure of output ordering. Analyses of recall order under free-recall instructions show a general tendency to make output transitions among nearby list items. This conditional response probability as a function of lag (lag-CRP; Howard & Kahana, 1999; Kahana, 1996) represents the probability that an item from serial position n + lag will be recalled immediately following an item from serial position n. That is, lag-CRPs give the probabilities of making transitions during recall from any one relative word position to another. Prior studies have shown the lag-CRP to have a forward bias, illustrating the effect of positional or temporal contiguity on retrieval transitions, with the “+1” lag being the most common (Howard & Kahana, 1999; Howard, Kahana, & Wingfield, 2006; Kahana, 1996).

Figure 4 shows the lag-CRP curves for the participants’ behavioral data and the LAMM simulations for recall outputs. Lag-CRPs are calculated by considering all recalls of position n, and then computing the probability of transitioning to n + lag, as mentioned above, providing that the latter position has not already been recalled. The set of all possible such transitions is then collapsed relative to the originating word’s position n (denoted “0” on the abscissa).

Fig. 4
figure 4

(A and B) Lag-CRP functions of behavioral data for the masked and control conditions for the slower and faster presentation rates, respectively. (C and D) Simulation results for the slower and faster presentation rates, respectively. Error bars represent one standard error; error bars that do not appear were too small to plot

The upper and lower left panels of Fig. 4 show the CRP curves for the masked and control lists for the slow (Fig. 4A) and fast (Fig. 4B) presentation rates. For both presentation rates and both listening conditions, by far the single most likely transition during recall is the n + 1 transition, in that participants are more than 50 % likely to follow recall of a given word (“0”) with the following word (denoted as “1” on the abscissa).

According to LAMM, the superior overall recall for slower than for faster presentation rates results from the slower rate allowing more time to strengthen associations between items within the buffer. One would thus expect to see an increased n + 1 transition probability for control lists presented at a slow rate as compared to a faster presentation rate. Figures 4A and B show just that. First, for the control lists, the probability of recalling the word in relative position “1” is significantly higher for the slow than for the fast presentation rate [t(36) = 2.27, p < .05]. Moreover, Figs. 4A and B show that at the slower presentation rate, fewer n + 1 transitions occurred when a list contained a masked item than in control lists [t(36) = 3.36, p < .01], whereas no difference in these conditional response probabilities was apparent at the faster rate [t(36) = 0.60, n.s.]. The simulations in Figs. 4C and D also show a decrease in the n + 1 transition probability for masked lists at the slow presentation rate, which, in accordance with the behavioral trend, was stronger than the decrease seen in the masked-list condition with the faster presentation rate.

Individual n + 1 transitional probabilities

In the above analysis, the dominant n to n + 1 transition was significantly affected by masking. To examine n + 1 transitions more closely, Figs. 5A and B show the effect on individual n + 1 transition probabilities of masking and presentation rate, plotted against position relative to the masked word (M). The changes to transition probabilities mirror the decreased recall that we saw in Fig. 4, with larger deficits in transition probabilities due to masking in the slow presentation rate (Fig. 5A) than in the fast presentation rate (Fig. 5B). A 4 (transition: –2 to –1, –1 to M, M to 1, or 1 to 2) × 2 (condition) × 2 (presentation rate) ANOVA conducted on the transition probabilities revealed main effects of transition [F(3, 108) = 22.33, MSE = 0.92, p < .001] and condition [F(1, 36) = 8.82, MSE = 0.23, p < .01], and significant Rate × Condition [F(1, 36) = 4.33, MSE = 0.17, p < .05] and Transition × Condition [F(3, 108) = 3.78, MSE = 0.10, p < .05] interactions. However, no significant Transition × Rate × Condition interaction emerged (p = .97).

Fig. 5
figure 5

Proportions of n + 1 transition probabilities for transitions prior to the masked word (M) and transitions following it, for the masked and control conditions at the slower (A) and faster (B) presentation rates. (C and D) Simulations for the slow and fast presentation rates, respectively. Error bars represent one standard error; error bars that do not appear were too small to plot

It can be seen in Fig. 5A that transitions following the masked word are also reduced. We thus considered the two prior transitions (–2 to –1 and –1 to M) and the following transitions (M to +1 and +1 to +2) separately. The prior transitions showed significant effects of rate [F(1, 36) = 5.97, MSE = 0.14, p < .05], condition [F(1, 36) = 23.26, p < .001], and transition position [F(1, 36) = 12.48, MSE = 0.28, p < .001], as well as a Rate × Condition interaction [F(1, 36) = 5.22, MSE = 0.10, p < .05]. The following transitions showed only a main effect of transition position (p < .001); neither the effects of rate (p = .43) nor masking (p = .82) were significant. Corrected comparisons of all rate condition cases of prior or following transitions confirmed that only the prior transitions in the control lists at the slow rate were significantly different from those in the other trials (at p < .05). The transitions following the masked word (M to +1 and +1 to +2) were not significantly different from the control list transitions for either rate.

Thus, only for the slower rate of presentation were the probabilities of n + 1 transitions significantly reduced by masking, and these were restricted to the positions prior to the masked word: from the word two positions prior to the word immediately preceding the masked word [–2 to –1; t(36) = 4.20, corrected p < .001], and from the word immediately preceding the masked word to the masked word itself [–1 to M; t(36) = 3.88, corrected p < .01], relative to these transitions in control lists (following transitions: p = .42 and .74). At the faster rate of presentation, these effects were attenuated, and none of the contrasts with the control lists reached significance (all corrected p values > .30).

The pattern of interactions between rate and condition revealed by the ANOVA is consistent with LAMM’s prediction that masking causes greater reductions in recall relative to control lists when the presentation rate is slower, and that this should be a consequence of weaker associative links between words as a result of buffer disruption. The significant reductions in the two n + 1 transitions preceding the masked word, when presentation was slow, coincide with the observed reductions in recall for both the masked (M) and preceding (M – 1) positions. Although the masking deficits in the M – 1 position were not significantly different between the fast- and slow-rate lists [t(36) = 1.58, p = .12], the masking deficit in the dominant transition to this word (–2 to –1 transition) was significantly attenuated by the faster presentation rate [t(36) = 2.07, p < .05].

The LAMM simulations are shown in Figs. 5C and D, where they replicate the n + 1 transition pattern seen in the behavioral data: As compared to control lists, the transition probabilities between words prior to a masked word are reduced by a greater amount when the presentation rate is slow rather than fast.

Probability of first recall (PFR)

Figure 6 plots the probabilities that a participant would begin recall with words from the five masked-relative list positions (–2, –1, M, +1, +2) and words in the same relative positions in the control lists (PFR). Figures 6A and B show the PFRs for the slow and fast presentation rates, respectively, and demonstrate that PFRs did not differ between the two rates for control lists, in which none of the words were masked.

Fig. 6
figure 6

Probabilities that participants would begin their recall with the masked word (M), with the word one (–1) or two (–2) words prior to the masked word, or with the word one (+1) or two (+2) words following the masked word. Data are also shown for the probabilities of first recall of items in the same relative positions in the control lists. Panel A shows these data for the slow presentation rate, and Panel B shows these data for the fast presentation rate. Error bars represent one standard error

As can be seen in the upper panel, when participants heard the word lists at the slower presentation rate, the presence of a masked word had a tendency to reduce the probability that recall would be initiated with the word prior to the masked word (–1) when compared to that position in a control list [t(36) = 2.64, p = .06, Bonferroni corrected]. The data for the faster presentation rate show an opposite, but similarly nonsignificant, trend (corrected p > .10).

As we can see in Fig. 6, the contribution of recall initiation to the masked-recall deficit in Fig. 3 was found to be negligible; the incorporation of such differences in PFR into LAMM, instead of LAMM’s masking mechanisms described, was not sufficient to produce any of the observed recall differences caused by masking. This absence of a significant impact of the observed variation in PFR on word recall, combined with the significant changes in transition probabilities described in the previous section, supports the role for interitem links in producing the recall effects of masking, independent of differences in first recall.

Discussion

Over four decades have passed since Rabbitt (1968) reported an intriguing finding when he tested recall for a set of spoken eight-digit lists. The eight items were presented in two groups of four, separated by a 2-s pause, and listeners were instructed to recall either the first or the second group of four digits, signaled after list presentation. In the conditions of interest, Rabbitt (1968) either presented both lists in the clear, or he acoustically masked either the first or the second set of four digits. Of special importance, he took care to ensure that the level of masking allowed for the masked digits to be correctly identified, albeit with some effort. He found that recall of the first half of the list, even when it was presented clearly and without masking, was poorer when the second half of the list was heard in noise than when the second half of the list was heard in quiet. Rabbitt (1968) suggested that the increased effort required to identify the second, noise-masked digits may have deprived the listeners of the processing resources that would ordinarily have been available to effectively encode the first digit set in memory (Rabbitt, 1968).

The studies following Rabbitt’s (1968) seminal experiment have shown that noise masking verbal materials produces a recall deficit relative to similar materials heard in the clear (Farley et al., 2007; Murphy et al., 2000; Surprenant, 1999). As we noted, in these studies all items in the masked sets, whether word lists or paired associates, were uniformly masked. We subsequently found that masking just a single word in a spoken word list was sufficient to have a detrimental effect on recall, not only for the masked word, but also on the word prior to it (Piquado et al., 2010). This is the same asymmetric effect that had been found by Rabbitt (1968), in that masking affected recall of what was heard prior to the masked stimuli, but not those that followed it. This finding was confirmed in the present study: Masking affected recall of the masked word and of the word prior to it, even though the level of masking still allowed for successful, although effortful, recognition of the masked word.

Of critical interest in the present study is the finding that the detrimental, asymmetrical effect of masking is reduced when the presentation rate of the word list is faster. This finding is in stark contrast to expectations based on notions of shared resources or an information-degradation hypothesis (Murphy et al., 2000; Schneider & Pichora-Fuller, 2000; but see also Tulving, 1969). A resource-sharing hypothesis posits that processing the sensory input and encoding what has been heard in memory compete for the same pool of limited resources (Kahneman, 1973). Thus, an impoverished input, such as an acoustically masked word, would increase the resource drain required for recognition of the masked word. The result of this increased resource drain needed for front-end perceptual operations would impair memory encoding (e.g., McCoy et al., 2005). On the other hand, a slow presentation rate would allow for a greater fraction of perceptual processing to be achieved, or to be fully completed, before the masked word arrived. This would lead to a prediction that the negative effect of masked words on prior words would be greater for a faster rate of presentation than for a slower rate, a prediction counter to our obtained findings. Furthermore, neither a limited-resource model nor an information-degradation hypothesis would have any expectations in regard to the finding that the presence of a perceptually difficult masked word would disrupt n + 1 transitions in recall output order.

Within LAMM, the buffer-induced association strength is proportional to the time that the two words spend coactive in the buffer, a principle similar to the TODAM power-set model (Murdock, 1995, 2005). In the absence of any other change in the dynamics of the system, an effect of doubling the time per word is a doubling of the buffer-induced association strengths. As we saw, after optimizing the model to the 1-s data, the model simulations produced good fits to the 2-s data with no additional parameter fitting.

The commonly encountered higher overall recall with a slower presentation rate (Murdock, 1962) follows from a greater time of coactivity in a buffer store for all words, and thus greater strengthening of associations, as demonstrated by the increased n + 1 transition probabilities. The greater negative impact on the word prior to the masked word with a slower presentation rate also follows from LAMM, because the encoding mechanism that increases recall for slower presentation rates is disrupted by masking. That is, the impact of masking is greater in conditions in which strengthening of associations by active maintenance is greater. It is worth noting that this fundamental consequence of altered presentation rate arose in a model designed to explain the effects of stimulus degradation on the recall of lists, with no thought to presentation rate (Piquado et al., 2010).

Although a number of models, such as TCM, give a good account of the dynamics of free recall under ordinary listening conditions (Howard & Kahana, 1999; Howard et al., 2006; Sederberg, Howard, & Kahana, 2008), TCM or other episodic memory models without an active store (e.g., SIMPLE; Brown et al., 2007)—even if they are modified to allow stimulus degradation to impact prior word recall—would also predict a greater, not a lesser, impact of masking with faster relative to the slower presentation rates, an expectation contrary to our present finding. Although dual-store models (Davelaar et al., 2005; Lehman & Malmberg, 2013; Raaijmakers & Shiffrin, 1981) do not consider a case in which stimulus items are unequal in perceptual clarity, we suggest that such models form the best basis for such an extension.

Biological basis for LAMM

Although LAMM is fundamentally a computational model, it was developed to be within the constraints of biological plausibility, in terms of the two neurobiological mechanisms for memory maintenance of synaptic plasticity and persistent activity. The two mechanisms are not independent, and indeed, following Hebb’s (1949) prescient hypothesis, the reverberating activity that persists after stimulus offset (Lorente de Nó, 1933) can produce the correlated spiking necessary for synaptic changes. Here, we incorporated such persistent mnemonic activity in a multi-item buffer and found that a good fit to the behavioral data appeared with the postulate that such persistent activity is disrupted by the perceptual challenge induced by a degraded stimulus.

LAMM was derived from our ongoing cellular-level simulations, a large-scale extension of those in Miller and Wingfield (2010), which comprise two distinct, but connected, neural circuits. One circuit maintains an active neural representation of previously heard items, allowing them to be associated with later items. Their ongoing activity is represented in LAMM by their presence in the memory buffer. Anatomically, this circuit is likely to reside in the prefrontal cortex, on the basis both of human studies (Nyberg, 1998; Wais, Kim, & Gazzaley, 2012) and of findings on mnemonic activity in primates (Funahashi, Bruce, & Goldman-Rakic, 1989; Goldman-Rakic et al., 1990). Alternatively, such buffering could also arise from repeated sequences of neural activation within the hippocampus (Fortin, Agster, & Eichenbaum, 2002; Jensen, Idiart, & Lisman, 1996; Lisman & Idiart, 1995; Winder, Reggia, Weems, & Bunting, 2009) or an interaction between the two (Jensen & Lisman, 2005).

The second circuit is the word-recognition circuit, which is winner-takes-all in nature (Wang, 2002), so that only one item is recognized/perceived at a time. Reactivation of word-representing cells in this circuit effects the recall of a word. Such reactivation can arise either from previously active cells in the same circuit, or via transient associations in an active-store buffer circuit. The word-recognition circuit is most likely distributed across the core left hemisphere language area, which includes inferior frontal gyrus and the posterior portion of the superior temporal gyrus (cf. Gagnepain, Henson, & Davis, 2012; Simos et al., 2009; Weems & Reggia, 2006). These two circuits produce the separate, buffer-mediated contribution and the direct contribution to word associations within LAMM (see also the arguments in J. R. Anderson, Qin, Sohn, Stenger, & Carter, 2003; Davelaar et al., 2005; Davelaar, Haarmann, Goshen-Gottstein, & Usher, 2006; Sohn et al., 2005).

Final remarks

Our focus in this study was specifically on the effects of degraded auditory stimuli, an issue of practical significance, given reports of an increase in the incidence of hearing impairment among university-aged young adults (Shargorodsky, Curhan, Curhan, & Eavey, 2010). For this experiment, we selected a single noise level relative to speech, on the basis of a pilot study. As noted, six potential participants failed to show a negative effect for the masked word itself at the masking level that we employed, defined as individuals whose recall for the masked words was no less accurate than their recall of control words in the same position (all other positional effects, including the prior word, were not considered). Because one needed a masking effect in order to examine how the effects of masking on recall for nonmasked words changed with rate, these participants were excluded from the participant pool prior to undertaking any data analysis. It is likely that adjusting the noise level on an individual basis would have been more effective at causing a masking effect for all listeners. The positive and negative factors thought to affect the ability to separate speech from background noise remain an active research area. Potential factors include individual differences in auditory processing not revealed by traditional pure-tone audiometry (Abel, Krever, & Alberti, 1990; Lorenzi, Gatehouse, & Lever, 1999; Marrone, Mason, & Kidd, 2008; Singh, Pichora-Fuller, & Schneider, 2008), individual differences in working memory capacity (Heinrich, Schneider, & Craik, 2008), and enhanced experience in isolating one sound among many, which, it has been suggested, may include musicians (Parbery-Clark, Strait, Anderson, Hittner, & Kraus, 2011).

Despite such individual differences in the susceptibility to masking effects, the finding that listening to masked speech leads to reduced memory is a robust one (Farley et al., 2007; Murphy et al., 2000; Rabbitt, 1968; Surprenant 1999). Our behavioral results show that disrupted output during free recall significantly contributes to this reduction in memory, as was shown by the reduced n + 1 transitions. This finding that masking a word disrupts the pattern of transitions in recall may be of some generality; serial-recall studies using visual stimuli have shown that many perceptual factors—including word fragmentation (Serra & Nairne, 1993), perceptual interference (Mulligan, 1999, 2000), and irrelevant speech (Beaman & Jones, 1998; Neath, 2000)—can disrupt the order information for items in a list. These findings suggest that, even when perception is successful under impoverished input conditions, it comes at a cost of diminished relational or order information for items.

Disrupted relational information, however, does not always lead to poor memory. Some studies have shown that, if perceptual interference makes the stimulus stand out (e.g., the “von Restorff effect”; Rundus, 1971; von Restorff, 1933), memory for the item is improved, despite poor relational memory (Mulligan, 1999; Serra & Nairne, 1993). Because we took care to prevent a stand-out effect of the masked word, by using a low level of background noise in both control lists and lists containing a masked word, we showed that this reduced relational information leads to poorer memory for the masked and earlier items. The combination of these findings suggests that, whereas order information and item-specific information are distinct, they work together to aid retrieval of items during recall.

In this study, we tested the theory that the effects of masking are due to buffer disruption; when the role of the buffer is minimal, as with a fast rate of presentation, factors that would normally disrupt the buffer should no longer affect performance. In support, our behavioral results showed attenuated recall and transition effects when the presentation rate was fast. Though we were specifically interested in the effects of effortful listening, we assume that many sources potentially contribute to buffer disruption, including increased attentional demands (N. D. Anderson, Craik, & Naveh-Benjamin, 1998), articulatory suppression (Larsen & Baddeley, 2003), and exceeding buffer-capacity limits (Hanley & Bakopoulou, 2003). If so, such factors could lead to diminished memory effects, both for present and for prior information, similar to those we have demonstrated in this article.