Reactivation-Dependent Amnesia for Contextual Fear Memories: Evidence for Publication Bias

Abstract Research on memory reconsolidation has been booming in the last two decades, with numerous high-impact publications reporting promising amnestic interventions in rodents and humans. However, our own recently-published failed replication attempts of reactivation-dependent amnesia for fear memories in rats suggest that such amnestic effects are not always readily found and that they depend on subtle and possibly uncontrollable parameters. The discrepancy between our observations and published studies in rodents suggests that the literature in this field might be biased. The aim of the current study was to gauge the presence of publication bias in a well-delineated part of the reconsolidation literature. To this end, we performed a systematic review of the literature on reactivation-dependent amnesia for contextual fear memories in rodents, followed by a statistical assessment of publication bias in this sample. In addition, relevant researchers were contacted for unpublished results, which were included in the current analyses. The obtained results support the presence of publication bias, suggesting that the literature provides an overly optimistic overall estimate of the size and reproducibility of amnestic effects. Reactivation-dependent amnesia for contextual fear memories in rodents is thus less robust than what is projected by the literature. The moderate success of clinical studies may be in line with this conclusion, rather than reflecting translational issues. For the field to evolve, replication and non-biased publication of obtained results are essential. A set of tools that can create opportunities to increase transparency, reproducibility and credibility of research findings is provided.


Introduction
Amnesia for previously acquired memories can be obtained by applying certain treatments shortly before or after memory reactivation. After the first published observation of reactivation-dependent amnesia, which was obtained by giving rats an electroconvulsive shock after a brief, unreinforced re-exposure to a conditioned tone (Misanin et al., 1968), this procedure was conceptually replicated using a wide variety of experimental protocols and treatments in different species (Reichelt and Lee, 2013;Beckers and Kindt, 2017). Overall, research on reactivation-dependent amnesia, commonly referred to as "reconsolidation blockade," has accelerated during the last two decades, with high-impact publications reporting promising amnestic interventions in rodents and humans (Nader et al., 2000;Kindt et al., 2009).
Meanwhile, studies revealed that memory reactivation does not occur each time a memory is retrieved but depends on the conditions under which the memory was acquired and retrieved (e.g., memory strength, age, type, or the amount of novelty introduced during the reactivation session; Tronson and Taylor, 2007). Apart from those controlled studies that examined limiting factors on memory destabilization, there have been only few papers reporting failures to obtain reactivation-dependent amnesia under standard conditions. Published failures mainly involved pharmacologically-induced amnesia in human participants (Bos et al., 2014;Thome et al., 2016;Schroyens et al., 2017) or the retrieval-extinction effect (Soeter and Kindt, 2011;Luyten and Beckers, 2017;Chalkia et al., 2020a), whereas the literature on pharmacologically-induced amnesia for fear memories in rodents shows robust and consistent amnestic effects, with hardly any failures to replicate.
In our lab, we aimed to further investigate opportunities for the clinical application of reactivation-dependent amnesia for fear memories. To this end, we set out to replicate published studies in which systemic drug injection after unreinforced re-exposure to a conditioned stimulus (CS) in rats resulted in amnesia for contextual or cued fear memories (Schroyens et al., 2019a,b;Luyten et al., 2020).
In contrast to what is reported in the literature, our extensive series of conceptual and exact replication attempts, performed by several experimenters and in different laboratories, did not provide clear evidence for reactivationdependent amnesia. In fact, we could only reproduce the amnestic effect when the experimenter from the original study (now a member of our research group) conducted the study in the original lab, with animals purchased from the local supplier [see Table 1 for an overview of our exact replication attempts using contextual fear conditioning and midazolam (MDZ)]. Given that we tested a wide range of behavioral parameters and (sometimes exactly) adhered to the standard protocols that have been typically used in the literature, it is unlikely that our negative results can be explained by previously established limiting factors on memory destabilization.
Overall, it can be concluded that the experimental evidence obtained in our replication attempts is not in line with the general representation of amnesia by postreactivation systemic drug injection in the literature. Our five failed exact replication attempts using contextual fear and MDZ (Table 1) suggest that the outcome of the procedure depends on delicate, unknown, and possibly uncontrollable parameters. Therefore, it seems unlikely that the high rate of large amnestic effects that is portrayed by the current literature is a reliable representation of actual observations. Based on the discrepancy between our results and those from published studies, we suspect that (1) amnestic effects are less easily replicated than what is currently suggested by the literature and thus (2) the large effect sizes that are reported in the literature are merely a subset of the range of effect sizes that have effectively been observed. We hypothesize that these issues arise from the omission of negative findings in the published literature (i.e., reporting and publication bias).
The main aim of the current paper was to assess whether indeed the literature on pharmacologically-induced reactivation-dependent amnesia for contextual fear memories in rodents shows evidence of publication bias. The first part of this project, which we completed before preregistration, consisted of an exploratory assessment of publication bias in the sample of published studies that used postreactivation systemic injection of MDZ. Given that our ultimate aim was to investigate whether publication bias applies to the field in a broader sense, rather than for just one (systemically injected) drug, we performed a preregistered systematic review of the literature on pharmacologically-induced reactivationdependent amnesia for contextual fear memories in rodents. Publication bias in this larger sample was assessed statistically, and relevant researchers were contacted to enquire about and request unpublished datasets. The obtained results contribute to a clearer view on the robustness of reactivation-dependent amnesia for contextual fear memories in rodents.

Materials and Methods
Relevant datasets, R scripts and overview tables of all included studies can be found on the Open Science Framework (OSF) at https://osf.io/apu9t/ (DOI 10.17 605/ OSF.IO/APU9T).

Systematic literature review
We performed a literature search through the online database of PubMed using the Boolean search terms '(context OR contextual) AND (fear OR aversive OR threat) AND (memory OR learning) AND (reconsolid* OR reactivat* OR destabili*)' to look for relevant published papers concerning drug-induced reactivation-dependent amnesia for contextual fear memories in rodents. After obtaining in-principle acceptance for the present study, the systematic review was registered at PROSPERO, in which we further specified that "pharmacological manipulations" do not entail genetic manipulations and, given that we investigate "reactivation-dependent amnesia," we only consider treatments that were aimed at inducing amnesia (these criteria were implied, but not explicitly mentioned, in the Stage 1 Registered Report).

Inclusion criteria
Experiments were included when meeting all of the following criteria (related to each element of the PICO framework): (1) Population. Rats or mice of either sex were used.
(2) Intervention. Contextual fear conditioning [i.e., one or multiple unsignaled shock(s) administered in the training context] and, afterward, a pharmacological manipulation was applied once before or after a brief unreinforced re-exposure to the training context that is commonly referred to as "contextual fear memory reactivation." Experiments were included regardless of the mode of drug administration. (3) Control group. A negative control group was included, in which subjects received a memory reactivation session combined with vehicle administration, or in which the drug of interest was administered without receiving a memory reactivation session. If multiple negative control groups were used, the most-commonly used control was considered, which appeared to be the vehicle control. Experiments that did not include such a control group, but, for example, only a positive control condition (i.e., in which the treatment of interest is compared with a "gold standard" treatment) were not included in the meta-analyses because of a lack of appropriate control. (4) Outcomes. A behavioral measure of fear or anxiety (e.g., freezing) was included during drug-free testing for long-term memory retention (at least one day after reactivation). If multiple tests were performed, only the results of the first drug-free long-term retention test were included. (5) Studies for which we were unable to calculate the effect size from reported graphs or statistics are addressed in the paper but not included in the metaanalyses. We excluded from the meta-analysis those "boundary" conditions in which amnesia is not expected to occur based on theoretical considerations and prior empirical observations concerning reactivation-dependent amnesia. As mentioned in the introduction, it is established that reactivation-dependent amnesia occurs only under certain theoretically-grounded circumstances. For example, it has been found that the success of obtaining reactivation-dependent amnesia depends on memory-related characteristics (such as its age or strength), the use of (stressful) interventions before learning, the conditions under which memory is retrieved (e.g., properties and duration of the reactivation session), the timing of drug application (e.g., not too long before/after the reactivation session), and the time of retention testing (e.g., amnesia is not expected to be observed immediately after the intervention). Importantly, we did not aim to investigate the presence of null findings obtained under those boundary conditions. Rather, we wanted to assess whether negative results have been obtained (and possibly suppressed) in situations where amnesia was expected to occur (i.e., under standard conditions). However, given that these limiting conditions are not absolute (i.e., they can be In these experiments, the methodology of the original studies was followed as closely as possible. All studies involved contextual fear conditioning, followed one day later by brief (i.e., 2, 3, or 5 min) unreinforced re-exposure to the conditioned context and systemic injection of MDZ (1.5 or 3 mg/kg) or saline (SAL), and retention testing one day later. Exact replication of Espejo et al. (2016) and Ortiz et al. (2015). The table indicates whether the experimenter and the lab in which the replication study was performed were the same as in the published, original study. The results show that the amnestic effect could not be replicated when the study was performed in a different lab or by another researcher, despite adherence to the experimental protocol of the original studies. These findings illustrate that the success of treatment may depend on subtle between-study differences, and the underlying causes of these failures to replicate remain unknown. Obtained power for our sample sizes, shown in the last column, was calculated using the smallest effect size (Hedges' g) that was observed in the original studies (a = 0.05; d = 1.71 in Alfei et al., 2015;1.31 in Stern et al., 2012;2.91 in Espejo et al., 2016; see Table 2). A complete description of our replication attempts involving contextual fear can be found in Schroyens et al. (2019a,b overcome and seem to interact with each other), it is impossible to predefine a comprehensive set of exact conditions in which amnesia is (not) expected to occur. Therefore, the experimental parameters of all experiments that fulfill the criteria stated above (see 'Inclusion criteria') were summarized in an overview table and reviewed independently by two other researchers to select relevant studies to be included in the meta-analyses. Both researchers have experience in the topic, were blinded for study outcome, and judged inclusion based on the guiding principles listed below. Given the widely accepted boundary conditions for fear memory destabilization, memories should be recent (,7 d) at the time of reactivation, and the reactivation session should take less than two times the duration of the training session. For studies that explicitly aimed to investigate conditions that were expected to impede reactivation-dependent amnesia (such as, for example, stress manipulations before learning), only the "positive control" condition (in which the effect was expected to occur) are included, whereas the conditions under investigation are excluded (regardless of their outcome, given that the selecting researchers were blinded). Negative control conditions that are commonly used in the investigation of amnesia, such as delayed treatment application (.1 h after termination of the reactivation session) or short-term memory tests, were excluded from the meta-analysis. Nevertheless, as mentioned earlier, any conditions that met the first four inclusion criteria listed in the previous paragraph were included in an overview table (https://osf.io/x2pkq/). This way, a thorough overview of all adopted experimental parameters and boundary conditions is provided. Articles were selected based on the abstract, and the methods section was screened as well if the abstract provided no information on the inclusion of a contextual fear memory procedure and/or insufficient information regarding drug application. Afterwards, the full text of the selected articles was screened to further assess eligibility (see https://osf.io/qebtd/ for a detailed overview of the review process). The summary table providing detailed information on experimental parameters for all included studies can be found at https://osf.io/sjwbd/. Based on this information, two blinded researchers further selected conditions for inclusion in the meta-analysis. For the purpose of restricting the amount of papers to be included in the meta-analysis and the number of researchers to be contacted (see 'Acquiring unpublished data' below), we limited the scope of the meta-analysis to the most commonly-used drugs to induce reactivation-dependent amnesia for contextual fear memories. Therefore, only studies that met all above-mentioned inclusion criteria (see 'Inclusion criteria') and used drugs that appeared in five or more research articles [i.e., anisomycin (ANI), MDZ, MK-801, and propranolol (PROP)] were included in the meta-analyses.

Calculation of Hedges' g
Means and SDs were estimated from reported descriptive and test statistics or from reported graphs using WebPlotDigitizer (Rohatgi, 2019). If only the overall sample size of a study was provided and the group sizes could not be derived, we assumed that subjects were equally divided among the groups. Hedges' g (with correction for small-sample bias) and corresponding SE were calculated based on our estimates from means, SDs, and group sizes using the metafor package in R. The Stage 1 version of the Registered Report mentioned "Cohen's d" instead of "Hedges' g." However, we decided to use Hedges' g (which is the default output of the adopted escalc function of the metafor package when calculating standardized mean differences), because this measure corrects for small-sample bias. In any case, we did compare both measures of effect sizes for all included studies and found highly similar estimates.

Meta-analysis
We used the metafor package in R to fit meta-analytic random-effects models, using restricted maximum likelihood (Viechtbauer, 2010). Measures of between-study variation include t 2 , I 2 , and Cochran's Q test. Research group and amnestic drug were included as moderators in case of significant between-study heterogeneity. Importantly, rather than estimating the size of the amnestic effect or investigating moderators, our goal was to assess whether the overall sample of published studies is subject to publication bias.

Publication bias
The funnel plot, in which the effect estimate for each study (here, the standardized mean difference) is plotted against a measure of precision of that study (here, the SE of the standardized mean difference as suggested by Sterne and Egger, 2001), is a primary visual tool to assess publication and other biases Peters et al., 2008). Observed effect sizes such as standardized mean differences are unbiased estimates of the population effect size regardless of the sample size, but the effect sizes obtained by studies with relatively small precision are in general more variable than those from studies with higher precision. As a result, in the absence of bias, those small-precision studies (i.e., lying at the bottom of the plot) are expected to scatter more widely compared with large-precision studies (lying at the top of the plot), resulting in a symmetrical funnelshape of the dots in the plot. However, if small studies with non-significant results remain unreported or unpublished, we can expect a gap (located at the left bottom side in case of a positive true effect size) and the funnel shape can thus become asymmetrical. Egger's linear regression approach was used to assess such plot asymmetry. We used a weighted regression of the effect estimates on their SEs, including a multiplicative dispersion parameter .
Although funnel plots and Egger's regression are standard tools for the assessment of publication bias, it should be noted that publication bias is not the only possible cause of funnel plot asymmetry or a relationship between study precision and effect size . For example, between-study heterogeneity in itself may lead to funnel plot asymmetry because of an accidental correlation between precision and effect size or because of a confounding effect of study characteristics. Such heterogeneity can pose a challenge for funnel plot interpretation. Consider, for example, the research group in which the experiments were performed: certain environmental or methodological differences between research groups may lead to differences in both observed effect sizes and precision of studies (researchers that obtain large effects might evolve to using smaller samples in their future studies). In order to account for such heterogeneity in observed effect sizes, research group was included as a moderator. The meta-analytic model without moderators was used for the creation of the funnel plots (which allowed for plotting of the raw effect sizes rather than their residual values), whereas the model with moderators was used for Egger's linear regression (allowing to statistically test for funnel plot asymmetry after accounting for the influence of the moderator(s) included in the meta-analytic model). For the sake of completeness, results of all regression models (with and without moderators) can be found at https://osf.io/zshwx/.
Apart from publication bias and genuine between-study heterogeneity, other sources of reporting bias (e.g., selective outcome or analysis reporting), suboptimal design and/or analyses used in smaller studies, and artefactual sampling variance may also lead to non-asymmetric funnel plots (Sterne et al., 2011). One way to discriminate publication bias as a source of asymmetry in funnel plots from other factors is by using contour-enhanced funnel plots (Peters et al., 2008). Contour-enhanced funnel plots, in which levels of statistical significance are displayed (i.e., ,0.01, ,0.05, and ,0.1), were therefore used to visualize whether publication bias is a likely factor contributing to funnel plot asymmetry (Peters et al., 2008).

Acquiring unpublished data
All corresponding authors from the selected articles and other relevant researchers were contacted via E-mail to enquire about and request unpublished datasets. In addition, announcements were spread using StudySwap (Chartier et al., 2018), conference mailing lists, and social media (Twitter, ResearchGate, etc.). Obtained unpublished datasets that met the inclusion criteria stated above (see 'Inclusion criteria') were included in funnel plots to get an indication of the precision, obtained effect sizes, and statistical significance of these unpublished results. In addition, Egger's regression was repeated using the total sample that includes published as well as unpublished datasets.

Pilot data
Before preregistration of the current study, we completed some exploratory analyses. In the course of reporting some of our replication efforts (Schroyens et al., 2019a), we performed a thorough literature search for studies that, like in our experiments, had used contextual fear conditioning and postreactivation systemic injection of MDZ to induce amnesia for a previously acquired fear memory in adult rats. We found 15 published papers (until April 2019; see Table 2) and conducted a randomeffects meta-analysis using this sample (adhering to the inclusion criteria and statistical analyses outlined in the methods section above). The same analyses were also performed on datasets from our own replication studies (Schroyens et al., 2019a,b), in which highly similar procedures and parameters were used.
Each of the 15 published papers contained at least one study in which an amnestic effect was found. Some papers included conditions that aimed to test limiting factors on reactivation-dependent amnesia, i.e., (stressful) interventions before learning (Zhang and Cranney, 2008;Bustos et al., 2010;Ortiz et al., 2015;Espejo et al., 2016Espejo et al., , 2017, the use of remote fear memories (Bustos et al., 2009), reactivation durations that yield inadequate levels of prediction error (Alfei et al., 2015), or drug injection outside the reconsolidation time window (Bustos et al., 2006;Stern et al., 2012). Based on the inclusion criteria described in the methods section, those conditions in which amnestic effects were not hypothesized to occur, were not included in the present meta-analysis given that we aimed to study the occurrence of reactivation-dependent amnesia under optimal standard conditions. Included studies in which no amnestic effects were found used either relatively brief (i.e., 1 or 1.5 min) or long (i.e., 10 min) reactivation sessions. A complete overview of experimental parameters adopted in each of the studies can be found on our OSF page at https://osf.io/sjwbd/. If multiple intervention groups were compared with the same control group, the intervention groups were combined into a single group as recommended by Higgins and Green (2011).
Published studies from other research groups versus our own replication attempts using MDZ: a first indication of publication bias An extensive literature search for studies using contextual fear conditioning and systemic MDZ injection after memory reactivation revealed 15 papers, containing a total of 33 comparisons (postreactivation MDZ vs SAL) that fulfilled the standard conditions for memory destabilization (Table 2). Visual inspection of the funnel plot including these experiments suggests asymmetry (Fig. 1, left panel). The random-effects meta-analysis on this sample (k = 33, total N = 549) showed considerable betweenstudy heterogeneity [Q(32) = 164.10; p , 0.001; t 2 = 1.76 (SE = 0.55) [0.98; 3.45]; I 2 = 81. 84% [71.45; 89.83]], implying differences between studies beyond those to be expected by chance. Given such heterogeneity, and as preregistered, research group was included in the model as a moderator. We found that the effect sizes plotted in Figure 1,  suggesting that reported effect sizes differ significantly between research groups, but these between-group differences cannot fully explain all of the observed heterogeneity between studies. When statistically assessing the relationship between the effect estimates and their SEs (i.e., the relation represented in the funnel plots), we used the meta-analytic model with the moderator to test for funnel plot asymmetry after accounting for the influence of research group. Doing so, Egger's test provided statistical evidence for funnel plot asymmetry (t (27) = 5.02; p , 0.001), which can be an indication of publication bias.
A similar random-effects meta-analysis on the replication studies from our group (k = 27, N = 324) showed different results ( Fig. 1, right panel). No signficant between-study heterogeneity was observed [Q(26) = 34.13; p = 0.132; t 2 = 0.08 (SE = 0.12) [0.00; 0.60]; I 2 = 19.34% [0.00; 62.95]]. Visual inspection of this plot shows a different pattern than the one obtained for previously published studies, as our studies seem to be scattered more symetrically. Nevertheless, Egger's regression test indicated a significant negative relationship between the effect estimates and their SEs (t (25) = À2.46; p = 0.021), possibly because of the use of a small sample size of 4 rats/group in a few of our studies (i.e., those represented on the bottom left of the graph) providing inaccurate effect estimates.
The funnel plot and Egger's test thus clearly reveal asymmetry in the published studies, which might indicate publication bias. One way to discriminate publication bias as a source of asymmetry in funnel plots from other factors is by using contour-enhanced funnel plots (Peters et al., 2008). The contour-enhanced funnel plot indicates that nearly all published studies report significant results (i.e., studies plotted to the right of the white area are statistically significant in a onetailed test; Fig. 2, left). The fact that studies seem to be missing in the white area of the plot suggests that suppression of non-significant results is likely a factor contributing to funnel plot asymmetry. Again, this plot is in stark contrast to the one displaying our replication studies, in which most studies yielded non-significant results (Fig. 2, right).
Published studies from other research groups versus our replication attempts using MDZ: the subtle nature of reactivation-dependent amnesia Comparing both funnel plots (Fig. 1) not only suggests publication bias, but also illustrates that the effect sizes obtained in our studies were never as large as in published research. As mentioned earlier, we hardly found statistical evidence for the presence of (large) amnestic effects, while published studies show quite the opposite pattern; they suggest that negative results are rarely obtained in this field (Fig. 2). This highlights that, even in our exact replication attempts, there were inherent differences between our own and published studies that determined the success of the intervention. In line with this Figure 1. Funnel plots including published studies (left panel) and our own replication studies (right panel) in which MDZ was used as amnestic agent. Each point represents an observed effect size Hedges' g against its SE. Visual inspection of the plot on the right panel shows that our replication studies are symmetrically scattered around the effect estimate of 0.04, indicating that the estimated effect size is close to zero and suggesting that no trend in one particular direction was observed across studies. In contrast, the plot of published studies (left panel) clearly shows asymmetry, and the reported effect sizes seem to depend strongly on the research group in which the studies were performed (represented by the different symbols in the left plot). Egger's test confirmed plot asymmetry (p , 0.0001), even when considering the moderating influence of research group. One should be careful to attach value to the estimated effect size shown in the left funnel plot, given the evidence for publication bias and because the nesting of studies within research groups is not accounted for. The funnel plots were based on the meta-analytic models without moderators. Symbols represent the research group in which each study was performed (left panel) or the lab space that was used (right panel). Note that three of our exact replication studies (right panel) were performed in the same lab space as some of the original, published studies (left panel). observation, the meta-analysis on the sample of published studies showed that the research group in which experiments were performed significantly affected the obtained effect size (see previous paragraph). Such dependence highlights the subtle nature of reactivation-dependent amnesia and raises the question whether other research groups also conducted unsuccessful attempts to obtain amnestic effect (but never published those findings). The results of our preregistered analyses (see results section below) address this question in depth and provide more insight into the overall robustness of postreactivation amnesia for contextual fear memories in rodents.

Funnel plots and Egger's regression suggest publication bias
The preregistered systematic PubMed search identified 304 articles, 89 of which met our inclusion criteria. The wide range of drugs that has been used with the purpose of inducing reactivation-dependent amnesia for contextual fear memories in rodents is shown in Table 3 (systemic administration) and Table 4 (intracranial administration). The scope of the meta-analysis was narrowed down to reactivation-dependent amnesia induction under standard conditions and with commonly-used amnestic drugs (i.e., those that appeared in five or more of the identified research articles, which were found to be ANI, MDZ, PROP, and MK-801). "Standard" conditions were defined based on theoretical considerations in line with the fear memory reconsolidation account (see Inclusion criteria). The final sample that was included in the preregistered metaanalysis consisted of 52 research articles, containing a total of 77 experiments and 95 drug-vehicle comparisons. It should be noted that one of those papers, i.e., Espejo et al. (2016), was not identified via the systematic PubMed search, but given that the study was already included in the pilot study, we also included it in the present analyses. In addition, we did not include Schroyens et al. (2009a,b; those were included in separate analyses; see Figs. 1, 2, right panels) as we aimed to review the literature that originated from outside our own research group. A detailed overview of experimental parameters used in the studies that were included in the meta-analyses can be found at https:// osf.io/sjwbd/.
The random-effects meta-analysis on this sample (k = 95, N = 1896) showed heterogeneity in effect estimates between studies [i.e., variation in effect estimates beyond chance; Q(94) = 334.08; p , 0.001; t 2 = 0. . I 2 , the percentage of the variability in effect estimates that is because of heterogeneity rather than chance, decreased from 75% (considerable) to 65% (substantial) after inclusion of the moderators (Deeks et al., 2019). The funnel plot that includes all 95 drug-vehicle comparisons suggests asymmetry (Fig. 3), which was confirmed statistically by Egger's test (t (60) = 5.04, p , 0.001 for the model including drug and research group as moderators). As mentioned before, one way to distinguish publication bias from other sources of asymmetry is by adding contours of statistical significance to the funnel plot. Such a contour-enhanced funnel including all published drug-vehicle comparisons (Fig. 4) illustrates that studies are missing in the area of statistical non-significance, adding credibility to publication bias being a source of asymmetry. In addition, effect sizes are most densely plotted in the gray area at the border of statistical significance, which might suggest a biased distribution of effect sizes (Simonsohn et al., 2014). Overall, the results based on all selected published studies (using ANI, MDZ, PROP, or MK-801) are in line with those from our pilot study (which only included MDZ), as evidence for publication bias was observed in both sets of analyses.
The majority of the included published drug-vehicle comparisons (i.e., 82%) was reported as statistically significant, and over 90% of the published papers concluded that amnesia could be obtained under at least some of the applied standard conditions (i.e., conditions in which the amnestic effect is expected to occur based on theoretical considerations; see Inclusion criteria). From the 12 published papers reporting non-significant amnestic effects under standard conditions, eight papers did find amnesia in some of the applied standard conditions (i.e., when changing the duration of the reactivation session, when administering ANI instead of PROP, or when infusing the amnestic drug into a different brain area), which leaves a total of four papers (including six comparisons) that found no amnestic effect under standard conditions whatsoever. Most of them did, however, obtain amnesia when using multiple injections (albeit temporarily; Lattal and Abel, 2004), when using a knock-out mice model (Yamada et al., 2009), or when postreactivation MK-801 injection was preceded by prereactivation injection of the cannabinoid CB1 receptor agonist arachidonyl-2-chloroethylamide (ACEA; Lee and Flavell, 2014). Only one of the included papers reported an overall failure to induce amnesia (using PROP; Careaga et al., 2015).

Funnel plots and Egger's regression suggest publication bias when excluding MDZ studies
Below, we report the results of additional analyses that were not part of the preregistered analysis plan, but that allow for a clearer interpretation of the current findings. Visual inspection of the funnel plot including all studies (Figs. 3, 4) seems to suggest that MDZ studies (plotted in black) strongly contribute to the asymmetrical funnel shape, or, in other words, to the observed correlation between the effect sizes and their SEs suggestive of publication bias. Therefore, we exploratorily repeated the analyses excluding the MDZ studies, to assess whether the same conclusions would still hold when solely looking at the three other amnestic drugs. In addition, we repeated the analyses for each drug separately. The plot with ANI, PROP, and MK-801 studies (i.e., excluding MDZ; Fig. 5) still showed an asymmetrical funnel shape, which was confirmed statistically by Egger's regression, even when taking into account the moderating influence of drug and research group (t (32) = 2.41, p = 0.022), albeit to a lesser extent compared with when MDZ was included. When inspecting the results for each drug individually [see "5. Funnel plots per Drug (exploratory analyses)" at https://osf.io/ zshwx/], asymmetrical funnel shapes were observed for MDZ, ANI and PROP, but not for MK-801. In addition, asymmetry was no longer observed for ANI (t (12) = 0.95, p = 0.362) or PROP (t (10) = 1.04, p = 0.321) when including research group as a moderator, implying evidence for overall asymmetry for ANI and PROP, but no evidence for asymmetry within research groups.

Unpublished data contain proportionally more failures to replicate than published data
We contacted all corresponding authors from the research articles that were included in the meta-analysis and sent out an E-mail to the Pavlovian Society mailing list to enquire about and request unpublished datasets. In addition, a request for unpublished data were posted on StudySwap (https://osf.io/98dr6/wiki/home/) and ResearchGate (https:// bit.ly/34xllde). Figure 6 provides an overview of the received responses.
Most researchers did not reply to our emails or replied that they did not have any unpublished data available (note that those two "replies" together comprised 60% of the responses). Some researchers provided information about unpublished studies that did not meet all our inclusion criteria. For example, three researchers replied having unpublished reconsolidation data from studies using cued fear memories. There were also three cases in which contextual fear conditioning was used, but the reactivation session was too long for inclusion (i.e., longer than two times the duration of the training session). For two of these, the outcome was also shared, one in which amnestic effects of PROP and MDZ were found and another one in which no effect of PROP was found. In another series of studies, an excluded amnestic agent, i.e., cycloheximide, was used. A dose of cycloheximide that was found to affect retention when injected after conditioning did not induce amnesia when given after a memory reactivation session despite varying the parameters of training (0.5-or 0.7-mA shocks) and reactivation (3 or 5 min) in a series of four studies described in an undergraduate student's report (Zacouteguy Boos et al., 2013). Researchers from three different research groups reported to have an (extensive) series of unpublished studies meeting all our inclusion criteria but wished not to share the data for inclusion in the current analyses. Finally, three researchers (from three different research groups) offered to share their data but did not manage to timely access and/or send those data. We did receive unpublished data that could be included in the current meta-analyses from seven researchers from five different research groups (a total of 12 drug-vehicle comparisons). Importantly, the amount of unpublished data that we could include in the current manuscript is less than half of all the unpublished data disclosed to exist to us by the contacted researchers. Overall, it appears that statistically non-significant results from reconsolidation studies in rodents are less likely to be published and, in some cases, researchers were unable or reluctant to share such "negative" data for the current paper.
The obtained unpublished studies that met all our inclusion criteria are plotted in combination with the published data (Fig. 7) and alone (Fig. 8). A total of 12 drug-vehicle comparisons was included, in which either MK-801 (six studies), PROP (three studies), or MDZ (three studies) was administered before or after a contextual fear memory reactivation session. One (MK-801) study contained two intervention groups that were compared with the same control group, so the intervention groups were combined into a single group as recommended by Higgins and Green (2011). A detailed overview of the adopted parameters of those studies can be found at https://osf.io/gfwrj/. Additional details for each study, including PubMed ID, strain, duration of the reactivation session (ranging from 30 s to 10 min), time of drug administration, and time between training and reactivation session (ranging from 1 to 36 d), are available at https://osf.io/x2pkq/. Ara-C = 1-b -D-arabinofuranosylcytosine triphosphate; CBD = cannabidiol; DA = dopamine; DDTC = diethyldithiocarbamate; GRP = gastrin releasing peptide; NE = norepinephrine; PEPA = 4-[2-(phenylsulfonylamino)ethylthio]À2,6-difluorophenoxyacetamide; s.c. = subcutaneous; THC = D9-tetrahydro-cannabinol; 1 = at least one study reported a statistically significant amnestic effect; * = amnestic effect was found to be transient; -= at least one study reported a non-significant effect; 1/À = at least one study observed that the amnestic effect occurred under some conditions: 1 depending on training parameters (e.g., shock intensity). 2 depending on memory age. 3 depending on reactivation duration. 4 depending on drug dose. The funnel plot including all studies (Fig. 7) still shows asymmetry after inclusion of the obtained unpublished data (t (70) = 5.63, p , 0.001; with drug and research group as moderators). This was not unexpected given that we probably did not track down all existing unpublished data and because a large part of the unpublished data that we did uncover were eventually not shared by the authors for inclusion in the current paper. Importantly, studies that previously remained unpublished show smaller and mostly statistically insignificant effect sizes compared with those reported in the literature (Fig. 7). Although the limited amount of unpublished data does not allow for robust conclusions, the symmetrical funnel shape that is observed when plotting unpublished datasets only (Fig. 8)

Discussion
Our own extensive experience with drug-induced, reactivation-dependent amnesia for contextual fear memories in rats (Schroyens et al., 2019a,b) suggested that amnestic effects are not easily found, even when performing well-powered, exact replication attempts of published "positive" studies and trying out a wide variety of experimental parameters in several different laboratories. Of Additional details for each study, including PubMed ID, strain, duration of the reactivation session (ranging from 1 to 10 min), time of drug administration, and time between training and reactivation session (ranging from 1 to 36 d), are available at https://osf.io/x2pkq/. ACC = anterior cingulate cortex; ALLN = N-Acetyl-Leu-Leu-norleucinal; BLA = basolateral amygdala; CeA = Central amygdala; dHipp = dorsal hippocampus; D-AP5 = D-2-amino-5-phosphonovaleric acid; DRB = 5,6-dichloro-1-b-dribofuranosylbenzimidazole; Ent = entorhinal cortex; IL = infralimbic cortex; i.c.v. = intracerebroventricular; MC = motor cortex; mPFC = medial prefrontal cortex; PL = prelimbic cortex; RSC = retrosplenial cortex; 1 = at least one study reported a statistically significant amnestic effect; * = amnestic effect was found to be transient; -= at least one study reported a non-significant effect; 1/À = at least one study observed that the amnestic effect occurred under some conditions (superscripts see Table 3).  . Contour-enhanced funnel plot including published studies suggests publication bias. The white area and the region on its left side contain studies with statistically non-significant amnestic effects based on one-tailed tests (drug , control; p ! 0.05). The plot suggests that non-significant studies are missing in the literature (i.e., publication bias). Remarkably, effect sizes are most densely plotted at the border of statistical significance, which might also imply biased effect sizes.
note, we also failed to obtain amnestic effects using behavioral or pharmacological interventions for cued fear memories in rats (Luyten and Beckers, 2017;Luyten et al., 2020) or healthy human participants (Schroyens et al., 2017;Chalkia et al., 2019Chalkia et al., , 2020a. Those observations, although corroborated by personal communication with experts in the field, were in stark contrast with the published literature, which contains a plethora of significant (mostly large) amnestic effects and hardly any negative results. This discrepancy inspired us to formally investigate publication bias. We performed a systematic PubMed search and selected studies that aimed to induce reactivation-dependent amnesia for contextual fear memories in rodents under standard conditions with a commonly-used amnestic drug (i.e., ANI, MDZ, PROP, or MK-801; see above, Inclusion criteria). The majority of the 95 included published drug-vehicle comparisons (i.e., 80%) was reported as statistically significant and funnel plots and Egger's linear regression provided evidence for publication bias in this sample. Only one of the included papers reported an overall failure to induce amnesia (Careaga et al., 2015). In contrast, the data that we received from previously unpublished studies mostly consisted of "negative" findings, as around 80% did not find a statistically significant amnestic effect. This discrepancy between published and unpublished results further supports the presence of publication bias. It should be mentioned that part of the unpublished experiments that we were informed of could not be included in the current study due the inability or reluctance of some of researchers to share relevant information about their unpublished findings. In any case, the current results suggest that the literature on reactivationdependent amnesia for contextual fear memories in rodents is biased.
Possible sources of publication bias can be found at different stages, such as author submission, peer review or editorial decisions (in which the journal's policy may play a role; Song et al., 2009). Authors' decisions not to submit their negative results for publication can result from (1) the fact that such results are considered unimportant, (2) fear of debunking own previously-published results, theories or conclusions, or (3) the expectation of rejection by (prestigious) journals. Importantly, in the presence of publication bias, the published studies as a whole do not provide solid evidence concerning the reliability of reactivation-dependent amnesia. Selective publication of research findings depending on their statistical outcome results in the literature painting an overly optimistic picture, with misleading overall estimates of the size and replicability of amnestic effects. This false image, in turn, may result in researchers investing time and resources on an effect that seemed to be robust but may turn out to be non-replicable or, at least, difficult to replicate.
Based on the evidence for publication bias provided here and the results of our empirical studies in which no evidence for reactivation-dependent amnesia was obtained (Schroyens et al., 2019a,b), we do not claim that such phenomenon for contextual fear memories in rodents does not exist, nor do we intend to doubt the veracity of the published studies included here; but we do conclude that drug-induced reactivation-dependent amnesia for contextual fear memories in rodents is far less robust than what is projected by the existing literature. In light of other empirical studies from our and other labs that reported failures to replicate, the same may apply to cued fear memories in rodents (Luyten and Beckers, 2017;Luyten et al., 2020) and healthy humans (Bos et al., 2014;Thome et al., 2016;Schroyens et al., 2017;Chalkia et al., 2019Chalkia et al., , 2020a. We want to point out that the intuitive reasoning of an effect being truly existent based on it being reported many times can be problematic, as it has been suggested that such counting ignores reporting  . Contour-enhanced funnel plot still suggests publication bias when MDZ studies are excluded. The white area and the area on its left side contain studies with statistically non-significant amnestic effects based on one-tailed tests (drug , control; p ! 0.05). The asymmetrical funnel shape observed here was statistically confirmed by Egger's regression (p , 0.001; or with Research Group and Drug as moderators: p = 0.022; exploratory analysis) and is suggestive of biased study outcomes because of selection of significant results for publication. bias, selection bias and questionable research practices (Hedges and Olkin, 1985;Vadillo et al., 2016). Likewise, non-significant results should be interpreted as the absence of evidence for rather than the evidence of absence of a treatment effect (Taleb, 2007) and the observation of a statistically non-significant result should not be equalized with the underlying theory being wrong (Meehl, 1990).
It is good to note that publication bias is probably by no means unique to the reconsolidation field; it is likely to hinder accurate estimation of effect sizes for many other (behavioral) phenomena as well. In this paper, we focused on a delineated part of the reconsolidation literature to systematically investigate publication bias, allowing us to illustrate the existence and pervasiveness of publication bias in this particular research domain. The obtained results provide us with a clearer view on the potential translational value of reactivationdependent amnesia for fear memories. We strongly believe that other research areas may also benefit from systematic investigations that (dis)confirm (1) the existence of publication bias and, if applicable, (2) shed light on its extent.
It should be noted that publication bias is only part of the story. Our own failures to exactly replicate prior "positive" studies already suggested that study outcome could depend on the lab in which the study was performed, or at least, that the outcome depends on subtle and unknown factors that differ between labs. In line with our experiences, the current meta-analysis suggested that the size of the amnestic effect depends on the research group in which the experiment was performed. Nevertheless, also within research group and amnestic drug, statistically significant between-study heterogeneity was observed, suggesting that observed effect sizes show differences beyond those to be expected by chance. Such heterogeneity indicates that the size of the amnestic effect, even under standard conditions, is expected to vary significantly. In combination with the current evidence for publication bias and the range of identified null findings, this implies that the outcome of postreactivation amnestic treatments is unpredictable. In addition, dominant theories (e.g., reconsolidation, state dependency) in their current form are unable to pinpoint which factors exactly influence the occurrence and size of reactivation-dependent amnestic effects. The need to define moderators of amnestic effects that has often been mentioned in reply to replication failures might be interesting for further development or refinement of theories on reactivation-dependent amnesia provided that data-driven moderators are also empirically tested.
The lack of robustness of reactivation-dependent amnesia, in combination with the strict (but vague) conditions that are required for memory destabilization and the absence of clear explanations for some of the observed null effects, cast doubt on the potential of the proposed clinical application of postreactivation interventions for the treatment of phobias or posttraumatic stress disorder (PTSD). Indeed, studies in (sub)clinical samples have not been entirely convincing (Brunet et al., 2011;Wood et al., 2015;Kindt and van Emmerik, 2016;Elsey et al., 2020; for an overview, see Beckers and Kindt, 2017). Importantly, the mixed results obtained in clinical studies might not reflect issues with translation from basic to clinical science but may simply reflect the lack of robustness of results obtained in basic research and illustrate the lack of insight  , studies that remained unpublished (12 drug-vehicle comparisons) showed smaller, and mostly non-significant, amnestic effects. This discrepancy between published and unpublished results is in line with the presence of publication bias that was suggested by the funnel plots. The majority of unpublished "negative" studies of which the existence was revealed could not be included in the current study because of author preferences.
in the optimal and boundary conditions for reactivationdependent memory interference.
For the field to evolve, replication and non-biased publication of obtained results is essential. The classical publication system clearly favors the publication of novel or "positive" results, but there is a set of valuable new tools that create opportunities to increase transparency, reproducibility and credibility of research findings. For example, documentation of hypotheses, research design, and/ or planned analyses on a public repository before commencing data collection, referred to as "preregistration," ensures a clear distinction between hypotheses and/or analysis plans that were formulated before versus after observing the results and can be made publicly accessible on paper publication (Nosek et al., 2018). The OSF is an online platform that can be used for such preregistration and for the sharing of data, analyses scripts, etc. (http://osf.io). Making datasets and analysis scripts publicly available provides the opportunity to be transparent and enhance credibility of one's obtained results and conclusions (Klein et al., 2018). Nevertheless, while valuable, those tools mostly provide a means to an end, as verification of agreement between registered and performed analyses must be assured and analytic reproducibility of published results needs to be checked. One valuable publication format in this regard is the Verification Report, in which authors of an empirical article reanalyze the original study data using the reported analyses to verify whether the same conclusion can be drawn as those reported in the original article (Chambers, 2020;see Chalkia et al., 2020b for an example from the reconsolidation field). Finally, the use of Registered Reports, in which inprinciple acceptance for publication is granted before data collection for a study commences, assures inclusion of study results in the published record on the basis of quality of the methods, regardless of a study's outcome (Hardwicke and Ioannidis, 2018). This format removes the pressure to come up with statistically significant findings for publication and prevents publication bias (https://cos. io/rr provides helpful guidelines for the submission of a Registered Report and an extensive list of participating journals). Researchers can thus take the opportunity of using those tools to increase transparency and reproducibility, both of which are essential for the reconsolidation field (and empirical science in general) to move forward.