The Contribution of Environmental Enrichment to Phenotypic Variation in Mice and Rats

Abstract The reproducibility and translation of neuroscience research is assumed to be undermined by introducing environmental complexity and heterogeneity. Rearing laboratory animals with minimal (if any) environmental stimulation is thought to control for biological variability but may not adequately test the robustness of our animal models. Standard laboratory housing is associated with reduced demonstrations of species typical behaviors and changes in neurophysiology that may impact the translation of research results. Modest increases in environmental enrichment (EE) mitigate against insults used to induce animal models of disease, directly calling into question the translatability of our work. This may in part underlie the disconnect between preclinical and clinical research findings. Enhancing environmental stimulation for our model organisms promotes ethological natural behaviors but may simultaneously increase phenotypic trait variability. To test this assumption, we conducted a systematic review and evaluated coefficients of variation (CVs) between EE and standard housed mice and rats. Given findings of suboptimal reporting of animal laboratory housing conditions, we also developed a methodological reporting table for enrichment use in neuroscience research. Our data show that animals housed in EE were not more variable than those in standard housing. Therefore, environmental heterogeneity introduced into the laboratory, in the form of enrichment, does not compromise data integrity. Overall, human life is complicated, and by embracing such nuanced complexity into our laboratories, we may paradoxically improve on the rigor and reproducibility of our research.


Introduction
Contributions to phenotypic variation are thought to derive not only from genotype but from multiple environmental factors that range from feeding and microbiology, to variables as seemingly simple as housing condition. In experimental research, scientists attempt to control factors presumed to have an impact on biological variation and consequently the reproducibility of their data. One way to control for phenotypic variability in the laboratory is to standardize animal caging systems and limit environmental complexity. Environmental enrichment (EE) is one form of complexity that includes physical, sensory, cognitive, and/or social stimulation which provides an enhanced living experience to laboratory animals, relative to standard housing conditions. The use of EE has become prominent in neuroscience, because of substantial evidence that EE influences structural and functional changes in the brain, in addition to engendering enduring effects on behavior (Nithianantharajah and Hannan, 2006;Kempermann, 2019). Provisioning supplementary resources to animals not only maintains their welfare but promotes more naturalistic species typical behavioral repertoires (Bloomsmith et al., 2018). Moreover, this enhanced rearing condition has been used to study the mitigative potential of the environment in a variety of animal disease models (Nithianantharajah and Hannan, 2006).
Regardless of the purpose of its use, there are questions about potential within-experiment and between-experiment variability that may accompany the addition of environmental complexity to animal laboratory cages (Toth et al., 2011;Bayne and Würbel, 2014;Toth, 2015;Kempermann, 2019;Grimm, 2018;Sparling et al., 2020). It is thought that the diverse phenotypes promoted by EE may lead to data variation within a study. Moreover, the variety in enrichment protocols used may create data variability between studies and laboratories, compromising data reproducibility. Together, these concerns foster arguments to maintain barren cages as the "gold" standard housing condition (Bayne and Würbel, 2014;Voelkl et al., 2020). Importantly, similar justifications (of increased variation) have been used to support the exclusion of studying females in research, because of hormonal fluctuations across the reproductive cycle. However scientific evidence has since shown this perspective to be incorrect (Becker et al., 2016;Beery, 2018).
Given the shifting attention of the scientific community to the topic of rigor and reproducibility (Toth, 2015;Voelkl et al., 2020), this is the perfect time to reconsider our assumptions about variation because of environmental complexity. Standardization of the environment intuitively falls in line with the scientific method. Parsing out contributors of extraneous variation [phenotype (P) = gene Â environmental interactions (G Â E)] is thought to increase statistical power and reproducibility between experiments. On the other hand, such standardization leads to homogeneity in a population and may undermine the robustness of the potential treatment being studied (Kentner et al., 2018; for excellent recent review, see Voelkl et al., 2020), a crucial concern given the disconnect between preclinical and clinical research outcomes (Berk, 2012;Hyman, 2012;Munos, 2013).
Still, to control for potential variability, efforts to standardize the environment continues. These efforts have been complicated by varying definitions of what is enriching to animals of each species, strain, and sex (Simpson and Kelly, 2011;Toth et al., 2011;Toth, 2015), even for standard laboratory housing where only minimal EE is recommended or required. Moreover, a lack of reporting on what types of enrichment protocols are used (e.g., shelters, nesting materials, cage mates, music, food/treats; Toth, 2015) make this task even more difficult. Overall, the differential implementation of EE in experimental design has provoked discussion over the inconsistent definitions and reporting methodology of enrichment use in the neuroscience literature, and whether standardization and minimization of laboratory caging is necessary to prevent further extraneous biological variation (Bayne and Würbel, 2014;Toth, 2015).
Outside of theoretical debates, data on whether EE contributes to the replication crisis, by increasing phenotypic variability and undermining research findings, is mixed (Walsh and Cummins, 1979;Wolfer et al., 2004;Würbel, 2007;Toth et al., 2011;Toth, 2015) and concerns about its use persist (Grimm, 2018). Recently, there has been a call to action suggesting that the question of biological variation and its impact on rigor and reproducibility be extended to the diversification of environmental conditions or "controlled heterogenization" (Voelkl et al., 2020). For example, diversification may be implemented by using different sexes, animal strains, ages, and even housing conditions (e.g., EE) within a study. One way to address the question of variability because of the implementation of EE is to use the methods of others who have conducted large scale evaluations comparing between male and female animals (Becker et al., 2016) and inbred versus outbred strains of mice (Tuttle et al., 2018). Indeed, it has been noted that the EE literature has typically focused on mean (x) differences between groups, rather than evaluating whether EE increases variability specifically (Kempermann, 2019). Of the small subset that have studied variation directly (Wolfer et al., 2004;Würbel, 2007;André et al., 2018) they have so far focused on mice and on a limited number of strains within the confines of their own experiments. To our knowledge, there has been no systematic literature-wide evaluation of multiple traits comparing EE to standard housed groups across species.

Materials and Methods
To evaluate whether EE housed rats or mice display increased phenotypic variability in neuroscience research, we conducted a systematic review and compared the coefficient of variation (CV), a measure of trait-specific variability, extracted from data where EE animals of either sex were directly compared with a standard (control) housed condition on the same trait. First, to determine the general scientific interest in EE protocols, the proportion of articles published each year, using the search term "environmental enrichment" was identified in PubMed (Sperr, 2016).

Search strategy
Both PubMed and EMBASE were searched from the period of January first, 2013 to September 5, 2018, the date when these searches were initiated. The period evaluated is comparable to other important systematic reviews that assessed phenotypic variability (Becker et al., 2016). We used the search terms (1) EE AND (2) electrophysiology OR (3) brain OR (4) behavior OR (5) "nervous system physiological phenomena," which yielded 3650 articles (Fig. 1).

Study selection
After duplicates were removed, evaluators independently identified studies eligible for inclusion in a 2-step process. First, we conducted an abstract and title search. If insufficient details were provided in the titles and abstracts, then the study was selected for full text review. Eligibility was based on (1) article relevance to the subject Research Article: New Research matter of interest (EE), (2) studies using any animal species including humans, (3) observational and experimental studies, and (4) English-written articles only. Exclusion criteria consisted of reviews, meta-analyses, case studies, conference abstracts, protocols, editorials, comments, and non-English articles. Overall, the articles included in this systematic review were primarily from the fields of neuroscience and animal welfare (see Fig. 2; Extended Data Figs. 2-1, 2-2).

Data extraction
Of the 963 articles identified as using EE in any species, a subset of 681 articles were identified as using mice or rats and were further evaluated on their use of several methodological variables including sex, types of enrichment devices employed, in addition to social structure of the EE group and composition of the control conditions used (e.g., running wheel, isolated, social/group housing). Phenotypic variability was also evaluated on the rat and mouse studies identified as using traditional EE caging systems ( Fig. 2A). For these analyses, 281 studies were evaluated based on meeting the inclusion criteria of providing means and standard deviations (or standard errors) that could be extracted from the article, and sample sizes for at least one EE and one control group (Fig. 1). We also identified whether EE and control groups were naive or "treated/manipulated" (e.g., drug treated, knockout models, surgery etc.). Studies with parental exposure to EE were excluded to control for potential confounds of parental care (Connors et al., 2015), as were studies where it was unclear whether control animals were singly or socially housed. To avoid oversampling (Tuttle et al., 2018), we limited data collection to the first three reported measures where data and error bars were clearly legible. Each measure was categorized similarly to how others had done previously (Becker et al., 2016;Tuttle et al., 2018) by using behavior/CNS, behavior/other, anatomy, immune system function, organ function, molecules, and electrophysiology as traits. Generally, the behavior/CNS category included measures where animals demonstrated some type of learning, discrimination, or what could be considered more complex sequences of behaviors. Examples from our dataset include time spent with a novel object or social conspecific, sniffing duration, and duration of social contact (e.g., discrimination and preference tests). Number of lever responses, conditioned place preference scores, latency to locate a platform in the Morris water maze, % fear generalization, % freezing time, % sucrose preference, number of reference memory errors etc. were also included in this category. In contrast, the behavior/other category represented measures such as time spent in the center of the open field, frequency of crossings into the open center or periphery, and distance traveled in the open field. Anatomy included measures like the length or volume of brain regions (e.g., dendritic length, corpus callosum thickness). Immune system function as a category included measures such as flow cytometric analysis of CD40 on peritoneal macrophage, tumor volume or weight. We also placed plasma cytokine levels into this category. The organ function category included heart rate, changes in arterial blood P02, PC02, and pH, as well as fasting blood glucose levels. Molecules included any other measures of protein or mRNA, for example. These latter measures were primarily localized to the brain in our dataset.
In total, there were 1130 direct comparisons of CVs between EE and control animals included here (618 naive pair comparisons and 512 manipulated/treated pair comparisons; Fig. 1). The number of articles included, and direct comparisons made, in our analyses surpassed other excellent systematic reviews evaluating phenotypic variability (Becker et al., 2016;Tuttle et al., 2018). Therefore, we have an adequate sample size to make appropriate conclusions. Data were extracted from graphs provided on digital PDF articles (using https://rhig.physics.yale.edu/;ullrich/software/xyscan/), or directly from tables (Tuttle et al., 2018). Graphical data extractions were performed by two trained researchers. Inter-rater reliability was assessed, and Pearson r correlation was determined to range from 0.912 to 0.997.

Statistical analyses
CVs were calculated as standard deviation divided by the mean and compared using paired t tests (for individual trait evaluations), or ANOVA (for multiple trait evaluations). Pairwise comparisons were done using the Tukey's multiple comparisons test (Howell, 2001;Becker et al., 2016). The partial h 2 is also reported as an index of effect size for the ANOVAs (the range of values being 0.02 = small effect, 0.13 = moderate effect, 0.26 = large effect; Miles and Shevlin, 2001). To determine whether the distribution of variation differed by environmental complexity, we calculated EE to control ratios of CV = [(CV EE )/(CV EE 1 CV control )]. CV ratios for each trait were tested as a function of housing complexity against the theoretical mean 0.5 by t test (Becker et al., 2016;Beery, 2018). Data were considered significant if p , 0.05.

Results
Using the term "environmental enrichment," we identified the proportion of articles indexed in PubMed each year from 1998 to 2019 (Sperr, 2016). One report has previously evaluated the number of articles published from 1960 to 2009 (Simpson and Kelly, 2011). In this work, it was demonstrated that an increased interest in EE emerged between 1990-1999 and 2000-2009. Here, we provide a replication and extension of those data from 1998 until 2019. Our search, including both review and empirical research articles, highlights a continuation of the increasing interest on this topic, relative to the number of total articles published (Fig. 2B).
The results of our analyses demonstrate patterns of experimental biases, specifically a heavy reliance on the use of rats and mice over other laboratory species (Fig. 2C), and the continued exclusion of females in EE research ( Fig. 2D; Simpson and Kelly, 2011). Our findings also show a range in the definition of EE used across laboratories in that the frequency of enrichment types, timing, and the social structures implemented varied widely (Fig. 3A-F). The use of toys (including plastic or wooden), bones/ chews, house hideaways, or tubes/pipes and tunnels, in addition to a larger cage space and social conspecifics were more frequently used in the enrichment housing conditions. Supplementary bedding/nesting materials and ramps/ladders or perches were less commonly used, as were swings, ropes and chains (Fig. 3A).
One issue that arose was a significant lack of reporting on several variables. This prompted us to develop a reporting table for describing key aspects of enrichment use in research (Table 1), following suit with other initiatives to improve on animal model reporting (Kentner et al., 2019). As part of this table, we suggest authors report whether they are providing EE animals with manufactured/artificial enrichment devices or more natural stimuli as there are differences in animal phenotypes depending on these devices (Hess et al., 2008;Lambert et al., 2015).
Using paired t tests, we found no differences between EE and standard housed mice or rats on CVs across traits (p . 0.05), regardless of control housing type (e.g., running wheel, isolated, social/group housing) or whether animals were naive or manipulated/treated (e.g., drug treated, knockout models, surgery). Therefore, we collapsed and analyzed both species together. When species were combined, the treated/manipulated social/ group housed controls (0.65 6 0.073) were more variable than their manipulated/treated EE counterparts (0.59 6 0.050; t (46) = 2.211, p = 0.032) on the "behavior other" trait only. Isolated control animals (0.24 6 0.079) had higher CVs than treated/manipulated EE animals on the anatomy trait (0.019 6 0.072; t (4) = 4.720, p = 0.009). However, for the anatomy trait the number of available comparisons between these two groups was not sufficiently powered (n = 5 comparisons based on three articles). In general, we did not find EE to increase trait variability compared with any control housing type in either naive or manipulated/treated animals (p . 0.05).
Although the inclusion of female animals was demonstrably lower than males to be able to make adequately powered comparisons on many traits (Fig. 2D), we conducted some preliminary sex difference analyses. Our subanalyses revealed that naive male EE rats (0.60 6 0.10) had higher CVs than their naive social/group housed controls (0.39 6 0.18; t (32) = À2.266, p = 0.030, based on 18 articles) on the "behavior other" trait, but were not more variable on any other trait (p . 0.05). There were no further differences in variability between EE and control animals across any combination of sex, strain, control type, or naive versus treated/manipulated animals.

Discussion
Our findings should resonate well with neuroscientists who would like to increase complexity in laboratory caging systems, promoting more naturalistic species typical behaviors and brain functioning, but who have been concerned about compromising data integrity and their control over environmental conditions. This should be especially salient given that lack of enrichment in laboratory cages leads to suppression of behavioral repertoires, increased stereotypies, and a reduction of general activity level, even during an animals' active phase (Hurst et al., 1997). Indeed, deprivation in the environment is known to impact the structure and functioning of the brain, affecting cognition and behavior (Lahvis, 2017;McLaughlin et al., 2017). This underscores the view that our current standard laboratory housing condition is not a true control condition. Cage enrichment is recommended in the Guide for the Care and Use of Laboratory Animals (National Research Council, 2011), and for standard housed rodents typically takes the form of sanitizable polyvinyl chloride (PVC) tubes, a chew bone, or a piece of nesting material. If the animal is lucky, they may receive a combination of two or three pieces of these enrichment devices. To be frank, the composition of this housing condition needs a major renovation. Seldom do these cage enrichment objects change across the course of the study; novelty and increased stimulation are luxuries afforded to animals reared in classic EE (see Fig. 2A). This latter housing condition is rarely used as a standard in the laboratory; when employed, EE is usually for the purpose of exploring mechanisms underlying neural plasticity, or to mitigate some type of toxic insult (Nithianantharajah and Hannan, 2006). The availability of resources is a major restriction to increasing stimulation in the animal laboratory. It will require a change in the mindsets of institutions, scientists, and funding bodies to make this housing condition, or an adapted version, the new "gold standard." Some solutions to address cost, physical space, as well as personnel constraints to implementing higher levels of enrichment have been discussed elsewhere (Kentner et al., 2018) and are outlined below. Still, the direction of funds to establish more complex housing conditions for laboratory animals should be part of the movement to improve scientific rigor and reproducibility.
Another important hurdle to the implementation of EE is concerns about phenotypic variability because of increased heterogeneity. While we identified some increased variability in naive male EE rats on measures such as distance traveled and open field, most studies evaluated used some type of experimental treatment/manipulation which did not affect phenotypic variability on any trait. Moreover, others have reported no differences in variability on these types of measures, when associated Table 1: Environmental enrichment reporting guidelines checklist. The recommended use of this reporting form is to fill it out and include it as supplemental material for each of your laboratory's environmental enrichment research publications. This document can also be used as a guide for including details of cage enrichment for studies using only standard laboratory housing. If there are difficulties using/adapting this form, please contact one of the corresponding authors to request a copy.
From a purely scientific perspective, EE can mitigate the effects of several experimental treatments and animal models of disease (Nithianantharajah and Hannan, 2006) and is often interpreted as a beneficial intervention (Sparling et al., 2020). However, this calls into question the external validity of these apparent context specific effects (Bernard, 2019;Manouze et al., 2019) and the robustness of our animal models; a clear example of fallacious reasoning (Bernard, 2020). Indeed, incorporating more environmental heterogeneity into neuroscience research, and testing our findings against such complexity, should increase the robustness of our experimental designs and the fidelity of biomedical treatments (Kentner et al., 2018;Voelkl et al., 2020), without compromising the underlying stability of data. Our study supports this idea given that traditional EE caging systems are dynamic environments where devices are being replaced or are changing location as animals interact and move them. Moreover, social experiences are varied for each animal. Specifically, experiences both between and within EE cages are unique, yet complex housing does not make animals any more variable compared with standard laboratory housed rats or mice. Importantly, the increased use of EE and improved robustness of experimental design should be less costly in the long run. This contrasts with a continued reliance on standard laboratory housing, which is clearly not a true control condition and appears to impede the translation of research results.
Going forward, it will be necessary to identify appropriate enrichment types for the species, sex, and age of the model organism of interest, in addition to the animal model/paradigm being used, and to accurately report their use (Simpson and Kelly, 2011;Toth, 2015;Kentner et al., 2018). Importantly, there are proposed methodologies for how to implement and account for such environmental variation (Voelkl et al., 2020). Overall, human life is complicated and by embracing such nuanced complexity into our laboratories we may paradoxically improve on the rigor and reproducibility of our research.

Data availability
All data are available on request.

Code availability
There is no code associated with this work.