Combining Animal Welfare With Experimental Rigor to Improve Reproducibility in Behavioral Neuroscience

Loss, Cássio Morais; Melleu, Fernando Falkenburger; Domingues, Karolina; Lino-de-Oliveira, Cilene; Viola, Giordano Gubert

doi:10.3389/fnbeh.2021.763428

OPINION article

Front. Behav. Neurosci., 30 November 2021
Sec. Pathological Conditions
Volume 15 - 2021 | https://doi.org/10.3389/fnbeh.2021.763428

Combining Animal Welfare With Experimental Rigor to Improve Reproducibility in Behavioral Neuroscience

Cássio Morais Loss^1,2*†

Fernando Falkenburger Melleu^3†

Karolina Domingues^4†

Cilene Lino-de-Oliveira^5†

Giordano Gubert Viola^6†

¹Molecular and Behavioral Neuroscience Laboratory, Departamento de Farmacologia, Universidade Federal de São Paulo, São Paulo, Brazil
²National Institute for Translational Medicine (INCT-TM), National Council for Scientific and Technological Development (CNPq/CAPES/FAPESP), Ribeirão Preto, Brazil
³Departamento de Anatomia, Instituto de Ciências Biomédicas, Universidade de São Paulo, São Paulo, Brazil
⁴Instituto de Investigação e Inovação em Saúde, Universidade do Porto, Porto, Portugal
⁵Departamento de Ciências Fisiológicas do Centro de Ciências Biológicas, Universidade Federal de Santa Catarina, Florianópolis, Brazil
⁶Independent Researcher, Mossoró, Brazil

Introduction

Reproducibility is an essential characteristic in any field of experimental sciences, this feature provides reliability to the experimentally obtained findings (for details, see Glossary). The currently available empirical estimates on the topic suggest that less than half (ranging from 49% down to 11%) of scientific results are reproducible (Prinz et al., 2011; Begley and Ellis, 2012; Freedman et al., 2015, 2017). While it can be argued that the accuracy of these estimations needs confirmation, we (as a scientific community) have to recognize that poor reproducibility is a major problem in the life sciences.

The perception of an undergoing “reproducibility crisis” has led to the establishment of crowdsourced initiatives around the world addressing reproducibility issues in sciences, such as behavioral neuroscience (Open Science, 2015; Freedman et al., 2017; Reproducibility Project and Cancer Biology, 2017; Amaral et al., 2019). Among the explanations for poor quality in published research, there is the prevalent culture of “reporting positive results” (publication bias) and the high incidence of diverse types of experimental bias, such as lack of transparency and poor description of methods, lack of predefined inclusion and exclusion criteria resulting in unlimited flexibility for deciding which experiments will be reported, insufficient knowledge of the scientific method and statistical tools when designing and analyzing experiments (Ioannidis, 2005; Cumming, 2008; Sena et al., 2010; Freedman et al., 2017; Vsevolozhskaya et al., 2017; Ramos-Hryb et al., 2018; Catillon, 2019; Neves and Amaral, 2020; Neves et al., 2020). Further discussions on the causes, consequences, and actions to overcome poor research practices and reproducibility in sciences are many (Altman, 1994; Macleod et al., 2014; Strech et al., 2020) and beyond the scope of this text. Here, we focus on the aspects relevant to the field of behavioral neuroscience, whereby poor research performance may affect not only the economic and translational aspects of science but also implies ethical issues once it involves necessarily living subjects, mostly laboratory animals (Prinz et al., 2011; Begley and Ellis, 2012; Festing, 2014; Freedman et al., 2015; Voelkl and Wurbel, 2021).

In our opinion, combining principles of animal welfare with experimental rigor may lead to improvement in the quality of studies in behavioral neuroscience. Hence, we will briefly discuss how adherence to legislations, guidelines, and ethical principles in animal research may guide more rigorous behavioral studies. Thereafter, we condense discussions on how (1) the better understanding of the conceptualization, validation, and limitations of the animal models; (2) the use of suitable statistical methods for study design and data analysis; and (3) the use of environmental enrichment in research facilities to favor welfare of animals may improve quality of studies in behavioral neuroscience (some practical tips in Table 1) and, hopefully, the reproducibility in the field.

TABLE 1

Table 1. Practical tips combining animal welfare and experimental rigor to improve reproducibility in behavioral neuroscience*.

Advantages of the Adherence to the Regulations to the Quality of Behavioral Neuroscience

Behavioral studies in laboratory animals are performed worldwide under specific guidelines conciliating the needs of science, scientists, and animal welfare (Smith et al., 2018). Regulations establish obligations and responsibilities for institutional actors involved in animal experimentation, from students to deans (please consult one’s own institution about regulations applied to a project). Here, we claim that, besides being ethical, adherence to the regulations is advantageous to the quality of behavioral studies. Why? Because, regulations in animal research consider, among other things, the 3Rs principle (replace, reduce, and refine), which are the useful frameworks to prepare good quality experiments taking animal welfare into account, as discussed by previous authors (e.g., Franco and Olsson, 2014; Bayne et al., 2015; Aske and Waugh, 2017; Strech and Dirnagl, 2019) and in the further sections. “Replace” prompts scientists to consider alternatives to behavioral studies in laboratory animals for reaching a giving aim, in the first place. Once a behavioral study in laboratory animals is considered necessary, “reduce” may guide designs using well-established rules for rigorous experimentation to extract the maximum information of a study with a minimum number of subjects. The principle “refine” assists scientists to devise better strategies guaranteeing animal welfare according to species-sex-age-specific needs. There is evidence that “happy animals make better science” (Poole, 1997; Grimm, 2018). Besides, poor welfare in laboratory settings affects the laboratory animals in unpredictable, and often deleterious ways, compromising behavioral outcomes in the experiments (e.g., Emmer et al., 2018), and increasing the number of experimental animals unnecessarily. Therefore, personnel handling animals (experimenters, technicians, and caregivers) may contribute to the efforts to minimize the risk of animal suffering during procedures improving research quality. There are many free resources for training staff in the 3Rs principle made available by international organizations, such as NC3Rs¹ or Animal Research Tomorrow,² which could be easily implemented in behavioral studies.

Suitable Animal Models and Behavioral Tests Should Improve Studies in Behavioral Neuroscience

The selection of an adequate animal model is a pivotal step in behavioral studies. Physical models (Godfrey-Smith, 2009) are central tools in neuroscientific research. Neuroscientists commonly employ in vivo animal models, aimed to simulate physiological, genetic, or anatomical features observed in humans (as is the case with studies of disease) or replicate natural situations under controlled laboratory conditions (van der Staay, 2006; Maximino and van der Staay, 2019). By definition, a model is a construct of a real physical component or property observed in nature. Therefore, a model is always imperfect and does not contemplate the full complexity of the real system that is being modeled (Garner et al., 2017). Much has been discussed about the validity and translational potential of animal models (Nestler and Hyman, 2010). Here, our aim is to consider how the misuse of animal models may affect the reproducibility and reliability of neurobiological research results. Firstly, there appears to be confusion about the definition of animal models and behavioral tests (Willner, 1986) that ultimately causes the misinterpretation of results. Animal models deliberately prompt changes in biological variables (such as behavior), while behavioral tests are paradigms in which animal models are subjected to having their behavior assessed. By this definition, a behavioral bioassay (an intact animal plus an apparatus) is not a model in a strict sense (van der Staay, 2006; Maximino and van der Staay, 2019), although useful to study normal animal behavior (e.g., exploration of a maze and immobility in forced swim test) and its underlying mechanisms (Maximino and van der Staay, 2019; de Kloet and Molendijk, 2021). Secondly, it is important to be aware of the conditions validated for the test because modifying some of them (e.g., light intensity or animal species/strain) may yield different results than those observed in the standardizations for the test (Griebel et al., 1993; Holmes et al., 2000; Garcia et al., 2005). For example, the dichotomic behavioral outcome (mobility or immobility) of mice is often registered in the tail suspension test. However, some mice (e.g., C57BL/6 strain) also present climbing behavior which may be mistaken by immobility (Mayorga and Lucki, 2001; Can et al., 2012). Third, we have to avoid the extrapolation of simple behavioral measures (those variables that we actually measure in a task) to complex multidimensional abstract behaviors (e.g., anxiety, memory, locomotor, and exploratory activities). For example, measuring only distance traveled (or the number of crossings) in an open field arena is not sufficient to fully capture the complexity of locomotor behavior (Paulus et al., 1999; Loss et al., 2014, 2015). Therefore, it alone does not provide enough information to make conclusions about locomotor activity, a multidimensional behavior that encompasses not only how much an animal moves (distance traveled and locomoting time) but also how it moves (average speed, number of stops made, among others) (Eilam et al., 2003; Loss et al., 2014, 2015). This extrapolation becomes even greater when we think about exploratory activity, which encompasses locomotor activity and other behaviors (such as time and frequency of rearing) (Loss et al., 2014, 2015). Similarly, Rubinstein et al. (1997) observed that mutant mice lacking D4 dopamine receptors moved less in the open field arena but outperformed their wild-type littermates in the rotarod test, which highlights that we cannot conclude much about motor function by measuring only the distance traveled (even if the amount of movement registered is similar between the groups). Finally, it is imperative to know whether the animal model we intend to test meets the assumptions of the behavioral paradigm (or our study hypothesis) that it will be tested. For example, animals with compromised mobility (e.g., models for spinal cord injury) will not provide meaningful results in tests that rely on preserved motor function (e.g., forced swim test, elevated plus maze). Similarly, subjecting a pigeon to the Morris water maze may lead one to conclude that pigeons have poor spatial memory. But, pigeons do not swim in the first place making the last experimental proposal not just inappropriate but absurd. Hence, knowledge about the biology of laboratory animals seems fundamental to the selection of a suitable approach for an intended behavioral study.

Rigorous Design of Studies and Analysis of Data Should Improve the Quality of Behavioral Neuroscience

Limited knowledge of the scientific method and statistics are among the reasons for the high levels of experimental bias and irreproducibility (Ioannidis, 2005; Lazic, 2018; Lazic et al., 2018) leading ones to suggest that we are actually facing an “epistemological crisis” (Park, 2020). Several guidelines for experimental design, analysis, and reporting are available (see Festing and Altman, 2002; Lazic, 2016; Percie du Sert et al., 2020), describing rigorous methods that should be adopted to avoid bias achieving high-quality data production. However, it seems that some of the most basic good practices described in these guidelines have been neglected or ignored (Goodman, 2008; Festing, 2014; Hair et al., 2019). Some frequent sources of biases are pseudoreplication (Freeberg and Lucas, 2009; Lazic, 2010; Lazic et al., 2020; Eisner, 2021; Zimmerman et al., 2021) and violations of rules for experimental design, such as a priori calculating the sample size, unbiased allocation of samples to groups (randomization), blinded assessment of outcomes, complete reporting of results, and choosing the method for data analysis beforehand (Macleod et al., 2015). The lack of a rigorous plan results in the massive production of underpowered exploratory studies (Maxwell, 2004; Button et al., 2013; Lazic, 2018), with the aggravating factor that they are often misinterpreted as confirmatory studies ones (Wagenmakers et al., 2012; Nosek et al., 2018). It is not unusual to find discussions about the so-called “statistical trend” in studies in which both biological effect sizes and sample sizes are assumed post hoc. In addition, the extensive practice of exclusively using linear models (such as Student’s t-test or ANOVA) to analyze the data, assuming that all variables present Gaussian distribution, contribute to the misinterpretation of results (Lazic, 2015; Eisner, 2021). Currently, there are alternative methods that we strongly suggest to be incorporated in research projects by the whole neuroscientific community. For example, Generalized Linear (Mixed) Models and Generalizing Estimating Equations (GLM, GLMM, and GEE, respectively) fit distinct types of distribution (such as the Gaussian distribution) and correct for confounding factors (Shkedy et al., 2005a,b; Lazic and Essioux, 2013; Lazic, 2015, 2018; Bono et al., 2021; Eisner, 2021; Zimmerman et al., 2021). Adopting randomized block experimental designs (that are more powerful, have higher external validity, and are less subject to bias than the completely randomized designs typically used in behavioral research) is also necessary for controlling confounding factor-related variability and producing more reproducible results (Festing, 2014). Considering the use of multivariate statistical tools (instead of the widely used univariate approach) is an alternative to achieve more accurate outcomes from experiments with big data, especially in behavioral studies (Sanguansat, 2012; Loss et al., 2014, 2015; Quadros et al., 2016). Among the advantages of using these alternative approaches is the increased accuracy in parameter estimation (thus avoiding making impossible predictions), resulting in reduced probability of making Type I Error (due to invalid estimation of p-values, for example) and Type II Error (due to lack of statistical power). Rigorous design of studies and analysis of data should help to extract the maximum information of a study with the adequate calculated number of subjects and prevent waste of scientific efforts in behavioral neuroscience. In addition, rigorous and systematic reporting of methods (with enough details to allow replication) and results (with complete description of effect sizes and their confidence intervals rather than uninformative p-values) are also necessary to increase transparency and, consequently, the quality of the studies (Halsey et al., 2015; Halsey, 2019; Percie du Sert et al., 2020).

Environmental Enrichment in Research Facilities May Favor Translational Neuroscience

As mentioned, “Happy animals make better science” (Poole, 1997; Grimm, 2018). It is a worldwide acknowledgment that environmental stimulus is necessary to improve the quality of life and welfare of captive animals, such as research animals. It has been more than a decade since the Directive 2010/63/EU was established (EC, 2010). However, this and other directives are far from being effectively complied with by the entire scientific community. A common non-tested argument to raise research animals in impoverished standard conditions is that the data variability among laboratories, or even within them, would increase by raising the animals in enriched non-standard conditions (Voelkl et al., 2020). This last claim has been criticized over the past two decades and suggested to be a fallacy (Wolfer et al., 2004; Kentner et al., 2021; Voelkl et al., 2021). For example, Wolfer et al. (2004) and Bailoo et al. (2018) observed that data variability did not increase after raising the animals in enriched environments when compared with raising them in standard laboratory environments. Furthermore, Richter et al. (2011) found that rearing animals in enriched environments decreased variation between experiments, strain-by-laboratory interaction on data variability. In other words, heterogenized housing designs appear to have improved data reproducibility. Therefore, it was claimed (and we agree) that we should embrace environmental variability (instead of static environmental standardization) because environmental heterogeneity better represents the wide variation (richness and complexity) of mental and physical stimulations in both human and non-human animals (Nithianantharajah and Hannan, 2006; Richter, 2017). In fact, drug development and discovery may be affected by the culture of raising animals in impoverished (extremely artificial) environments. There are studies showing that some drugs present biological effects when tested in animals raised in impoverished environments but not in animals raised in enriched environments (which is more similar to real-life conditions) (Akkerman et al., 2014; Possamai et al., 2015). Furthermore, we cannot disregard that more pronounced effects could be found whether drugs were tested in animals raised in enriched when compared to impoverished environments (Gurwitz, 2001). While one can argue that there are not enough studies strengthening this assertion, the low quality of life of captive animals, the low reproducibility of studies, and the poor translational rate of preclinical research reinforce the necessity of a paradigm shift related to the welfare of animals (Akkerman et al., 2014; Voelkl et al., 2020). This debate should not be restricted to rodents and shall include avians (Melleu et al., 2016; Campbell et al., 2018), reptiles (Burghardt et al., 1996), fishes (Turschwell and White, 2016; Fong et al., 2019; Masud et al., 2020), and even invertebrate animals (Ayub et al., 2011; Mallory et al., 2016; Bertapelle et al., 2017; Wang et al., 2018; Guisnet et al., 2021). We bring two practical examples (or recommendations) of improvements that we (the neuroscientific community) could do: (1) when using animal models we should implement environmental enrichment as the standard in the animal facilities (especially for those animal models that attempt to simulate central nervous system disorders), as raising animals in impoverished environments provides suboptimal sensory, cognitive and motor stimulation, making them too reactive to any kind of intervention (i.e., “noise amplifiers”) (Nithianantharajah and Hannan, 2006); (2) when proposing alternative organisms to study behavior (e.g., zebrafish), we should learn from past and present mistakes (mostly in rodents), keeping in mind the ethological and natural needs of the species (Branchi and Ricceri, 2004; Lee et al., 2019; Stevens et al., 2021). Importantly, when making these improvements we should carefully respect the species-specific characteristics. For example, rats and mice share some characteristics, such as nocturnal habits (which means that both species need places to hide during the light period, to provide a sense of security) (Loss et al., 2015). However, they also have some distinct characteristics, such as the need for running (which is higher in mice) (Meijer and Robbers, 2014). This means that providing running wheels for mice is really necessary, while for rats, (that run less but are more social than mice) (Kondrakiewicz et al., 2019) the space dedicated to some of the running wheels could be better used by increasing (carefully not to compromise the population density) the number of individuals in the home cage. On the other hand, zebrafish needs aquatic plants and several substrates in their environment, such as mud, gravel or sand, to represent their own eco-ethological expansions of behavior (Engeszer et al., 2007; Spence et al., 2008; Arunachalam et al., 2013; Parichy, 2015; Stevens et al., 2021). The substrates might provide some camouflage for zebrafish against the predator, which may contribute to feelings of security and improved welfare (Schroeder et al., 2014). Taking all these together, in our opinion, the scientific community must think over the long-term costs (economical and ethical ones) of keeping the culture of raising animals in impoverished environments, a condition that potentially disrupt the translation of behavioral neuroscience results into applicable benefits (Akkerman et al., 2014).

Future Directions

As previously stated, a “reproducibility crisis” is not an issue limited to the field of behavioral neuroscience, and several crowdsourced initiatives were established around the world addressing reproducibility (Open Science, 2015; Freedman et al., 2017; Reproducibility Project and Cancer Biology, 2017; Amaral et al., 2019). An essential step to confront this issue is to first recognize that there is a crisis and that it is a major problem. Secondly, the scientific communities have been developing and disseminating guidelines for good experimental practices to be implemented by themselves (more information can be found in http://www.consort-statement.org/ and also in https://www.equator-network.org/). In addition, encouraging the preregistration of the projects and experimental protocols (a practice that is essential for carrying out confirmatory studies) (Wagenmakers et al., 2012; Nosek et al., 2018) and the embracement of open research practices (open data sharing) (Ferguson et al., 2014; Steckler et al., 2015; Gilmore et al., 2017) are also alternatives to improve reproducibility. Interestingly, it seems that just encouraging good research practices is not enough to assure compliance with the proposed guidelines (Baker et al., 2014; Hair et al., 2019). This suggests that the participation of research funding agencies is necessary as well as of peer reviewers and journal editors in demanding adherence to these directives (Kilkenny et al., 2009; Baker et al., 2014; Han et al., 2017; Hair et al., 2019).

In conclusion, paraphrasing Lazic et al. (2018), “There are few ways to conduct an experiment well, but many ways to conduct it poorly.” In our opinion, we, as a scientific community, have to be worried about the rigor of the experiments we are conducting and the quality of the studies we are producing. Publishing non-reproducible results (or reproducible noise) can lead to ethical, economic, and technological consequences leading to scientific discredit. Furthermore, poor reproducibility delays discovery and development and hinders the progress of scientific knowledge. Broad adherence and advanced training to principles of animal welfare and good experimental practices may elevate the standards of behavioral neuroscience. Finally, perhaps we, as the scientific community, should strive to refine our current animal models and focus our efforts in the development of new, more robust, ethologically relevant models that could potentially improve both the description of our reality and the translational potential of our basic research.

Author Contributions

CML was responsible for the conceptualization of the opinion article. All authors were responsible for writing and revising the manuscript and read and approved the final manuscript.

Funding

Grants of Alexander von Humboldt Foundation (Germany) to CLi. CML was recipient of Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) research fellowship through the Instituto Nacional de Ciência e Tecnologia Translacional em Medicina (INCT-TM), Brazil. FM was supported by Post-doctoral fellowship grant #2018/25857-5, São Paulo Research Foundation (FAPESP), Brazil. KD was supported by Fellow BIPD/FCT Proj2020/i3S/26040705/2021, Fundação para a Ciência e Tecnologia, Portugal. This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior – Brasil (CAPES) – Finance Code 001.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Acknowledgments

We are grateful to the Alexander von Humboldt Foundation (Germany) and the Brazilian funding agencies for the financial support and fellowships granted. We are also grateful to Ann Colette Ferry (in memoriam) for providing language assistance.

Footnotes

References

Akkerman, S., Prickaerts, J., Bruder, A. K., Wolfs, K. H., De Vry, J., Vanmierlo, T., et al. (2014). PDE5 inhibition improves object memory in standard housed rats but not in rats housed in an enriched environment: implications for memory models? PLoS One 9:e111692. doi: 10.1371/journal.pone.0111692

PubMed Abstract | CrossRef Full Text | Google Scholar

Altman, D. G. (1994). The scandal of poor medical research. BMJ 308, 283–284. doi: 10.1136/bmj.308.6924.283

PubMed Abstract | CrossRef Full Text | Google Scholar

Amaral, O. B., Neves, K., Wasilewska-Sampaio, A. P., and Carneiro, C. F. (2019). The Brazilian reproducibility initiative. Elife 8:e41602.

OPINION article

Combining Animal Welfare With Experimental Rigor to Improve Reproducibility in Behavioral Neuroscience

Introduction

Advantages of the Adherence to the Regulations to the Quality of Behavioral Neuroscience

Suitable Animal Models and Behavioral Tests Should Improve Studies in Behavioral Neuroscience

Rigorous Design of Studies and Analysis of Data Should Improve the Quality of Behavioral Neuroscience

Environmental Enrichment in Research Facilities May Favor Translational Neuroscience

Future Directions

Author Contributions

Funding

Conflict of Interest

Publisher’s Note

Acknowledgments

Footnotes

References

Key concepts (Glossary)

This article is part of the Research Topic

People also looked at