Skip to main content

Main menu

  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Blog
    • Collections
    • Podcast
  • TOPICS
    • Cognition and Behavior
    • Development
    • Disorders of the Nervous System
    • History, Teaching and Public Awareness
    • Integrative Systems
    • Neuronal Excitability
    • Novel Tools and Methods
    • Sensory and Motor Systems
  • ALERTS
  • FOR AUTHORS
  • ABOUT
    • Overview
    • Editorial Board
    • For the Media
    • Privacy Policy
    • Contact Us
    • Feedback
  • SUBMIT

User menu

Search

  • Advanced search
eNeuro
eNeuro

Advanced Search

 

  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Blog
    • Collections
    • Podcast
  • TOPICS
    • Cognition and Behavior
    • Development
    • Disorders of the Nervous System
    • History, Teaching and Public Awareness
    • Integrative Systems
    • Neuronal Excitability
    • Novel Tools and Methods
    • Sensory and Motor Systems
  • ALERTS
  • FOR AUTHORS
  • ABOUT
    • Overview
    • Editorial Board
    • For the Media
    • Privacy Policy
    • Contact Us
    • Feedback
  • SUBMIT
PreviousNext
Research ArticleOpen Source Tools and Methods, Novel Tools and Methods

DeepEthoProfile—Rapid Behavior Recognition in Long-Term Recorded Home-Cage Mice

Andrei Istudor, Alexej Schatz and York Winter
eNeuro 15 July 2025, 12 (7) ENEURO.0369-24.2025; https://doi.org/10.1523/ENEURO.0369-24.2025
Andrei Istudor
Humboldt-Universität zu Berlin, Berlin 10099, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Andrei Istudor
Alexej Schatz
Humboldt-Universität zu Berlin, Berlin 10099, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Alexej Schatz
York Winter
Humboldt-Universität zu Berlin, Berlin 10099, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for York Winter
  • Article
  • Figures & Data
  • Info & Metrics
  • eLetters
  • PDF
Loading

Abstract

Animal behavior is crucial for understanding both normal brain function and dysfunction. To facilitate behavior analysis of mice within their home environments, we developed DeepEthoProfile, an open-source software powered by a deep convolutional neural network for efficient behavior classification. DeepEthoProfile requires no spatial cues for either training or processing and is designed to perform reliably under real laboratory conditions, tolerating variations in lighting and cage bedding. For data collection, we introduce EthoProfiler, a mobile cage rack system capable of simultaneously recording up to 10 singly housed mice. We used 36 h of manually annotated video data sampled in 5 min clips from a 48 h video database of 10 mice. This published dataset provides a reference that can facilitate further research. DeepEthoProfile achieved an overall classification accuracy of over 83%, comparable with human-level accuracy. The model also performed on par with other state-of-the-art solutions on another published dataset ( Jhuang et al., 2010). Designed for long-term experiments, DeepEthoProfile is highly efficient—capable of annotating nearly 2,000 frames per second and can be customized for various research needs.

Significance Statement

DeepEthoProfile addresses a long-standing need for robust, automated analysis of rodent activity within undisturbed home cages. Powered by a deep convolutional neural network, this open-source system achieves near-human-level accuracy—over 83% on a newly published 36 h annotated dataset—and compares favorably with other state-of-the-art solutions on an established benchmark. Its tolerance to varying lighting, bedding arrangements, and mouse morphology makes it particularly suited to real-world laboratory conditions. By annotating nearly 2,000 frames per second, DeepEthoProfile accelerates high-throughput behavioral phenotyping and enables long-term investigations of day/night cycles, strain differences, and aging-related frailty. The use of a Docker-based pipeline eases adoption and maintenance with a minimum of requirements.

Introduction

Mice and rats are central to biological and medical research. Long-term classification of behavioral activity is crucial for understanding mammalian brain function and investigating its dysfunction. In standard experimental setups, automated behavior evaluation is relatively straightforward when focusing on basic metrics such as general activity, body movement, and spatial usage (Contet et al., 2001; Tanaka et al., 2012; Lorbach et al., 2019; Ebbesen and Froemke, 2022). However, it becomes more challenging when it involves identifying specific behavioral actions in mice within their home cages, because mice lack distinct body features and can rapidly elongate or contract. Furthermore, standard animal welfare measures introduce complexities into home-cage-based video acquisition and data interpretation. The development of deep neural networks has been significantly advanced by improvements in consumer hardware, large databases, and deep learning libraries, leading to major progress in human action recognition (Karpathy et al., 2014; Tran et al., 2018). Although methods for human recognition are not directly transferable to rodent studies, they have opened new avenues in animal research. Techniques such as deep convolutional neural networks (CNNs), long short-term memory (LSTM) networks (Le and Murari, 2019), and optical flow features (Bohnslav et al., 2021) are now common for video annotating in both human and animal domains.

Behavior analysis requirements generally fall into two categories. On the one hand, researchers may need detailed kinematic data of body and limb movements of mice in specific experimental conditions. In this domain, DeepLabCut (Mathis et al., 2018) and SLEAP (Pereira et al., 2022) enable precise pose estimation across diverse settings. On the other hand, long-term observation in standard environments centers on daily behavioral sequences under unprovoked, natural conditions. The chaining of basic behaviors into a day-to-day sequence is influenced by genetics, environment, and disease. Identifying and understanding these behavioral elements and their timing offers insights into both healthy and disturbed patterns. In neurodegenerative disease research, such monitoring is critical for early symptom detection, often impossible by other methods, and for assessing treatment efficacy (Adamah-Biassi et al., 2013; Bains et al., 2018; Colomb and Winter, 2021). Moreover, automatic classification aligns with the 3R principles (replace, reduce, refine), potentially minimizing the number of animals used while improving data quality.

Quantifying daily routines in standard home cages presents unique obstacles. Mice frequently rearrange the bedding, altering the environment and complicating automatic detection. A commercial software suite (Adamah-Biassi et al., 2013) offers home-cage behavior analysis but is proprietary and expensive. Over a decade ago, an open-source solution from Tomaso Poggio’s lab at M.I.T. (Jhuang et al., 2010) was developed but is no longer maintained. The database from Jhuang et al. (2010) is still being used as a reference by various projects (Nwokedi et al., 2023; Jiang et al., 2017, 2019; Nguyen et al., 2019). Work from van Dam et al. (2020) relied on a multifiber network to classify overhead recordings of singly housed rats, while an approach integrating optical flow with a deep CNN showed promise for rodent and fly data (Bohnslav et al., 2021). For a more comprehensive review of recent approaches, see Segalin et al. (2021).

We developed a hardware and software solution, for extended recording and recognition of basic behaviors in singly housed mice in standard home cages. Our acquisition system, EthoProfiler, can record up to 10 cages at once. We used it to record 10 mice of two strains (different fur colors) for over 2 d each. Mice were individually housed with ad libitum access to food and water and a standard amount of bedding. We manually annotated single frames of 36 h of video from this data and then used those annotations to train and validate the DeepEthoProfile classification model. Our software excels in speed, accuracy, ease of use, and open-source availability, demonstrating strong performance on both our dataset and the previously published one (Jhuang et al., 2010).

Material and Methods

We have developed an experimental system comprising a mobile, compact hardware platform, acquisition software, and processing software. DeepEthoProfile performs per-frame video annotations via a PyTorch-based deep convolutional neural network (CNN). The implementation processes short sequences of image frames as multichannel images. Aside from the input layer, this approach closely resembles standard image classification neural networks (Krizhevsky et al., 2017).

DeepEthoProfile runs in a Docker container environment, eliminating the need for special software installations. Processing is very fast and parallelizable. We ran six instances in parallel on a computer with an Intel i7-6700 processor and NVidia GTX 1080 graphics card (8 GB RAM), achieving over 1900 classified frames per second. Consequently, a 24 h video recorded at 25 fps can be annotated in under 20 min on 2016-era hardware.

Acquisition system

For this study, we created a mobile and compact system to record long-term behavioral video of 10 singly housed mice. These 10 cages and their respective cameras are installed in a bottomless mobile rack (MetroMax i 5-Shelf), arranged in five rows of two cages each (Fig. 1A). We used conventional wire-lid home cages (Tecniplast Model 1145T, 369 × 165 × 132 mm, floor area 435 cm²). Their distinct long-and-narrow shape encourages left–right movement in front of the side-view camera (Fig. 1B). To differentiate feeding from drinking, we added a custom barrier in the food tray that confines pellet access to the left side, so pellet feeding is always on the left while drinking remains on the right.

Figure 1.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 1.

The EthoProfiler 10-cage data acquisition system. A, Frontal view of the setup, showing video cameras at the front, two cages per shelf level, and a backlighting panel at the rear, all mounted on a MetroMax shelf. B, Close-up of one shelf, with two adjacent Tecniplast 1145T cages housing mice of different strains (C57BL/6 on the left, SWISS on the right). Each cage contains the following: (a) a wire lid with ad libitum access to food, (b) a water bottle, and (c) paper bedding. Each camera (d) faces one cage, while the backlighting panel (e) provides illumination. Infrared background lighting in the processed videos was more uniform than in the daylight images shown here (Fig. 2). C–E, 3D schematics of one shelf: (C) overview, (D) front view (similar to panel B), and (E) side view.

Whereas some video methods advise minimal or no bedding for easier mouse detection, a practice that can cause stress and alter natural behavior, we used 200 ml of ALPHA-dri paper bedding (Tecnilab-BMI) and found our software to be robust despite bedding rearrangements. For illumination, we used backlighting panels across the camera’s field of view to reduce reflections. White plastic walls on each side of every cage prevent visual contact between neighboring cages and enhance uniform lighting. Though backlighting limits visible surface detail, it keeps the system compact. Direct front lighting would provide more detail but introduce reflections and complicate the design.

Given that mice are crepuscular or nocturnal and require a day/night cycle, our illumination panel contained two independent light sources of different wavelengths. Infrared light was always on for video acquisition while additional white light at ∼60 lux was switched on during the light phase (but remained invisible on video due to an IR filter). We used a Watec extreme low-light camera (WAT-902H2 Ultimate CCIR) at a small aperture for increased depth of field and with short exposure time to freeze motion, along with an infrared filter (Heliopan 5850) on the lens (Computar H0514MP2) that blocked visible light under 850 nm. The analog signals (25 fps@704 × 576) were digitized and simultaneously encoded using a dedicated Picolo U16 H.264 board (Euresys).

The recording output consists of MKV files, each up to six hours in length, containing an H.264-encoded video stream with a per frame timestamp. The acquisition software, written in C# and using DirectShow, is provided in our GitHub repository under “Capture.”

Behavior database

We studied 10 animals: five black C57BL/6NHsd (C57BL/6) females (Envigo) and five white RjOrl:SWISS (SWISS) females (Janvier), all 13 months old at recording. Data were collected for over 2 d. For manual annotation, each mouse had three or four 6 h recordings chosen to be evenly spaced across the 48 h period. In each selected video, 5 min intervals at the top of every half hour were annotated, yielding 60 annotated minutes per video and 180–240 min per mouse. We used the eight basic behavior categories described in Jhuang et al. (2010): eat, drink, groom, micromovement, rear, hang, walk, and rest (Fig. 2). Two trained biologists annotated half of the dataset each, labeling every frame with one behavior.

Figure 2.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 2.

The eight basic behavior categories as defined in Jhuang et al. (2010). Representative frames from the annotated database are shown. Monochromatic images were captured using an infrared (IR) background light and a camera filter blocking wavelengths below 850 nm.

The dataset comprises 3,240,000 annotated frames, with their distribution shown in Figure 4. Because the mouse body has few distinct visual features under these illumination conditions, many frames appear highly similar. This limited distinctiveness stems from the mouse's anatomy, coat color, and the specific lighting setup. In addition, the mouse’s movement speed can vary considerably, and its overall size and shape can change drastically. During long-term recordings, behavior bouts may last from a few hundred milliseconds to several minutes, and mice frequently redistribute bedding on the cage floor. Despite these challenges, the large size and broad coverage of our database helped mitigate potential limitations. We anticipate that it will serve as a valuable reference for future methodological innovations.

To evaluate the consistency of the initial annotations, a third biologist reviewed two randomly selected sets of clips that had been annotated previously. Overall 35% of the original clips were reviewed, and in the following we refer to them as reviewed set 1 (20% of the original data) and reviewed set 2 (15% of the original data). Labels were changed where necessary, and the resulting confusion matrix (Fig. 5C) showed an overall agreement of labelled frames of 86.4% between the initial and the revised annotations and a macroaccuracy of 87.8%.

Additionally, in reviewed set 2 we introduced a new behavior category, “None,” for frames in which no behavior could be determined. These were primarily frames where the mouse was facing away from the camera but was not resting. This new category was applied to nearly 4% of frames in reviewed set 2.

Reviewed sets 1 and 2 covered one-third of the initial data and were added to our database for training and testing.

Annotations were done using a Python script, “AnnotationViewer,” included in our GitHub repository. It allows frame-by-frame playback, navigation, and annotation, saving results as CSV files.

Processing software

DeepEthoProfiler is implemented in Python using PyTorch. Running the software requires Python 3, QT5, Docker, and CUDA. Processing occurs inside a Docker container, removing the need for specialized library installations. The application is compatible with modern Linux systems featuring NVIDIA GPUs that support CUDA 10.1 or higher.

The graphical user interface (GUI) is designed to be simple. Users can load videos one by one or add multiple files from a folder into a queue. A background process starts a separate Docker container for each file. If GPU memory allows, multiple containers run in parallel. The GUI shows a progress bar and completion percentage. All steps are automated requiring no extra user input. The output is a CSV with frame-by-frame annotations (including timestamps) that sit next to the processed video file.

Overview

DeepEthoProfile uses a deep CNN to classify each video frame into a specific behavior. The design is inspired by standard image classification methods (Krizhevsky et al., 2017) and is similar to other 2D convolution approaches for video classification (Karpathy et al., 2014; Tran et al., 2018). The input is a stack of frames encoding temporal information in separate channels. A unique feature is the first convolution layer, which includes multiple asymmetrical filters. These filters act on the input, and their outputs are stacked for further processing.

Input data format

Because recording illumination was monochromatic, we obtained single-channel visual images. The acquisition software converts these data to color before encoding and storing. During preprocessing each frame is reconverted to grayscale, resized to 256 × 256, and has its contrast enhanced. The pixel intensities are normalized to the [0,1] range. Eleven consecutive frames are then stacked into a single 11-channel image (stacked image) that will be the input for the CNN. This approach captures motion cues, offsetting the lack of distinct spatial features. The method from Jhuang et al. (2010) used nine frames equivalent to a 300 ms sequence at a recording rate of 30 frames per second (fps). We found that 11 frames gave a 4−5% accuracy gain, while 13 frames or more offered no further benefit or even reduced macroaccuracy. Thus, 11 frames (∼440 ms) was optimal, consistent with Wiltschko et al. (2015) . Different fps values may require adjusting the stack size.

First layer

We applied batch normalization (BN) to the input before being forwarded to the four distinct rectangular kernels in the first layer of the CNN. These varying filters effectively capture both small and large movements and consider relative positions within the environment, removing the need for explicit cage or feeder location data. Each convolution is followed by a Rectified Linear Unit (ReLU), and the outputs are stacked in the same coordinate space.

CNN description

Subsequent layers follow a standard pattern (Krizhevsky et al., 2017), with three additional convolution layers (each followed by a ReLU, and two fully connected layers). ReLU and BN boost training speed and help prevent overfitting.

The architecture of the CNN is summarized as follows BN-(C(32, (11,1), 2), C(32, (1,11), 2), C(32, (7,3), 2), C(32, (3,7), 2))-P-BN-C(256, 5, 1)-P-BN-C(384, 3, 1)-P-C(512, 3, 1)-C(256, 3, 1)-P-FC(2048)-FC(2048) (Fig. 3). The max-pooling layers (P) have a spacing of 2 and size of 3. C(d, f, s) denotes a convolutional layer with d filters of size f × f, applied to the input with stride s. BN represents a batch normalization layer. FC(n) represents a fully connected layer with n nodes. Each convolutional and FC layer was followed by a ReLU. A one-dimensional dropout of 0.5 was applied before each fully connected layer, reducing coadaptation of neurons (Hinton et al., 2012). There is a final FC layer with the number of nodes corresponding to the number of classified behaviors, in our case eight.

Figure 3.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 3.

The DeepEthoProfile CNN model architecture. Convolutional layers are shown in red, normalization layers in green, pooling layers in blue, and fully connected layers in yellow. The gray boxes represent the 11 input frames.

Training

We trained the network on the 5 min clips, each frame labeled with one of the eight behaviors. For an 11-frame stack, the target label was the majority annotation within those frames. If a tie arose, the central frame’s label was used. This eliminated brief and potentially invalid behaviors, minimizing the impact of noise in manual annotations while preserving the accuracy of the trained model.

Because the dataset was imbalanced (Fig. 4), we implemented a dynamic balancing approach with random sampling that favored underrepresented behaviors. Every minibatch contained an example from each behavior and exactly one from the least prevalent behavior, in this case “drink.” The sampling probability was calculated to ensure that there were at least twice as many occurrences of every other behavior in each training epoch. We used a minibatch size of 16 and attempted to pick each of the stacked images from a different training clip. Some behavior bouts are long and repetitive. This approach tried to avoid having stacked images that were too similar in the same minibatch to avoid overfitting. Training used cross-entropy loss with stochastic gradient descent (SGD), an initial learning rate of 0.03, and exponential decay at 0.99. We also applied standard augmentations (random shifts and horizontal flips) to reduce overfitting.

Figure 4.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 4.

Distribution of annotated frames for each behavior category. The dataset comprises 3,240,000 frames, equivalent to 36 h of video material.

When available, the data of the reviewed sets was used instead of the initial annotations. For the data from the reviewed set 2 with the “None” behavior, frames of that new category were excluded from the training process. The database does not contain enough samples of “None” to classify it. Thus, reviewed set 2 was not used for model validation.

Processing

At inference, one classification per 11-frame stack was applied to all 11 frames. This enforced a minimum 440 ms behavior duration, reducing noise and improving speed. Though transitions may shift by up to five frames (∼200 ms), this is negligible in day-scale analyses.

Code accessibility

The source code for DeepEthoProfile is accessible via the GitHub repository under the GPL-3.0 license at https://github.com/WinterLab-Berlin/DeepEthoProfile. The software runs on a PC installed with an Ubuntu 24.04 operating system, provided a Nvidia graphics card and the corresponding CUDA drivers are installed. Additional information and software requirements are found on the GitHub page. The code is available as Extended Data. The model used by the processing software will be downloaded by the starting script from https://doi.org/10.5281/zenodo.14827053.

The acquisition software runs on Windows 10 and can be found at https://github.com/WinterLab-Berlin/DeepEthoProfile/tree/main/Capture.

The database with all the annotations is published under the MIT license and can be found at https://doi.org/10.5281/zenodo.14782614.

Extended Data

Download Extended Data, ZIP file.

Results

Validation

We performed leave-one-out cross-validation, training with the data from nine mice, and testing on the data from the one that was left out. Repeating this for all mice, we trained for 40 epochs in each case. Summing and normalizing results gave an overall frame-level accuracy of 83.6% and a macroaccuracy of 82.9% (Fig. 5A).

Figure 5.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 5.

Confusion matrices for animal behavior detection rates using data acquired with EthoProfiler. Rows represent the manually annotated behaviors. A, The sum of confusion matrices from 10 leave-one-out tests. In each test, the model was trained on data from nine mice and tested on the remaining mouse. B, Results of training on 90% of the data and testing on the remaining 10%, with all reviewed annotations included in the training set. C, Results from the partial review of the initial human annotation data, where columns represent the reviewed behavior annotations.

The difficulty in accurately classifying frames as micromovement largely stems from its definition as “small movements of the animal's head or limbs” (Jhuang et al., 2010), which is open to misinterpretation. This category exhibited low accuracy in both our human review of annotations (Fig. 5C) and the review performed in Jhuang et al. (2010) (Fig. 7C). Moreover, when we examined frames labeled as “None” during our database review, over 91% had originally been annotated as either groom or micromovement. More than 8% of the previously labeled micromovement frames and 9% of the grooming frames aligned with the “None” classification. These findings highlight a limitation of our side-view recordings, which was not fully appreciated during the initial annotation phase. With limited visual information, the annotator had to infer the animal’s behavior in ambiguous frames, often splitting the same cues between two categories and preventing the model from learning effectively.

Such ambiguous frames significantly contributed to the difficulty in distinguishing micromovement from grooming.

Misclassification between micromovement and walk arose especially when the mouse moved very slowly, for very short periods of time, or changed body length. Similar ambiguities appeared in the revised confusion matrix (Fig. 5C).

When comparing strains, the classification on C57BL/6 mice had lower accuracy than on SWISS mice, an outcome mirrored in the annotation review (Table 1).

View this table:
  • View inline
  • View popup
Table 1.

Comparison of the accuracy of the model across mouse strains with leave-one-out test results

Despite these issues, the cross-validation performance remained close to human annotation quality.

DeepEthoProfile model

We trained the final model on 90% of the clips and tested on the remaining 10%. All revised annotations were included in the training set for best results. The frames corresponding to the “None” category were ignored to avoid inconsistencies. The confusion matrix had a frame-level accuracy of 90.6% (Fig. 5B). This model is used by the DeepEthoProfile software. We used it to process the 2 d of video data from which our annotated database was previously selected. Results (Fig. 6) illustrate behavioral differences between strains and their adaptation to the day/night cycle.

Figure 6.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 6.

Mean behavior durations monitored over 48 h. A, Data from SWISS mice (n = 5). B, Data from C57BL/6 mice (n = 5). The data were acquired using the EthoProfiler setup and processed with DeepEthoProfiler. The shaded bar at the top of each figure indicates the dark phase. Data from the first hour after introducing the mice to their new cages were excluded.

Comparison

To assess efficiency, we tested the model on the database published in Jhuang et al. (2010), a well-established benchmark used by several state-of-the-art methods (Nwokedi et al., 2023; Jiang et al., 2017, 2019; Nguyen et al., 2019). This dataset has two annotation sets: a “full database” of 12 videos (every frame labeled) and a “clipped database” of short, unambiguous segments.

We followed the same protocols used in prior studies. On the full database, we applied leave-one-out (training on 11 videos, testing on the remaining one) for 50 epochs each, obtaining 74.2% accuracy over frames and 78.7% macroaccuracy (Fig. 7A). A separate human annotation review in Jhuang et al. (2010) yielded 73% (frames) and 79% (macro) agreement, indicating that our results were close to human-level performance. On the clipped database, we split data 50/50 for training and testing, repeating this process five times (30 epochs each). Combined, we reached 97.6% overall accuracy and 96.8% macroaccuracy (Fig. 7B). Table 2 compares our results to other published methods.

Figure 7.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 7.

Confusion matrices for animal behavior detection rates using the dataset from Jhuang et al. (2010). Rows represent the manually annotated behaviors. A, The sum of confusion matrices from 12 leave-one-out tests on the “full database.” In each test, the model was trained on data from 11 videos and tested on the remaining video. B, The sum of confusion matrices from five tests on the “clipped database.” In each test, the model was trained on 50% of the videos (randomly selected) and tested on the remaining 50%. C, Results from the partial review of the initial human annotation, as originally published in Jhuang et al. (2010).

View this table:
  • View inline
  • View popup
Table 2.

Representation of previously published classification models tested on the data from Jhuang et al. (2010) alongside results obtained with DeepEthoProfile

DeepEthoProfile performs on par with or better than most current techniques, especially on drink, though it struggles with micromovement in the full database, where annotations are often inconsistent. Notably, the agreement on micromovement behavior between human reviewers was just 64% in Jhuang et al. (2010).

Discussion

We present a new open-source method to classify single-housed mouse behavior in standard home cages over extended periods. Our system is fast, robust to bedding changes, and well suited for long-term data collection. We also provide the dataset used to train and validate the model. To our knowledge, this dataset is the first freely available long-term video recording with frame-by-frame mouse behavior annotations. Our classification method was validated on the data from Jhuang et al. (2010), and it performed comparably with other leading systems. However, those videos were recorded in conditions that differ from standard husbandry: there is minimal bedding, uniform top illumination, and they are relatively short. Often the animals are in exploratory mode typical of a newly introduced cage. We have explicitly ignored the first hours of our more than 2 d recording to avoid that behavior.

Automated observation of animals in their usual habitat supports 3R principles by reducing negative impact on animals or refinement of animal welfare, and enhancing research efficiency. This noninvasive approach captures normal behavior spectra without causing stress or disturbance, potentially lowering the number of animals needed for invasive procedure. DeepEthoProfile can be adapted to be used on existing video databases to extract additional data with a high degree of consistency.

In typical biology laboratory settings, users found our system intuitive and easy to use, with classification performance comparable to human annotators. We look forward to further develop DeepEthoProfile to meet additional requirements and to adapt it to diverse experimental environments.

Footnotes

  • This work was supported by the BMBF program “Alternative methods for animal experiments” FKZ 031A418 and FKZ 161L0228, as well as by the DFG Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) through SFB 1315, project-ID 327654276, and EXC 257 NeuroCure, project-ID 441 39052203. We thank Prisila Charles, Rebecca Saager, and Cedric Rauch for their dedicated efforts in working with the animals and carrying out the manual annotations; Rupert Overall and Janine Musolf for their thorough evaluation of a pre-release version of our software, which involved data from multiple sets of 20 mice across two EthoProfiler racks operated concurrently; Peter Spende, Francesco Bagorda, and Katja Frei for their significant contributions in the development of the acquisition hardware; and Clemens Winter and Rupert Overall for proof-reading and providing feed-back to the initial version of the manuscript. The article processing charge was funded by the Open Access Publication Fund of Humboldt-Universität zu Berlin.

  • PhenoSys GmbH, which manufactures and distributes the 10-cage EthoProfiler data acquisition system, counts Y.W. among its equity owners.

This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license, which permits unrestricted use, distribution and reproduction in any medium provided that the original work is properly attributed.

References

  1. ↵
    1. Adamah-Biassi EB,
    2. Stepien I,
    3. Hudson RL,
    4. Dubocovich ML
    (2013) Automated video analysis system reveals distinct diurnal behaviors in C57BL/6 and C3H/HeN mice. Behav Brain Res 243:306–312. https://doi.org/10.1016/j.bbr.2013.01.003 pmid:23337734
    OpenUrlCrossRefPubMed
  2. ↵
    1. Bains RS,
    2. Wells S,
    3. Sillito RR,
    4. Armstrong JD,
    5. Cater HL,
    6. Banks G,
    7. Nolan PM
    (2018) Assessing mouse behaviour throughout the light/dark cycle using automated in-cage analysis tools. J Neurosci Methods 300:37–47. https://doi.org/10.1016/j.jneumeth.2017.04.014 pmid:28456660
    OpenUrlCrossRefPubMed
  3. ↵
    1. Bohnslav JP, et al.
    (2021) DeepEthogram, a machine learning pipeline for supervised behavior classification from raw pixels. Elife 10:e63377. https://doi.org/10.7554/eLife.63377 pmid:34473051
    OpenUrlCrossRefPubMed
  4. ↵
    1. Colomb J,
    2. Winter Y
    (2021) Creating detailed metadata for an R shiny analysis of rodent behavior sequence data detected along one light-dark cycle. Front Neurosci 15:742652. https://doi.org/10.3389/fnins.2021.742652 pmid:34899155
    OpenUrlPubMed
  5. ↵
    1. Contet C,
    2. Rawlins JN,
    3. Deacon RM
    (2001) A comparison of 129S2/SvHsd and C57BL/6JOlaHsd mice on a test battery assessing sensorimotor, affective and cognitive behaviours: implications for the study of genetically modified mice. Behav Brain Res 124:33–46. https://doi.org/10.1016/s0166-4328(01)00231-5
    OpenUrlCrossRefPubMed
  6. ↵
    1. Ebbesen CL,
    2. Froemke RC
    (2022) Automatic mapping of multiplexed social receptive fields by deep learning and GPU-accelerated 3D videography. Nat Commun 13:Article 1. https://doi.org/10.1038/s41467-022-28153-7 pmid:35105858
    OpenUrlPubMed
  7. ↵
    1. Hinton GE,
    2. Srivastava N,
    3. Krizhevsky A,
    4. Sutskever I,
    5. Salakhutdinov RR
    (2012) Improving neural networks by preventing co-adaptation of feature detectors (arXiv:1207.0580).
  8. ↵
    1. Jhuang H,
    2. Garrote E,
    3. Yu X,
    4. Khilnani V,
    5. Poggio T,
    6. Steele AD,
    7. Serre T
    (2010) Automated home-cage behavioural phenotyping of mice. Nat Commun 1:Article 1. https://doi.org/10.1038/ncomms1064
    OpenUrl
  9. ↵
    1. Jiang Z,
    2. Crookes D,
    3. Green BD,
    4. Zhang S,
    5. Zhou H
    (2017) Behavior recognition in mouse videos using contextual features encoded by spatial-temporal stacked fisher vectors. 259–269.
  10. ↵
    1. Jiang Z,
    2. Crookes D,
    3. Green BD,
    4. Zhao Y,
    5. Ma H,
    6. Li L,
    7. Zhang S,
    8. Tao D,
    9. Zhou H
    (2019) Context-aware mouse behaviour recognition using hidden Markov models. IEEE Trans Image Process 28:1133–1148. https://doi.org/10.1109/TIP.2018.2875335
    OpenUrlCrossRef
  11. ↵
    1. Karpathy A,
    2. Toderici G,
    3. Shetty S,
    4. Leung T,
    5. Sukthankar R,
    6. Fei-Fei L
    (2014) Large-scale video classification with convolutional neural networks. IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, pp. 1725–1732.
  12. ↵
    1. Krizhevsky A,
    2. Sutskever I,
    3. Hinton GE
    (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60:84–90. https://doi.org/10.1145/3065386
    OpenUrlCrossRef
  13. ↵
    1. Le VA,
    2. Murari K
    (2019) Recurrent 3D convolutional network for rodent behavior recognition. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1174–1178.
  14. ↵
    1. Lorbach M,
    2. Poppe R,
    3. Veltkamp R
    (2019) Interactive rodent behavior annotation in video using active learning. Multimed Tools Appl 78:1–20. https://doi.org/10.1007/s11042-019-7169-4
    OpenUrl
  15. ↵
    1. Mathis A,
    2. Mamidanna P,
    3. Cury KM,
    4. Abe T,
    5. Murthy VN,
    6. Mathis MW,
    7. Bethge M
    (2018) DeepLabCut: markerless pose estimation of user-defined body parts with deep learning. Nat Neurosci 21:Article 9. https://doi.org/10.1038/s41593-018-0209-y
    OpenUrlCrossRef
  16. ↵
    1. Nguyen N,
    2. Phan D,
    3. Lumbanraja F,
    4. Faisal M,
    5. Abapihi B,
    6. Purnama B,
    7. Delimayanti M,
    8. Mahmudah K,
    9. Kubo M,
    10. Satou K
    (2019) Applying deep learning models to mouse behavior recognition. J Biomed Sci Eng 12:183–196. https://doi.org/10.4236/jbise.2019.122012
    OpenUrl
  17. ↵
    1. Nwokedi EI,
    2. Bains RS,
    3. Bidaut L,
    4. Ye X,
    5. Wells S,
    6. Brown JM
    (2023) Dual-stream spatiotemporal networks with feature sharing for monitoring animals in the home cage. Sensors 23:9532. https://doi.org/10.3390/s23239532 pmid:38067907
    OpenUrlPubMed
  18. ↵
    1. Pereira TD, et al.
    (2022) SLEAP: a deep learning system for multi-animal pose tracking. Nat Methods 19:486–495. https://doi.org/10.1038/s41592-022-01426-1 pmid:35379947
    OpenUrlCrossRefPubMed
  19. ↵
    1. Segalin C,
    2. Williams J,
    3. Karigo T,
    4. Hui M,
    5. Zelikowsky M,
    6. Sun JJ,
    7. Perona P,
    8. Anderson DJ,
    9. Kennedy A
    (2021) The Mouse Action Recognition System (MARS) software pipeline for automated analysis of social behaviors in mice. Elife 10:e63720. https://doi.org/10.7554/eLife.63720 pmid:34846301
    OpenUrlPubMed
  20. ↵
    1. Tanaka S,
    2. Young JW,
    3. Halberstadt AL,
    4. Masten VL,
    5. Geyer MA
    (2012) Four factors underlying mouse behavior in an open field. Behav Brain Res 233:55–61. https://doi.org/10.1016/j.bbr.2012.04.045 pmid:22569582
    OpenUrlCrossRefPubMed
  21. ↵
    1. Tran D,
    2. Wang H,
    3. Torresani L,
    4. Ray J,
    5. LeCun Y,
    6. Paluri M
    (2018) A closer look at spatiotemporal convolutions for action recognition. Proceedings - 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018. IEEE Computer Society, pp. 6450–6459.
  22. ↵
    1. van Dam EA,
    2. Noldus LPJJ,
    3. van Gerven MAJ
    (2020) Deep learning improves automated rodent behavior recognition within a specific experimental setup. J Neurosci Methods 332:108536. https://doi.org/10.1016/j.jneumeth.2019.108536
    OpenUrlCrossRef
    1. Wang L,
    2. Qiao Y,
    3. Tang X
    (2015) Action recognition with trajectory-pooled deep-convolutional descriptors.
    1. Wang L,
    2. Xiong Y,
    3. Wang Z,
    4. Qiao Y,
    5. Lin D,
    6. Tang X,
    7. Van Gool L
    (2017) Temporal Segment Networks for Action Recognition in Videos.
  23. ↵
    1. Wiltschko AB,
    2. Johnson MJ,
    3. Iurilli G,
    4. Peterson RE,
    5. Katon JM,
    6. Pashkovski SL,
    7. Abraira VE,
    8. Adams RP,
    9. Datta SR
    (2015) Mapping sub-second structure in mouse behavior. Neuron 88:1121–1135. https://doi.org/10.1016/j.neuron.2015.11.031 pmid:26687221
    OpenUrlCrossRefPubMed

Synthesis

Reviewing Editor: Nathalie Ginovart, University of Geneva

Decisions are customarily a result of the Reviewing Editor and the peer reviewers coming together and discussing their recommendations until a consensus is reached. When revisions are invited, a fact-based synthesis statement explaining their decision and outlining what is needed to prepare a revision will be listed below. The following reviewer(s) agreed to reveal their identity: NONE.

The authors have constructively revised their manuscript, and this revised version is significantly improved over the initial submission. However, the first reviewer highlighted a few minor issues that need to be addressed before final acceptance.

Moreover, before final acceptance, authors must also comply with eNeuro's policy regarding studies using custom code. The code must be available upon acceptance and publication of the manuscript. As per eNeuro's Instructions to authors, code files should be deposited in a suitable repository such as GitHub, ModelDB, BioModels, CellML, or Visiome. The "Code Accessibility" section should therefore indicate how the code can be accessed and provide a repository link, including any accession numbers or restrictions, as well as the type of computer and operating system used to run the code.

Please find the reviewer's comments below:

Reviewer 1:

The manuscript is a marked improvement on the original, with a much better structure and more methodological details. A comparison with previous methods on the Jhuang et al. dataset is provided as well. I have some minor comments:

- Lines 181 - 188 need proper mathematical formatting, and the architecture should probably be provided as a table and/or diagram.

- Typo on Line 120 "aolder" -> "an older"?

Back to top

In this issue

eneuro: 12 (7)
eNeuro
Vol. 12, Issue 7
July 2025
  • Table of Contents
  • Index by author
  • Masthead (PDF)
Email

Thank you for sharing this eNeuro article.

NOTE: We request your email address only to inform the recipient that it was you who recommended this article, and that it is not junk mail. We do not retain these email addresses.

Enter multiple addresses on separate lines or separate them with commas.
DeepEthoProfile—Rapid Behavior Recognition in Long-Term Recorded Home-Cage Mice
(Your Name) has forwarded a page to you from eNeuro
(Your Name) thought you would be interested in this article in eNeuro.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Print
View Full Page PDF
Citation Tools
DeepEthoProfile—Rapid Behavior Recognition in Long-Term Recorded Home-Cage Mice
Andrei Istudor, Alexej Schatz, York Winter
eNeuro 15 July 2025, 12 (7) ENEURO.0369-24.2025; DOI: 10.1523/ENEURO.0369-24.2025

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Respond to this article
Share
DeepEthoProfile—Rapid Behavior Recognition in Long-Term Recorded Home-Cage Mice
Andrei Istudor, Alexej Schatz, York Winter
eNeuro 15 July 2025, 12 (7) ENEURO.0369-24.2025; DOI: 10.1523/ENEURO.0369-24.2025
Twitter logo Facebook logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Jump to section

  • Article
    • Abstract
    • Significance Statement
    • Introduction
    • Material and Methods
    • Results
    • Discussion
    • Footnotes
    • References
    • Synthesis
  • Figures & Data
  • Info & Metrics
  • eLetters
  • PDF

Responses to this article

Respond to this article

Jump to comment:

No eLetters have been published for this article.

Related Articles

Cited By...

More in this TOC Section

Open Source Tools and Methods

  • RetINaBox: A Hands-On Learning Tool for Experimental Neuroscience
  • The Odor Delivery Optimization Research System (ODORS): An Open-Source Olfactometer for Behavioral Assessments in Tethered and Untethered Rodents
  • Low-Cost 3D-Printed Mazes with Open-Source ML Tracking for Mouse Behavior
Show more Open Source Tools and Methods

Novel Tools and Methods

  • RetINaBox: A Hands-On Learning Tool for Experimental Neuroscience
  • The Odor Delivery Optimization Research System (ODORS): An Open-Source Olfactometer for Behavioral Assessments in Tethered and Untethered Rodents
  • Low-Cost 3D-Printed Mazes with Open-Source ML Tracking for Mouse Behavior
Show more Novel Tools and Methods

Subjects

  • Novel Tools and Methods
  • Open Source Tools and Methods
  • Home
  • Alerts
  • Follow SFN on BlueSky
  • Visit Society for Neuroscience on Facebook
  • Follow Society for Neuroscience on Twitter
  • Follow Society for Neuroscience on LinkedIn
  • Visit Society for Neuroscience on Youtube
  • Follow our RSS feeds

Content

  • Early Release
  • Current Issue
  • Latest Articles
  • Issue Archive
  • Blog
  • Browse by Topic

Information

  • For Authors
  • For the Media

About

  • About the Journal
  • Editorial Board
  • Privacy Notice
  • Contact
  • Feedback
(eNeuro logo)
(SfN logo)

Copyright © 2026 by the Society for Neuroscience.
eNeuro eISSN: 2373-2822

The ideas and opinions expressed in eNeuro do not necessarily reflect those of SfN or the eNeuro Editorial Board. Publication of an advertisement or other product mention in eNeuro should not be construed as an endorsement of the manufacturer’s claims. SfN does not assume any responsibility for any injury and/or damage to persons or property arising from or related to any use of any material contained in eNeuro.