Skip to main content

Main menu

  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Blog
    • Collections
    • Podcast
  • TOPICS
    • Cognition and Behavior
    • Development
    • Disorders of the Nervous System
    • History, Teaching and Public Awareness
    • Integrative Systems
    • Neuronal Excitability
    • Novel Tools and Methods
    • Sensory and Motor Systems
  • ALERTS
  • FOR AUTHORS
  • ABOUT
    • Overview
    • Editorial Board
    • For the Media
    • Privacy Policy
    • Contact Us
    • Feedback
  • SUBMIT

User menu

Search

  • Advanced search
eNeuro
eNeuro

Advanced Search

 

  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Blog
    • Collections
    • Podcast
  • TOPICS
    • Cognition and Behavior
    • Development
    • Disorders of the Nervous System
    • History, Teaching and Public Awareness
    • Integrative Systems
    • Neuronal Excitability
    • Novel Tools and Methods
    • Sensory and Motor Systems
  • ALERTS
  • FOR AUTHORS
  • ABOUT
    • Overview
    • Editorial Board
    • For the Media
    • Privacy Policy
    • Contact Us
    • Feedback
  • SUBMIT
PreviousNext
Research ArticleResearch Article: Methods/New Tools, Novel Tools and Methods

SaLSa: A Combinatory Approach of Semi-Automatic Labeling and Long Short-Term Memory to Classify Behavioral Syllables

Shuzo Sakata
eNeuro 21 November 2023, 10 (12) ENEURO.0201-23.2023; https://doi.org/10.1523/ENEURO.0201-23.2023
Shuzo Sakata
Strathclyde Institute of Pharmacy and Biomedical Sciences, University of Strathclyde, Glasgow G4 0RE, United Kingdom
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Article
  • Figures & Data
  • Info & Metrics
  • eLetters
  • PDF
Loading

Abstract

Accurately and quantitatively describing mouse behavior is an important area. Although advances in machine learning have made it possible to track their behaviors accurately, reliable classification of behavioral sequences or syllables remains a challenge. In this study, we present a novel machine learning approach, called SaLSa (a combination of semi-automatic labeling and long short-term memory-based classification), to classify behavioral syllables of mice exploring an open field. This approach consists of two major steps. First, after tracking multiple body parts, spatial and temporal features of their egocentric coordinates are extracted. A fully automated unsupervised process identifies candidates for behavioral syllables, followed by manual labeling of behavioral syllables using a graphical user interface (GUI). Second, a long short-term memory (LSTM) classifier is trained with the labeled data. We found that the classification performance was marked over 97%. It provides a performance equivalent to a state-of-the-art model while classifying some of the syllables. We applied this approach to examine how hyperactivity in a mouse model of Alzheimer’s disease develops with age. When the proportion of each behavioral syllable was compared between genotypes and sexes, we found that the characteristic hyperlocomotion of female Alzheimer’s disease mice emerges between four and eight months. In contrast, age-related reduction in rearing is common regardless of genotype and sex. Overall, SaLSa enables detailed characterization of mouse behavior.

  • aging
  • Alzheimer’s disease
  • classification
  • computational ethology
  • deep learning
  • long short-term memory

Significance Statement

Describing complex animal behavior is a challenge. Here, we developed an open-source, combinatory approach to behavioral syllable classification, called SaLSa (a combination of semi-automatic labeling and long short-term memory-based classification). In order to classify behavioral syllables, this approach combines multiple machine learning methods to label video frames semi-automatically and train a deep learning model. To demonstrate SaLSa’s versatility, we monitored the exploratory behavior of an Alzheimer’s disease mouse model and delineated their complex behaviors. We found that female Alzheimer’s mice become hyperactive in the sense that their locomotion behavior, but not other active behaviors, appear more frequently than controls and even male Alzheimer’s mice as they age. SaLSa offers a toolkit to analyze complex behaviors.

Introduction

In modern neuroscience, a goal is to understand the relationship between behavior and neural ensembles. However, accurately and quantitatively describing complex behavior remains challenging. In the past, mouse behaviors have often been assessed using simple, subjective criteria in a series of behavioral tests. However, advances in machine learning have changed the field (Mathis et al., 2020; Pereira et al., 2020; Luxem et al., 2023).

To describe mouse behaviors, several steps are required. First, behaviors are video-monitored. The state-of-the-art is to use a depth camera (Wiltschko et al., 2020) or multiple cameras (Dunn et al., 2021; Schneider et al., 2022). While some experiments require video recording from the bottom to monitor limb movement (Pereira et al., 2019; Bohnslav et al., 2021; Luxem et al., 2022), most experiments still use videos taken from the top. Second, the movement of the mouse’s body parts is tracked frame-by-frame to estimate animal posture. Deep learning-based algorithms have been widely adopted for this purpose (Mathis et al., 2018; Pereira et al., 2019). The final step is to identify and classify distinct behavioral sequences or syllables. While a variety of approaches have been developed over the past decade (Kabra et al., 2013; Hsu and Yttri, 2021; Luxem et al., 2023), this final step is still challenging.

There are two broad categories of methods used to classify behavioral syllables. The first category is a top-down approach, which involves predefining a set of rules to identify behavioral syllables or applying supervised machine learning to classify them (Kabra et al., 2013; Segalin et al., 2021; Harris et al., 2023). The second category is a bottom-up approach, which involves analyzing data patterns using unsupervised classification algorithms (Dunn et al., 2021; Hsu and Yttri, 2021; Gabriel et al., 2022; Jia et al., 2022; Luxem et al., 2022; Weinreb et al., 2023). Either approach has its own advantages and disadvantages. For example, although the top-down approach provides interpretable outcomes by definition, it can be laborious to prepare labeled datasets for model training. Conversely, while the bottom-up approach is less unbiased, setting optimal parameters and providing each syllable to an interpretable behavioral label can be a nontrivial task.

Here, we hypothesize that the long short-term memory (LSTM) model (Hochreiter and Schmidhuber, 1997) can be adopted for this purpose because of the following reasons. First, behavioral sequences may contain contextual, long-term dependencies (Musall et al., 2019; Issa et al., 2020). Second, the LSTM model is designed to handle time series data with long-term dependencies and has shown promising results in other fields (Vinyals et al., 2019; Van Houdt et al., 2020). While this model has not been adopted to classify behavioral syllables, it may be a suitable method for addressing the challenges associated with these sequences. Here, we introduce SaLSa (a combination of semi-automatic labeling and LSTM-based classification; Fig. 1). This is a combinatorial approach to creating labeled data semi-automatically and training an LSTM classifier. To demonstrate the capability of this approach, we examine how behavioral abnormalities in an Alzheimer’s mouse model emerge during aging.

Figure 1.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 1.

SaLSa. After recording videos and processing them with DeepLabCut (“pose estimation”), spatial and temporal features are extracted from the egocentric coordinates of tracked body parts (feature extraction). Based on a set of videos, an LSTM classifier is trained (model training). This component consists of two parts. First, through fully automated unsupervised processes, including dimensionality reduction and clustering, behavioral syllable candidates are identified. Using a graphical user interface, the identified candidates are manually labeled (semi-automatic labeling). For the training and evaluation of an LSTM classifier, labeled data will be used. Once the classifier is trained, the entire sequence of extracted features is processed to classify behavioral syllables.

Materials and Methods

Animals

All animal experiments were performed in accordance with the United Kingdom Animals (Scientific Procedures) Act of 1986 Home Office regulations and approved by the University of Strathclyde Animal Welfare and Ethical Review Body and the Home Office (PPL0688994). 5xFAD mice (JAX006554; Oakley et al., 2006) were bred with wild-type (WT) mice on the C57BL/6J background (>F10). All genotyping was performed by Transnetyx using real-time PCR. 43 mice (13 5xFAD male; 11 5xFAD female; 9 WT male; 10 WT female; age range: 1.3–9.4 months old) were used. WT mice were littermates. They had ad libitum access to food and water. The animals were housed with their littermates on a 12/12 h light/dark cycle. All behavioral experiments were performed during the first quarter of the light period (zeitgeber time 2–3).

Video monitoring of exploratory behaviors

The behavioral arena was an acrylic transparent box (40 × 40 × 40 cm, Displaypro). Paper sheets covered the outside of all side walls and landmark images (e.g., large stripes and crosses) were placed on two walls. Four boxes were placed closely on white tables (Lack Side Table, IKEA). A webcam (C900, NULAXY) was set over the boxes. The pixel resolution was 0.74 pixels/mm at the bottom of the arena. The video was captured at 25 frames/s (fps) by a custom-written program (LabVIEW, National Instruments). For each behavioral session, an animal was placed in the center of the arena and allowed to explore it for 20 min. After the test, the arena was cleaned with 70% ethanol. Only one session was held per day.

Pose estimation

Every video file was cropped into four videos, each containing one box. This was done with custom-written MATLAB code. A DeepLabCut model (Mathis et al., 2018; Lauer et al., 2022) was trained: four videos were chosen from one behavioral session. Fifty frames from each video were manually labeled. The labeled body parts consisted of (1) nose, (2) head, (3) left ear, (4) right ear, (5) neck, (6) anterior back (back 1), (7) posterior back (back 2), (8) tail base, (9) mid-tail, and (10) tail tip (Fig. 2A). The number of training iterations was 410,000. All videos were processed with this trained model. Filtered data were used. Because the tail shape did not reflect behavioral syllables, the mid-tail and tail tip were excluded from further analysis in this study. Thus, eight body parts were analyzed.

Figure 2.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 2.

Semi-automatic labeling. A, A frame with tracked body parts. Although 10 body parts were tracked, the mid-tail and tail tip were excluded for quantitative analyses. B, The categories of extracted features. C, A chunk of normalized feature values. Each feature was Z-scored for a visualization purpose. Dotted lines separate different feature categories. The feature category number corresponds to the number indicated in B. D, Cumulative distribution of explained variance across principal components (PCs). The threshold for including PCs was set at 85% variance explained. E, UMAP representation of the entire video sequences and example frame sequences of labeled behavioral syllables. Data reduced by principal component analysis (PCA) was further reduced to two dimensions by UMAP. A spectral clustering algorithm was applied. By removing fragmented (<0.25 s) sequences, each cluster was manually annotated. The example sequences indicate a down-sampled sequence of each labeled behavioral syllable. Please note that although the mid-tail and tail tip were shown in sample frames, because the tail shape did not reflect behavioral syllables, the mid-tail and tail tip were excluded from all quantitative analyses including the UMAP analysis.

Feature extraction

All analyses were implemented in MATLAB (https://github.com/Sakata-Lab/SaLSa). All coordinates were converted into egocentric coordinates as the neck was set as the body center and the nose-neck axis was set as the body-center axis. After conversion, comprehensive features were derived. Features can be categorized into spatial and temporal features.

The spatial features were as follows: (1) the relative coordinates across body parts, (2) the relative angle of each body part relative to the body-center axis, and (3) the distance between body parts. The temporal features were as follows: (1) the frame-by-frame velocity of each body part, (2) the frame-by-frame velocity of the distance between body parts, (3) the spectrotemporal characteristics of the relative coordinates across body parts, (4) the spectrotemporal characteristics of the relative angle of each body part, (5) the spectrotemporal characteristics of the frame-by-frame velocity of each body part, (6) the spectrotemporal characteristics of the distance between body parts, and (7) the spectrotemporal characteristics of the frame-by-frame velocity of the distance between body parts. To compute spectrotemporal characteristics, we applied a wavelet transformation (cwt function in MATLAB). The extracted frequency components were evenly spaced in the wavelet frequency domain: 0.62, 0.88, 1.25, 1.77, 2.50, 3.54, 5.01, 7.09, and 10.0 Hz. Overall, 910 features were extracted.

Unsupervised processes

The following steps automatically identified candidates for behavioral syllables based on the extracted features. This step consisted of the following substeps: the first step was dimensionality reduction. Principal component analysis (PCA) was performed to reduce the dimension. PCs that explained >85% variance were used for uniform manifold approximation and projection (UMAP; https://www.mathworks.com/matlabcentral/fileexchange/71902). The parameters, min_dist, and n_neighbors, were set to 0.3 and 5, respectively. The second step was clustering. In the 2D UMAP space, spectral clustering was performed. The number of clusters was optimized to between 35 and 50 clusters using Calinski–Harabasz criterion (Calinski and Harabasz, 1974). Because the main aim was to extract clusters of certain behavioral syllables with less contamination, we intentionally over-clustered the data. After clustering, each cluster contains a number of sequences of candidate behavioral syllables. The final step is a postprocessing. Because short sequences were hard to label manually, fragmented (<0.25 s) sequences were excluded from labeling. This step drastically reduced sequences and clusters. After this processing, a snippet for each cluster was created for manual labeling.

Manual labeling

A custom-written graphical user interface (GUI) was used to label each snippet. According to the initial evaluation, the following six behavioral syllables were often observed: (1) locomotion: walking/running behavior toward one direction; (2) turning: turning behavior at the same position; (3) rearing: rearing behavior toward a wall or at the center of the arena; (4) sniffing: rhythmic sniffing behavior; (5) grooming: grooming behavior; (6) pause: behavior standing still at the same position. Each snippet was labeled as one of these six syllables or miscellaneous where more than two behavioral syllables were contaminated, or initially unrecognized behaviors (such as jumping) were observed. Because this labeling procedure was for the training of an LSTM classifier, we took a conservative approach so that snippets labeled as one of six syllables can be less contaminated by other syllables. Out of 43 videos, 29 were manually labeled. Nineteen videos were used for LSTM model training. Ten additional videos were used for the model performance assessment.

LSTM training and classification

The LSTM classifier was comprised of (1) an input layer, (2) an LSTM layer, (3) a fully connected layer, (4) a Softmax layer, and (5) a classification layer. The input layer consisted of 910 units as 910 features were extracted from each video (see above). The LSTM layer contained 200 hidden units. The classifier aimed to classify six behavioral syllables. For training, the ADAM optimizer was used with the L2 norm regularization method. The relationship between three parameters (the gradient threshold, the maximum number of epochs, and the number of hidden units) and evaluation accuracy was systematically assessed as part of optimization. In this study, the gradient threshold was set to 1 and the maximum number of epochs was set to 60. For training, 80% of labeled data were used whereas the remaining labeled data were used for evaluation. The model with the best validation performance was used for further analysis. Once the classifier was trained, all videos were processed to determine behavioral syllables frame-by-frame.

Classification performance assessment

To evaluate the LSTM classifier’s performance, additional 10 videos were used. For each behavioral syllable, a receiver operating characteristic (ROC) curve and the area under the curve (AUC) were computed.

Benchmark testing

To compare SaLSa’s performance with an existing model, keypoint-MoSeq (Weinreb et al., 2023) was chosen for the following reasons. First, it overperformed other major models, such as B-SOiD (Hsu and Yttri, 2021) and VAME (Luxem et al., 2022). Second, the implementation is straightforward with a limited set of parameters to explore. For training of the keypoint-MoSeq (kpms) model, we took 29 videos used for training and performance assessment for SaLSa since manually labeled data were available. The hyperparameter, κ, was set to 9e4, and training iterations were set to 250 times. The frequency cutoff for behavioral syllables was set to 0.5%. These subthreshold syllables were merged as a single syllable for post hoc analysis. The trained model was used to analyze all these 29 videos to compare the performance of both SaLSa and kpms models with each other. For more direct comparisons between the two models, kpms’ syllables were re-assigned based on the comparison data with the manually labeled data: each original syllable was re-assigned as the most presented syllable of six predefined syllables (i.e., locomotion, turning, rearing, sniffing, grooming, and pause).

As an additional comparison, a multiclass support vector machine was trained to classify 6 syllables with MATLAB’s fitcecoc function, and performance was assessed. Similar to the LSTM model, 19 labeled videos were used for training whereas the remaining 10 videos were used for performance assessment.

Quantification of behavioral syllables

To quantify the features of each behavioral syllable, three metrics were computed. First, speed (cm/s) was calculated in each episode. This was done by measuring the nose travel distance across frames. Second, the turning angle (°/s) was computed for each episode. This was done by calculating the cumulative angle change between the nose and tail base across frames. Finally, “compactness” was defined as the mean distance between two body parts. Smaller compactness results from the squeezed body pose.

Statistical analysis

All statistical analyses were performed in MATLAB (version 2022a). To compare properties across behavioral syllables, the Shapiro–Wilk test was performed with a Bonferroni correction to check normality. Then, a one-way ANOVA with a post hoc Tukey’s HSD test was conducted. To compare the linear relationship between age and the fraction of each behavioral syllable across animal groups, after performing the Shapiro–Wilk test, an analysis of covariance (ANCOVA) with a post hoc Tukey’s HSD test was conducted. A p-value <0.05 was considered significant. Otherwise stated, the error bars represent SEM.

Code availability

MATLAB implementations of SaLSa are publicly available (https://github.com/Sakata-Lab/SaLSa).

Results

SaLSa (a combination of semi-automatic labeling and LSTM-based classification)

The general workflow was as follows (Fig. 1; see also Materials and Methods): (1) video recording, (2) body part tracking, (3) feature extraction, (4) classifier training, and (5) classification. The step of classifier training consists of two major components (Fig. 1). First, labeled data are prepared semi-automatically. This component starts with an unsupervised approach to automatically identify behavioral syllable candidates. This facilitates the subsequent manual labeling step. The second component is the training of an LSTM-based classifier. Using the labeled data, an LSTM classifier is trained to classify sequential data. We call this integrative approach “SaLSa” (a combination of semi-automatic labeling and LSTM-based classification).

To deploy SaLSa, we began by collecting 43 videos where mice explored an open arena for 20 min. As described below, 5xFAD mice and their littermates of both sexes (1.3–9.4 months old) were used. To track multiple body parts (Fig. 2A), a DeepLabCut (DLC) model was trained and all videos were analyzed. The DLC model test error was 1.84 pixels. Because the tail shape did not reflect behavioral syllables, the mid-tail and tail tip were excluded from further analysis in this study. Thus, eight body parts were analyzed.

After converting all body part coordinates into egocentric coordinates as the neck was defined as the body center, we extracted 910 features with respect to spatial and temporal features (see Materials and Methods; Fig. 2B). Figure 2C shows an example sequence of all features. There were notifiable patterns across frames, implying distinct behavioral syllables.

To identify syllable candidates, we adopted unsupervised methods, including principal component analysis (PCA), uniform manifold approximation and projection (UMAP), and spectral clustering (Fig. 2D,E). First, 910-dimensional data were reduced into ∼15 dimensions by using PCA so that >85% variance could be explained (Fig. 2D). UMAP was adopted for further reduction to two dimensions. In this two-dimensional UMAP space, around 40 clusters were separated using the spectral clustering algorithm. Because we aimed to identify clusters, each containing frames related to a certain behavioral syllable with minimum contamination of other syllables, we intentionally set the number of clusters high (between 35 and 50; Fig. 2E).

After removing fragmented, short (<0.25 s) sequences, each cluster was manually labeled as one of the following categories: (1) locomotion, (2) turning, (3) rearing, (4) sniffing, (5) grooming, (6) pause, and (7) miscellaneous (Fig. 2E). Out of 43 videos, 29 videos were manually labeled: 19 were used for classifier training whereas 10 were used for classifier evaluation in this study.

Classification performance

From 19 videos, 38,202 frames were labeled as one of six behavioral syllables (Fig. 3A). While most of the frames were labeled as locomotion, <1% of the frames were labeled as sniffing. Despite this uneven distribution of labeled frames, the evaluation performance of the trained LSTM classifier was 97.9%. A range of parameters (i.e., the gradient threshold, the maximum number of epochs, and the number of hidden units) were explored to confirm similar evaluation performance (Extended Data Fig. 3-1).

Figure 3.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 3.

LSTM training data and performance. A, The proportion of each behavioral syllable for LSTM training. B, Receiver operating characteristic curves for each behavioral syllable based on independently labeled 10 videos. FA, false alarm. C, The area under the curve (AUC) values across behavioral syllables. In Extended Data Figure 3-1, a range of parameters in the LSTM model were systematically assessed. The evaluation performance of a multiclass support vector machine is shown in Extended Data Figure 3-2.

Extended Data Figure 3-1

Systematic comparison of three parameters for evaluation accuracy and training duration. A, B, The evaluation accuracy of LSTM models with variable maximum numbers of epochs and hidden units. The gradient threshold was set at 1 in A and 2 in B. C, D, Training duration across conditions. Download Figure 3-1, TIF file.

Extended Data Figure 3-2

Performance of a multiclass support vector machine. A, Receiver operating characteristic curves for each behavioral syllable based on independently labeled 10 videos. FA, false alarm. B, The area under the curve (AUC) values across behavioral syllables. Download Figure 3-2, TIF file.

We further evaluated the classifier’s performance by processing additional 10 labeled videos (Fig. 3B,C). To evaluate evaluation performance, the receiver operating characteristic (ROC) curve (Fig. 3B) and the area under the curve (AUC; Fig. 3C) were calculated for each video and each behavioral syllable. In many cases, the performance was high (>0.95 AUC). As a comparison, a multiclass support vector machine model was also trained. Although the evaluation performance of the trained model was 98.4%, the model failed to generalize the performance to the additional 10 labeled videos (Extended Data Fig. 3-2). Although the LSTM model’s performance was preserved for those videos, the classification performance for sniffing was not good as other syllables (Fig. 3B,C). Therefore, we excluded sniffing frames for further analysis (Figs. 5 and 6). Overall, the trained LSTM classifier provided reliable outcomes across most behavioral syllables.

Comparisons of SaLSa’s performance with a state-of-the-art model

We compared the performance of SaLSa with that of a state-of-the-art behavioral syllable classification algorithm. Recently, keypoint-MoSeq (kpms) has been introduced and outperforms other models (Weinreb et al., 2023). Therefore, the performance of a kpms model can be a benchmark to evaluate SaLSa. Since we had labeled data from 29 videos, we trained a kpms model with the entire 29 videos. First, we examined how each model classified the labeled data across 6 syllables (Fig. 4A,B). Consistent with the assessment above, the trained LSTM model provided high classification performance except for sniffing (Fig. 4A). In the trained kpms model, 15 syllables with one subthreshold syllable were identified (Fig. 4B), while all subthreshold syllables were merged into one syllable. Although some behavioral syllables (locomotion, turning, rearing) were classified into multiple syllables further depending on the direction of the movement, other syllables (sniffing, grooming and pause) tended to be misclassified together. In particular, a significant fraction of grooming behavior was identified as the subthreshold syllable. We also directly compared the outcomes from two models (Fig. 4C). The general trend was qualitatively similar to Figure 4B. Although locomotion, turning, and rearing were classified into subclasses by the kpms model, other syllables were classified together. To make this trend clearer, syllables from kpms were re-assigned to one of the six predefined syllables based on the comparison with the labeled data (Fig. 4D). Then the outcomes of SaLSa and kpms were compared (Fig. 4E). In this additional analysis, it became clear that kpms misclassified grooming behavior. Thus, although the state-of-the-art model can classify detailed behavioral syllables in a fully automated fashion, SaLSa can reliably classify major behavioral syllables, including grooming.

Figure 4.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 4.

Benchmark testing of SaLSa. A, Confusion matrix between manually labeled data (y-axis) and SaLSa outputs (x-axis). The values indicate what fraction of manually labeled data were classified as each syllable by SaLSa. B, Confusion matrix between manually labeled data and keypoint-MoSeq (kpms) outputs. Syllable 16 is a subthreshold (0.5% cutoff) class. C, Confusion matrix between outputs from SaLSa and kpms. The values indicate what fraction of frames classified by SaLSa were classified by kpms. D, Confusion matrix between manually labeled data and re-assigned kpms syllables. The original kpms syllables were re-assigned to one of six syllables based on comparison to manually labeled data (B). E, Confusion matrix between SaLSa output and the re-assigned kpms syllables.

Quantification of behavioral syllables

Using the trained LSTM classifier, we processed all frames across all 43 videos. To examine whether each behavioral syllable has characteristic features, we quantified three simple metrics: speed (Fig. 5A), turning angle (Fig. 5B), and compactness (Fig. 5C). Speed was defined as the nose speed in each behavioral syllable episode. The turning angle was calculated as the cumulative turning angle of the nose-tail base axis in each episode. Compactness was defined as the average pair-wise distance between body parts. We computed the median value of each metric in each video and compared their averages across behavioral syllables (Fig. 5A–C).

Figure 5.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 5.

Quantification of behavioral syllables. A, The median speed per episode of each video across behavioral syllables. B, The median turning angle per episode of each video across behavioral syllables. C, The median compactness per episode of each video across behavioral syllables. D, The fraction of each behavioral syllable. E, The average episode duration of each video across behavioral syllables. Inset, p-values of post hoc pair-wise comparisons (n = 43, two-way ANOVA with post hoc Tukey’s HSD test). L, locomotion; T, turning; R, rearing; G, grooming; P, pause.

As expected, locomotion was the fastest syllable (F(4,210) = 730, p < 0.0001, one-way ANOVA with post hoc Tukey’s HSD test) whereas grooming and pause were the slowest syllables even compared with rearing (p < 0.0005; Fig. 5A). The metric of turning angle also provided expected outcomes (Fig. 5B): turning exhibited the largest turning angle (F(4,210) = 456, p < 0.01, one-way ANOVA with post hoc Tukey’s HSD test) whereas grooming and pause showed a smaller angle than rearing (p < 0.0001). Grooming and rearing were the most compact syllables (F(4,210) = 75.2, p < 0.0001, one-way ANOVA with post hoc Tukey’s HSD test; Fig. 5C). Because our video monitoring was top-down, rearing typically squeezes their pose in 2-D images. On the other hand, locomotion was the most stretched pose compared with others (p < 0.001).

To assess the general structure of animals’ exploratory behaviors in this particular experimental setting, we computed the fraction and average episode duration of each syllable (Fig. 5D,E). Animals typically spend more time on locomotion and rearing (F(4,210) = 129, p < 0.0001, one-way ANOVA with post hoc Tukey’s HSD test) and less on pause (p < 0.0001; Fig. 5D). However, while the effect of behavioral syllables on the duration was significant (F(4,210) = 66.1, p < 0.0001, one-way ANOVA), the duration of grooming was comparable to pause (p = 0.18, post hoc Tukey’s HSD test; Fig. 5E). As expected, turning was the shortest (p < 0.0001). Overall, the quantities of each behavioral syllable are consistent with our intuition from behavioral syllables.

Age-related and sex-specific changes in behavioral syllables of 5xFAD mice

To apply our approach, we examined how abnormalities in 5xFAD mice’s exploratory behavior emerge as they age (Fig. 6). To this end, we simply compared the fraction of each behavioral syllable as a function of age. The interaction effect between animal groups and age on locomotion was significant (F(3) = 5.7, p < 0.005, ANCOVA): female 5xFAD mice exhibited significantly higher hyperlocomotion compared with female controls (p < 0.05, post hoc Tukey’s HSD test; Fig. 6A). Consistent with this, the interaction effect on pause was also significant (F(3) = 3.8, p < 0.05, ANCOVA; Fig. 6E). On the other hand, although the interaction effect on rearing was not significant (F(3) = 1.85, p = 0.155, ANCOVA), the effect of age was significant (F(1) = 15.4, p < 0.0005), meaning that the fraction of rearing decreased with age regardless animal groups (Fig. 6C). We did not see anysignificant interaction effects on turning and grooming (F(3) = 2.87, p = 0.050 for turning; F(3) = 1.09, p = 0.36 for grooming, ANCOVA; Fig. 6B,D). Although hyperactivity of female 5xFAD mice was well described, our approach could dissect detailed behavioral abnormalities.

Figure 6.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 6.

Age-dependent effects of genotype and sex on behavioral contents in 5xFAD mice. The fraction of each behavioral syllable as a function of age. Behavioral syllables are (A) locomotion, (B) turning, (C) rearing, (D) grooming, and (E) pause. p-values of ANCOVA are shown.

Discussion

In the present study, we developed SaLSa, a combination of semi-automatic labeling and LSTM-based classification. The semi-automatic process facilitates the preparation of labeled data for LSTM training whereas LSTM-based classification provides accurate and generalizable behavioral syllable classification. Applying this approach, we found that hyperlocomotion in female 5xFAD mice emerges between four and eight months old whereas other active behaviors, such as rearing, are not affected by genotype or sex. Given its versatility, SaLSa can classify behavioral syllables without an expensive experimental setup.

Comparisons to other approaches

Over the last decade, a range of approaches has been developed to classify behavioral syllables (Kabra et al., 2013; Pereira et al., 2020; Wiltschko et al., 2020; Dunn et al., 2021; Hsu and Yttri, 2021; Segalin et al., 2021; Jia et al., 2022; Luxem et al., 2022, 2023; Harris et al., 2023; Weinreb et al., 2023). Since these approaches including SaLSa are applied after body parts detection, they can be applied to videos taken in relatively dark environments as long as body parts are detected reliably. SaLSa is the first approach to use LSTM for this purpose, to the best of our knowledge. Deep learning-based approaches have increasingly been adopted for this purpose (Marks et al., 2022; Harris et al., 2023). Although convolutional deep learning models are powerful to classify and segment images, they lack an intrinsic mechanism to hold contextual information. Recurrent neural networks are suitable to process time series data, including behavioral tracking data (Luxem et al., 2022). One advantage of LSTM over conventional recurrent networks is that it can learn long-term dependencies on the data (Hochreiter and Schmidhuber, 1997). Because the brain can deal with several orders of magnitudes of time depending on its computational goals (Issa et al., 2020), adopting LSTMs is an extension of ongoing efforts to characterize natural behaviors comprehensively. Another advantage of LSTM is its generalizability like many other deep learning approaches. Once it has been trained, newly acquired videos can be processed by simply extracting features as we demonstrated.

On the other hand, LSTM requires a large amount of labeled data for training. To mitigate this issue, we adopted semi-automatic labeling. By extracting features unbiasedly, behavioral syllable candidates were automatically identified. Compared with an approach that creates snippets randomly (such as MuViLab), our approach reduces manual curation time. In the present study, a 20-min video took several minutes. This allows labeling a number of videos easily to prepare training data. Thus, our approach is unique in the sense that it combines both unsupervised methods and a supervised LSTM model to classify behavioral syllables.

In a broader context, SaLSa takes a top-down approach where predefined behavioral syllables are identified semi-automatically and classified by a deep learning model. In the future, it may be interesting to integrate a fully automated bottom-up model with an LSTM classifier.

Behavioral abnormalities in 5xFAD

As an application of SaLSa, we investigated age-specific and sex-specific changes in the exploratory behaviors of 5xFAD mice. It has been well documented that female 5xFAD mice exhibit hyperactivity (Oblak et al., 2021). Our approach demonstrates that hyperactivity consists of hyperlocomotion whereas other active behaviors, such as turning and rearing, are similar across animal groups. In particular, rearing decreases with age regardless of genotype and sex, which has not been documented in this mouse model before.

The underlying mechanisms of sex-specific hyperlocomotion in 5xFAD mice are unknown. In this mouse model, amyloid plaques can be seen in the hippocampus (primarily the subiculum) as early as two months old and the pathology appears across brain regions as they age (Oakley et al., 2006; Oblak et al., 2021). Sex differences in amyloid pathology can also be apparent in the cortical subplate even at three months old and seen across multiple brain regions at four months old (Oblak et al., 2021). Consistent with this sex-specific pathologic progression, a transcriptomic analysis also revealed sex differences in a wide range of molecular pathways (Oblak et al., 2021). In the future, it would be crucial to determine how these sex-specific pathologic features link to dysfunctions in neural circuit activity, which lead to age-related, sex-specific hyperlocomotion. Despite this challenge, given the simplicity of our experimental setup, similar approaches can be applied to other animal models.

Limitations of the study

Our study has at least four limitations. First, our preset behavioral syllables were limited. We are also aware that some animals jumped or exhibited complex behaviors, such as mixed turning and rearing behaviors. This will require more labeled data or additional post hoc analysis. For example, based on the predicted score from a classifier, such complex behavioral syllables may be defined. Additionally, sniffing behaviors could not be identified and classified accurately. This could be partly because of the limited resolution of our camera and the accuracy of body-part tracking. Increasing the resolution and adding extra body parts to track may improve this aspect.

Second, the model must be retrained when videos are taken at different frame rates or from a different experimental setup. Because our temporal features assume a certain frame rate (25 fps), a change in frame rate leads to a change in the number of features. To deal with this issue, re-sampling can be considered.

Third, while the present study replicated the results (i.e., sex-specific hyperactivity) of the recent comprehensive analysis in 5xFAD mice with the detailed classification of behavioral syllables, the estrous cycle might be a potential confounding factor although a recent study could not find this to be the case (Levy et al., 2023). Increasing the sampling size by monitoring the estrous cycle will address this issue in the future.

Finally, as widely appreciated, deep learning models are challenging to interpret. In this study, it is not straightforward to determine what spatiotemporal features contribute to classification. Several approaches may be considered, such as local interpretable model-agnostic explanations (LIME) and visualization.

In conclusion, behavioral syllable classification is important. In the present study, we developed a combinatory approach to semi-automatic labeling and LSTM-based classification, called SaLSa. Our approach can assist manual curation to prepare labeled data semi-automatically. The LSTM classifier reliably classified behavioral syllables in new datasets, which were not used for training. It also provides comparable performance with the state-of-the-art model. Thus, our approach adds a versatile tool for behavioral syllable classification. Combining other advanced technologies, SaLSa facilitates the effort to better understand the neural basis of complex behavior.

Acknowledgments

Acknowledgments: I thank Abigail Hatcher Davies for her technical assistance.

Footnotes

  • The author declares no competing financial interests.

  • This work was supported by the Medical Research Council Grant MR/V033964/1 and Horizon2020-ICT (DEEPER, 101016787; to S.S.).

This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license, which permits unrestricted use, distribution and reproduction in any medium provided that the original work is properly attributed.

References

  1. ↵
    Bohnslav JP, Wimalasena NK, Clausing KJ, Dai YY, Yarmolinsky DA, Cruz T, Kashlan AD, Chiappe ME, Orefice LL, Woolf CJ, Harvey CD (2021) DeepEthogram, a machine learning pipeline for supervised behavior classification from raw pixels. Elife 10:e63377. https://doi.org/10.7554/eLife.63377
    OpenUrlCrossRef
  2. ↵
    Calinski T, Harabasz J (1974) A dendrite method for cluster analysis. Comm Stats Simul Comp 3:1–27. https://doi.org/10.1080/03610917408548446
    OpenUrl
  3. ↵
    Dunn TW, Marshall JD, Severson KS, Aldarondo DE, Hildebrand DGC, Chettih SN, Wang WL, Gellis AJ, Carlson DE, Aronov D, Freiwald WA, Wang F, Ölveczky BP (2021) Geometric deep learning enables 3D kinematic profiling across species and environments. Nat Methods 18:564–573. https://doi.org/10.1038/s41592-021-01106-6 pmid:33875887
    OpenUrlPubMed
  4. ↵
    Gabriel CJ, Zeidler Z, Jin B, Guo C, Goodpaster CM, Kashay AQ, Wu A, Delaney M, Cheung J, DiFazio LE, Sharpe MJ, Aharoni D, Wilke SA, DeNardo LA (2022) BehaviorDEPOT is a simple, flexible tool for automated behavioral detection based on markerless pose tracking. Elife 11:e74314. https://doi.org/10.7554/eLife.74314
    OpenUrl
  5. ↵
    Harris C, Finn KR, Kieseler ML, Maechler MR, Tse PU (2023) DeepAction: a MATLAB toolbox for automated classification of animal behavior in video. Sci Rep 13:2688. https://doi.org/10.1038/s41598-023-29574-0 pmid:36792716
    OpenUrlPubMed
  6. ↵
    Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735 pmid:9377276
    OpenUrlCrossRefPubMed
  7. ↵
    Hsu AI, Yttri EA (2021) B-SOiD, an open-source unsupervised algorithm for identification and fast prediction of behaviors. Nat Commun 12:5188. https://doi.org/10.1038/s41467-021-25420-x pmid:34465784
    OpenUrlPubMed
  8. ↵
    Issa JB, Tocker G, Hasselmo ME, Heys JG, Dombeck DA (2020) Navigating through time: a spatial navigation perspective on how the brain may encode time. Annu Rev Neurosci 43:73–93. https://doi.org/10.1146/annurev-neuro-101419-011117 pmid:31961765
    OpenUrlCrossRefPubMed
  9. ↵
    Jia Y, Li S, Guo X, Lei B, Hu J, Xu XH, Zhang W (2022) Selfee, self-supervised features extraction of animal behaviors. Elife 11:e76218. https://doi.org/10.7554/eLife.76218
    OpenUrl
  10. ↵
    Kabra M, Robie AA, Rivera-Alba M, Branson S, Branson K (2013) JAABA: interactive machine learning for automatic annotation of animal behavior. Nat Methods 10:64–67. https://doi.org/10.1038/nmeth.2281 pmid:23202433
    OpenUrlCrossRefPubMed
  11. ↵
    Lauer J, Zhou M, Ye S, Menegas W, Schneider S, Nath T, Rahman MM, Di Santo V, Soberanes D, Feng G, Murthy VN, Lauder G, Dulac C, Mathis MW, Mathis A (2022) Multi-animal pose estimation, identification and tracking with DeepLabCut. Nat Methods 19:496–504. https://doi.org/10.1038/s41592-022-01443-0 pmid:35414125
    OpenUrlCrossRefPubMed
  12. ↵
    Levy DR, Hunter N, Lin S, Robinson EM, Gillis W, Conlin EB, Anyoha R, Shansky RM, Datta SR (2023) Mouse spontaneous behavior reflects individual variation rather than estrous state. Curr Biol 33:1358–1364.e4. https://doi.org/10.1016/j.cub.2023.02.035 pmid:36889318
    OpenUrlCrossRefPubMed
  13. ↵
    Luxem K, Mocellin P, Fuhrmann F, Kürsch J, Miller SR, Palop JJ, Remy S, Bauer P (2022) Identifying behavioral structure from deep variational embeddings of animal motion. Commun Biol 5:1267. https://doi.org/10.1038/s42003-022-04080-7 pmid:36400882
    OpenUrlPubMed
  14. ↵
    Luxem K, Sun JJ, Bradley SP, Krishnan K, Yttri E, Zimmermann J, Pereira TD, Laubach M (2023) Open-source tools for behavioral video analysis: setup, methods, and best practices. Elife 12:e79305. https://doi.org/10.7554/eLife.79305
    OpenUrl
  15. ↵
    Marks M, Qiuhan J, Sturman O, von Ziegler L, Kollmorgen S, von der Behrens W, Mante V, Bohacek J, Yanik MF (2022) Deep-learning based identification, tracking, pose estimation, and behavior classification of interacting primates and mice in complex environments. Nat Mach Intell 4:331–340. https://doi.org/10.1038/s42256-022-00477-5 pmid:35465076
    OpenUrlPubMed
  16. ↵
    Mathis A, Mamidanna P, Cury KM, Abe T, Murthy VN, Mathis MW, Bethge M (2018) DeepLabCut: markerless pose estimation of user-defined body parts with deep learning. Nat Neurosci 21:1281–1289. https://doi.org/10.1038/s41593-018-0209-y pmid:30127430
    OpenUrlCrossRefPubMed
  17. ↵
    Mathis A, Schneider S, Lauer J, Mathis MW (2020) A primer on motion capture with deep learning: principles, pitfalls, and perspectives. Neuron 108:44–65. https://doi.org/10.1016/j.neuron.2020.09.017 pmid:33058765
    OpenUrlCrossRefPubMed
  18. ↵
    Musall S, Kaufman MT, Juavinett AL, Gluf S, Churchland AK (2019) Single-trial neural dynamics are dominated by richly varied movements. Nat Neurosci 22:1677–1686. https://doi.org/10.1038/s41593-019-0502-4 pmid:31551604
    OpenUrlCrossRefPubMed
  19. ↵
    Oakley H, Cole SL, Logan S, Maus E, Shao P, Craft J, Guillozet-Bongaarts A, Ohno M, Disterhoft J, Van Eldik L, Berry R, Vassar R (2006) Intraneuronal beta-amyloid aggregates, neurodegeneration, and neuron loss in transgenic mice with five familial Alzheimer’s disease mutations: potential factors in amyloid plaque formation. J Neurosci 26:10129–10140. https://doi.org/10.1523/JNEUROSCI.1202-06.2006 pmid:17021169
    OpenUrlAbstract/FREE Full Text
  20. ↵
    Oblak AL, et al. (2021) Comprehensive evaluation of the 5XFAD mouse model for preclinical testing applications: a MODEL-AD study. Front Aging Neurosci 13:713726. https://doi.org/10.3389/fnagi.2021.713726 pmid:34366832
    OpenUrlCrossRefPubMed
  21. ↵
    Pereira TD, Aldarondo DE, Willmore L, Kislin M, Wang SS, Murthy M, Shaevitz JW (2019) Fast animal pose estimation using deep neural networks. Nat Methods 16:117–125. https://doi.org/10.1038/s41592-018-0234-5 pmid:30573820
    OpenUrlCrossRefPubMed
  22. ↵
    Pereira TD, Shaevitz JW, Murthy M (2020) Quantifying behavior to understand the brain. Nat Neurosci 23:1537–1549. https://doi.org/10.1038/s41593-020-00734-z pmid:33169033
    OpenUrlCrossRefPubMed
  23. ↵
    Schneider A, Zimmermann C, Alyahyay M, Steenbergen F, Brox T, Diester I (2022) 3D pose estimation enables virtual head fixation in freely moving rats. Neuron 110:2080–2093.e10. https://doi.org/10.1016/j.neuron.2022.04.019 pmid:35609615
    OpenUrlPubMed
  24. ↵
    Segalin C, Williams J, Karigo T, Hui M, Zelikowsky M, Sun JJ, Perona P, Anderson DJ, Kennedy A (2021) The Mouse Action Recognition System (MARS) software pipeline for automated analysis of social behaviors in mice. Elife 10:e63720. https://doi.org/10.7554/eLife.63720
    OpenUrl
  25. ↵
    Van Houdt G, Mosquera C, Nápoles G (2020) A review on the long short-term memory model. Artif Intell Rev 53:5929–5955. https://doi.org/10.1007/s10462-020-09838-1
    OpenUrl
  26. ↵
    Vinyals O, et al. (2019) Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575:350–354. https://doi.org/10.1038/s41586-019-1724-z pmid:31666705
    OpenUrlCrossRefPubMed
  27. ↵
    Weinreb C, Osman MAM, Zhang L, Lin S, Pearl J, Annapragada S, Conlin E, Gillis WF, Jay M, Shaokai Y, Mathis A, Mathis MW, Pereira T, Linderman SW, Datta SR (2023) Keypoint-MoSeq: parsing behavior by linking point tracking to pose dynamics. bioRxiv 532307. https://doi.org/10.1101/2023.03.16.532307.
  28. ↵
    Wiltschko AB, Tsukahara T, Zeine A, Anyoha R, Gillis WF, Markowitz JE, Peterson RE, Katon J, Johnson MJ, Datta SR (2020) Revealing the structure of pharmacobehavioral space through motion sequencing. Nat Neurosci 23:1433–1443. https://doi.org/10.1038/s41593-020-00706-3 pmid:32958923
    OpenUrlCrossRefPubMed

Synthesis

Reviewing Editor: Mark Laubach, American University

Decisions are customarily a result of the Reviewing Editor and the peer reviewers coming together and discussing their recommendations until a consensus is reached. When revisions are invited, a fact-based synthesis statement explaining their decision and outlining what is needed to prepare a revision will be listed below. The following reviewer(s) agreed to reveal their identity: Soma Tomihara.

Your paper was reviewed by two experts in the topic of your research. Their comments are given in full below. Please revise your manuscript to address all points that were raised. PLEASE PAY SPECIFIC ATTENTION TO COMMENTS FROM THE FIRST REVIEWER ABOUT ISSUES EXPERIENCED IN USING YOUR SOFTWARE. In addition, please prepare a visual abstract, which is invaluable for promoting papers on the eNeuro website.

Reviewer 1

Advances the Field: The authors submit a novel set of MATLAB scripts designed to leverage LTSM networks to classify animal behavior from pose estimation data.

This approach is reasonable and the rationale for using LSTM for this purpose is sound, although the report lacks a quantitative comparison between approaches.

The authors demonstrate the efficacy of their approach using the popular 5xFAD model of Alzheimer’s Disease. Their analysis package was able to replicate expected observations of 5xFAD mice in Open Field tests.

Of particular note are the UMAP-derived ‘snippets’ of behavior that are extracted in a manner analagous to DeepLabCut’s frame extraction; optimally distinctive snippets on behaviorally-relevant dimensions are shown to the user for classification, increasing the value of labeling over randomly-labeled sections.

A key limitation compared to similar unsupervised or semi-supervised methods is the limited flexibility of SaLSa in the classes of behavior that can be analyzed.

The described workflow seems to be an effective option for any laboratory using MATLAB that seeks to classify the animal behaviors using pose estimation data.

Statistics: The statistics used to quantify syllable metrics and assess genotype/sex/age effects are appropriate. Data are appropriately presented in figures. Figure captions are clear.

It isn’t stated whether the data conform to the assumptions of variance-based hypothesis testing (e.g. normality). Also, if any correction was made for multiple comparisons, it was not described. Given the secondary nature of these statistics to the overall interpretation of the work and the seemingly high power (n=43), I don’t believe these details would change the outcome either way so I’d leave it to the authors and editor to decide whether these are worth including.

There is a qualitative benchmarking between the current methodology and keypoint-moseq. While it is clear that there’s a qualitative similarity, this could be quantified in a number of useful ways.

Extended Data: There is no extended data in the manuscript. However, potential users may find it valuable if an example video/.csv/completed .mlx were available on the authors’ github.

Software Comments: This reviewer attempted to use the SaLSa package with the materials available in my current circumstances that include using the version of MATLAB on our HPC. In theory, there should be no difference between this environment a local install, but you never know. Further, our current licensure includes ONLY MATLAB 2023+, which may or may not change functionality over the stated stable build on 2022a.

Under these circumstances, I was unable to get the package working all the way through, although I was able to interact with each module. Whether this was a matter of user error, limited sample, inappropriate data, my computing environment, or an issue with the module is unclear.

With respect to the scripts, the methods employed make sense.

More specifically:

Usability:

Throughout, data validation and error handling would ensure better data fidelity especially in the hands of new/inexperienced users.

Throughout, although the notebook contains useful headers, in-line comments and more human-readable variable naming would help users new to the package navigate any issues or errors.

Modeling:

Unless MATLAB does something under the hood that I’m not aware of (a distinct possibility):

In step 2, it seems as though training progresses for a fixed number of (60 by default). This means you’ll get the last model but not necessarily the best model. It may be preferable to hold out a validation fraction and cease training once a certain # of epochs have passed without improvement, holding the best model in memory.

I don’t think there is a dataloader, so large files could fill RAM and cause a crash. A dataloader may also allow for larger batch processing of videos.

Other comments: For setting the dimensions of the box in pixel-space, using a images.roi.polyline() or images.roi.rectangle() over an extracted frame from the video (imported via imread) is a simple and easy method to allow users using .mlx notebooks to input these dimensions. This may be a quick and easy improvement going forward!

This is far from necessary, but you may consider providing some comparative data of the performance of the LSTM with less appropriate network structures to demonstrate the (presumably) increased efficacy of the more robust LSTM structure.

Reviewer 2

Advances the Field: Through the elegant use of machine learning methods, the authors have established SaLSa, the analysis suite that automatically detects several behavioral syllables. Also, this study shows a solution to mitigate the amount of labeled data for training, which reduces manual curation time. Using this suite, the authors demonstrated sexually dimorphic hyperactivation in AD model mice, which may be important for pathological studies about AD.

Statistics: There are error bars on the bar graphs on the left side of Figure 6, although the bar graphs indicate the average value. I considered that the average values were calculated from each value of fraction shown as triangles/crosses in right plots, so each average value is a “single” value, but not a multiple one, which cannot be shown by error bars. If the bar graphs indicate multiple average values, please demonstrate how multiple average values were calculated.

Comments to the Authors: SaLSa is a useful tool for the automated classification of behavioral syllables, which enables us to be free from laborious works of behavioral analysis. The data indicate that SaLSa successfully classified the behavioral syllables and found sexually dimorphic hyperactivation in AD mice. However, I have some concerns about the present work and note it below as major comments. Also, there are some minor comments that the authors need to take note of:

Major comments

- Although the behaviors of female mice are drastically changed with their estrous cycle, the aspect is not considered in the present study. It is necessary to either use data from females that show the same estrous property or demonstrate that the behavioral activity does not vary with their estrous cycle (Line 73).

- Since mice is nocturnal, their behaviors are often analyzed in the dark period. However, SaLSa was trained by the data obtained from the behavior in the light period. It should be mentioned whether SaLSa can be applied to videos taken in dark environments (Line 75).

- The authors found that 5xFAD female mice exhibit hyperactivity compared to 5xFAD male mice and demonstrated this hyperactivity consists of hyper-locomotion, but the discussion about this result is not sufficient. The discussion part should be added with considering the sex differences of pathological features of AD (Line 327).

Minor comments

- Were the mice used in this study maintained in a 12L/12D photoperiod environment? It may be better to mention the testing period by Zeitgeber time than “the first quarter of the light period” (Line 75).

- In the Materials and Methods session, it is written that “9” videos were used for the LSTM classifier’s performance (Line 156), but “10” labeled videos in the Results session (Lines 225 and 453). Which is correct?

- Different words are used between the manuscript and Figure 1 (e.g. “video monitoring” in the manuscript/ “video recording” in Figure 1). It is better to use the same words and phrases to understand the workflow (Line 192).

- Is “DLC” the abbreviation of DeepLabCut? Please note it (Line 201).

- Although tail coordinates did not use for converting egocentric coordinates, the video-captures in Figure 2E have red/orange dots, which indicate the tail coordinates (Line 205 and Figure 2E).

- What is “sub-classes” in Line 242? Does it mean that a single behavioral syllable (manually labeled) is dividedly classified into multiple syllables? It may be better to use different words.

- The legend of Figure 1 does not mention to video-recording phase (Line 431).

- Figure 2C indicates the feature values of ten feature categories separated by dotted lines, but we cannot recognize which is each category (Line 441).

- The word “FA” is in Figure 3, but there is no explanation in the legend of Figure 3 (Line 451).

- The legend of Figure 6 does not have enough information to understand. It should be mentioned that “M”, “F”, “+”, and “-” indicate male, female, 5xFAD mice, and WT littermates, respectively. It also needs to mention that the colors in the left bar graph correspond to the colors in the right plot (Line 476 and Figure 6).

Back to top

In this issue

eneuro: 10 (12)
eNeuro
Vol. 10, Issue 12
December 2023
  • Table of Contents
  • Index by author
  • Masthead (PDF)
Email

Thank you for sharing this eNeuro article.

NOTE: We request your email address only to inform the recipient that it was you who recommended this article, and that it is not junk mail. We do not retain these email addresses.

Enter multiple addresses on separate lines or separate them with commas.
SaLSa: A Combinatory Approach of Semi-Automatic Labeling and Long Short-Term Memory to Classify Behavioral Syllables
(Your Name) has forwarded a page to you from eNeuro
(Your Name) thought you would be interested in this article in eNeuro.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Print
View Full Page PDF
Citation Tools
SaLSa: A Combinatory Approach of Semi-Automatic Labeling and Long Short-Term Memory to Classify Behavioral Syllables
Shuzo Sakata
eNeuro 21 November 2023, 10 (12) ENEURO.0201-23.2023; DOI: 10.1523/ENEURO.0201-23.2023

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Respond to this article
Share
SaLSa: A Combinatory Approach of Semi-Automatic Labeling and Long Short-Term Memory to Classify Behavioral Syllables
Shuzo Sakata
eNeuro 21 November 2023, 10 (12) ENEURO.0201-23.2023; DOI: 10.1523/ENEURO.0201-23.2023
Twitter logo Facebook logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Jump to section

  • Article
    • Abstract
    • Significance Statement
    • Introduction
    • Materials and Methods
    • Results
    • Discussion
    • Acknowledgments
    • Footnotes
    • References
    • Synthesis
  • Figures & Data
  • Info & Metrics
  • eLetters
  • PDF

Keywords

  • aging
  • Alzheimer’s disease
  • classification
  • computational ethology
  • deep learning
  • long short-term memory

Responses to this article

Respond to this article

Jump to comment:

No eLetters have been published for this article.

Related Articles

Cited By...

More in this TOC Section

Research Article: Methods/New Tools

  • Reliable Single-trial Detection of Saccade-related Lambda Responses with Independent Component Analysis
  • Establishment of an Infrared-Camera-Based Home-Cage Tracking System Goblotrop
  • Automated Classification of Sleep–Wake States and Seizures in Mice
Show more Research Article: Methods/New Tools

Novel Tools and Methods

  • Reliable Single-trial Detection of Saccade-related Lambda Responses with Independent Component Analysis
  • Establishment of an Infrared-Camera-Based Home-Cage Tracking System Goblotrop
  • Automated Classification of Sleep–Wake States and Seizures in Mice
Show more Novel Tools and Methods

Subjects

  • Novel Tools and Methods
  • Home
  • Alerts
  • Follow SFN on BlueSky
  • Visit Society for Neuroscience on Facebook
  • Follow Society for Neuroscience on Twitter
  • Follow Society for Neuroscience on LinkedIn
  • Visit Society for Neuroscience on Youtube
  • Follow our RSS feeds

Content

  • Early Release
  • Current Issue
  • Latest Articles
  • Issue Archive
  • Blog
  • Browse by Topic

Information

  • For Authors
  • For the Media

About

  • About the Journal
  • Editorial Board
  • Privacy Notice
  • Contact
  • Feedback
(eNeuro logo)
(SfN logo)

Copyright © 2025 by the Society for Neuroscience.
eNeuro eISSN: 2373-2822

The ideas and opinions expressed in eNeuro do not necessarily reflect those of SfN or the eNeuro Editorial Board. Publication of an advertisement or other product mention in eNeuro should not be construed as an endorsement of the manufacturer’s claims. SfN does not assume any responsibility for any injury and/or damage to persons or property arising from or related to any use of any material contained in eNeuro.