Abstract
Immediate early genes (IEGs) are useful markers of neuronal activation and essential components of neuronal response. While studies of gastropods have provided many insights into the basic learning and memory mechanisms, the genome-wide assessment of IEGs has been mainly restricted to vertebrates. In this study, we identified IEGs in the terrestrial snail Helix lucorum. In the absence of the genome, we conducted de novo transcriptome assembly using reads with short and intermediate lengths cumulatively covering more than 98 billion nucleotides. Based on this assembly, we identified 37 proteins corresponding to contigs differentially expressed (DE) in either the parietal ganglia (PaG) or two giant interneurons located within the PaG of the snail in response to the neuronal stimulation. These proteins included homologues of well-known mammalian IEGs, such as c-jun/jund, C/EBP, c-fos/fosl2, and Egr1, as well as homologues of genes not yet implicated in the neuronal response.
Significance Statement
Gastropods, which include snails and slugs, are widely used in studies of basic neuronal activity mechanisms. The first step of the transcriptional response to neuronal stimulation requires the activation of immediate early genes (IEGs). The identification of IEGs is important for the understanding of neuronal response mechanisms and for the visualization of activated neurons. However, genome-wide studies of IEGs have thus far been mainly restricted to vertebrates. Furthermore, a study of activity-regulated genes (ARGs), a gene group that includes IEGs, conducted in fruit flies did not reveal a clear overlap with vertebrate IEGs. In this study we present a transcriptome-wide study of snail IEGs, which reveals multiple homologues of well-known mammalian IEGs, as well as a number of novel IEG candidates.
Introduction
Immediate early genes (IEGs) are the composite group of genes rapidly and transiently upregulated in neuronal cells in response to stimulation. An induction of IEGs does not require de novo protein synthesis and commonly occurs within an hour after the stimulation event (Fowler et al., 2011). Stimulation experiments conducted in the presence of protein synthesis inhibitors revealed hundreds of mammalian IEGs, with most works conducted in mouse brain preparations (Thompson et al., 2010; Bojovic et al., 2015; Gerstner et al., 2016) and neuronal cultures (Kim et al., 2010; Saha et al., 2011; Spiegel et al., 2014). Although different types of stimulations induce different sets of IEGs, many genes were repeatedly found in most experiments (Dahmen et al., 1997; Bepari et al., 2012; Lacar et al., 2016). Many commonly induced IEGs, including the first identified one, c-fos, as well as c-jun, C/EBP, and Egr1, function as transcription factors triggering the expression of the secondary response genes (SRGs; West and Greenberg, 2011). As well as transcription factors, neuronal IEGs encode other functional proteins, such as cytoskeletal regulators (Arc), growth factors (β-activin), metabolic enzymes (Dusp1), and signal transduction proteins (Homer; Lanahan and Worley, 1998).
While mammalian and vertebrate IEGs are relatively well characterized, less is known about IEGs in invertebrate species. Whereas several studies have been performed exploring genes involved in the long-term memory in the Caenorhabditis elegans (Lakhina et al., 2015; Freytag et al., 2017) and Aplysia kurodai (Lee et al., 2008), the genome-wide systematic investigation focusing on IEGs in invertebrates to date has been restricted to fruit flies, with two studies characterizing neuronal activity-regulated genes (ARGs). Similar to IEGs, ARGs are defined as genes induced in neurons within approximately 1 h after the stimulation, but without de novo protein synthesis inhibition. The first study using microarrays to assess the transcriptome alterations in heads of mutant flies after the seizure induction identified 122 genes showing rapid expression changes. These genes included fly homologs of known mammalian IEGs, such as c-fos, c-jun, C/EBP, and Egr (Guan et al., 2005). The second study used transcriptome sequencing (RNA-Seq) to look for ARGs in the fly brain and various neuron types after the activation by three stimulation protocols (Chen et al., 2016). Although the study identified known insect IEGs of hr38 and sr induced in the fly brain by the three stimulation protocols, there was no detectable activation of many mammalian IEGs’ homologues, including c-fos and c-jun. Instead, largely independent sets of genes, ∼100 each, were induced by each stimulation procedure.
While no genome-wide screens for neuronal IEGs were conducted in other invertebrate species, previous studies identified a number of individual genes. Specifically, the analysis of IEGs in a sea slug Aplysia californica, an organism widely used in the studies of memory mechanisms, identified homologues of mammalian IEGs c-jun, C/EBP, CREB1, and Egr (Alberini et al., 1994; Sung et al., 2006; Bonnick et al., 2012; Cyriac et al., 2013). In addition, studies conducted in the terrestrial slug Limax valentianus identified homologues of mammalian IEGs C/EBP and KLF (Fukunaga et al., 2006). Similarly, homologues of mammalian IEGs C/EBP, CREB1, and CREB2 were identified in the extensively studied pond snail Lymnaea stagnalis (Sadamoto et al., 2004, 2010; Hatakeyama et al., 2006).
Here, we conducted a broad search for neuronal IEGs in another model invertebrate species, the terrestrial snail Helix lucorum. The nervous system of this organism containing five pairs of neuronal ganglia and one unpaired visceral ganglion has been used in electrophysiological studies for >35 years, yielding insights into basic learning and memory mechanisms (Balaban, 1980, 2002; Balaban et al., 2015). However, the absence of the genome sequence has limited molecular studies of the neuronal response mechanisms. To overcome this limitation, we conducted the de novo assembly of the snail neuronal transcriptome using >943 million reads cumulatively covering >98 billion nucleotides. The following analysis yielded 37 putative snail IEGs, including homologues of well-characterized mammalian ones: c-jun/jund, C/EBP, c-fos/fosl2, Egr1, Ier5l, Socs2, and Dusp10.
Materials and Methods
Sample preparations and RNA-Seq
We conducted experiments using adult H. lucorum taurica L. specimens weighing 30–35 g. The snails were kept in a wet environment and fed their usual diet of pieces of lettuce. The experimental procedures were in compliance with the Guide for the Care and Use of Laboratory Animals published by the National Institutes of Health and the protocol was approved by the Ethical Committee of the Institute of Higher Nervous Activity and Neurophysiology RAS. Before the experiment, the snails were kept in the active state for at least two weeks. Details of the preparation and identification of neurons are given elsewhere (Malyshev and Balaban, 2002). Briefly, animals were cooled to 4°C and injected with isotonic MgCl2 before the CNS isolation to minimize pain. The central ganglia complex was surgically isolated from anesthetized snails, pinned to a silicone-elastomer (Sylgard)-coated dish, and kept in high-Ca2+, high-Mg2+ Ringer saline (80 mМ NaCl, 4 mМ KCl, 28 mМ СаСl2, 25 mМ MgCl2, and 10 mМ Trisma; pH 7.6) to suppress electrical activity of the nervous system. Such treatment blocks electrical activity in the CNS and neuromuscular connections (Balaban and Chase, 1989), thereby minimizing the dissection effect on IEGs’ expression in our experiments. The central ganglia complex was stripped of connective tissue sheath using a fine forceps and scissors, ensuring the integrity of the thin layer adherent to the neurons, kept at 4°C for 24 (±1) h to further minimize any possible effect of dissection on IEGs’ expression, and then kept at room temperature for 1 h. Part of CNS which was used as a self-control was cut out, transferred to the dry-ice cooled plastic tube, and frozen at –80°C. The remaining part was washed with 50 ml of normal Ringer saline (80 mМ NaCl, 4 mМ KCl, 8 mМ СаСl2, 5 mМ MgCl2, and 10 mМ Trisma; pH 7.6) containing 20 µM anisomycin, and kept in this solution for 10 min, then was washed with 50 ml of normal Ringer saline containing 20 µM anisomycin, 100 µM caffeine and 5 µM 5-HT, and kept in this solution for 20–25 min. Part of activated CNS symmetrical to the control part was cut out, transferred to the dry-ice cooled plastic tube, and frozen at –80°C.
In the first experiment, we isolated the entire parieto-visceral complex. Self-control sets included three left parietal (+adhered visceral) ganglia and three right parietal ganglia (PaG). Activated CNS sets included three right PaG and three left parietal (+adhered visceral) ganglia, respectively. In the second experiment, a small medial part of the PaG mainly containing bodies of the two giant premotor interneurons Pa2 and Pa3 were surgically dissected. Self-control sets included similarly dissected parts of symmetrical ganglia with two left parietal Pa2+Pa3 neurons and four right Pa2+Pa3 neurons. Activated CNS sets included two right parietal Pa2+Pa3 neurons and four left parietal Pa2+Pa3 neurons, respectively (Table 1).
For RNA-Seq, we prepared cellular RNA from samples using the guanidine thiocyanate method (Chomczynski and Sacchi, 1987). A total of 24 RNA samples were analyzed using Agilent 2100 Bioanalyzer to confirm the RNA isolation purity and absence of RNA degradation. The peak of 28S rRNA is invisible in some species of snails because their 28S rRNA consists of two separate pieces held together by ribosome proteins, and after purification each half of 28S rRNA has the same length as 18S rRNA, so 28S peak merges with 18S peak. We therefore checked only the 18S peak integrity to estimate the total RNA quality.
A total of 500 ng RNA of each sample was depleted with rRNA Removal Mix (Ribo-Zero Human/Mouse/Rat) kit. cDNA preparations from RNA samples were performed using TruSeq Stranded Total RNA Sample Preparation kit (Illumina) following the supplier’s instruction. Briefly, RNAs were fragmented to 120–200 bp with a median size of 150 bp and reverse transcribed using random hexamers and SuperScript II Reverse Transcriptase. Single stranded cDNAs were converted to double stranded cDNAs. End repair protocol and subsequent adenine nucleoside addition to 5’-end of DNA were made for ligation of barcoded adapters. The quality of each prepared cDNA library was evaluated using Qubit 2.0 Fluorometer (with Qubit dsDNA HS Assay kit) and Agilent 2100 Bioanalyzer (Agilent High Sensitivity DNA kit). The amount of short cDNA fragments in our samples with length of 25–160 bp did not exceed 10%. Sequencing was performed using the HiSeq Illumina platform (Table 2). One load contained 10–12 pooled libraries tagged with different barcodes.
Transcriptome assembly
The transcriptome was assembled from the dataset mentioned above, as well as two additional RNA-Seq datasets measured in H. lucorum (Table 3), by SOAPdenovo-Trans (version 1.04; Xie et al., 2014) using the recommended parameters, after the adapter sequence removal by cutadapt (version 1.11; Martin, 2011).
Assessment of transcriptome assembly quality
The quality of the overall snail neuronal transcriptome assembly was assessed by Transrate (version 1.0.3; Smith-Unna et al., 2016), including N30/50/70 (the contig size at which 30/50/70% of bases are contained in contigs with greater sizes), GC content (percentage of nitrogenous bases), ORF percentage, and reciprocal best BLAST matches between the assembly and proteins of A. californica (California sea slug; Knudsen et al., 2006).
Quantification of contig expression levels
Raw reads from the 24 snail samples were mapped to the transcriptome assembly using bowtie (version 1.0.0) with the specifically chosen parameters by RSEM (version 1.2.18; Li and Dewey, 2011). We took advantage of the effective handling of ambiguously-mapping reads and the absence of reference genome implemented in RSEM to locate the expression abundance (raw count) at the contig level. In each experiment, only contigs with total counts across all 12 samples >10 were determined as expressed and used in the following analyses. Considering the intrinsically comparable character among samples, we normalized the abundance data using trimmed mean of M-values normalization method (TMM) provided by the Bioconductor package “edgeR” in R (Robinson et al., 2010). Moreover, for proper comparisons among genes, reads per kilobase per million mapped reads (RPKM) was subsequently obtained for each contig in each sample in consideration of the contig’s effective length.
Global pattern exploration
We performed multidimensional scaling (MDS) analyses to explore the global patterns of the snail samples in both experiments on the basis of sample dissimilarities defined as one minus Spearman’s rank correlation coefficient between pairwise samples based on expressed contigs using the “cmdscale” function in R.
Differential expression analysis
Differentially expressed (DE) contigs between activated and control samples in each experiment were identified using the Bioconductor package edgeR in R (Robinson et al., 2010). DE contigs were determined by the criteria of false discovery rate (FDR) <0.05 and fold change > 2. The significance of overlap of DE contigs between the two experiments was assessed by a hypergeometric test using the “phyper” function in R, with all expressed contigs in the first or second experiment as the statistical background.
Contig annotation
We resolved the homologous proteins of expressed contigs by the sequence search using BLASTX (version 2.2.24). Proteomes of five species were used as the search database: A. californica (Knudsen et al., 2006), Biomphalaria glabrata (VectorBase, version 1.2), C. elegans (Ensembl, release 77), Drosophila melanogaster (Ensembl, release 77), and Takifugu rubripes (Ensembl, release 77). Alignments with e-values below 1e-5 were selected as valid. If one contig was aligned to multiple proteins, we considered only the best alignment.
To define the consistent annotation across the five species, we performed BLASTP (version 2.2.24) between proteomes of pairwise species. The top five alignments with e-values below 1e-5 were considered as homologous. One contig was considered to be consistently annotated across species when the aligned protein in one species (e.g., the sea slug) and each of the aligned proteins in the other species (e.g., the remaining four species) were homologous based on the above criterion.
For each protein, we determined the direction of its change between activated and control samples by considering the expression changes of all the contigs mapped to the protein (both DE and non-DE contigs) in two experiments or one experiment. Specifically, for each protein, we calculated the percentage of contigs which were upregulated/downregulated in both experiments (E1 and E2) or in either experiment (E1 or E2) after activation out of all its mapped contigs. Under both experiments or one experiment, we next considered proteins with either upregulated or downregulated contigs’ percentage no <80% (i.e., no <80% of its mapped contigs showed consistent direction of expression change). The percentage of such proteins was then calculated and the significance of this percentage was estimated by randomly sampling 1000 times the same number of expressed contigs to be mapped to the protein, with other procedures being constant.
Design of in situ probe
The probe, which uniquely targeted the contig homologous to mammalian fos family IEGs c-fos and fosl2 and fruit fly IEG kayak (c-fos/fosl2 probe), was designed in an intron-spanned manner for the purpose of hybridizing only the transcript instead of the DNA. To find the junction site of the corresponding contig, we aligned the contig to the gene sequence of c-Fos in mice by BLASTN in Ensembl. With the aid of the gene structure and alignment visualization implemented in Ensembl, we chose the subset of sequence around the junction site from the contig as the probe (Table 4). Along with this criterion, a compromise was reached in consideration of the probe’s length (∼50 nt), GC content (40–50%) for an appropriate hybridization.
Electrophysiological experiment
Sample preparations were the same as the preparations before RNA-Seq. The CNS was treated with proteinase Type XIV for 2 min, followed by washing using Ringer saline (80 mМ NaCl, 4 mМ KCl, 7 mМ СаСl2, 5 mМ MgCl2, and 10 mМ Trisma; pH 7.8). We recorded intracellularly the activity of the readily identifiable giant premotor neurons located in both PaG involved in triggering withdrawal (RPa2, RPa3, LPa2, LPa3). Microelectrodes were filled with 2 М potassium acetate and had the conductance 10–30 MOhm. Nerve stimulations were performed via the polyethylene suction electrodes. The experimental protocol included the nerve stimulation with 10-stimulus trains (10 Hz) once per 30 s, the intensity of stimulation chosen to cause the appearance of several action potentials in the giant premotor neurons at the beginning of the training. The total time of stimulation was 2 h. Intracellular signals were recorded with preamplifiers (Axoclamp 2B, Molecular Devices), digitized and stored on a computer (Digidata 1400 A A/D converter and Axoscope 10.0 software, both from Molecular Devices).
Nerve backfills
To detect the population of neurons projecting to the stimulated nerve and to compare it with the population of neurons expressing genes of interest (here c-fos/fosl2), we performed the retrograde labeling of the neurons in the CNS. The cut end of the nerve was sucked into a pipette filled with 10% neurobiotin in 0.1 M KCl. The end of the nerve was left in place for 12–24 h at 18–22°C. The time of backfill was chosen experimentally. The ganglia were fixed with 4% paraformaldehyde in 0.1 M phosphate buffer for 2 h. The brains were processed with the Vectastain ABC kit.
In situ hybridization
For in situ hybridization, the CNS was processed as whole-mounts. The experimental procedure was described earlier (Balaban et al., 2001). Differently, as the short probe to the RNA of interest was used, we slightly modified the protocol. That is, the pre-hybridization was conducted at 50°C, and the hybridization itself was conducted at 37°C, with other procedure details being constant.
RT-qPCR
Five snails were used for RT-qPCR, with the sample preparation conducted as described above in Sample preparations and RNA-Seq. RNA was extracted from snail’s CNS with PureLink RNA Mini kit (Ambion) according to manufacturer’s recommendations. One RNA extraction failed for the stimulated part of the snail CNS. The quality of purified RNA was examined with Nanodrop spectrophotometer and 1 μg of RNA was taken for DNase I (Thermo Scientific) treatment conducted according to the manufacturer’s recommendations. DNA-free RNA was subsequently transcribed into cDNA for RT-qPCR procedure using Maxima First Strand cDNA Synthesis kit (Thermo Scientific). Quantitative PCR was performed using QuantStudio 3 Real-Time PCR System (Applied Biosystems) in triplicates for each sample. The protocol used SsoAdvanced Universal SYBR Green Supermix (Bio-Rad) according to manufacturer’s recommendations. Primer sequences used for c-fos and β-actin are provided in Table 5. The relative gene expression was calculated using 2-ΔΔCt method (Livak and Schmittgen, 2001). The relative expression levels of c-fos mRNA were normalized by the geometric mean of β-actin mRNA expression.
Training and unilateral stimulation
Training experiments were performed on adult H. lucorum weighing 25–30 g. Animals were housed in large plastic boxes with increased humidity and fed with cabbage ad libitum. The animals were food deprived for 3 d before experiments. A total of 36 snails were involved in a food-aversion experiment. 12 of them were trained by the association of the novel food (carrot, conditioned stimulus, CS) with a bitter taste (10% quinine hydrochloride solution, unconditioned stimulus, US). Three CS-US paired stimulations were applied to the 12 snails with 10-min intertrial interval. Another 12 out of the 36 snails were used in the unpaired training, with three carrot and three quinine presentations applied in a random order. The remaining 12 snails were used as controls. For each experimental group, five snails were randomly selected and subjected to immunohistochemistry (IHC) analyses 2 h later from the start of the training (Zangenehpour and Chaudhuri, 2002; Barry and Commins, 2017). The remaining seven snails were subjected to behavioral test 24 h later to assess the long-term memory formation.
To reveal the immediate gene activation, 2 h after the stimulus presentation trained, unpaired trained, and control snails (n = 5 in each experimental group) were anesthetized; control snails were anesthetized immediately after the removal from their home boxes. CNS was quickly removed and frozen in liquid nitrogen vapors for IHC analyses. The remaining snails (n = 7 in each experimental group) were used to assess the reaction time latency 24 h after the training. The cabbage was used as a differentiation stimulus (DS). Three CS and three DS were presented to each snail with 10-min intertrial intervals.
To prepare semi-intact preparations for unilateral stimulation experiments, five snails were anesthetized in ice-cold water for 30–40 min followed by the injection of 100- to 150-mg MgCl2 in 1-ml physiologic saline for mollusks. Quinine solution (10%, 600 μl) was applied unilaterally to the chemoreceptive snail lip area and triggered ipsilateral contractions of the skin and mantle bolster. After 2 h from the stimulation, CNS was extracted and frozen in liquid nitrogen vapors for the following IHC.
IHC
For the multiple immunofluorescence reaction, 20-μm serial sections were prepared from snail CNS on freezing microtome Leica CM1950 by the freeze–thaw method. Sections were fixed in fresh ice-cold 4% PFA solution for 7 min and washed in 0.01 M PBS (1× PBS; pH 7.4) three times for 5 min. Then sections were incubated in permeabilization buffer [5% Triton X-100, 5% DMSO and 5% normal horse serum (NHS) in 1× PBS] for 1 h at room temperature and washed. The reaction with primary antibodies (Table 6) was performed in the blocking buffer (1% Triton X-100, 5% DMSO, 5% NHS, 0.01% NaN3 in 1× PBS) overnight at 4°C followed by the washing. For the first experimental series, mouse anti-c-Fos antibodies (sc-8047, Santa-Cruz) were used. In the second series, to confirm the results, we used mouse anti-c-Fos antibodies (sc-166940, Santa-Cruz; Table 6). Then sections were incubated in biotinylated horse anti-mouse immunoglobulin IgG (BA-2000, VectorLabs; Table 6) in the blocking buffer for 2 h at room temperature and washed. After that sections were stained with streptavidin conjugated to Alexa Fluor 568 (S-11226, Invitrogen; Table 6) in the blocking buffer for 2 h at room temperature and washed. For double IHC, mouse anti-c-Fos (sc-166940, Santa-Cruz), rabbit anti-serotonin (S5545, Sigma), and corresponding biotinylated horse anti-mouse followed by streptavidin conjugated to Alexa Fluor 568 and donkey anti-rabbit Alexa Fluor 488 (A-21206, Invitrogen) were used (Table 6). After autofluorescence reduction in 1% Sudan black in 70% ethanol for 20 min followed by washing in 1× PBS, sections were mounted in FluoroMount aqueous mounting medium (Sigma) with fluorescent nuclear counterstain DAPI, coverslipped and sealed with nail polish. No signal was seen in negative control sections processed with primary antibody omission.
Images were obtained by Olympus FluoView 10i confocal laser scanning microscope with UPLSAPO 60×/1.20 W objective and Zeiss LSM800 AiryScan system with LD Plan-Neofluar 40×/0.6 objective. The image analysis was performed in Imaris (Bitplane) and ImageJ (NIH) software. Fluorescence intensities were measured in the outlined nuclei of identified neurons on three sections from each neuron and were further averaged for each snail. Statistical difference was determined using Mann–Whitney test.
Data and code accessibility
The RNA-Seq data and the assembled snail neuronal transcriptome were deposited in the Gene Expression Omnibus (GEO) under the accession number GSE123558. The code is available as Extended Data Code 1.
Extended Data Code 1
The script includes major procedures we used to perform the analyses. It is executed in Shell or R environment. Specifically, this script performed transcriptome assembly, quality examination of the assembled transcriptome, expression level quantification and normalization. Post-quantification analytical steps were also contained in the script, including global pattern exploration by MDS, differential expression analysis, and contig annotation. Download Extended Data 1, TXT file.
Results
Snail transcriptome assembly
We searched for IEGs in the terrestrial snail H. lucorum by comparing the transcriptome composition between the activated and control sections of the nervous system. In the first experiment (E1), we stimulated one half of the PaG network of six snails using serotonin (5-HT) in the presence of the protein synthesis inhibitor anisomycin, with the other half of the PaG network serving as a control (Fig. 1). In the second experiment (E2), we stimulated in the presence of anisomycin a medial part of the PaG containing mostly two giant premotor interneurons (Pa2/3) surgically dissected from the PaG network of the other six snails, and used the dissected Pa2/3 interneurons from the non-stimulated side of the network as a control. This resulted in a total of 12 activated and 12 control samples, dissected from 13 snails (Table 1).
We measured the poly A+ transcriptome of activated and control samples using RNA-Seq yielding a total of 460,233,404 reads with the read length of 100 nucleotides (nt; Table 2). Further, we generated 229,050,797 transcriptome read pairs with the read length of 101 nt and 12,701,459 transcriptome read pairs with the read length of 251 nt from H. lucorum neurons in additional experiments (Table 3). As the genome sequence of the snail H. lucorum or any other closely related invertebrate species was not yet deciphered, we conducted the de novo assembly of the neuronal transcriptome of the snail using all obtained reads.
The de novo transcriptome assembly yielded 693,041 contigs, far more than the anticipated transcript number, indicating the presence of alignment gaps. Nonetheless, the assessment criteria indicated reasonable assembly quality: N50 representing the minimal size of the contigs covering half of the assembled transcriptome sequence equaled 865 bp, the median GC content equaled 41%, and the median open reading frame (ORF) percentage equaled 71% (Extended Data Fig. 1-1).
Extended Data Figure 1-1
Assessment of snails’ transcriptome assembly. Curves represented empirical cumulative distributions of lengths, GC contents, and ORF percentages of all contigs in the transcriptome assembly. Note contigs longer than 2-k bp were not shown in the length distribution to avoid an elongated tail for visualization. Q1, Q2, and Q3: first, second, and third quartiles, respectively. Download Extended Data 1, TIF file.
To test the reliability of our assembly, we conducted conditional reciprocal best BLAST (CRBB) between the snail assembly and the annotated proteome of another gastropod, the California sea slug A. californica (Knudsen et al., 2006). Almost half (49%) of the slug proteins could be reciprocally matched with the assembled snail transcripts, despite >450 million years of evolution separating these two species.
Detection of DE contigs
To quantify gene expression at the contig level, we mapped RNA-Seq reads from each sample to the transcriptome assembly. On average, 60% of reads were mapped to the assembled transcripts (Table 2), resulting in 404,678 (58%) contigs classified as expressed in E1 and 332,423 (48%) contigs in E2. The MDS analysis based on the expression of these contigs revealed the absence of outliers and clustering of samples according to individuals (Extended Data Fig. 1-2).
Extended Data Figure 1-2
Overall patterns of snail samples. A, B, Global patterns across snail samples revealed by MDS analyses based on the expressed contigs in E1 (A) and E2 (B). Different shapes of dots indicated different individuals in each experiment. Colors showed the two groups: red, activated group; blue, control group. Download Figure 2, TIF file.
The statistical analysis identified 350 contigs DE between control and activated samples in E1 and 98 contigs in E2 [exact test, Benjamini and Hochberg (BH) FDR-corrected p < 0.05; fold change > 2]. Of them, 26 contigs were DE in both experiments. All 26 were upregulated after the neuronal activity stimulation, an observation not expected by chance (hypergeometric test, p < 0.0001). Overall, 95% of DE contigs in E1 and 70% of DE contigs in E2 were upregulated after the neuronal activity stimulation, consistent with the known IEGs’ response mechanism (Fig. 2A). Furthermore, the direction of expression difference was in good agreement between the experiments: 83% of the 422 DE contigs representing the union of the two experiments were upregulated and 7% downregulated in both experiments after the neuronal activation (χ2 test, p < 0.0001; Fig. 2A,B).
Annotation of expressed contigs
We then annotated all expressed contigs by translating their nucleotide sequences into the amino acid sequences in all six possible frames and aligning them to the protein sequences from five species: bloodfluck planorb (B. glabrata, Bg), California sea slug (A. californica, Ac), roundworm (C. elegans, Ce), fruit fly (D. melanogaster, Dm), and tiger puffer (T. rubripes, Tr; Fig. 3A).
Extended Data Figure 3-1
Expression regulation of 37 consistently annotated proteins. Heat map showing the log2-transformed fold changes between activated and control samples in E1 and E2 for all contigs corresponding to the 37 consistently annotated proteins derived from the annotation of DE contigs. Protein names colored in red indicated the 20 proteins with more than 80% of their contigs showing upregulation in activated samples. Download Figure 3-1, TIF file.
Out of all expressed contigs in E1 and E2, 49,695 (12.3%) and 44,958 (13.5%) were, respectively, mapped to a total of 12,711 and 12,104 protein sequences in at least one species (Fig. 3B). Of them, 76% and 77% were mapped in multiple species, yielding the consistent annotation in 79% and 81% of the cases (Fig. 3B). On average, each consistently annotated protein was represented by three contigs in each of the two experiments.
Among DE contigs, 46 were consistently mapped in multiple species to 37 proteins (Fig. 3C). Further, 138 contigs not classified as DE were mapped to these 37 proteins. Despite failing to pass the stringent significance cutoff, 86% of these 138 contigs were upregulated in activated samples in at least one experiment, an observation consistent with the expression behavior of DE contigs, and not expected by chance (permutations, p < 0.001; Fig. 3C). Accordingly, for 20 of the 37 proteins, >80% of its all mapped contigs, including both DE and non-DE contigs, showed consistent upregulation in activated samples (permutations, p < 0.004; Fig. 3D; Extended Data Fig. 3-1).
Notably, these 20 proteins contained snail homologs of seven previously characterized mammalian IEGs: c-jun/jund, C/EBP, c-fos/fosl2, Egr1, Ier5l, Socs2, and Dusp10 (Fig. 3D). Among the remaining 13 proteins, nine fell within well-defined Gene Ontology (GO) terms: “response to stimulus” and “immune system process” (Fig. 3D).
In vivo assessment of identified IEGs
To characterize the spatial expression of putative snail IEGs detected in our study, we conducted in situ hybridization experiments in snails’ pedal ganglia (PdG), pleuro-viscero-PaG complex (Par), and cerebral ganglia (CrG) using the customized probe targeting the snail transcript homologues to mammalian fos family IEGs c-fos and fosl2 and fruit fly IEG kayak (c-fos/fosl2 probe; Table 4).
The global neuronal activation of the circumphageal ganglia complex (cerebral, pleural, parietal, pedal and visceral ganglia) by 5 µM 5-HT bath application produced widespread hybridization signals of the c-fos/fosl2 probe within neurons compared to non-stimulated controls in all ganglia (Fig. 4A), which was further confirmed by the increased c-fos expression in activated parts of PaG measured using quantitative PCR (RT-qPCR; Fig. 4B; Table 7). The selective stimulation via the anal nerve elicited clear and specific hybridization patterns with the c-fos/fosl2 probe in the ganglia (Fig. 4C). Comparisons of neurons projecting to the anal nerve identified by the neurobiotin backfill with c-fos/fosl2 hybridization patterns further showed the accurate identification of these neurons by the probe in the PdG and Par, and revealed additional neurons possibly representing the secondary activation response. These results show the potential of identified IEGs to reveal gene activation patterns in the snail nervous system.
To further assess the expression of c-Fos protein homolog in snails, we conducted IHC experiments in the PaG containing the two giant premotor interneurons (Pa2/3) and PdG using the mouse monoclonal antibody against amino acids of c-Fos of human origin (Table 6). The IHC showed increased c-Fos expression in the Pa2/3 nuclei after the in vivo behavioral training using taste aversion paradigm (Fig. 4D–F), and after the unilateral stimulation of the lip by quinine in the semi-intact snail CNS preparation (Fig. 4E,F). The activation of c-Fos in Pa2 neurons after the behavioral training was not directly linked to associative learning (Fig. 4G,H), consistent with studies conducted in mice and other vertebrates (Herdegen and Leah, 1998; Kovacs, 1998; Kaczmarek et al., 1999; Burnham et al., 2010; Lopez et al., 2018; Fredericksen et al., 2019; Möser et al., 2019). Intriguingly, a group of serotoninergic neurons located in PdG showed greater c-Fos upregulation after paired CS and US presentation compared to unpaired ones (Fig. 4G,H). This suggests possible roles of these neurons in the formation of learning and memory in snails.
Discussion
Gastropods, including the sea slug A. californica, the terrestrial slug L. valentianus, the pond snail L. stagnalis, and the terrestrial snail H. lucorum are widely used in studies of nervous system organization and function due to their simpler organization and the presence of large neurons and axons, facilitating electrophysiological readings (Clarac and Pearlstein, 2007). These studies produced fundamental insights into the basic mechanisms of learning and memory formation. The potential of these model systems was not fully used, however, due to insufficient knowledge of the molecular mechanisms accompanying the neuronal activation.
In the absence of the genome sequence or its close homologues, the de novo transcriptome assembly represents a useful approach to identify transcripts and quantify their expression. In our study, we conducted the assembly of the snail neuronal transcriptome using >943 million reads with lengths from 100 to 251 nt cumulatively covering >98 billion positions. The resulting transcriptome contained gaps, with an average of three contigs mapping to the same protein. It is recognized that de novo transcriptome assembly intrinsically generates fewer complete transcripts than reference genome-based methods (Martin and Wang, 2011; Bushmanova et al., 2018). Moreover, the extent of fully-constructed transcripts drops drastically when the number of isoforms per gene increases (Chang et al., 2014). This represents a common issue of de novo transcriptome reconstruction for non-model organisms, which is exacerbated by factors such as homologous or repetitive genomic regions (Treangen and Salzberg, 2011), varying read coverage depths along a transcript (Chow et al., 2014), low-expression transcripts (Zhao et al., 2011) and sequencing artifacts. Taking into account the number of reads generated in our study (n = 943 millions), further improvement in the assembly quality can be achieved by the use of alternative protocols involving longer read lengths. Nonetheless, the assembly quality (N50 = 865 bp) was substantially better than that reported for another land snail Cornu aspersum (N50 = 365 bp; Parmakelis et al., 2017), and comparable to the transcriptome assembly quality of the sea snail Dicathais orbita (N50 = 608 bp; Baten et al., 2016), the land snail Aegista chejuensis (N50 = 788 bp; Kang et al., 2016), and the pond snail L. stagnalis (N50 ranged from 564 to 1614 bp depending on the assembly algorithm; Sadamoto et al., 2012).
The differential expression analysis revealed the substantial gene expression activation after the serotonin stimulation both within a specific ganglion of the snail nervous system (E1) and in a dissected medial part of a ganglion containing identified interneurons (E2). Notably, genes activated in the snail contained a number of homologues to well-characterized mammalian IEGs including c-jun/jund, C/EBP, c-fos/fosl2, Egr1, Ier5l, Socs2, and Dusp10. Some of these genes were characterized in other invertebrates with a similar nervous system organization. Specifically, c-jun was shown to modulate the synaptogenesis between sensory neurons and motor neurons in A. californica (Sung et al., 2006). Similarly, Egr1, a member of the Egr family, was upregulated after long-time sensitization training in A. californica (Cyriac et al., 2013). Another well-characterized mammalian IEG also found in the snail in our study, C/EBP, was detected in all 11 ganglia (Hatakeyama et al., 2004) and undergone expression changes during early memory consolidation in the pond snail L. stagnalis (Hatakeyama et al., 2006). Moreover, C/EBP was shown to be involved in the consolidation phase of long-term facilitation in A. californica (Alberini et al., 1994).
Besides known mammalian IEGs’ homologues, we found a number of not yet annotated putative snail IEGs. Genes activated in our experiments were particularly overrepresented in immune or stimulus response terms, indicating the potential activation of these systems by the serotonin stimulation procedures. Notably, the number of contigs activated after the stimulation of the PaG network exceeded the number detected in a dissected part of the ganglion containing two giant premotor interneurons by more than 3-fold. This observation was not caused by the difference in numbers of contigs detected in the two experiments, and matched reports indicating the variability of IEG repertoire among different neuron types (Guo et al., 2014; Chen et al., 2016).
IEGs are often used as neuronal activity markers to visualize the activity patterns within the nervous system (Wilson et al., 2002; Bepari et al., 2012; Fujita et al., 2013). Our in situ hybridization experiments conducted using the probe specific to the snail homolog of mammalian fos family IEGs c-fos and fosl2, and fly IEG kayak indicated the potential of identified snail transcripts to visualize activated neurons. Designs of antibodies specific to snail IEGs would be the next step to facilitate studies of the gene activation networks in a low-complexity model system.
Acknowledgments
Acknowledgements: We thank Zhisong He for his assistance, Grace Cuddihy for her comments on this manuscript, and all members of the Comparative Biology Group in Shanghai for their helpful suggestions.
Footnotes
The authors declare no competing financial interests.
This work was supported by the Russian Science Foundation Grant 16-14-00220 (to Q.L., O.E., V.N.I., P.M.B., and P.Kh.) and by the National Key R&D Program of China Grant 2017YFA0505700.
This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license, which permits unrestricted use, distribution and reproduction in any medium provided that the original work is properly attributed.