Skip to main content

Main menu

  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Blog
    • Collections
    • Podcast
  • TOPICS
    • Cognition and Behavior
    • Development
    • Disorders of the Nervous System
    • History, Teaching and Public Awareness
    • Integrative Systems
    • Neuronal Excitability
    • Novel Tools and Methods
    • Sensory and Motor Systems
  • ALERTS
  • FOR AUTHORS
  • ABOUT
    • Overview
    • Editorial Board
    • For the Media
    • Privacy Policy
    • Contact Us
    • Feedback
  • SUBMIT

User menu

Search

  • Advanced search
eNeuro

eNeuro

Advanced Search

 

  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Blog
    • Collections
    • Podcast
  • TOPICS
    • Cognition and Behavior
    • Development
    • Disorders of the Nervous System
    • History, Teaching and Public Awareness
    • Integrative Systems
    • Neuronal Excitability
    • Novel Tools and Methods
    • Sensory and Motor Systems
  • ALERTS
  • FOR AUTHORS
  • ABOUT
    • Overview
    • Editorial Board
    • For the Media
    • Privacy Policy
    • Contact Us
    • Feedback
  • SUBMIT
PreviousNext
Research ArticleMethods/New Tools, Novel Tools and Methods

Cross-Laboratory Analysis of Brain Cell Type Transcriptomes with Applications to Interpretation of Bulk Tissue Data

B. Ogan Mancarci, Lilah Toker, Shreejoy J. Tripathy, Brenna Li, Brad Rocco, Etienne Sibille and Paul Pavlidis
eNeuro 20 November 2017, 4 (6) ENEURO.0212-17.2017; DOI: https://doi.org/10.1523/ENEURO.0212-17.2017
B. Ogan Mancarci
1Graduate Program in Bioinformatics, University of British Columbia, Vancouver V6T 1Z4, Canada
2Department of Psychiatry, University of British Columbia, Vancouver V6T 2A1, Canada
3Michael Smith Laboratories, University of British Columbia, Vancouver V6T 1Z4, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for B. Ogan Mancarci
Lilah Toker
2Department of Psychiatry, University of British Columbia, Vancouver V6T 2A1, Canada
3Michael Smith Laboratories, University of British Columbia, Vancouver V6T 1Z4, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Shreejoy J. Tripathy
2Department of Psychiatry, University of British Columbia, Vancouver V6T 2A1, Canada
3Michael Smith Laboratories, University of British Columbia, Vancouver V6T 1Z4, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Shreejoy J. Tripathy
Brenna Li
2Department of Psychiatry, University of British Columbia, Vancouver V6T 2A1, Canada
3Michael Smith Laboratories, University of British Columbia, Vancouver V6T 1Z4, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Brenna Li
Brad Rocco
4Campbell Family Mental Health Research Institute of CAMH
5Department of Psychiatry and the Department of Pharmacology and Toxicology, University of Toronto, Vancouver M5S 1A8, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Etienne Sibille
4Campbell Family Mental Health Research Institute of CAMH
5Department of Psychiatry and the Department of Pharmacology and Toxicology, University of Toronto, Vancouver M5S 1A8, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Paul Pavlidis
2Department of Psychiatry, University of British Columbia, Vancouver V6T 2A1, Canada
3Michael Smith Laboratories, University of British Columbia, Vancouver V6T 1Z4, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Paul Pavlidis
  • Article
  • Figures & Data
  • Info & Metrics
  • eLetters
  • PDF
Loading

Visual Abstract

Figure
  • Download figure
  • Open in new tab
  • Download powerpoint

Abstract

Establishing the molecular diversity of cell types is crucial for the study of the nervous system. We compiled a cross-laboratory database of mouse brain cell type-specific transcriptomes from 36 major cell types from across the mammalian brain using rigorously curated published data from pooled cell type microarray and single-cell RNA-sequencing (RNA-seq) studies. We used these data to identify cell type-specific marker genes, discovering a substantial number of novel markers, many of which we validated using computational and experimental approaches. We further demonstrate that summarized expression of marker gene sets (MGSs) in bulk tissue data can be used to estimate the relative cell type abundance across samples. To facilitate use of this expanding resource, we provide a user-friendly web interface at www.neuroexpresso.org.

  • cell type
  • gene expression
  • marker gene
  • microarray
  • RNA sequencing

Significance Statement

Cell type markers are powerful tools in the study of the nervous system that help reveal properties of cell types and acquire additional information from large scale expression experiments. Despite their usefulness in the field, known marker genes for brain cell types are few in number. We present NeuroExpresso, a database of brain cell type-specific gene expression profiles, and demonstrate the use of marker genes for acquiring cell type-specific information from whole tissue expression. The database will prove itself as a useful resource for researchers aiming to reveal novel properties of the cell types and aid both laboratory and computational scientists to unravel the cell type-specific components of brain disorders.

Introduction

Brain cells can be classified based on features such as their primary type (e.g., neurons vs glia), location (e.g., cortex, hippocampus, cerebellum), electrophysiological properties (e.g., fast spiking vs regular spiking), morphology (e.g., pyramidal cells, granule cells), or the neurotransmitter/neuromodulator they release (e.g., dopaminergic cells, serotonergic cells, GABAergic cells). Marker genes, genes that are expressed in a specific subset of cells, are often used in combination with other cellular features to define different types of cells (Margolis et al., 2006; Hu et al., 2014) and facilitate their characterization by tagging the cells of interest for further studies (Tomomura et al., 2001; Lobo et al., 2006; Handley et al., 2015). Marker genes have also found use in the analysis of whole tissue “bulk” gene expression profiling data, which can be challenging to interpret due to the difficulty to determine the source of the observed expressional change. For example, a decrease in a transcript level can indicate a regulatory event affecting the expression level of the gene, a decrease in the number of cells expressing the gene, or both. To address this issue, computational methods have been proposed to estimate cell type-specific proportion changes based on expression patterns of known marker genes (Xu et al., 2013; Chikina et al., 2015; Newman et al., 2015; Westra et al., 2015). Finally, marker genes are obvious candidates for having cell type-specific functional roles.

An ideal cell type marker has a strongly enriched expression in a single cell type in the brain. However, this criterion can rarely be met, and for many purposes, cell type markers can be defined within the context of a certain brain region; namely, a useful marker may be specific for the cell type in one region but not necessarily in another region or brain wide. For example, the calcium binding protein parvalbumin (PV) is a useful marker of both fast spiking interneurons in the cortex and Purkinje cells in the cerebellum (Celio and Heizmann, 1981; Kawaguchi et al., 1987). Whether the markers are defined brain-wide or in a region-specific context, the confidence in their specificity is established by testing their expression in as many different cell types as possible. This is important because a marker identified by comparing just two cell types might turn out to be expressed in a third, untested cell type, reducing its utility.

During the last decade, targeted purification of cell types of interest followed by gene expression profiling has been applied to many cell types in the brain. Such studies, targeted towards well-characterized cell types, have greatly promoted our understanding of the functional and molecular diversity of these cells (Chung et al., 2005; Cahoy et al., 2008; Doyle et al., 2008). However, individual studies of this kind are limited in their ability to discover specific markers as they often analyze only a small subset of cell types (Sugino et al., 2006; Okaty et al., 2009; Shrestha et al., 2015) or have limited resolution as they group subtypes of cells together (Cahoy et al., 2008). Recently, advances in technology have enabled the use of single-cell transcriptomics as a powerful tool to dissect neuronal diversity and derive novel molecular classifications of cells (Poulin et al., 2016). However, with single-cell analysis the classification of cells to different types is generally done post hoc, based on the clustering similarity in their gene expression patterns. These molecularly defined cell types are often uncharacterized otherwise (e.g., electrophysiologically, morphologically), challenging their identification outside of the original study and understanding their role in normal and pathological brain function. A notable exception is the single-cell RNA-sequencing (RNA-seq) study of Tasic et al. (2016) analyzing single labelled cells from transgenic mouse lines to facilitate matching of the molecularly defined cell types they discover to previously identified cell types. We hypothesized that by aggregating cell type-specific studies that analyze expression profiles of cell types previously defined in literature, a more comprehensive dataset more suitable for marker genes could be derived.

Here, we report the analysis of an aggregated cross-laboratory dataset of cell type-specific expression profiling experiments from mouse brain, composed both of pooled cell microarray data and single-cell RNA-seq data. We used these data to identify sets of brain cell marker genes more comprehensive than any previously reported, and validated the markers genes in external mouse and human single-cell datasets. We further show that the identified markers are applicable for the analysis of human brain and demonstrate the usage of marker genes in the analysis of bulk tissue data via the summarization of their expression into marker gene profiles (MGPs), which can be cautiously interpreted as correlates of cell type proportion. Finally, we made both the cell type expression profiles and marker sets available to the research community at www.neuroexpresso.org.

Materials and Methods

Figure 1A depicts the workflow and the major steps of this study. All the analyses were performed in R version 3.3.2; the R code and data files can be accessed through www.neuroexpresso.org (RRID: SRC_015724) or directly from https://github.com/oganm/neuroexpressoAnalysis.

Figure 1.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 1.

Mouse brain cell type-specific expression database compiled from publicly available datasets. A, Workflow of the study. Cell type-specific expression profiles are collected from publicly available datasets and personal communications. Acquired samples are grouped based on cell type and brain region. Marker genes are selected per brain region for all cell types. Marker genes are biologically and computationally validated and used in estimation of cell type proportions. B, Brain region hierarchy used in the study. Samples included in a brain region based on the region they were extracted from. For instance, dopaminergic cells isolated from the midbrain were included when selecting marker genes in the context of brainstem and whole brain, and microglia extracted from whole brain isolates were added to all brain regions.

Pooled cell type-specific microarray datasets

We began with a collection of seven studies of isolated cell types from the brain, compiled by Okaty et al. (2011). We expanded this by querying PubMed (http://www.ncbi.nlm.nih.gov/pubmed) and Gene Expression Omnibus (GEO; http://www.ncbi.nlm.nih.gov/geo/; RRID: SCR_007303; Edgar et al., 2002; Barrett et al., 2013) for cell type-specific expression datasets from the mouse brain that used Mouse Expression 430A Array (GPL339) or Mouse Genome 430 2.0 Array (GPL1261) platforms. These platforms were our focus as together, they are the most popular platforms for analysis of mouse samples and are relatively comprehensive in gene coverage, and using a reduced range of platforms reduced technical issues in combining studies. Query terms included names of specific cell types (e.g., astrocytes, pyramidal cells) along with blanket terms such as “brain cell expression” and “purified brain cells.” Only samples derived from postnatal (>14 d), wild-type, untreated animals were included. Data sets obtained from cell cultures or cell lines were excluded due to the reported expression differences between cultured cells and primary cells (Cahoy et al., 2008; Halliwell, 2003; Januszyk et al., 2015). We also considered RNA-seq data from pooled cells (2016; Zhang et al., 2014), but because such datasets are not available for many cell types, including it in the merged resource was not technically feasible without introducing biases (although we were able to incorporate a single-cell RNA-seq dataset, described in the next section). While we plan to incorporate more pooled cell RNA-seq data in the future, for this study we limited their use to validation of marker selection.

As a first step in the quality control of the data, we manually validated that each sample expressed the gene that was used as a marker for purification of the corresponding cell type in the original publication (expression greater than median expression among all gene signals in the dataset), along with other well-established marker genes for the relevant cell type (e.g., Pcp2 for Purkinje cells, Gad1 for GABAergic interneurons). We next excluded contaminated samples, namely, samples expressing established marker genes of nonrelated cell types in levels comparable to the cell type marker itself (for example neuronal samples expressing high levels of glial marker genes), which lead to the removal of 21 samples. In total, we have 30 major cell types compiled from 24 studies represented by microarray data (summarized in Table 1); a complete list of all samples including those removed is available from the authors).

View this table:
  • View inline
  • View popup
Table 1.

Cell types in NeuroExpresso database

Single-cell RNA-seq data

The study of cortical single cells by Tasic et al. (2016) includes a supplementary file (Tasic et al., 2016, their supplementary Table 7) linking a portion of the molecularly defined cell clusters to known cell types previously described in the literature. Using this file, we matched the cell clusters from Tasic et al. (2016) with pooled cortical cell types represented by microarray data (Table 2). For most cell types represented by microarray (e.g., glial cells, Martinotti cells), the matching was based on the correspondence information provided by Tasic et al. (2016). However, for some of the cell clusters from Tasic et al. (2016), the cell types were matched manually, based on the description of the cell type in the original publication (e.g., cortical layer, high expression of a specific gene). For example, Glt25d2+ pyramidal cells from Schmidt et al. (2012), described by the authors as “layer 5b pyramidal cells with high Glt25d2 and Fam84b expression,” were matched with two cell clusters from Tasic et al. (2016), “L5b Tph2” and “L5b Cdh13,” two of the three clusters described as layer 5b glutamatergic cells by Tasic et al. (2016), since both of these clusters represented pyramidal cells from cortical layer 5b and exhibited high level of the indicated genes. Cell clusters identified in Tasic et al. (2016) that did not match to any of the pooled cell types were integrated into to the combined data if they fulfilled the following criteria: (1) they represented well-characterized cell types, and (2) we could determine with high confidence that they did not correspond to more than one cell type represented by microarray data. Table 2 contains information regarding the matching between pooled cell types from microarray data and cell clusters from single-cell RNA-seq data from Tasic et al. (2016)

View this table:
  • View inline
  • View popup
Table 2.

Matching single-cell RNA sequencing data from tasic to well-defined cell types

In total, the combined database contains expression profiles for 36 major cell types, 10 of which are represented by both pooled cell microarray and single-cell RNA-seq data, and five which are represented by single-cell RNA-seq only (summarized in Table 2). Due to the substantial differences between microarray and RNA-seq technologies, we analyzed these data separately (see next sections). For visualization only, in neuroexpesso.org we rescaled the RNA-seq data to allow them to be plotted on the same axes. Details are provided on the web site.

Grouping and reassignment of cell type samples

When possible, samples were assigned to specific cell types based on the descriptions provided in their associated publications. When expression profiles of closely related cell types were too similar to each other and we could not find sufficient number of differentiating marker genes meeting our criteria, they were grouped together into a single cell type. For example, A10 and A9 dopaminergic cells had no distinguishing markers (represented on the microarray platform and meeting our criteria) and were grouped as “dopaminergic neurons.” In the case of pyramidal cells, while we were able to detect marker genes for pyramidal cell subtypes, they were often few in number and most of them were not represented on the human microarray chip (Affymetrix Human Exon 1.0 ST Array) used in the downstream analysis. As a result, calculation of MGPs in human bulk tissue would not be feasible for majority of these cell types. To combat this, we created two gene lists, one created by considering pyramidal subtypes as separate cell types, and another where pyramidal subtypes are pooled into a pan-pyramidal cell type. Due to the scarcity of markers for pyramidal subtypes, we only consider the pan-pyramidal cell type in our downstream analysis. However, we still kept the pyramidal subtypes separate during marker gene selection (described below) for the nonpyramidal cell types to help ensure marker specificity.

Since our focus was identifying markers specific to cell types within a given brain region, samples were grouped based on the brain region from which they were isolated, guided by the anatomical hierarchy of brain regions (Fig. 1B). Brain subregions (e.g., locus coeruleus) were added to the hierarchy if there were multiple cell types represented in the subregion. An exception to the region assignment process are glial samples. Since these samples were only available from either cortex or cerebellum regions or extracted from whole brain, the following assignments were made: Cerebral cortex-derived astrocyte and oligodendrocyte samples were included in the analysis of other cerebral regions as well as thalamus, brainstem and spinal cord. Bergmann glia and cerebellum-derived oligodendrocytes were used in the analysis of cerebellum. The only microglia samples available were isolated from whole brain homogenates and were included in the analysis of all brain regions.

Selection of cell type markers

Marker gene sets (MGSs) were selected for each cell type in each brain region, based on fold change and clustering quality (see below). For cell types that are represented by both microarray and single-cell data (cortical cells), two sets of MGSs were created and later merged as described below. Since there is no generally accepted definition of “marker gene”, our goal was to identify markers that were sufficiently specific and highly expressed to be useful in computational settings, but also likely to be of interest for potential laboratory applications. Thus, our threshold selections were guided in part by the expression patterns of previously well-established markers as well as our intended applications.

Marker genes were selected for each brain region based on the following steps:

  1. For RNA-seq data, each of the relevant clusters identified in Tasic et al. (2016) was considered as a single sample, where the expression of each gene was calculated by taking the mean RPKM values of the individual cells representing the cluster. Table 2 shows which clusters represent which cell types.

  2. Expression level of a gene in a cell type was calculated by taking the mean expression of all replicate samples originating from the same study and averaging the resulting values across different studies per cell type.

  3. The quality of clustering was determined by “mean silhouette coefficient” and “minimal silhouette coefficient” values (where silhouette coefficient is a measure of group dissimilarity ranged between -1 and 1 (Rousseeuw, 1987). Mean silhouette coefficient was calculated by assigning the samples representing the cell type of interest to one cluster and samples from the remaining cell types to another, and then calculating the mean silhouette coefficient of all samples. The minimal silhouette coefficient is the minimal value of mean silhouette coefficient when it is calculated for samples representing the cell type of interest in comparison to samples from each of the remaining cell types separately. The two measures where used to ensure that the marker gene robustly differentiates the cell type of interest from other cell types. Silhouette coefficients were calculated with the “silhouette” function from the “cluster” R package version 1.15.3 (Maechler et al., 2016), using the expression difference of the gene between samples as the distance metric.

  4. A background expression value was defined as expression below which the signal cannot be discerned from noise. Different background values were selected for microarray (6, all values are log2 transformed) and RNA-seq (0.1 RPKM) data due to the differences in their distribution.

Based on these metrics, the following criteria were used:

  1. A threshold expression level was selected to help ensure that the gene’s transcripts will be detectable in bulk tissue. Genes with median expression level below this threshold were excluded from further analyses. For microarrays, this threshold was chosen to be 8. Theoretically, if a gene has an expression level of 8 in a cell type, and the gene is specific to the cell type, an expression level of 6 would be observed if 1/8th of a bulk tissue is composed of the cell type. As many of the cell types in the database are likely to be as rare as or rarer than 1/8th, and 6 is generally close to background for these data, we picked 8 as a lower level of marker gene expression. For RNA-seq data, we selected a threshold of 2.5 RPKM, which in terms of quantiles corresponds to the microarray level of 8.

  2. If the expression level in the cell type of interest is higher than 10 times the background threshold, there must be at least a 10-fold difference from the median expression level of the remaining cell types in the region. If the expression level in the cell type is less than 10 times the background, the expression level must be higher than the expression level of every other cell type in that region. This criterion was added because below this expression level, for a 10-fold expression change to occur, the expression median of other cell types needs be lower than the background. Values below the background signal that do not convey meaningful information but can prevent potentially useful marker genes from being selected.

  3. The mean silhouette coefficient for the gene must be higher than 0.5 and minimum silhouette coefficient must the greater than zero for the associated cell type.

  4. The conditions above must be satisfied only by a single cell type in the region.

To ensure robustness against outlier samples, we used the following randomization procedure, repeated 500 times: one third (rounded) of all samples were removed. For microarray data, to prevent large studies from dominating the silhouette coefficient, when studies representing the same cell types did not have an equal number of samples, N samples were picked randomly from each of the studies, where N is the smallest number of samples coming from a single study. A gene was selected if it qualified our criteria in more than 95% of all permutations.

Our next step was combining the MGSs created from the two expression data types. For cell types and genes represented by both microarray and RNA-seq data, we first looked at the intersection between the MGSs. For most of the cell types, the overlap between the two MGSs was about 50%. We reasoned that this could be partially due to numerous “near misses” in both data sources. Namely, since our method for marker gene selection relies on multiple steps with hard thresholds, it is very likely that some genes were not selected simply because they were just below one of the required thresholds. We thus adopted a soft intersection: a gene was considered as a marker if it fulfilled the marker gene criteria in one data source (pooled cell microarray or single-cell RNA-seq), and its expression in the corresponding cell type from the other data source was higher than in any other cell type in that region. For example, Ank1 was originally selected as a marker of FS Basket cells based on microarray data, but did not fulfil our selection criteria based on RNA-seq data. However, the expression level of Ank1 in the RNA-seq data is higher in FS Basket cells than in any other cell type from this data source, and thus, based on the soft intersection criterion, Ank1 is considered as a marker of FS Basket cells in our final MGS. For genes and cell types that were only represent by one data source, the selection was based on this data source only.

It can be noted that some previously described markers [such as Prox1 for dentate granule dentate gyrus granule cells] are absent from our marker gene lists. In some cases, this is due to the absence the genes from the microarray platforms used, while in other cases the genes failed to meet our stringent selection criteria. Final marker gene lists, along with the data used to generate them, can be found at http://hdl.handle.net/11272/10527, also available from http://pavlab.msl.ubc.ca/supplement-to-mancarci-et-al-neuroexpresso/.

Human homologues of mouse genes were defined by NCBI HomoloGene (ftp://ftp.ncbi.nih.gov/pub/HomoloGene/build68/homologene.data).

Microglia-enriched genes

Microglia expression profiles differ significantly between activated and inactivated states and to our knowledge, the samples in our database represent only the inactive state (Holtman et al., 2015). In order to acquire marker genes with stable expression levels regardless of microglia activation state, we removed the genes differentially expressed in activated microglia based on Holtman et al. (2015). This step resulted in removal of 408 out of the original 720 microglial genes in cortex (microarray and RNA-seq lists combined) and 253 of the 493 genes in the context of other brain regions (without genes from single-cell data). Microglial marker genes which were differentially expressed in activated microglia are referred to as Microglia_activation and Microglia_deactivation (up- or downregulated, respectively) in the marker gene lists provided.

S100a10+ pyramidal cell-enriched genes

The paper (Schmidt et al., 2012) describing the cortical S100a10+ pyramidal cells emphasizes the existence of non-neuronal cells expressing S100a10+. Schmidt et al. (2012), therefore, limited their analysis to 7853 genes specifically expressed in neurons and advised third-party users of the data to do so as well. Since a contamination caveat was only concerning microarray samples from Schmidt et al. (2012; the only source of S100a10+ pyramidal cells in microarray data), we removed marker genes selected for S100a10+ pyramidal cells based on the microarray data if they were not among the 7853 genes indicated in Schmidt et al. (2012). We also removed S100a10 itself since based on the author’s description it was not specific to this cell type. In total, 36 of the 47 S100a10 pyramidal genes originally selected based on microarray data were removed in this step. Of note, none of the removed genes were selected as a marker of S100a10 cell based on RNA-seq data.

Dentate gyrus granule cell-enriched genes

We used data from (Cembrowski et al., 2016; Hipposeq, RRID: SCR_015730) for validation and refinement of dentate dyrus granule cell markers (as noted above these data are not currently included in NeuroExpresso for technical reasons). FPKM values were downloaded (GEO accession GSE74985) and log2 transformed. Based on these values, dentate gyrus granule cell marker genes were removed if their expression in Hipposeq data (mean of dorsal and ventral granule cells) was lower than other cell types represented in this dataset. In total, 15 of the 39 originally selected genes that were removed in this step.

In situ hybridization (ISH)

Male C57BL/6J (RRID: IMSR_JAX:0000664) mice aged 13–15 weeks at time of killing were used (n = 5). Mice were euthanized by cervical dislocation and then the brain was quickly removed, frozen on dry ice, and stored at −80°C until sectioned via cryostat. Brain sections containing the sensorimotor cortex were cut along the rostral-caudal axis using a block advance of 14 μm, immediately mounted on glass slides and dried at room temperature (RT) for 10 min, and then stored at -80°C until processed using multilabel fluorescent ISH procedures.

Fluorescent ISH probes were designed by Advanced Cell Diagnostics to detect mRNA encoding Cox6a2, Slc32a1, and Pvalb. Two sections per animal were processed using the RNAscope 2.5 Assay as previously described (Wang et al., 2012). Briefly, tissue sections were incubated in a protease treatment for 30 min at RT and then the probes were hybridized to their target mRNAs for 2 hours at 40°C. The sections were exposed to a series of incubations at 40°C that amplifies the target probes, and then counterstained with NeuroTrace blue-fluorescent Nissl stain (1:50; Molecular Probes) for 20 min at RT. Cox6a2, Pvalb, and Slc32a1 were detected with Alexa Fluor 488, Atto 550, and Atto 647, respectively.

Data were collected on an Olympus IX83 inverted microscope equipped with a Hamamatsu Orca-Flash4.0 V2 digital CMOS camera using a 60x 1.40 NA SC oil immersion objective. The equipment was controlled by cellSens (Olympus). 3D image stacks (2D images successively captured at intervals separated by 0.25 μm in the z-dimension) that are 1434 × 1434 pixels (155.35 × 155.35 μm) were acquired over the entire thickness of the tissue section. The stacks were collected using optimal exposure settings (i.e., those that yielded the greatest dynamic range with no saturated pixels), with differences in exposures normalized before analyses.

Laminar boundaries of the sensorimotor cortex were determined by cytoarchitectonic criteria using NeuroTrace labeling. Fifteen image stacks across the gray matter area spanning from layer 2 to 6 were systematic randomly sampled using a sampling grid of 220 × 220 μm2, which yielded a total of 30 image stacks per animal. Every NeuroTrace labeled neuron within a 700 × 700 pixels counting frame was included for analyses; the counting frame was placed in the center of each image to ensure that the entire NeuroTrace labeled neuron was in the field of view. The percentage (±standard deviation) of NeuroTrace labeled cells containing Cox6a2 mRNA (Cox6a2+) and that did not contain Slc32a1 mRNA (Slc32a1-), that contained Slc32a1 but not Pvalb mRNA (Slc32a1+/Pvalb-), and that contained both Slc32a1 and Pvalb mRNAs (Slc32a1+/Pvalb+) were manually assessed.

Allen Brain Atlas (ABA) ISH data

We downloaded ISH images using the ABA API (http://help.brain-map.org/display/mousebrain/API). Assessment of expression patterns was done by visual inspection. If a probe used in an ISH experiment did not show expression in the region, an alternative probe targeting the same gene was sought. If none of the probes showed expression in the region, the gene was considered to be not expressed.

Validation of marker genes using external single-cell data

Mouse cortex single-cell RNA sequencing (RNA-seq) data were acquired from Zeisel et al. (2015; available from http://linnarssonlab.org/cortex/, GEO accession: GSE60361,1691 cells). Human single-cell RNA sequencing data were acquired from Darmanis et al. (2015; GEO accession: GSE67835, 466 cells). For both datasets, pre-processed expression data were encoded in a binary matrix with 1 representing any nonzero value. For all MGSs, Spearman’s ρ was used to quantify internal correlation. A null distribution was estimated by calculating the internal correlation of 1000 randomly-selected prevalence-matched gene groups. Gene prevalence was defined as the total number of cells with a non-zero expression value for the gene. Prevalence matching was done by choosing a random gene with a prevalence of ±2.5% of the prevalence of the marker gene; p values were calculated by comparing the internal correlation of MGS to the internal correlations of random gene groups using Wilcoxon rank-sum test.

Preprocessing of microarray data

For comparison of MGPs in white matter and frontal cortex, we acquired expression data from pathologically healthy brain samples from Trabzuni et al. (2013; GEO accession: GSE60862). For estimation of dopaminergic MGPs in Parkinson’s disease (PD) patients and controls, we acquired substantia nigra expression data from Lesnick et al. (2007; GSE7621), Moran et al.(2006; GSE8397), and Zhang et al.(2005; GSE20295) studies. Expression data for the Stanley Medical Research Institute (SMRI), which included postmortem prefrontal cortex samples from bipolar disorder, major depression and schizophrenia patients along with healthy donors, were acquired through https://www.stanleygenomics.org/, study identifier 2.

All microarray data used in the study were pre-processed and normalized with the “rma” function of the “oligo” (RRID: SCR_015729; Affymetrix gene arrays) or “affy” (RRID: SCR_012835; Affymetrix 3’IVT arrays; Carvalho and Irizarry, 2010) R packages. Probeset to gene annotations were obtained from Gemma (Zoubarev et al., 2012; https://gemma.msl.ubc.ca/). Probesets with maximal expression level lower than the median among all probeset signals were removed. Of the remaining probesets, whenever several probesets were mapped to the same gene, the one with the highest variance among the samples was selected for further analysis.

Outliers and mislabelled samples were removed when applicable, if they were identified as an outlier in provided metadata, if expression of sex-specific genes did not match the sex provided in metadata (Toker et al., 2016), or if they clustered with data from another tissue type in the same dataset based on genes found to be most differentially expressed between the tissue types. This resulted in the removal of 18/194 samples from Trabzuni et al. (2013), 3/44 samples from expression data from SMRI, and 3/93 samples from the Zhang et al. (2005) dataset.

Samples from pooled cell types that make up the NeuroExpresso database were processed by an in-house modified version of the rma function that enabled collective processing of data from Mouse Expression 430A Array (GPL339) and Mouse Genome 430 2.0 Array (GPL1261) which share 22690 of their probesets. As part of the rma function, the samples are quantile normalized at the probe level. However, possibly due to differences in the purification steps used by different studies (Okaty et al., 2011), we still observed biases in signal distribution among samples originating from different studies. Thus, to increase the comparability across studies, we performed a second quantile normalization of the samples at a probeset level before selection of probes with the highest variance. After all processing, the final dataset included 11,564 genes.

Estimation of MGPs

For each cell type, relevant to the brain region analyzed, we used the first principal component of the corresponding MGS expression as a surrogate for cell type proportions. This method of MGP estimation is similar to the methodology of multiple previous works that aim to estimate relative abundance of cell types in a whole tissue sample (Xu et al., 2013; Chikina et al., 2015; Westra et al., 2015). Principal component analysis was performed using the “prcomp” function from the “stats” R package, using the “scale = TRUE” option. It is plausible that some marker genes will be transcriptionally differentially regulated under some conditions (e.g., disease state), reducing the correspondence between their expression level with the relative cell proportion. A gene that is thus regulated is expected to have reduced correlation to the other marker genes with expression levels primarily dictated by cell type proportions, which will reduce their loading in the first principal component. To reduce the impact of regulated genes on the estimation process, we removed marker genes from a given analysis if their loadings had the opposite sign to the majority of markers when calculated based on all samples in the dataset and recalculate loadings and components using the remaining genes. This was repeated until all remaining genes had loadings with the same signs. Since the sign of the loadings of the rotation matrix (as produced by prcomp function) is arbitrary, to ease interpretation between the scores and the direction of summarized change in the expression of the relevant genes, we multiplied the scores by −1 whenever the sign of the loadings was negative. For visualization purposes, the scores were normalized to the range 0–1. Two-sided Wilcoxon rank-sum test (“wilcox.test” function from the stats package in R, default options) was used to compare between the different experimental conditions.

For estimations of cell type MGPs in samples from frontal cortex and white matter from the Trabzuni study (Trabzuni et al., 2013), results were subjected to multiple testing correction by the Benjamini and Hochberg method (Benjamini and Hochberg, 1995). For the PD datasets from Moran et al. (2006) and Lesnick et al. (2007), we estimated MGPs for dopaminergic neuron markers in control and PD subjects. Moran et al., data included samples from two subregions of substantia nigra. Since some of the subjects were sampled in only one of the subregions while others in both, the two subregions were analyzed separately.

For the SMRI collection of psychiatric patients we estimated oligodendrocytes MGPs based on expression data available through the SMRI website (as indicated above) and compared our results to experimental cell counts from the same cohort of subjects previously reported by Uranova et al. (2004). Figure 7B representing the oligodendrocyte cell counts in each disease group was adapted from Uranova et al. (2004). The data presented in the figure were extracted from Uranova et al. (2004), their Fig. 1A, using WebPlotDigitizer (http://arohatgi.info/WebPlotDigitizer/app/).

Code accessibility

All code is available as Extended Data. They are also maintained in the GitHub repositories listed below.

Marker gene selection and MGP estimation was performed with custom R functions provided within “markerGeneProfile” R package available on GitHub (https://github.com/oganm/markerGeneProfile).

Human homologues of mouse genes were identified using “homologene” R package available on GitHub (https://github.com/oganm/homologene). The code will be available as Extended Data 1.

Extended Data 1

R package to perform MGP estimations on whole tissue expression data and to select marker genes from cell type-specific expression data. Download Extended Data 1, ZIP file.

Code for data processing and analysis can be found at “neuroExpressoAnalysis” repository available on GitHub (https://github.com/oganm/neuroExpressoAnalysis). The code will be available as Extended Data 2.

Extended Data 2

R package to find gene homologues across species. Download Extended Data 2, ZIP file.

Source code of the neuroexpresso.org we app can be found at “NeuroExpresso” repository available on GitHub (https://github.com/oganm/neuroexpresso) The code will be available as Extended Data 3.

Extended Data 3

Code for data acquisition, analysis, and generation of all figures. Download Extended Data 3, ZIP file.

Results

Compilation of a brain cell type expression database

A key input to our search for marker genes is expression data from purified pooled brain cell types and single cells. Expanding on work from Okaty et al. (2011), we assembled and curated a database of cell type-specific expression profiles from published data (see Materials and Methods; Fig. 1A). The database represents 36 major cell types from 12 brain regions (Fig. 1B) from a total of 263 samples and 30 single cell clusters. Neocortex cortex is represented by both microarray and RNA-seq data, with five of the 15 cortical cell types represented exclusively by RNA-seq data. We used rigorous quality control steps to identify contaminated samples and outliers (see Materials and Methods). In the microarray dataset, all cell types except for ependymal cells are represented by at least three replicates and in the entire database, 14/36 cell types are represented by multiple independent studies (Table 1). The database is in constant growth as more cell type data becomes available. To facilitate access to the data and allow basic analysis we provide a simple search and visualization interface on the web, www.neuroexpresso.org (Fig. 2). The app provides means of visualizing gene expression in different brain regions based on the cell type, study or methodology, as well as differential expression analysis between groups of selected samples.

Figure 2.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 2.

The NeuroExpresso.org web application. The application allows easy visualization of gene expression across cell types in brain regions. Depicted is the expression of cell types from neocortex region. Alternatively, cell types can be grouped based on their primary neurotransmitter or the purification type. The application can be reached at www.neuroexpresso.org.

Identification of cell type-enriched MGSs

We used the NeuroExpresso data to identify MGSs for each of the 36 cell types. An individual MGS is composed of genes highly enriched in a cell type in the context of a brain region (Fig. 3A). Marker genes were selected based on (1) fold of change relative to other cell types in the brain region and (2) a lack of overlap of expression levels in other cell types (see Materials and Methods for details). This approach captured previously known marker genes [e.g., Th for dopaminergic cells (Pickel et al., 1976), Tmem119 for microglia (Bennett et al., 2016); of note, Tmem119 was classified as downregulated in activated microglia in our analysis, corroborating previous reports of Erny et al. (2015) and Satoh et al. (2016)]. We also identified numerous new candidate markers such as Cox6a2 for fast spiking PV+ interneurons. Some marker genes previously reported by individual studies whose data were included in our database, were not selected by our analysis. For example, Fam114a1 (9130005N14Rik), identified as a marker of fast spiking basket cells by Sugino et al. (2006), is highly expressed in oligodendrocytes and oligodendrocyte precursor (OPI cells) (Fig. 3B). These cell types were not considered in the Sugino et al. (2006) study, and thus the lack of specificity of Fam114a1 could not be observed by the authors. In total, we identified 2671 marker genes (3-186 markers per cell type; Table 1). The next sections focus on verification and validation of our proposed markers, using multiple methodologies.

Figure 3.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 3.

Marker genes are selected for mouse brain cell types and used to estimate cell type profiles. A, Expression of top marker genes selected for cell cortical cell types in cell types represented by RNA-seq (left) and microarray (right) data in NeuroExpresso. Expression levels were normalized per gene to be between 0 and 1 for each dataset. B, Expression of Fam114a1 in neocortex in microarray (top) and RNA-seq (bottom) datasets. Fam114a1 is a proposed fast spiking basket cell marker. It was not selected as a marker in this study due to its high expression in oligodendrocytes and S100a10 expressing pyramidal cells that were both absent from the original study.

Verification of markers by ISH

Two cell types in our database (Purkinje cells of the cerebellum and hippocampal dentate gyrus granule cells) are organized in well-defined anatomical structures that can be readily identified in tissue sections. We exploited this fact to use ISH data from the ABA (http://mouse.brain-map.org; Sunkin et al., 2013) to verify colocalization of known and novel markers for these two cell types. There was a high degree of colocalization of the markers to the corresponding brain structures, and by implication, cell types (Fig. 4A,B). For dentate gyrus granule cell markers, all 16 genes were represented in ABA. Of these, 14 specifically colocalized with known markers (i.e., had the predicted expression pattern confirming our marker selection), one marker exhibited nonspecific expression and one marker showed no signal. For Purkinje cell markers, 41/43 genes were represented in ABA. Of these, 37 specifically colocalized with known markers, one marker exhibited nonspecific expression and three markers showed no signal in the relevant brain structure (Fig. 4B). Figure 4A shows representative examples for the two cell types (details of our ABA analysis, including images for all the genes examined and validation status of the genes, are provided in Extended Data.

Figure 4.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 4.

Validation of candidate markers using the ABA. A, ISH images from the ABA. Rightmost panels show the location of the image in the brain according to the Allen Brain mouse reference atlas. Panels on the left show the ISH image and normalized expression level of known and novel dentate gyrus granule cell (upper panels) and Purkinje cell (lower panels) markers. B, Validation status of marker genes detected for Purkinje and dentate gyrus granule cells. Figures used for validation and validation statuses of individual marker genes can be found in Extended Data (Extended Data Fig. 4-1,2,3,4).

Figure 4-1,2,3,4

Figure 4-1. Expression of DG cell markers discovered in the study in ABA mouse brain ISH database. The first gene is Prox1, a known marker of DG cells. The intensity is color coded to range from blue (low expression intensity), through green (medium intensity) to red (high intensity). All images except Ogn is taken from the sagittal view. Ogn is taken from the coronal view.

Figure 4-2. Expression of Purkinje markers discovered in the study in ABA mouse brain ISH database. The first gene is Pcp2, a known marker of Purkinje cells. The intensity is color coded to range from blue (low expression intensity), through green (medium intensity) to red (high intensity). All images are taken from the sagittal view.

Figure 4-3. Validation status of DG cell markers.

Figure 4-4. Validation status of Purkinje cell markers. Download Figure 4-1,2,3,4, PDF file.

The four markers for which no signal was detected (one marker of dentate gyrus granule cells and three markers of Purkinje cells) underwent additional scrutiny. For one of the markers of Purkinje cells (Eps8l2), the staining of cerebellar sections was inconsistent, with some sections showing no staining, some sections showing nonspecific staining and several sections showing the predicted localization. The three remaining genes had no signal in ABA ISH data brain wide. We considered such absence or inconsistency of ISH signal inconclusive. Further analysis of these cases (one dentate gyrus granule cell marker, three Purkinje) suggests that the ABA data is the outlier. As part of our marker selection procedure, Pter, the dentate gyrus granule cell marker in question, was found to have high expression in granule cells within Hipposeq, a dataset that was not used for primary selection of markers (see Materials and Methods). In addition, Hipposeq indicates specificity of Pter to dentate gyrus granule cells relative to the other neuron types in Hipposeq. For the Purkinje markers, specific expression for one gene (Sycp1) was supported by the work of Rong et al. (2004), who used degeneration of Purkinje cells to identify potential markers of these cells (20/43 Purkinje markers identified in our study were also among the list of potential markers reported by Rong et al., 2004). We could not find data to further establish expression for the two remaining markers of Purkinje cells (Eps8l2 and Smpx). However, we stress that the transcriptomic data for Purkinje cells in our database are from five independent studies using different methodologies for cell purification, all of which support the specific expression of Eps8l2 and Smpx in Purkinje cells. Overall, through a combination of examination of ABA and other data sources, we were able to find confirmatory evidence of cell type specificity for 53/57 genes, with two false positives, and inconclusive findings for two genes.

We independently verified Cox6a2 as a marker of cortical fast spiking PV+ interneurons using triple label ISH of mouse cortical sections for Cox6a2, Pvalb and Slc32a1 (a pan-GABAergic neuronal marker) transcripts. As expected, we found that approximately 25% of all identified neurons were GABAergic (i.e., Slc32a1 positive), while 46% of all GABAergic neurons were also Pvalb positive. 80% of all Cox6a2+ neurons were both Pvalb and Slc32a1 positive whereas Cox6a2 expression outside GABAergic cells was very low (1.65% of Cox6a2 positive cells), suggesting high specificity of Cox6a2 to PV+ GABAergic cells (Fig. 5).

Figure 5.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 5.

Single-plane image of mouse sensorimotor cortex labeled for Pvalb, Slc32a1, and Cox6a2 mRNAs and counterstained with NeuroTrace. Arrows indicate Cox6a2+ neurons. Scale bar: 10 µm.

Verification of MGSs in single-cell RNA-seq data

As a further validation of our marker gene signatures, we analyzed their properties in recently published single-cell RNA-seq datasets derived from mouse cortex (Zeisel et al., 2015) and human cortex (Darmanis et al., 2015). We could not directly compare our MGSs to markers of cell type clusters identified in the studies producing these datasets since their correspondence to the cell types in NeuroExpresso was not clear. However, since both datasets represent a large number of individual cells, they are likely to include individual cells corresponding to the cortical cell types in our database. Thus, if our MGSs are cell type specific, and the corresponding cells are present in the single-cell datasets, MGS should have a higher than random chance of being codetected in the same cells, relative to nonmarker genes. A weakness of this approach is that a failure to observe a correlation might be due to absence of the cell type in the dataset rather than a true shortcoming of the markers. Overall, all MGSs for all cell types with the exception of OPCs were successfully validated (p < 0.001, Wilcoxon rank-sum test) in both single-cell datasets (Table 3).

View this table:
  • View inline
  • View popup
Table 3.

Coexpression of cortical MGSs in single-cell RNA-seq data

NeuroExpresso as a tool for understanding the biological diversity and similarity of brain cells

One of the applications of NeuroExpresso is as an exploratory tool for exposing functional and biological properties of cell types. In this section, we highlight three examples we encountered: We observed high expression of genes involved in GABA synthesis and release (Gad1, Gad2, and Slc32a1) in forebrain cholinergic neurons, suggesting the capability of these cells to release GABA in addition to their cognate neurotransmitter acetylcholine (Fig. 6A). Indeed, corelease of GABA and acetylcholine from forebrain cholinergic cells was recently demonstrated by Saunders et al. (2015). Similarly, the expression of the glutamate transporter Slc17a6, observed in thalamic (habenular) cholinergic cells suggests corelease of glutamate and acetylcholine from these cells, recently supported experimentally (Ren et al., 2011; Fig. 6A). Surprisingly, we observed consistently high expression of Ddc (DOPA decarboxylase), responsible for the second step in the monoamine synthesis pathway in oligodendrocyte cells (Fig. 6B). This result is suggestive of a previously unknown ability of oligodendrocytes to produce monoamine neurotransmitters upon exposure to appropriate precursor, as previously reported for several populations of cells in the brain (Ugrumov, 2013; Ren et al., 2016). Alternatively, this finding might indicate a previously unknown function of Ddc. Lastly, we found overlap between the markers of spinal cord and brainstem cholinergic cells, and midbrain noradrenergic cells, suggesting previously unknown functional similarity between cholinergic and noradrenergic cell types. The common markers included Chodl, Calca, Cda, and Hspb8, which were recently confirmed to be expressed in brainstem cholinergic cells (Enjin et al., 2010), and Phox2b, a known marker of noradrenergic cells (Pattyn et al., 1997).

Figure 6.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 6.

NeuroExpresso reveals novel gene expression patterns. A, Expression of cholinergic, GABAergic, and glutamatergic markers in cholinergic cells from forebrain and thalamus. Forebrain cholinergic neurons express GABAergic markers while thalamus (hubenular) cholinergic neurons express glutamatergic markers. B, left, Expression of Ddc in oligodendrocyte samples from Cahoy et al. (2008), Doyle et al. (2008), and Fomchenko et al. (2011) datasets and in comparison to dopaminergic cells and other (nonoligodendrocyte) cell types from the neocortex in the microarray dataset. In all three datasets, expression of Ddc in oligodendrocytes is comparable to expression in dopaminergic cells and is higher than in any of the other cortical cells. Oligodendrocyte samples show higher than background levels of expression across datasets. Right, Ddc expression in oligodendrocytes, OPCs, and other cell types from Tasic et al. (2016) single-cell dataset. C, Bimodal gene expression in two dopaminergic cell isolates by different labs. Genes shown are labeled as marker genes in the context of midbrain if the two cell isolates are labeled as different cell types.

MGPs can be used to infer changes in cellular proportions in the brain

Marker genes are by definition cell type specific, and thus changes in their expression observed in bulk tissue data can represent either changes in the number of cells or cell type-specific transcriptional changes (or a combination). Marker genes of four major classes of brain cell types (namely neurons, astrocytes, oligodendrocytes, and microglia) were previously used to gain cell type-specific information from brain bulk tissue data (Sibille et al., 2008; Kuhn et al., 2011; Tan et al., 2013b; Hagenauer et al., 2016; Skene and Grant, 2016; Ramaker et al., 2017), and infer changes in cellular abundance. Following the practice of others, we applied similar approach to our marker genes, summarizing their expression profiles as the first principal component of their expression (see Materials and Methods; Xu et al., 2013; Chikina et al., 2015; Westra et al., 2015). We refer to these summaries as MGPs.

In order to validate the use of MGPs as surrogates for relative cell type proportions, we used bulk tissue expression data from conditions with known changes in cellular proportions. Firstly, we calculated MGPs for human white matter and frontal cortex using data collected by (Trabzuni et al., 2013). Comparing the MGPs in white versus grey matter, we observed the expected increase in oligodendrocyte MGP, as well as increase in oligodendrocyte progenitor cell, endothelial cell, astrocyte and microglia MGPs, corroborating previously reported higher number of these cell types in white versus grey matter (Ogura et al., 1994; Gudi et al., 2009; Williams et al., 2013). We also observed decrease in MGPs of all neurons, corroborating the low neuronal cell body density in white versus grey matter (Fig. 7A; Table 4).

Figure 7.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 7.

MGPs reveal cell type-specific changes in whole tissue data. A, Estimation of cell type profiles for cortical cells in frontal cortex and white matter. Values are normalized to be between 0 and 1. (***p < 0.001). B, left, Oligodendrocyte MGPs in Stanley C cohort. Right, Morphology-based oligodendrocyte counts of Stanley C cohort. Figure adapted from Uranova et al. (2004). C, Estimations of dopaminergic cell MGPs in substantia nigra of controls and PD patients. Values are relative and are normalized to be between 0 and 1 and are not reflective of absolute proportions (**p < 0.01, ***p < 0.001).

View this table:
  • View inline
  • View popup
Table 4.

Summaries of statistical analyses

A more specific form of validation was obtained from a pair of studies done on the same cohort of subjects, with one study providing expression profiles (study 2 from SMRI microarray database, see Methods) and another providing stereological counts of oligodendrocytes (Uranova et al., 2004), for similar brain regions. We calculated oligodendrocyte MGPs based on the expression data and compared the results to experimental cell counts from Uranova et al. (2004). The MGPs were consistent with the reduction of oligodendrocytes observed by Uranova et al. (2004), in schizophrenia, bipolar disorder, and depression patients [Fig. 7B; Table 4; direct comparison between MGP and experimental cell count at a subject level was not possible, as Uranova et al. (2004), did not provide subject identifiers corresponding to each of the cell count values].

To further assess and demonstrate the ability of MGPs to correctly represent cell type-specific changes in neurological conditions, we calculated dopaminergic profiles of substantia nigra samples in three expression datasets of PD patients and controls from Moran et al. (2006; GSE8397), Lesnick et al. (2007; GSE7621), and Zhang et al. (2005; GSE20295). We tested whether the well-known loss of dopaminergic cells in PD could be detected using our MGP approach. MGP analysis correctly identified reduction in dopaminergic cells in substantia nigra of PD patients (Fig. 7C; Table 4).

Discussion

Cell type-specific expression database as a resource for neuroscience

We present NeuroExpresso, a rigorously curated database of brain cell type-specific gene expression data (www.neuroexpresso.org), and demonstrate its utility in identifying cell type markers and in the interpretation of bulk tissue expression profiles. To our knowledge, NeuroExpresso is the most comprehensive database of expression data for identified brain cell types. The database will be expanded as more data become available.

NeuroExpresso allows simultaneous examination of gene expression associated with numerous cell types across different brain regions. This approach promotes discovery of cellular properties that might have otherwise been unnoticed or overlooked when using gene-by-gene approaches or pathway enrichment analysis. For example, a simple examination of expression of genes involved in biosynthesis and secretion of GABA and glutamate, suggested the corelease of these neurotransmitters from forebrain and habenular cholinergic cells, respectively.

Studies that aim to identify novel properties of cell types can benefit from our database as an inexpensive and convenient way to seek novel patterns of gene expression. For instance, our database shows significant bimodality of gene expression in dopaminergic cell types from the midbrain (Fig. 6C). The observed bimodality might indicate heterogeneity in the dopaminergic cell population, which could prove a fruitful avenue for future investigation. Another interesting finding from NeuroExpresso is the previously unknown overlap of several markers of motor cholinergic and noradrenergic cells. While the overlapping markers were previously shown to be expressed in spinal cholinergic cells, to our knowledge their expression in noradrenergic (as well as brain stem cholinergic) cells was previously unknown.

NeuroExpresso can be also used to facilitate interpretation of genomics and transcriptomics studies. Recently (Pantazatos et al., 2017) used an early release of the databases to interpret expression patterns in the cortex of suicide victims, suggesting involvement of microglia. Moreover, this database has further applications beyond the use of marker genes, such as understanding the molecular basis of cellular electrophysiological diversity (Tripathy et al., 2017).

Importantly, NeuroExpresso is a cross-laboratory database. A consistent result observed across several studies raises the certainty that it represents a true biological finding rather than merely an artefact or contamination with other cell types. This is specifically important for unexpected findings such as the expression of Ddc in oligodendrocytes (Fig. 6B).

Validation of cell type markers

To assess the quality of the marker genes, a subset of our cell type markers was validated by ISH (Cox6a2 as a marker of fast spiking basket cells, and multiple Purkinje and Dentate gyrus granule cell markers). Further validation was performed with computational methods in independent single-cell datasets from mouse and human. This analysis validated all cortical marker gene sets except OPCs. In their paper, Zeisel et al. (2015) stated that none of the oligodendrocyte subclusters they identified were associated with OPCs, which likely explains why we were not able to validate the OPC MGP in their dataset. The Darmanis dataset, however, is reported to include OPCs (18/466 cells; Darmanis et al., 2015), but again our OPC MGP did not show good validation. In this case, the reason for negative results could be changes in the expression of the mouse marker gene orthologs in human, possibly reflecting functional differences between the human and mouse cell types (Shay et al., 2013; Zhang et al., 2016). Further work will be needed to identify a robust human OPC signature. However, since most MGSs did validate between mouse and human data, it suggests that most marker genes preserve their specificity despite cross-species gene expression differences.

Improving interpretation of bulk tissue expression profiles

Marker genes can assist with the interpretation of bulk tissue data in the form of MGPs. A parsimonious interpretation of a change in an MGP is a change in the relative abundance of the corresponding cell type. Similar summarizations of cell type-specific genes were previously used to analyse gene expression (Xu et al., 2013; Chikina et al., 2015; Newman et al., 2015; Westra et al., 2015) and methylation data (Jones et al., 2017; Shannon et al., 2017). Since our approach focuses on the overall trend of a MGS expression level, it should be relatively insensitive to expression changes in a subset of these genes. Still, we prefer to refer the term MGP rather than “cell type proportions,” to emphasize the indirect nature of the approach.

Our results show that MGPs based on NeuroExpresso MGSs can reliably recapitulate relative changes in cell type abundance across different conditions. Direct validation of cell count estimation based on MGSs in human brain was not feasible due to the unavailability of cell counts coupled with expression data. Instead, we compared oligodendrocyte MGPs based on a gene expression dataset available through the SMRI database to experimental cell counts taken from a separate study (Uranova et al., 2004) of the same cohort of subjects and were able to recapitulate the reported reduction of oligodendrocyte proportions in patients with schizophrenia, bipolar disorder and depression. Based on analysis of dopaminergic MGPs we were also able to capture the well-known reduction in dopaminergic cell types in PD patients.

Limitations and caveats

While we took great care in the assembly of NeuroExpresso, there remain a number of limitations and room for improvement. First, the NeuroExpresso database was assembled from multiple datasets, based on different mouse strains and cell type extraction methodologies, which may lead to undesirable heterogeneity. We attempted to reduce interstudy variability by combined pre-processing of the raw data and normalization. However, due to insufficient overlap between cell types represented by different studies, many of the potential confounding factors such as age, sex, and methodology could not be explicitly corrected for. Thus, it is likely that some of the expression values in NeuroExpresso may be affected by confounding factors. While our confidence in the data is increased when expression signals are robust across multiple studies, many of the cell types in NeuroExpresso are represented by a single study. Hence, we advise that small differences in expression between cell types as well as previously unknown expression patterns based on a single data source should be treated with caution. In our analyses, we address these issues by enforcing a stringent set of criteria for the marker selection process, reducing the impact of outlier samples, ignoring small changes in gene expression and validating the results in external data. However, it must be noted that it was not possible validate our markers for all cell types and brain regions.

An additional limitation of our study is that the representation for many of the brain cell types is still lacking in the NeuroExpresso database. Therefore, despite our considerable efforts to ensure cell type specificity of the marker genes, we cannot rule out the possibility that some of them are also expressed in one or more of the nonrepresented cell types. This problem is partially alleviated in cortex due to the inclusion of single-cell data. As more such datasets become available, it will be easier to create a more comprehensive database.

A related problem to the coverage of cell types in NeuroExpresso lies in the definition of the term “cell type”. Most cell types represented in NeuroExpresso are heterogeneous populations. For instance, fast-spiking basket cells as defined by microarray data match five distinct clusters identified by Tasic et al. (2016) based on single-cell RNA sequencing data. By considering them as a single cell type, we lose the ability to detect unique properties of the individual clusters. Heterogeneity may also reduce the confidence we have in our marker genes. If a selected marker is expressed in a subtype of another cell type, this will not be noticed in pooled expression data as the signal will be suppressed by other subtypes that do not express the gene. We hope to remedy this problem with increased availability of single-cell data in the future. Where intercell type variability ends and new cell type begins is an ongoing discussion in the field. For the purposes of this study, we tried to ensure that cell types we define are accepted and studied by a portion of the community, and that the expression profiles of the cell types were distinct enough to allow marker gene identification. The data we make available to other researchers may be portioned into finer cell types or grouped together into more broad cell type groups depending on the aims of the researchers.

Finally, it must be noted that while we aim to infer changes in cell type abundance with MGPs, we do not attempt to estimate the cell type proportions themselves even though many established deconvolution methods do accomplish this using databases of expression profiles (Grange et al., 2014; Chikina et al., 2015; Newman et al., 2015). These approaches operate on the assumption that the absolute expression levels of genes will be conserved across the cell types in the reference database and cell types that make up the whole tissue sample. In our work, we avoid these approaches because our database (mouse cell types) and the whole tissue samples we analyze (human brain tissue) come from different species which may cause changes in gene expression, while marker genes are more likely to be conserved.

In summary, we believe that NeuroExpresso is a valuable resource for neuroscientists. We identified numerous novel markers for 36 major cell types and used them to estimate cell type profiles in bulk tissue data, demonstrating high correlation between our estimates and experiment-based cell counts. This approach can be used to reveal cell type-specific changes in whole tissue samples and to re-evaluate previous analyses on brain whole tissues that might be biased by cell type-specific changes. Information about cell type-specific changes is likely to be very valuable since conditions like neuron death, inflammation, and astrogliosis are common hallmarks of in neurological diseases.

Acknowledgments

Acknowledgements: We thank Ken Sugino for providing access to raw CEL files for Purkinje and TH+ cells from locus coeruleus, Chee Yeun Chung for providing access to raw CEL files for dopaminergic cells, Dean Attali for providing insight on the usage of the Shiny platform, the Pavlidis lab for their inputs to the project during its development, and Rosemary McCloskey for aid in editing the manuscript.

Footnotes

  • The authors declare no competing financial interests.

  • This work is supported by a NeuroDevNet grant (P.P.); the University of British Columbia Bioinformatics Graduate Training Program (B.O.M.); a Canadian Institutes of Health Research postdoctoral fellowship (S.J.T.); the Campbell Family Mental Health Research Institute of Centre for Addiction and Mental Health (E.S. and B.R.); National Institutes of Health Grants MH077159 (to E.S.) and MH111099 and GM076990 (to P.P.); and an Natural Sciences and Engineering Research Council of Canada Discovery Grant (P.P.).

This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license, which permits unrestricted use, distribution and reproduction in any medium provided that the original work is properly attributed.

References

  1. Anandasabapathy N, Victora GD, Meredith M, Feder R, Dong B, Kluger C, Yao K, Dustin ML, Nussenzweig MC, Steinman RM, Liu K (2011) Flt3L controls the development of radiosensitive dendritic cells in the meninges and choroid plexus of the steady-state mouse brain. J Exp Med 208:1695–1705. doi:10.1084/jem.20102657
    OpenUrlAbstract/FREE Full Text
  2. ↵
    Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Holko M, Yefanov A, Lee H, Zhang N, Robertson CL, Serova N, Davis S, Soboleva A (2013) NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Res 41:D991–D995.
    OpenUrlCrossRefPubMed
  3. Beckervordersandforth R, Tripathi P, Ninkovic J, Bayam E, Lepier A, Stempfhuber B, Kirchhoff F, Hirrlinger J, Haslinger A, Lie DC, Beckers J, Yoder B, Irmler M, Götz M (2010) In vivo fate mapping and expression analysis reveals molecular hallmarks of prospectively isolated adult neural stem cells. Cell Stem Cell 7:744–758. doi:10.1016/j.stem.2010.11.017 pmid:21112568
    OpenUrlCrossRefPubMed
  4. Bellesi M, Pfister-Genskow M, Maret S, Keles S, Tononi G, Cirelli C (2013) Effects of sleep and wake on oligodendrocytes and their precursors. J Neurosci 33:14288–14300. doi:10.1523/JNEUROSCI.5102-12.2013 pmid:24005282
    OpenUrlAbstract/FREE Full Text
  5. ↵
    Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B Methodol 57:289–300.
    OpenUrl
  6. ↵
    Bennett ML, Bennett FC, Liddelow SA, Ajami B, Zamanian JL, Fernhoff NB, Mulinyawe SB, Bohlen CJ, Adil A, Tucker A, Weissman IL, Chang EF, Li G, Grant GA, Hayden Gephart MG, Barres BA (2016) New tools for studying microglia in the mouse and human CNS. Proc Natl Acad Sci USA 113:E1738–E1746. doi:10.1073/pnas.1525528113 pmid:26884166
    OpenUrlAbstract/FREE Full Text
  7. ↵
    Cahoy JD, Emery B, Kaushal A, Foo LC, Zamanian JL, Christopherson KS, Xing Y, Lubischer JL, Krieg PA, Krupenko SA, Thompson WJ, Barres BA (2008) A transcriptome database for astrocytes, neurons, and oligodendrocytes: a new resource for understanding brain development and function. J Neurosci 28:264–278. doi:10.1523/JNEUROSCI.4178-07.2008 pmid:18171944
    OpenUrlAbstract/FREE Full Text
  8. ↵
    Carvalho BS, Irizarry RA (2010) A framework for oligonucleotide microarray preprocessing. Bioinformatics 26:2363–2367. doi:10.1093/bioinformatics/btq431
    OpenUrlCrossRefPubMed
  9. ↵
    Celio MR, Heizmann CW (1981) Calcium-binding protein parvalbumin as a neuronal marker. Nature 293:300–302. pmid:7278987
    OpenUrlCrossRefPubMed
  10. ↵
    Cembrowski MS, Wang L, Sugino K, Shields BC, Spruston N (2016) Hipposeq: a comprehensive RNA-seq database of gene expression in hippocampal principal neurons. Elife 5:e14997. doi:10.7554/eLife.14997 pmid:27113915
    OpenUrlCrossRefPubMed
  11. ↵
    Chikina M, Zaslavsky E, Sealfon SC (2015) CellCODE: a robust latent variable approach to differential expression analysis for heterogeneous cell populations. Bioinformatics 31:1584–1591.
    OpenUrlCrossRefPubMed
  12. ↵
    Chung CY, Seo H, Sonntag KC, Brooks A, Lin L, Isacson O (2005) Cell type-specific gene expression of midbrain dopaminergic neurons reveals molecules involved in their vulnerability and protection. Hum Mol Genet 14:1709–1725. doi:10.1093/hmg/ddi178 pmid:15888489
    OpenUrlCrossRefPubMed
  13. Dalal J, Roh JH, Maloney SE, Akuffo A, Shah S, Yuan H, Wamsley B, Jones WB, Strong C, de G, Gray PA, Holtzman DM, Heintz N, Dougherty JD (2013) Translational profiling of hypocretin neurons identifies candidate molecules for sleep regulation. Genes Dev 27:565–578. doi:10.1101/gad.207654.112
    OpenUrlAbstract/FREE Full Text
  14. ↵
    Darmanis S, Sloan SA, Zhang Y, Enge M, Caneda C, Shuer LM, Gephart MGH, Barres BA, Quake SR (2015) A survey of human brain transcriptome diversity at the single cell level. Proc Natl Acad Sci USA 112:7285–7290. doi:10.1073/pnas.1507125112
    OpenUrlAbstract/FREE Full Text
  15. Dougherty JD, Maloney SE, Wozniak DF, Rieger MA, Sonnenblick L, Coppola G, Mahieu NG, Zhang J, Cai J, Patti GJ, Abrahams BS, Geschwind DH, Heintz N (2013) The disruption of Celf6, a gene identified by translational profiling of serotonergic neurons, results in autism-related behaviors. J Neurosci 33:2732–2753. doi:10.1523/JNEUROSCI.4762-12.2013 pmid:23407934
    OpenUrlAbstract/FREE Full Text
  16. ↵
    Doyle JP, Dougherty JD, Heiman M, Schmidt EF, Stevens TR, Ma G, Bupp S, Shrestha P, Shah RD, Doughty ML, Gong S, Greengard P, Heintz N (2008) Application of a translational profiling approach for the comparative analysis of CNS cell types. Cell 135:749–762. doi:10.1016/j.cell.2008.10.029 pmid:19013282
    OpenUrlCrossRefPubMed
  17. ↵
    Edgar R, Domrachev M, Lash AE (2002) Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30:207–210. pmid:11752295
    OpenUrlCrossRefPubMed
  18. ↵
    Enjin A, Rabe N, Nakanishi ST, Vallstedt A, Gezelius H, Memic F, Lind M, Hjalt T, Tourtellotte WG, Bruder C, Eichele G, Whelan PJ, Kullander K (2010) Identification of novel spinal cholinergic genetic subtypes disclose Chodl and Pitx2 as markers for fast motor neurons and partition cells. J Comp Neurol 518:2284–2304. doi:10.1002/cne.22332
    OpenUrlCrossRefPubMed
  19. ↵
    Erny D, Hrabě de Angelis AL, Jaitin D, Wieghofer P, Staszewski O, David E, Keren-Shaul H, Mahlakoiv T, Jakobshagen K, Buch T, Schwierzeck V, Utermöhlen O, Chun E, Garrett WS, McCoy KD, Diefenbach A, Staeheli P, Stecher B, Amit I, Prinz M (2015) Host microbiota constantly control maturation and function of microglia in the CNS. Nat Neurosci 18:965–977. doi:10.1038/nn.4030 pmid:26030851
    OpenUrlCrossRefPubMed
  20. ↵
    Fomchenko EI, Dougherty JD, Helmy KY, Katz AM, Pietras A, Brennan C, Huse JT, Milosevic A, Holland EC (2011) Recruited cells can become transformed and overtake PDGF-induced murine gliomas in vivo during tumor progression. PLoS One 6:e20605. doi:10.1371/journal.pone.0020605 pmid:21754979
    OpenUrlCrossRefPubMed
  21. Galloway JN, Shaw C, Yu P, Parghi D, Poidevin M, Jin P, Nelson DL (2014) CGG repeats in RNA modulate expression of TDP-43 in mouse and fly models of fragile X tremor ataxia syndrome. Hum Mol Genet 23:5906–5915.
    OpenUrlCrossRefPubMed
  22. Görlich A, Antolin-Fontes B, Ables JL, Frahm S, Slimak MA, Dougherty JD, Ibañez-Tallon I (2013) Reexposure to nicotine during withdrawal increases the pacemaking activity of cholinergic habenular neurons. Proc Natl Acad Sci USA 110:17077–17082. doi:10.1073/pnas.1313103110 pmid:24082085
    OpenUrlAbstract/FREE Full Text
  23. ↵
    Grange P, Bohland JW, Okaty BW, Sugino K, Bokil H, Nelson SB, Ng L, Hawrylycz M, Mitra PP (2014) Cell-type–based model explaining coexpression patterns of genes in the brain. Proc Natl Acad Sci USA 111:5397–5402. doi:10.1073/pnas.1312098111 pmid:24706869
    OpenUrlAbstract/FREE Full Text
  24. ↵
    Gudi V, Moharregh-Khiabani D, Skripuletz T, Koutsoudaki PN, Kotsiari A, Skuljec J, Trebst C, Stangel M (2009) Regional differences between grey and white matter in cuprizone induced demyelination. Brain Res 1283:127–138. doi:10.1016/j.brainres.2009.06.005 pmid:19524552
    OpenUrlCrossRefPubMed
  25. ↵
    Hagenauer MH, Li JZ, Walsh DM, Vawter MP, Thompson RC, Turner CA, Bunney WE, Myers RM, Barchas JD, Schatzberg AF, Watson SJ, Akil H (2016) Inference of cell-type composition from human brain transcriptomic datasets illuminates the effects of age, manner death, dissection, and psychiatric diagnosis. bioRxiv 089391.
  26. ↵
    Halliwell B (2003) Oxidative stress in cell culture: an under-appreciated problem? FEBS Lett 540:3–6. pmid:12681474
    OpenUrlCrossRefPubMed
  27. ↵
    Handley A, Schauer T, Ladurner AG, Margulies CE (2015) Designing cell-type-specific genome-wide experiments. Mol Cell 58:621–631. doi:10.1016/j.molcel.2015.04.024 pmid:26000847
    OpenUrlCrossRefPubMed
  28. Heiman M, Heilbut A, Francardo V, Kulicke R, Fenster RJ, Kolaczyk ED, Mesirov JP, Surmeier DJ, Cenci MA, Greengard P (2014) Molecular adaptations of striatal spiny projection neurons during levodopa-induced dyskinesia. Proc Natl Acad Sci USA 111:4578–4583. doi:10.1073/pnas.1401819111 pmid:24599591
    OpenUrlAbstract/FREE Full Text
  29. ↵
    Holtman IR, Noback M, Bijlsma M, Duong KN, van der Geest MA, Ketelaars PT, Brouwer N, Vainchtein ID, Eggen BJL, Boddeke HWGM (2015) Glia open access database (GOAD): a comprehensive gene expression encyclopedia of glia cells in health and disease. Glia 63:1495–1506.
    OpenUrlCrossRefPubMed
  30. ↵
    Hu H, Gan J, Jonas P (2014) Fast-spiking, parvalbumin+ GABAergic interneurons: from cellular design to microcircuit function. Science 345:1255263. doi:10.1126/science.1255263 pmid:25082707
    OpenUrlAbstract/FREE Full Text
  31. ↵
    Januszyk M, Rennert RC, Sorkin M, Maan ZN, Wong LK, Whittam AJ, Whitmore A, Duscher D, Gurtner GC (2015) Evaluating the effect of cell culture on gene expression in primary tissue samples using microfluidic-based single cell transcriptional analysis. Microarrays 4:540–550. doi:10.3390/microarrays4040540 pmid:27600239
    OpenUrlCrossRefPubMed
  32. ↵
    Jones MJ, Islam SA, Edgar RD, Kobor MS (2017) Adjusting for cell type composition in DNA methylation data using a regression-based approach. Methods Mol Biol 1589:99–106. doi:10.1007/7651_2015_262 pmid:26126446
    OpenUrlCrossRefPubMed
  33. ↵
    Kawaguchi Y, Katsumaru H, Kosaka T, Heizmann CW, Hama K (1987) Fast spiking cells in rat hippocampus (CA1 region) contain the calcium-binding protein parvalbumin. Brain Res 416:369–374. pmid:3304536
    OpenUrlCrossRefPubMed
  34. ↵
    Kuhn A, Thu D, Waldvogel HJ, Faull RLM, Luthi-Carter R (2011) Population-specific expression analysis (PSEA) reveals molecular changes in diseased brain. Nat Methods 8:945–947. doi:10.1038/nmeth.1710
    OpenUrlCrossRefPubMed
  35. ↵
    Lesnick TG, Papapetropoulos S, Mash DC, Ffrench-Mullen J, Shehadeh L, de Andrade M, Henley JR, Rocca WA, Ahlskog JE, Maraganore DM (2007) A genomic pathway approach to a complex disease: axon guidance and Parkinson disease. PLoS Genet 3:e98. doi:10.1371/journal.pgen.0030098 pmid:17571925
    OpenUrlCrossRefPubMed
  36. ↵
    Lobo MK, Karsten SL, Gray M, Geschwind DH, Yang XW (2006) FACS-array profiling of striatal projection neuron subtypes in juvenile and adult mouse brains. Nat Neurosci 9:443–452. doi:10.1038/nn1654 pmid:16491081
    OpenUrlCrossRefPubMed
  37. ↵
    Maechler M, Rousseeuw PR, Struyf A, Gonzalez J (2016) Cluster: “Finding Groups in Data”: Cluster Analysis Extended Rousseeuw et al.
  38. ↵
    Margolis EB, Lock H, Hjelmstad GO, Fields HL (2006) The ventral tegmental area revisited: is there an electrophysiological marker for dopaminergic neurons? J Physiol 577:907–924. doi:10.1113/jphysiol.2006.117069 pmid:16959856
    OpenUrlCrossRefPubMed
  39. Maze I, Chaudhury D, Dietz DM, Von Schimmelmann M, Kennedy PJ, Lobo MK, Sillivan SE, Miller ML, Bagot RC, Sun H, Turecki G, Neve RL, Hurd YL, Shen L, Han M-H, Schaefer A, Nestler EJ (2014) G9a influences neuronal subtype specification in striatum. Nat Neurosci 17:533–539. doi:10.1038/nn.3670 pmid:24584053
    OpenUrlCrossRefPubMed
  40. ↵
    Moran LB, Duke DC, Deprez M, Dexter DT, Pearce RKB, Graeber MB (2006) Whole genome expression profiling of the medial and lateral substantia nigra in Parkinson’s disease. Neurogenetics 7:1–11. doi:10.1007/s10048-005-0020-2
    OpenUrlCrossRefPubMed
  41. ↵
    Newman AM, Liu CL, Green MR, Gentles AJ, Feng W, Xu Y, Hoang CD, Diehn M, Alizadeh AA (2015) Robust enumeration of cell subsets from tissue expression profiles. Nat Methods 12:453–457. doi:10.1038/nmeth.3337 pmid:25822800
    OpenUrlCrossRefPubMed
  42. ↵
    Ogura K, Ogawa M, Yoshida M (1994) Effects of ageing on microglia in the normal rat brain: immunohistochemical observations. Neuroreport 5:1224–1226. pmid:7919169
    OpenUrlCrossRefPubMed
  43. ↵
    Okaty BW, Miller MN, Sugino K, Hempel CM, Nelson SB (2009) Transcriptional and electrophysiological maturation of neocortical fastspiking GABAergic interneurons. J Neurosci 29:7040–7052. doi:10.1523/JNEUROSCI.0105-09.2009
    OpenUrlAbstract/FREE Full Text
  44. ↵
    Okaty BW, Sugino K, Nelson SB (2011) A quantitative comparison of cell-type-specific microarray gene expression profiling methods in the mouse brain. PLoS One 6:e16493. doi:10.1371/journal.pone.0016493 pmid:21304595
    OpenUrlCrossRefPubMed
  45. ↵
    Pantazatos SP, Huang Y-Y, Rosoklija GB, Dwork AJ, Arango V, Mann JJ (2017) Whole-transcriptome brain expression and exon-usage profiling in major depression and suicide: evidence for altered glial, endothelial and ATPase activity. Mol Psychiatry 22:760–773.
    OpenUrl
  46. ↵
    Pattyn A, Morin X, Cremer H, Goridis C, Brunet JF (1997) Expression and interactions of the two closely related homeobox genes Phox2a and Phox2b during neurogenesis. Dev Camb Engl 124:4065–4075.
    OpenUrl
  47. Paul A, Cai Y, Atwal GS, Huang ZJ (2012) Developmental coordination of gene expression between synaptic partners during GABAergic circuit assembly in cerebellar cortex. Front Neural Circuits 6:37. doi:10.3389/fncir.2012.00037 pmid:22754500
    OpenUrlCrossRefPubMed
  48. Perrone-Bizzozero NI, Tanner DC, Mounce J, Bolognani F (2011) Increased expression of axogenesis-related genes and mossy fibre length in dentate granule cells from adult HuD overexpressor mice. ASN Neuro 3:AN20110015. doi:10.1042/AN20110015
    OpenUrlCrossRefPubMed
  49. Phani S, Gonye G, Iacovitti L (2010) VTA neurons show a potentially protective transcriptional response to MPTP. Brain Res 1343:1–13. doi:10.1016/j.brainres.2010.04.061 pmid:20462502
    OpenUrlCrossRefPubMed
  50. ↵
    Pickel VM, Joh TH, Reis DJ (1976) Monoamine-synthesizing enzymes in central dopaminergic, noradrenergic and serotonergic neurons. Immunocytochemical localization by light and electron microscopy. J Histochem Cytochem 24:792–792. doi:10.1177/24.7.8567
    OpenUrlCrossRef
  51. ↵
    Poulin J-F, Tasic B, Hjerling-Leffler J, Trimarchi JM, Awatramani R (2016) Disentangling neural cell diversity using single-cell transcriptomics. Nat Neurosci 19:1131–1141. doi:10.1038/nn.4366 pmid:27571192
    OpenUrlCrossRefPubMed
  52. ↵
    Ramaker RC, Bowling KM, Lasseigne BN, Hagenauer MH, Hardigan AA, Davis NS, Gertz J, Cartagena PM, Walsh DM, Vawter MP, Jones EG, Schatzberg AF, Barchas JD, Watson SJ, Bunney BG, Akil H, Bunney WE, Li JZ, Cooper SJ, Myers RM (2017) Post-mortem molecular profiling of three psychiatric disorders. Genome Med 9:72. doi:10.1186/s13073-017-0458-5 pmid:28754123
    OpenUrlCrossRefPubMed
  53. ↵
    Ren J, Qin C, Hu F, Tan J, Qiu L, Zhao S, Feng G, Luo M (2011) Habenula “cholinergic” neurons co-release glutamate and acetylcholine and activate postsynaptic neurons via distinct transmission modes. Neuron 69:445–452. doi:10.1016/j.neuron.2010.12.038 pmid:21315256
    OpenUrlCrossRefPubMed
  54. ↵
    Ren L, Wienecke J, Hultborn H, Zhang M (2016) Production of dopamine by aromatic L-amino acid decarboxylase cells after spinal cord injury. J Neurotrauma 33:1150–1160.
    OpenUrl
  55. ↵
    Rong Y, Wang T, Morgan JI (2004) Identification of candidate Purkinje cell-specific markers by gene expression profiling in wild-type and pcd(3J) mice. Brain Res Mol Brain Res 132:128–145. doi:10.1016/j.molbrainres.2004.10.015 pmid:15582153
    OpenUrlCrossRefPubMed
  56. Rossner MJ, Hirrlinger J, Wichert SP, Boehm C, Newrzella D, Hiemisch H, Eisenhardt G, Stuenkel C, von Ahsen O, Nave K-A (2006) Global transcriptome analysis of genetically identified neurons in the adult cortex. J Neurosci 26:9956–9966. doi:10.1523/JNEUROSCI.0468-06.2006
    OpenUrlAbstract/FREE Full Text
  57. ↵
    Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65. doi:10.1016/0377-0427(87)90125-7
    OpenUrlCrossRef
  58. ↵
    Satoh J, Kino Y, Asahina N, Takitani M, Miyoshi J, Ishida T, Saito Y (2016) TMEM119 marks a subset of microglia in the human brain. Neuropathology 36:39–49. doi:10.1111/neup.12235 pmid:26250788
    OpenUrlCrossRefPubMed
  59. ↵
    Saunders A, Granger AJ, Sabatini BL (2015) Corelease of acetylcholine and GABA from cholinergic forebrain neurons. Elife 4. doi:10.7554/eLife.06412
    OpenUrlAbstract/FREE Full Text
  60. ↵
    Schmidt EF, Warner-Schmidt JL, Otopalik BG, Pickett SB, Greengard P, Heintz N (2012) Identification of the cortical neurons that mediate antidepressant responses. Cell 149:1152–1163. doi:10.1016/j.cell.2012.03.038 pmid:22632977
    OpenUrlCrossRefPubMed
  61. ↵
    Shannon CP, Balshaw R, Chen V, Hollander Z, Toma M, McManus BM, FitzGerald JM, Sin DD, Ng RT, Tebbutt SJ (2017) Enumerateblood – an R package to estimate the cellular composition of whole blood from Affymetrix Gene ST gene expression profiles. BMC Genomics 18. doi:10.1186/s12864-016-3460-1
    OpenUrlCrossRef
  62. ↵
    Shay T, Jojic V, Zuk O, Rothamel K, Puyraimond-Zemmour D, Feng T, Wakamatsu E, Benoist C, Koller D, Regev A ImmGen Consortium (2013) Conservation and divergence in the transcriptional programs of the human and mouse immune systems. Proc Natl Acad Sci USA 110:2946–2951. doi:10.1073/pnas.1222738110 pmid:23382184
    OpenUrlAbstract/FREE Full Text
  63. ↵
    Shrestha P, Mousa A, Heintz N (2015) Layer 2/3 pyramidal cells in the medial prefrontal cortex moderate stress induced depressive behaviors. Elife 4. doi:10.7554/eLife.08752
    OpenUrlAbstract/FREE Full Text
  64. ↵
    Sibille E, Arango V, Joeyen-Waldorf J, Wang Y, Leman S, Surget A, Belzung C, Mann JJ, Lewis DA (2008) Large-scale estimates of cellular origins of mRNAs: enhancing the yield of transcriptome analyses. J Neurosci Methods 167:198–206. doi:10.1016/j.jneumeth.2007.08.009 pmid:17889939
    OpenUrlCrossRefPubMed
  65. ↵
    Skene NG, Grant SGN (2016) Identification of vulnerable cell types in major brain disorders using single cell transcriptomes and expression weighted cell type enrichment. Front Neurosci 10. doi:10.3389/fnins.2016.00016
    OpenUrlCrossRef
  66. ↵
    Sugino K, Hempel CM, Miller MN, Hattox AM, Shapiro P, Wu C, Huang ZJ, Nelson SB (2006) Molecular taxonomy of major neuronal classes in the adult mouse forebrain. Nat Neurosci 9:99–107. doi:10.1038/nn1618 pmid:16369481
    OpenUrlCrossRefPubMed
  67. Sugino K, Hempel CM, Okaty BW, Arnson HA, Kato S, Dani VS, Nelson SB (2014) Cell-type-specific repression by methyl-CpG-binding protein 2 is biased toward long genes. J Neurosci 34:12877–12883. doi:10.1523/JNEUROSCI.2674-14.2014
    OpenUrlAbstract/FREE Full Text
  68. ↵
    Sunkin SM, Ng L, Lau C, Dolbeare T, Gilbert TL, Thompson CL, Hawrylycz M, Dang C (2013) Allen Brain Atlas: an integrated spatio-temporal portal for exploring the central nervous system. Nucleic Acids Res 41:D996–D1008. doi:10.1093/nar/gks1042 pmid:23193282
    OpenUrlCrossRefPubMed
  69. Tan CL, Plotkin JL, Venø MT, von Schimmelmann M, Feinberg P, Mann S, Handler A, Kjems J, Surmeier DJ, O’Carroll D, Greengard P, Schaefer A (2013a) MicroRNA-128 governs neuronal excitability and motor behavior in mice. Science 342:1254–1258. doi:10.1126/science.1244193
    OpenUrlAbstract/FREE Full Text
  70. ↵
    Tan PPC, French L, Pavlidis P (2013b) Neuron-enriched gene expression patterns are regionally anti-correlated with oligodendrocyte-enriched patterns in the adult mouse and human brain. Front Neurosci 7:5. doi:10.3389/fnins.2013.00005
    OpenUrlCrossRefPubMed
  71. ↵
    Tasic B, Menon V, Nguyen TN, Kim TK, Jarsky T, Yao Z, Levi B, Gray LT, Sorensen SA, Dolbeare T, Bertagnolli D, Goldy J, Shapovalova N, Parry S, Lee C, Smith K, Bernard A, Madisen L, Sunkin SM, Hawrylycz M, et al. (2016) Adult mouse cortical cell taxonomy revealed by single cell transcriptomics. Nat Neurosci 19:335–346. doi:10.1038/nn.4216 pmid:26727548
    OpenUrlCrossRefPubMed
  72. ↵
    Toker L, Feng M, Pavlidis P (2016) Whose sample is it anyway? Widespread misannotation of samples in transcriptomics studies. F1000Research 5:2103. doi:10.12688/f1000research.9471.2 pmid:27746907
    OpenUrlCrossRefPubMed
  73. ↵
    Tomomura M, Rice DS, Morgan JI, Yuzaki M (2001) Purification of Purkinje cells by fluorescence-activated cell sorting from transgenic mice that express green fluorescent protein. Eur J Neurosci 14:57–63. pmid:11488949
    OpenUrlCrossRefPubMed
  74. ↵
    Trabzuni D, Ramasamy A, Imran S, Walker R, Smith C, Weale ME, Hardy J, Ryten M, North American Brain Expression Consortium (2013) Widespread sex differences in gene expression and splicing in the adult human brain. Nat Commun 4. doi:10.1038/ncomms3771
    OpenUrlCrossRefPubMed
  75. ↵
    Tripathy SJ, Toker L, Li B, Crichlow CL, Tebaykin D, Mancarci BO, Pavlidis P (2017) Transcriptomic correlates of neuron electrophysiological diversity. PLoS Comput Biol 13:e1005814. doi:10.1371/journal.pcbi.1005814
    OpenUrlCrossRef
  76. ↵
    Ugrumov MV (2013) Brain neurons partly expressing dopaminergic phenotype: location, development, functional significance, and regulation. In: Advances in pharmacology, a new era of catecholamines in the laboratory and clinic, Chap 4 ( Eiden LE , ed), pp 37–91. San Diego: Academic Press.
  77. ↵
    Uranova NA, Vostrikov VM, Orlovskaya DD, Rachmanova VI (2004) Oligodendroglial density in the prefrontal cortex in schizophrenia and mood disorders: a study from the Stanley Neuropathology Consortium. Schizophr Res 67:269–275. doi:10.1016/S0920-9964(03)00181-6 pmid:14984887
    OpenUrlCrossRefPubMed
  78. ↵
    Wang Y, Winters J, Subramaniam S (2012) Functional classification of skeletal muscle networks. II. Applications to pathophysiology. J Appl Physiol 113:1902–1920. doi:10.1152/japplphysiol.01515.2011
    OpenUrlAbstract/FREE Full Text
  79. ↵
    Westra HJ, Arends D, Esko T, Peters MJ, Schurmann C, Schramm K, Kettunen J, Yaghootkar H, Fairfax BP, Andiappan AK, Li Y, Fu J, Karjalainen J, Platteel M, Visschedijk M, Weersma RK, Kasela S, Milani L, Tserel L, Peterson P, Reinmaa E, et al. (2015) Cell specific eQTL analysis without sorting cells. PLoS Genet 11:e1005223. doi:10.1371/journal.pgen.1005223 pmid:25955312
    OpenUrlCrossRefPubMed
  80. ↵
    Williams MR, Hampton T, Pearce RKB, Hirsch SR, Ansorge O, Thom M, Maier M (2013) Astrocyte decrease in the subgenual cingulate and callosal genu in schizophrenia. Eur Arch Psychiatry Clin Neurosci 263:41–52. doi:10.1007/s00406-012-0328-5
    OpenUrlCrossRefPubMed
  81. ↵
    Xu X, Nehorai A, Dougherty JD (2013) Cell type-specific analysis of human brain transcriptome data to predict alterations in cellular composition. Syst Biomed 1:151–160. doi:10.4161/sysb.25630
    OpenUrlCrossRef
  82. Zamanian JL, Xu L, Foo LC, Nouri N, Zhou L, Giffard RG, Barres BA (2012) Genomic analysis of reactive astrogliosis. J Neurosci 32:6391–6410. doi:10.1523/JNEUROSCI.6221-11.2012 pmid:22553043
    OpenUrlAbstract/FREE Full Text
  83. ↵
    Zeisel A, Muñoz-Manchado AB, Codeluppi S, Lönnerberg P, Manno GL, Juréus A, Marques S, Munguba H, He L, Betsholtz C, Rolny C, Castelo-Branco G, Hjerling-Leffler J, Linnarsson S (2015) Brain structure. Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq. Science 347:1138–1142. doi:10.1126/science.aaa1934
    OpenUrlAbstract/FREE Full Text
  84. ↵
    Zhang Y, James M, Middleton FA, Davis RL (2005) Transcriptional analysis of multiple brain regions in Parkinson’s disease supports the involvement of specific protein processing, energy metabolism, and signaling pathways, and suggests novel disease mechanisms. Am J Med Genet Part B Neuropsychiatr Genet off Publ Int Soc Psychiatr Genet 137B:5–16. doi:10.1002/ajmg.b.30195
    OpenUrlCrossRef
  85. ↵
    Zhang Y, Chen K, Sloan SA, Bennett ML, Scholze AR, O’Keeffe S, Phatnani HP, Guarnieri P, Caneda C, Ruderisch N, Deng S, Liddelow SA, Zhang C, Daneman R, Maniatis T, Barres BA, Wu JQ (2014) An RNA-sequencing transcriptome and splicing database of glia, neurons, and vascular cells of the cerebral cortex. J Neurosci 34:11929–11947. doi:10.1523/JNEUROSCI.1860-14.2014 pmid:25186741
    OpenUrlAbstract/FREE Full Text
  86. ↵
    Zhang Y, Sloan SA, Clarke LE, Caneda C, Plaza CA, Blumenthal PD, Vogel H, Steinberg GK, Edwards MSB, Li G, Duncan JA, Cheshier SH, Shuer LM, Chang EF, Grant GA, Gephart MGH, Barres BA (2016) Purification and characterization of progenitor and mature human astrocytes reveals transcriptional and functional differences with mouse. Neuron 89:37–53. doi:10.1016/j.neuron.2015.11.013
    OpenUrlCrossRefPubMed
  87. ↵
    Zoubarev A, Hamer KM, Keshav KD, McCarthy EL, Santos JRC, Van Rossum T, McDonald C, Hall A, Wan X, Lim R, Gillis J, Pavlidis P (2012) Gemma: a resource for the reuse, sharing and meta-analysis of expression profiling data. Bioinforma 28:2272–2273. doi:10.1093/bioinformatics/bts430
    OpenUrlCrossRefPubMed

Synthesis

Reviewing Editor: Alfonso Araque, University of Minnesota

Decisions are customarily a result of the Reviewing Editor and the peer reviewers coming together and discussing their recommendations until a consensus is reached. When revisions are invited, a fact-based synthesis statement explaining their decision and outlining what is needed to prepare a revision will be listed below. The following reviewer(s) agreed to reveal their identity: Stephen Ginsberg, Michael Hawrylycz

The reviewers have found merit and interest on the findings reported. I concur with the reviewers' comments expressing the interest of the study. However, they also expressed some concerns about several issues that need to be addressed and clarified. I also consider pertinent the concerns expressed. I believe that addressing those concerns, which will improve the manuscript, is necessarily required to firmly validate the results and support the conclusions.

Specific comments of the reviewers:

Reviewer 1.

General comment:

Comparing single cell versus bulk tissue expression profiles is an interesting concept. It may or may not be attainable, which is the crux of the issue with this submission.

Specific comments:

Manuscript eN-MNT-0212-17 “Cross-laboratory analysis of brain cell type transcriptomes with applications to interpretation of bulk tissue data” is a bioinformatics endeavor using highly curated gene sets from multiple publications, both single-cell and bulk tissue studies. The authors have painstakingly compiled and curated a database of cell-type specific expression from published previously data. The authors go to great lengths methodologically to compare (single-cell) and contrast (bulk tissue) studies through the use of marker gene sets (MGSs). The approach is heavily informatics methods-driven, and many(!) assumptions are made about the existing datasets in order to generate the present results. Being able to extrapolate single-cell data from bulk data is a laudable goal. However, many assumptions need to be established to attain this goal.

MGSs were selected for each cell type (36 cell types!) in each brain region, based on fold change and clustering quality. Rather than relegating MSGs to one area (e.g., frontal cortex- where microarray and RNA-seq studies both existed), the authors evaluated multiple cell types from multiple regions. The rationale for this approach versus sticking to a single region should be enumerated.

A threshold expression level was chosen to enable the detection of individual genes in bulk tissue, ‘8’ for microarrays, and ‘2.5 RPKM’ for RNA-seq studies. Despite text describing these arbitrary cutoffs, the Reviewer is largely confused as to why they were selected- is there any evidence-based metric why they were chosen? At present, these cutoffs seem to have been ‘taken out of thin air’, which leads to the contention that many of the assumptions made to generate the database are arbitrary and in need of some form of calibration and/or quantitation to prove that they are viable.

Merging microarray and RNA-seq data has numerous pitfalls with transcript size and location bias, detection level, among many other parameters. The authors even acknowledge this for a specific gene, Ank1, which they choose to essentially believe the microarray data over the RNA-seq study, which in of itself is a bit perplexing. This needs to be addressed in a significantly more detailed manner, with the full cohort of limitations and pitfalls for this type of assessment.

The authors highlight the case for cell loss in Parkinson's disease (PD) rather than regulatory changes based on their database analysis. To be frank, the Reviewer found this to be one of the weaker aspects of the study, as it is well established in the literature that individual studies vary considerably in their ability to detect expected and not-expected gene changes, especially in disease states and clinical conditions. Importantly, the authors do not go into the similarities and/or differences of the studies in terms of patient cohorts, age, gender, disease duration, etc that are crucial components of “head-to-head” study comparisons. It is the opinion of the Reviewer that the strength of the study is the ability to curate single-cell studies relative to bulk tissue studies- principally for modeling purposes. Extending these analyses AND interpretations into pathological or postmortem studies is risky and equivocal.

Validation through in situ hybridization is adequate. However, validation, or a lack thereof, from more bulk based methods, e.g., qPCR or immunoblot analyses, is strongly recommended to either support or refute some of the single cell versus bulk tissue findings. This is pretty industry standard at the current time and needs to be addressed.

The Discussion requires some realignment to address the numerous assumptions and pitfalls that come with trying to evaluate a single cell type expression profile in a sea of mixed cell types. Greater discourse on more subtle effects, such as masking of expression profile by admixed cell types, requires a little more in-depth and detailed thought.

In terms of the figures, each one was extremely dense. The authors may choose to cull or redistribute some of the information. For example, Figure 1C could either be removed or stand on its own. Figure 2D was off the plane of the figure and distracting. Although the text was clear in the Figures, the actual images- particularly Fig. 2A heatmap and Fig 2C in situ were pixelated and difficult to see (at least in the Reviewer's copy).

In summary, the manuscript has both positive aspects and areas that could use substantial improvement. The authors are commended for the state-of-the-art bioinformatics approach, which will ultimately yield a useful resource for transcriptomics.

Reviewer 2.

General comment:

The authors provide an interesting and reasonably comprehensive comparison of cell type specific and bulk tissue marker genes through a cross analysis of expression patterns an clusters found in both. The identification of such marker genes are key determinants of transcriptomic cell type and are generally agreeed upon a useful approach to classification.

Specific comments:

The authors provide an interesting and reasonably comprehensive comparison of cell type specific and bulk tissue marker genes through a cross analysis of expression patterns an clusters found in both. The identification of such marker genes are key determinants of transcriptomic cell type and are generally agreeed upon a useful approach to classification.

In considering computational methods to estimate cell type proportions the authors should also consider/cote Grange et al PNAS on estimating cell types in the brain from in situ hybridization methods.

The statistical analysis may well be correct but the authors do not provide sufficient justification for many of the choices of thresholds and clustering used. There are many parameters used throughout that should be modeled more rigorously or the rational form these choices should be made more explicit. This aspect of the work needs more attention. In reworking the manuscript every such choice or cited statistic should be reviewed.

The authors present a nice overall computational approach to the problem and have constructed a supporting database to highlight the work and continue analysis.

A helpful validation of cell type is given using in situ hybridization.

The authors have considered a variety of datasets and taken a reasonably complete approach to the problem and produced a reassuring analysis.

The description of the results section is fairly ad hoc and could be better presented potentially by taking a broader view of cell types and their signfiicance in the brain, rather than certain specialized observations from the data.

The Neuroexpresso data resource is a nice idea and could be developed further, but provides an initial entry point to the analysis.

Author Response

Reviewer 1.

General comment:

Comparing single cell versus bulk tissue expression profiles is an interesting concept. It may or may not be attainable, which is the crux of the issue with this submission.

Authors: We thank the reviewer for the encouraging comment about the goals of the work. We address the attainability in the following responses.

Specific comments:

Manuscript eN-MNT-0212-17 “Cross-laboratory analysis of brain cell type transcriptomes with applications to interpretation of bulk tissue data” is a bioinformatics endeavor using highly curated gene sets from multiple publications, both single-cell and bulk tissue studies. The authors have painstakingly compiled and curated a database of cell-type specific expression from published previously data. The authors go to great lengths methodologically to compare (single-cell) and contrast (bulk tissue) studies through the use of marker gene sets (MGSs). The approach is heavily informatics methods-driven, and many(!) assumptions are made about the existing datasets in order to generate the present results. Being able to extrapolate single-cell data from bulk data is a laudable goal. However, many assumptions need to be established to attain this goal.

Authors: As we describe in the introduction, inferring something about cell types (which we believe is what the reviewer means by “single cell”) in bulk tissue is not a new concept, and there is a long literature on various methods for “cell type deconvolution” and similar ideas.

Our contribution is to provide a carefully curated database of brain cell type gene expression with higher granularity and anatomical coverage than before; we use that to identify potential and validated markers of those cell types, and then show how they can be meaningfully applied to interpretation of bulk tissue data. At each step of the way we validate our approach: we note that we capture previously known markers; we validate markers using in situ hybridization; we provide evidence that our analysis of bulk tissue gives results that are consistent with prior knowledge, and we are very careful to refer to the analysis method as Marker Gene Profiles, not “cell type proportion analysis” to acknowledge the indirect nature of the inference.

Thus, we don't disagree with the reviewer that there are caveats to the use of the approach, and indeed discuss these caveats, and have expanded that discussion in the revision.

MGSs were selected for each cell type (36 cell types!) in each brain region, based on fold change and clustering quality. Rather than relegating MSGs to one area (e.g., frontal cortex- where microarray and RNA-seq studies both existed), the authors evaluated multiple cell types from multiple regions. The rationale for this approach versus sticking to a single region should be enumerated.

Authors: We are certain that the breadth of coverage of NeuroExpresso is a strength, not a weakness, and that justifying this in the paper is unnecessary. To be clear, focusing on a single region would simply mean we throw out all the data on other regions, it does not mean we would be able to do a more in-depth analysis of a single region (if that is what the reviewer is implying). Thus, there are only downsides to focusing on a single region, so far as we can see. As an aside, we note that to the extent possible, we analyze each brain region separately when selecting marker genes, as is explained in the Methods.

A threshold expression level was chosen to enable the detection of individual genes in bulk tissue, ‘8’ for microarrays, and ‘2.5 RPKM’ for RNA-seq studies. Despite text describing these arbitrary cutoffs, the Reviewer is largely confused as to why they were selected- is there any evidence-based metric why they were chosen? At present, these cutoffs seem to have been ‘taken out of thin air’, which leads to the contention that many of the assumptions made to generate the database are arbitrary and in need of some form of calibration and/or quantitation to prove that they are viable.

Authors: Both reviewers raised this point and we regret that we did not sufficiently document our decision-making process. At the same time, we note that the original submission did have an explanation of the thresholds in the Materials and Methods - Selection of Cell Type Markers section.

The main place where thresholds come into play in our work is in picking marker genes, not in the generation of the database of expression profiles or other aspects of the work. “Marker gene” is not a biological concept: it is something that can be defined at best operationally as a gene that usefully distinguishes a “cell type”. The key word here is “usefully”.

Therefore, our goal was in identifying genes which had the best chance of being useful markers for applications of interest to us (particularly the MGP analysis). This was partly based on observing the statistics for previously known (and generally-accepted) markers, though even this cannot be done uncritically (some supposed markers aren't that specific after all). We have added text to the methods section explaining this in more detail, but it remains the case that some arbitrariness is unavoidable. Other researchers are welcome to download the NeuroExpresso data and use their own methods for choosing markers.

A key point is that rather than try to fully justify thresholds based on prior knowledge (which is not possible), we validate our decisions using complementary data, such as the in-situ hybridization.

Merging microarray and RNA-seq data has numerous pitfalls with transcript size and location bias, detection level, among many other parameters. The authors even acknowledge this for a specific gene, Ank1, which they choose to essentially believe the microarray data over the RNA-seq study, which in of itself is a bit perplexing. This needs to be addressed in a significantly more detailed manner, with the full cohort of limitations and pitfalls for this type of assessment.

Authors: Merging these data types was indeed a challenge, but the benefits of combining the data outweighed the downsides, and was better than using just one or the other data type.

The section of our methodology exemplified by Ank1 merely explains how we combine gene lists from two data sets. If the stringent criteria for marker gene selection is satisfied for one of the datasets, we only ensure that the gene has the highest expression in the cell type of interest in the other dataset before selecting it as a marker gene.

We do this because enforcing our stringent criteria in both datasets simultaneously results in many genes not selected because they are slightly below our thresholds. We think our approach is a reasonable compromise: it allows us to benefit from merging two datasets by increasing the number of recognized cell types, while allowing us to remove genes if data from microarray and single cell RNA-seq contradict each other. So “Believing the microarray data over the RNA-seq study” is not a correct representation of our methodology as we use both datasets in a complementary way.

The authors highlight the case for cell loss in Parkinson's disease (PD) rather than regulatory changes based on their database analysis. To be frank, the Reviewer found this to be one of the weaker aspects of the study, as it is well established in the literature that individual studies vary considerably in their ability to detect expected and not-expected gene changes, especially in disease states and clinical conditions.

Authors: We are not making any claims about expected or not-expected changes in gene expression, we are simply interpreting the changes that are in fact observed. We agree with the original study authors that the genes they claim are differentially expressed are indeed differentially expressed. The fact that two independent studies give similar results is good, not a problem.

Our contribution is to provide an interpretation of those changes that is different than the one offered by the original authors: that a decrease in the numbers of dopaminergic neurons is a parsimonious explanation for the observed patterns, whereas regulation within living cells gives a minor contribution, at best. Since the reviewer is not disputing our conclusions in any specific way, we decided to keep this in the manuscript.

Importantly, the authors do not go into the similarities and/or differences of the studies in terms of patient cohorts, age, gender, disease duration, etc. that are crucial components of “head-to-head” study comparisons.

Authors: We are not making a head-to-head comparison of the Parkinson's data sets. See previous question. The fact the studies are not identical make our results more compelling as it shows the findings of the MGP analysis generalize.

It is the opinion of the Reviewer that the strength of the study is the ability to curate single-cell studies relative to bulk tissue studies- principally for modeling purposes. Extending these analyses AND interpretations into pathological or postmortem studies is risky and equivocal.

Authors: The reviewer is not offering any specific criticism here, and we believe our conclusions, which we present with careful statements of caveats and cautions, are supported by the data.

We want to clarify that the curation was of both single-cell and purified cell data sets (of specific cell types), and was never “relative to bulk tissue”. We then apply the data to the interpretation of bulk tissue, which is in itself not novel (including postmortem and pathological samples) - see references cited in our introduction. In our view showing a real-life application using our own approach is far more satisfying than merely suggesting it be tried out by others; we fear we'd be criticized for not showing this utility directly. If we had not presented any data, the reviewer would be fully justified in being concerned. But we actually apply the approach and get sensible results (i.e. in the Parkinson's analysis), and we discuss the caveats and limitations.

Validation through in situ hybridization is adequate. However, validation, or a lack thereof, from more bulk based methods, e.g., qPCR or immunoblot analyses, is strongly recommended to either support or refute some of the single cell versus bulk tissue findings. This is pretty industry standard at the current time and needs to be addressed.

Authors: Our manuscript does not present any “single cell versus bulk tissue findings” (we take “single cell” to mean “cell type”, but the point stands regardless of what the reviewer meant).

We are not challenging or presenting any novel gene expression findings from bulk tissue data - we are only reinterpreting what is has previously been reported, using external knowledge about cell types (i.e., what genes are expressed in which cell type).

Thus, we are very unclear on what immunoblot or qPCR experiments on bulk tissue would be helpful at this time. If anything, what is needed is more single-cell (individual cell) analysis of neurons, but this is clearly out of scope of the current work.

The Discussion requires some realignment to address the numerous assumptions and pitfalls that come with trying to evaluate a single cell type expression profile in a sea of mixed cell types. Greater discourse on more subtle effects, such as masking of expression profile by admixed cell types, requires a little more in-depth and detailed thought.

Authors: We have expanded our discussion of caveats.

In terms of the figures, each one was extremely dense. The authors may choose to cull or redistribute some of the information.

For example, Figure 1C could either be removed or stand on its own.

Authors: We felt that showing a screenshot of the NeuroExpresso web interface helps readers understand what we are providing, and at the reviewer's suggestion we have split this to be Figure 2.

Figure 2D was off the plane of the figure and distracting.

Authors: We have separated Figure 2 C and D to a separate Figure (4)

Although the text was clear in the Figures, the actual images- particularly Fig. 2A heatmap and Fig 2C in situ were pixelated and difficult to see (at least in the Reviewer's copy).

Authors: High resolution images are provided with the latest submission.

In summary, the manuscript has both positive aspects and areas that could use substantial improvement. The authors are commended for the state-of-the-art bioinformatics approach, which will ultimately yield a useful resource for transcriptomics.

Reviewer 2

General comment:

The authors provide an interesting and reasonably comprehensive comparison of cell type specific and bulk tissue marker genes through a cross analysis of expression patterns an clusters found in both. The identification of such marker genes are key determinants of transcriptomic cell type and are generally agreeed upon a useful approach to classification.

Specific comments:

The authors provide an interesting and reasonably comprehensive comparison of cell type specific and bulk tissue marker genes through a cross analysis of expression patterns an clusters found in both. The identification of such marker genes are key determinants of transcriptomic cell type and are generally agreeed upon a useful approach to classification.

In considering computational methods to estimate cell type proportions the authors should also consider/cote Grange et al PNAS on estimating cell types in the brain from in situ hybridization methods.

Authors: The limitations and caveats section of our discussion section now addresses this issue, citing Grange et al.

The statistical analysis may well be correct but the authors do not provide sufficient justification for many of the choices of thresholds and clustering used. There are many parameters used throughout that should be modeled more rigorously or the rational form these choices should be made more explicit. This aspect of the work needs more attention. In reworking the manuscript every such choice or cited statistic should be reviewed.

Authors: See previous discussion about the choice of thresholds in response to Reviewer 1; we have added the requested revisions.

The authors present a nice overall computational approach to the problem and have constructed a supporting database to highlight the work and continue analysis.

A helpful validation of cell type is given using in situ hybridization.

The authors have considered a variety of datasets and taken a reasonably complete approach to the problem and produced a reassuring analysis.

Authors: We thank the reviewer for these positive comments.

The description of the results section is fairly ad hoc and could be better presented potentially by taking a broader view of cell types and their signfiicance in the brain, rather than certain specialized observations from the data.

Authors: We were unsure what the reviewer meant by “description of the results section”.

We have decided to leave the results section organized as in the original submission. The first section is an overview of the database, the middle sections are about the marker genes (including some specific observations) and the last sections are on the application to interpretation of bulk tissue data (again, with specific observations). We feel this is logical and easy to follow.

If the comment is referring to the discussion, we're not clear on the issue. We feel a broad review of the significance of cell types is out of place (and practically axiomatic among neuroscientists anyway), and that discussing our specific findings is not unreasonable.

The Neuroexpresso data resource is a nice idea and could be developed further, but provides an initial entry point to the analysis.

Authors: We thank the reviewer again for this constructive feedback.

Back to top

In this issue

eneuro: 4 (6)
eNeuro
Vol. 4, Issue 6
November/December 2017
  • Table of Contents
  • Index by author
Email

Thank you for sharing this eNeuro article.

NOTE: We request your email address only to inform the recipient that it was you who recommended this article, and that it is not junk mail. We do not retain these email addresses.

Enter multiple addresses on separate lines or separate them with commas.
Cross-Laboratory Analysis of Brain Cell Type Transcriptomes with Applications to Interpretation of Bulk Tissue Data
(Your Name) has forwarded a page to you from eNeuro
(Your Name) thought you would be interested in this article in eNeuro.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Print
View Full Page PDF
Citation Tools
Cross-Laboratory Analysis of Brain Cell Type Transcriptomes with Applications to Interpretation of Bulk Tissue Data
B. Ogan Mancarci, Lilah Toker, Shreejoy J. Tripathy, Brenna Li, Brad Rocco, Etienne Sibille, Paul Pavlidis
eNeuro 20 November 2017, 4 (6) ENEURO.0212-17.2017; DOI: 10.1523/ENEURO.0212-17.2017

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Respond to this article
Share
Cross-Laboratory Analysis of Brain Cell Type Transcriptomes with Applications to Interpretation of Bulk Tissue Data
B. Ogan Mancarci, Lilah Toker, Shreejoy J. Tripathy, Brenna Li, Brad Rocco, Etienne Sibille, Paul Pavlidis
eNeuro 20 November 2017, 4 (6) ENEURO.0212-17.2017; DOI: 10.1523/ENEURO.0212-17.2017
Reddit logo Twitter logo Facebook logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Jump to section

  • Article
    • Visual Abstract
    • Abstract
    • Significance Statement
    • Introduction
    • Materials and Methods
    • Allen Brain Atlas (ABA) ISH data
    • Validation of marker genes using external single-cell data
    • Preprocessing of microarray data
    • Estimation of MGPs
    • Code accessibility
    • Results
    • Discussion
    • Acknowledgments
    • Footnotes
    • References
    • Synthesis
    • Author Response
  • Figures & Data
  • Info & Metrics
  • eLetters
  • PDF

Keywords

  • Cell Type
  • gene expression
  • Marker Gene
  • Microarray
  • RNA sequencing

Responses to this article

Respond to this article

Jump to comment:

No eLetters have been published for this article.

Related Articles

Cited By...

More in this TOC Section

Methods/New Tools

  • Bicistronic expression of a high-performance calcium indicator and opsin for all-optical stimulation and imaging at cellular resolution
  • A Toolbox of Criteria for Distinguishing Cajal–Retzius Cells from Other Neuronal Types in the Postnatal Mouse Hippocampus
  • Superficial Bound of the Depth Limit of Two-Photon Imaging in Mouse Brain
Show more Methods/New Tools

Novel Tools and Methods

  • Bicistronic expression of a high-performance calcium indicator and opsin for all-optical stimulation and imaging at cellular resolution
  • A Toolbox of Criteria for Distinguishing Cajal–Retzius Cells from Other Neuronal Types in the Postnatal Mouse Hippocampus
  • Superficial Bound of the Depth Limit of Two-Photon Imaging in Mouse Brain
Show more Novel Tools and Methods

Subjects

  • Novel Tools and Methods

  • Home
  • Alerts
  • Visit Society for Neuroscience on Facebook
  • Follow Society for Neuroscience on Twitter
  • Follow Society for Neuroscience on LinkedIn
  • Visit Society for Neuroscience on Youtube
  • Follow our RSS feeds

Content

  • Early Release
  • Current Issue
  • Latest Articles
  • Issue Archive
  • Blog
  • Browse by Topic

Information

  • For Authors
  • For the Media

About

  • About the Journal
  • Editorial Board
  • Privacy Policy
  • Contact
  • Feedback
(eNeuro logo)
(SfN logo)

Copyright © 2023 by the Society for Neuroscience.
eNeuro eISSN: 2373-2822

The ideas and opinions expressed in eNeuro do not necessarily reflect those of SfN or the eNeuro Editorial Board. Publication of an advertisement or other product mention in eNeuro should not be construed as an endorsement of the manufacturer’s claims. SfN does not assume any responsibility for any injury and/or damage to persons or property arising from or related to any use of any material contained in eNeuro.