Main

The results of the whole-genome sequencing projects have thus far identified far fewer genes than previously expected, and it is particularly puzzling that highly diverse organisms such as fruit fly, mouse and human seem to have rather similar numbers of genes. Therefore, the number of genes alone does not account for the increased complexity of higher organisms, and other regulatory principles must contribute to the use of genetic information during development and homeostasis1,2. This phenomenon is attributed in part to the mechanisms of alternative splicing, which allow for combinatorial assembly of different exons from the same primary transcript. It was estimated that 38–59% of the genes in mouse and human are subject to alternative splicing2,3,4,5,6 and that, furthermore, 15% of the genetic disorders in humans are caused by mutations related to defects in pre-mRNA splicing7,8,9.

Although the identification of all splice variants is important for understanding the transcriptome, only a few large-scale analyses have been made by computational means for exon prediction within genomes or using expressed-sequence tags (ESTs) and cDNA sequences. Predicted exons were used to design exon- and intron10-specific oligonucleotide arrays to monitor pre-mRNA splicing on a genome-wide scale11, including experiments using exon-junction arrays12 or predicted exons on chromosome 22 (ref. 1). But these approaches are limited to predicted or known exons and do not allow new exon discovery.

To identify genes regulated at the level of alternative splicing by experimental means, we developed an approach to selectively clone differentially spliced exons from distinct biological samples into alternative splicing libraries (ASLs) and to sequence alternative splicing sequence enriched tags (ASSETs). As ASSETs encompass exons along with their related splice junctions, they not only allow exon and gene identification but also enable the analysis of different splicing types, which are beyond the reach of present computational predictions.

As cancer-related genes such as Mdm2 (ref. 13) or Cdkn2a14 are regulated or affected at the level of alternative splicing in tumors, alteration of splicing might be a general mechanism contributing to tumor development. For this reason, we prepared ASLs from melanoma B16-F10Y and melanocyte melan-c cell lines to studying alternatively spliced genes and their contribution to melanoma development15. ASSETs sequencing in combination with computational annotation provides here an overview of the extent of alternative splicing in the context of tumorigenesis.

Results

Alternative splicing libraries

The preparation of an ASL started from two full-length cDNA libraries derived from distinct samples. To avoid cloning intronic regions, ASLs were prepared from libraries of cytoplasmic RNA, which were further enriched by full-length selection to cover all exons within the transcripts. Single-stranded DNA (ssDNA) was prepared from the cDNA inserts of each library to obtain sets of sense and antisense strands (Fig. 1a–c). Complementary cDNAs formed double-stranded hybrids, which comprised sense and antisense strands derived from the two distinct libraries (Fig. 1d). Hybrids contained ssDNA regions where one cDNA contained an exon not present in the complementary cDNA. Thus, exons that were differentially spliced between the two samples formed loop structures within regions of double-stranded DNA (dsDNA) that were used for selective cloning. Unhybridized ssDNA was removed by treatment with exonuclease VII (Fig. 1e), whereas a mixture of 4–base pair (bp) cutters cleaved dsDNA into fragments mainly of 100–500 bp (Fig. 1f). The remaining DNA hybrids with single-stranded loop structures were specifically enriched by annealing to randomized oligonucleotides and captured by magnetic beads (Fig. 1g). The recovered DNA fragments were ligated to Y-shaped linkers (Fig. 1h), amplified by PCR (Fig. 1i) and cloned into ASLs (Fig. 1j). ASSETs obtained from ASLs represented individual exons derived from the loop structures and their related splice junctions, as the 4-bp cutters cut only within the flanking exons.

Figure 1: Preparation of ASLs.
figure 1

(ac) Preparation of ssDNA using an RNA template. (d) Hybridization of ssDNAs. (e) Removal of remaining ssDNA. (f) Digest of regions comprising dsDNA by a set of 4-bp cutters. (g) Capturing of DNA hybrids with loop structures. (h) Ligation of Y-shaped primers to isolated DNA hybrids with loop structures. (i) PCR amplification of ligation products. (j) Cloning into ASL.

ASL preparations from melanocyte and melanoma cell lines

To test the concept of an ASL, cDNA libraries prepared from melanocyte melan-c (RIKEN ID: G2) and melanoma B16-F10Y (RIKEN ID: G3) cell lines were analyzed for differentially spliced genes, which may contribute to tumor development. Three different ASLs were prepared from the parental libraries using two different protocols (Table 1). Library L1 used PCR-derived dsDNA, which allowed for the detection of all differentially spliced transcripts within both samples. Libraries L2 and L3, prepared at two distinct hybridization conditions based on strand-specific ssDNA preparations derived from intermediate RNA templates only allowed for the detection of exons that were differentially spliced between the samples. From all ASLs and the parental libraries, individual clones were isolated and subjected to sequencing, and a total of 43,649 ASSETs and 33,602 ESTs were obtained. On average, the exons identified by ASSETs sequencing were 100–200 bp long, whereas the flanking regions covered 10–100 bp. Usually one-pass sequencing was sufficient to cover entire ASSETs where the smallest exon was only 15 bp long.

Table 1 Three ASLs (L1, L2 and L3) were prepared from two parental libraries (G2: melanocyte and G3: melanoma)

ASSETs and EST-related genes were identified and analyzed for their library distribution (Fig. 2). Libraries L2 and L3 had more genes in common with each other than with library L1, indicating that the use of dsDNA or ssDNA could drive ASL specificity, whereas no indications for preferable hybridization conditions were found. ASLs allowed identification of an additional 1,575 genes not found by ESTs, including 81 corresponding to rare transcripts (r values > 8 correspond to probabilities of >98%; ref. 16 and data not shown). Thus, sequencing of ASLs enabled a more thorough coverage of the transcriptome than was possible by EST sequencing alone. Moreover, as compared with the transcript set in Ensembl and GenBank, new splicing events of the cassette type were identified within 436 genes, demonstrating the further potential of ASLs in exon discovery (Supplementary Fig. 1 online).

Figure 2: Library distribution of genes identified by EST and ASSETs sequencing (Table 1).
figure 2

All ESTs obtained from the parental libraries and ASSETs obtained for ASLs were mapped to the mouse genome for identification of the related genes and known exons therein. The distribution of genes, known exons and new exons, as covered by the different libraries, is shown for comparison of ASLs (a), of parental libraries (b) and between ASLs and parental libraries (c).

Enrichment of alternatively spliced exons in an ASL

To confirm that ASLs were enriched for alternatively spliced exons, we prepared one ASL from the same parental libraries, with and without enrichment of loop structures (compare Fig. 1g), and subjected the resulting libraries to colony hybridization (Fig. 3a). Six different hybridization probes (alternative splicing, AS+) were selected from alternatively spliced exons of Zdhhc16 (also called Aph2; gi:9957214; rtsID:TB14871) and Sorbs1 (gi:667934; rtsID:TB5104) as verified by RT-PCR. For control experiments, hybridization probes (AS−) were designed from invariable 3′ untranslated regions of Zdhhc16 and Sorbs1, for which no ASSETs had been found. AS+ probes recognized at least six times more colonies in the ASL library preparation subjected to enrichment of loop structures compared with the ASL library preparation omitting this step, whereas no signals were obtained with the control probe AS− in the ASL library preparation subjected to enrichment of loop structures compared with some colonies detected in the ASL library preparation omitting the selection step. Statistical analysis of colony numbers from two independent experiments showed 40-fold enrichment within ASLs by the selection of loop structures (Fig. 3b). Thus, finding the 43,649 ASSETs obtained in this study would hypothetically require scanning some 1.8 million exons by EST sequencing.

Figure 3: Efficiency of selection step.
figure 3

ASLs were prepared with and without enrichment of dsDNA hybrids with loop structures and analyzed for enrichment of differentially spliced exons. (a) Images from colony hybridizations with 32P-labeled probes comprising alternatively spliced exons (AS+) and invariable regions (AS−). (b) Ratios between positive signals obtained during colony hybridization versus total number of plated colonies.

Functional annotation of ASSETs and ESTs

In total, 43,649 ASSETs were obtained from three ASLs and analyzed along with 33,602 ESTs from the parental libraries to identify spliced transcripts in melanocytes and melanoma. ASSETs and ESTs were annotated by BLAST-like alignment tool (BLAT) searches in the Transcript Set within Ensembl (mouse 20.32b.1) and the mouse genome (NCBI build 32), where 29,891 ASSETs mapped, with at least 70% overlap, to unique locations in the mouse genome (Supplementary Fig. 2 online), and 37,796 mapped to the Transcript Set within Ensembl (Fig. 4 and Supplementary Table 1 online), identifying 5,401 genes, which were further annotated by Gene Ontology terms (Supplementary Table 2 online). Information on all ASSETs and ESTs is summarized on our website (see URL in Methods) and in Supplementary Table 3 online, providing gene, transcript and peptide identification numbers and Gene Ontology identification numbers and terms.

Figure 4: Computational analysis of ASSETs.
figure 4

Principle of data analysis of ASSETS. Flowchart of the data analysis. ava, all versus all; GO, gene ontology.

Classification of splicing types

To identify splicing types detectable by ASLs, all ASSETs were aligned against each other to identify pairs having at least one matching region to one or more ASSETs (alignments can be found on our website). For this analysis, all ASSETs were treated equally as the characterization of splice types is unrelated to the use of dsDNA or ssDNA in an ASL preparation. ASSETs groups were defined by grouping ASSETs with overlapping regions and mapping them to the same region in the genome, for which independent ASSETs confirmed exon borders. The multiple alignment program17 was used for visual inspection of all ASSETs groups. This program identified 662 alternative splicing events, which were defined as high-quality targets and were further annotated for their splice type (Supplementary Table 4 online). Among these targets eight different situations could be distinguished (Fig. 5). About 31% of the exons were of the cassette type and 11% most likely were retained introns (defined by intronic GT-AG rule). Only 7.7% of the exons detected by ASSETs (deletion type) did not satisfy the GT-AG splicing consensus. In ten cases, sense and antisense pairs were detected, where a complementary region was spliced out from only one of the two strands. Also, the location of ASSETs within transcripts was analyzed (Supplementary Table 5 online), and 79% of the exons were located within coding regions.

Figure 5: Alternative splicing patterns.
figure 5

High-quality targets were analyzed for their splicing types. Schematic representation of the splicing types found in ASSETs.

Differential expression of exons identified by ASSETs

To confirm that ASSETs comprised specific exons, 96 high-quality targets were selected at random, and exon-specific primers were designed for RT-PCR (Supplementary Fig. 3 online). For exons of the cassette type, primers were chosen from upstream and downstream exons, whereas for other types one primer was selected from a neighboring exon and the second primer from an invariable part of the exon in question. Out of 96 RT-PCR reactions, 70 detected two different splicing patterns, whereas for 24 reactions only one splice variant was observed, and for two samples no amplification was found. RT-PCR experiments confirmed alternative splicing in melanoma and melanocytes for important genes such as dickkopf homolog 3 (Dkk3), tuberous sclerosis 2 (Tsc2), Btg3-associated nuclear protein (Banp) and Hermansky-Pudlak syndrome 1 (Hps1). Although little is known about the mouse Dkk3, products of the dickkopf gene family function as morphogens for the WNT pathway18. The Tsc2 gene product is involved in tumor suppressor activity and is one of two genes associated with the tuberous sclerosis complex, where its GTPase-activating protein domain targets the small GTPase Rheb (Ras homologue enriched in brain)19. For Banp, at least two alternative splicing variants have been identified, one of which was associated with decreasing tumor growth20. Hps1 is involved in Hermansky-Pudlak syndrome (HPS or pale ear), which is an autosomal recessive disorder causing oculocutaneous albinism due to defective biosynthesis or processing of melanosomes, and it has been related to trafficking of tyrosinase or tyrosinase-related protein 1 (ref. 21).

We carried out real-time PCR on six transcripts identified by ASSETs to confirm melanoma- or melanocyte-specific exons. Cytoplasmic RNA was prepared from melan-c and B16-F10 cells, and per-transcript real-time PCR was done with exon-specific primer sets for constitutive or alternatively spliced exons. Cell line–specific splicing patterns were deteced for exons 15–16 within the transmembrane channel-like gene family 6 (Tmc6) mRNA, exons 2–3 within abl-interactor 1 (Abi-1) mRNA, exons 11–12 within Sorbs1mRNA, exons 8–9 within Ndel1 mRNA and exons 6–8 within sorting nexin 16 (Snx16) mRNA (Fig. 6). Thus, ASSETs could identify genes specifically regulated by splicing in these cell lines, including Tmc6, which belongs to a larger gene family, members of which have been related to a high risk of skin carcinoma after papillomavirus infection22,23; Abi-1, which takes part in signaling and actin remodeling24; Sorbs1, which has been related to insulin-stimulated glucose uptake and was discussed as a candidate for insulin resistance25; Ndel1, which is involved in dynein motor activity26; and Snx16, which belongs to a growing gene family of hydrophilic proteins, some members of which interact with various receptors, and may have a role in intracellular trafficking27.

Figure 6: Real-time PCR on selected ASSETs.
figure 6

Real-time PCR was done for six mRNAs, where for each mRNA a constitutive exon (A) and an alternatively spliced exon (B) were analyzed in melan-c and B16-F10Y mRNA preparations. The mean amounts ± s.d. of RNA from three independent experiments were plotted along with 18S rRNA (control). Snx16: A, exons 6–7; B, exons 6–Δ7–8. Ndel1: A, exons 8–9; B, exons 8–cassette–9. Hps1: A, exons 17–18; B, exons long17–18. Sorbs1: A, exons 11–12; B, exons 11–long12. Abi1: A, exons 2–cassette-3; B, exons 2–3. Tmc6: A, exons 15–cassette-16; B, exons 15–16.

Full-length sequence comparison

After confirming that ASSETs identified specific exons, we studied their relationship to full-length cDNAs. As an example of an extensively spliced mRNA, Zdhhc16, which interacts with the c-Abl oncogene during apoptosis28, was analyzed in more detail. Nine cDNA clones identified by ESTs from the parental libraries were subjected to full-length sequencing, and different splice variants were identified by multiple alignments of full-length sequences. Also, 22 ASSETs were found, and exons were predicted by Genscan29. The nine full-length cDNAs grouped into three different splice variants, which were distinct in their differential use of exons 6 and 8. The 22 ASSETs covered nine different splicing patterns, where good correlation was observed between the full-length cDNAs and ASSETs for alterations in exon 6. As outlined in Supplementary Fig. 4 online, the number of ASSETs and full-length sequences obtained for Zddhc16 was not sufficient to consistently identify alterations in exons 2, 4, 8 and 12, although ASL sequencing greatly expanded the number of splice variants.

Target genes found in ASLs

Genes covered by ASSETs include those encoding key regulators of cellular control such as MAPK12 and MAPK14, the integrin-linked kinase (ILK), cyclin D2, BRPK, ADP-ribosyltransferase and melastatin 1 (Supplementary Table 6 online). We further identified ASSETs for exon 12A of the tumor suppressor gene bridging integrator 1 (Bin1), whose aberrant splicing variant found in melanoma renders the protein nonfunctional30. Thus, an ASL in combination with ASSETs sequencing could confirm known melanoma-related alterations in the splicing patterns of Bin1.

Discussion

Alternative mRNA splicing is a key mechanism in higher eukaryotes for increasing proteome diversity from a limited gene repertoire1,2. For a better understanding of how splice variants differ in their cellular functions, we developed ASLs and explored ASSETs derived from these libraries to locate alternatively spliced exons in a high-throughput manner using experimental means rather than computational methods.

The method of ASLs was developed to detect exons derived from internal splicing rather than from untranslated regions, as end-sequences are easily obtainable by EST sequencing31 or, in the case of 5′ends, by the Cap analysis gene expression (CAGE) approach32. It has been shown that 73% of alternatively spliced exons are within CDSs2, which preferentially include internal exons. The statistical analysis of the high-quality targets obtained in our study matched very well with this expectation; we found that 79% of the ASSETs contained coding exons. Thus, ASLs allow systematic screens for alternative exon usage and its effect on coding regions. It was further important for ASL preparation that ASSETs comprise not only entire exons but also the actual splice sites, as the identification of the flanking exons greatly contributed to the analysis of splicing mechanisms and the structure of splice sites.

Another question during the development of ASLs was whether to focus on the cloning of individual exons or entire cDNAs of alternatively spliced transcripts. Here, we preferred to focus first on individual exons because their identification is sufficient for further analysis and manipulation of experimental systems. In addition to the use of realtime-PCR, individual exons can be targeted by specific antisense oligonucleotides33,34,35 or antibodies against exon-encoded protein domains36,37 by approaches already established for drug development and pharmacogenomics38. Thus, in combination with ASLs, exon-specific antisense oligonucleotides could be used to analyze the contribution of individual splicing events to gene function.

As an important feature of ASL preparation, we focused on the use of full-length cDNA libraries as starting materials. These libraries not only enable coverage of all internal exons within a transcript but also provide a pool for cloning individual full-length cDNAs related to ASSETs. In our initial studies on melanoma and melanocyte libraries, EST clones were prepared, but ASSETs can easily be applied to screen cDNA libraries by classical methods. Alternatively, in the future, ASLs could be prepared omitting the digestion step with 4-bp cutters to allow direct cloning and characterization of nearly full-length splice variants. When isolating full-length cDNAs, however, additional steps will be necessary to identify the alternatively spliced exons within clones, most likely including full-length sequencing.

Our characterization of thus far unknown splicing events within ASLs showed that the specific selection of loop structures comprising alternatively spliced exons was highly effective for enrichment and exon discovery. Although searching for alternatively spliced isoforms by conventional ESTs or open reading frame ESTs39 are powerful means to detect alternative splicing, millions of ESTs would be required to find splicing events with a similar efficiency as in ASLs. A different approach38 for cloning of alternatively spliced exons was disclosed in US patent 6,251,590, where DNA/RNA hybrids are formed and loop structures in the RNA comprising alternatively spliced exons are released by RNase H digestion of RNA within DNA/RNA hybrids. Fragments derived from the loop structures are then cloned and used for exon-specific microarray design. Apart from the lack of publicly available data, this technology cannot determine exact exon borders, as only partial exon sequences are obtained, whereas ASLs retrieve entire exons along with their flanking regions.

Although in our study of the 662 high-quality targets annotated for their splicing types 92% followed the intronic GT-AG rule (compared with 98% previously estimated by computational means40), ASSETs discovered a large number of thus far unknown splicing events. This observation emphasizes the need for sequencing-based methods to reliably detect splice sites. In the future, genome-wide tiling microarrays1 may detect exon-exon junctions, although such experiments would be difficult to interpret for complex loci, because they are rather limited in their resolution and restricted to few organisms. ASSETs, however, will undoubtedly facilitate microarray design, as exon-specific or tiled microarrays rather than high-throughput sequencing could be used for evaluation of ASSETs. Similarly, ASSETs sequences could be presented on arrays to analyze splicing patterns, or they could be beneficial for other methods, such as junction probes41. Thus, we foresee comprehensive splicing studies where ASSETs could drive probe design.

ASL and ASSETs analysis was applied to melanocytes and melanomas, as in western countries the number of patients with melanoma rose faster than that with any other cancer15. Current genomics is dominated by expression profiling, despite limitations of this technique. For example, in melanomas and metastases, downregulation was observed for the tumor suppressor Cdkn2a and for genes involved in proliferation and growth control, such as CDK4, CDK6, CCND1, RB1, E2F, ARF, MDM2 and TP53 (refs. 42,43). In contrast, ASSETs derived from melanocytes and melanoma cell lines revealed a complementary pattern, excluding the above genes at the present level of ASL sequencing. It is therefore interesting to speculate whether an increasing evolutionary pressure on regulatory networks during the development of complex organisms has preferentially expanded the number of splice variants of regulatory genes, whereas genes related to basic functions remained conserved and are preferentially regulated at the expression level. This assumption places even more emphasis on the importance of splice patterns and their contribution to regulatory processes. Here, ASSETs identified splicing events for genes of great biological relevance to melanocytes and melanoma that could be interesting targets for studies on melanoma development. The number of alternatively spliced genes covered by our ASLs supports the idea that protein variability has different patterns than previously expected from expression analysis, suggesting that the simple comparison of biological phenomena by expression profiling should be readdressed by including the detection of specific splice variants.

Methods

Cell lines and cell culture conditions.

Melanoma B16-F10 and B16-F1 cell lines were maintained in Dulbecco's modified Eagle's medium (DMEM) supplemented with 10% fetal bovine serum (Gibco-BRL), 100 u/ml penicillin and 100 μg/ml streptomycin (Invitrogen) and grown in an incubator with 5% CO2 at 37 °C. Melanocyte melan-c cells were maintained in RPMI1640 medium supplemented with 10% fetal bovine serum (Gibco-BRL) plus 100 u/ml penicillin, 100 μg/ml streptomycin (Invitrogen), 200 nM 12-0-tetradecanoyl phorbol acetate (TPA; Sigma) and 100 μM β-mercaptoethanol (Gibco-BRL) in an incubator with 10% CO2 at 37 °C. Melan-a2 cells were maintained under the same conditions as melan-c cells, but in medium containing 200 nM TPA and 200 pM cholera toxin (Sigma) with β-mercaptoethanol omitted.

Full-length cDNA libraries.

Full-length cDNA libraries were prepared as described44.

Preparation of an ASL.

A detailed protocol for the preparation of an ASL is provided in Supplementary Methods online.

Computational analysis.

A detailed description of the computational analysis is provided in Supplementary Figure 2 online. Sequence alignments are available from our website.

Colony hybridization, RT-PCR, real-time PCR and sequencing methods.

Colony hybridization, RT-PCR, real-time PCR and sequencing methods are described in Supplementary Methods online, and lists of primers and probes are given in Supplementary Tables 7, 8, 9 online. Supplementary Fig. 5 online provides further information on the real-time PCR experiments.

URL.

Our project website is available at http://genome.gsc.riken.jp/splicing/.

Accession numbers.

ESTs from melanocytes: BB751502BB762024, BB851120BB851347, BY151892BY152033 and BY465889BY474653. ESTs from melanomas: BB762025BB772933, BB851348BB858427, BY152034BY153202 and BY474654BY481097. ASSETS: AK176923AK220149.

Note: Supplementary information is available on the Nature Methods website.