Review
Evolutionary conservation of long non-coding RNAs; sequence, structure, function

https://doi.org/10.1016/j.bbagen.2013.10.035Get rights and content

Highlights

  • Recent genomewide studies have revealed the presence of thousands of lncRNAs.

  • Many lncRNAs do not show the same pattern of conservation as protein-coding genes.

  • Due to the lack of sequence conservation, functional interpretation is challenging.

  • The presence, and conservation, of secondary structural elements have been suggested.

  • This phenomenon remains poorly studied, and we explore what is currently known.

Abstract

Background

Recent advances in genomewide studies have revealed the abundance of long non-coding RNAs (lncRNAs) in mammalian transcriptomes. The ENCODE Consortium has elucidated the prevalence of human lncRNA genes, which are as numerous as protein-coding genes. Surprisingly, many lncRNAs do not show the same pattern of high interspecies conservation as protein-coding genes. The absence of functional studies and the frequent lack of sequence conservation therefore make functional interpretation of these newly discovered transcripts challenging. Many investigators have suggested the presence and importance of secondary structural elements within lncRNAs, but mammalian lncRNA secondary structure remains poorly understood. It is intriguing to speculate that in this group of genes, RNA secondary structures might be preserved throughout evolution and that this might explain the lack of sequence conservation among many lncRNAs.

Scope of review

Here, we review the extent of interspecies conservation among different lncRNAs, with a focus on a subset of lncRNAs that have been functionally investigated. The function of lncRNAs is widespread and we investigate whether different forms of functionalities may be conserved.

Major conclusions

Lack of conservation does not imbue a lack of function. We highlight several examples of lncRNAs where RNA structure appears to be the main functional unit and evolutionary constraint. We survey existing genomewide studies of mammalian lncRNA conservation and summarize their limitations. We further review specific human lncRNAs which lack evolutionary conservation beyond primates but have proven to be both functional and therapeutically relevant.

General significance

Pioneering studies highlight a role in lncRNAs for secondary structures, and possibly the presence of functional “modules”, which are interspersed with longer and less conserved stretches of nucleotide sequences. Taken together, high-throughput analysis of conservation and functional composition of the still-mysterious lncRNA genes is only now becoming feasible.

Introduction

Studies using the recent technical advances in genomewide platforms have revealed the human genome to be vastly more complex than previously anticipated. While only ~ 1.2% of the human genome encodes for protein-coding genes [1], it is becoming increasingly apparent that the large majority of the human genome is transcribed into non-protein-coding RNAs (ncRNAs) [2], [3]. Thousands of long ncRNAs (lncRNAs) have been identified, but very few have been assigned any function. The lack of functional studies and in many cases absence of evolutionary conservation have raised concerns about the importance of lncRNAs; some argue they are nothing more than transcriptional noise [4]. However, recent reports show thousands of lncRNAs being evolutionarily conserved [5], though not to the same extent as many protein-coding genes [6]. While the transcripts of lncRNAs appear less conserved than protein-encoding mRNAs, the promoter regions of lncRNAs are often just as conserved as the promoters of many protein-coding genes [3], [7]. Furthermore as they are RNAs their conservation may be found in functional interactions with proteins and other RNAs, in contrast to the conservation of specific sequence stretches. Functional equivalency of lncRNAs that appear to lack conservation across species may be feasible thanks to the chemical properties of nucleotides and protein interaction affinities.

The function of RNA is indeed widespread; mRNAs encode proteins, rRNA and tRNA are in involved in translation, and microRNAs act by RNA:RNA interactions to modulate mRNA function. In contrast to microRNAs, almost all of which are post-transcriptional repressors, the diverse functions of lncRNAs include both positive and negative regulations of protein-coding genes, and range from lncRNA:RNA and lncRNA:protein to lncRNA:chromatin interactions [8], [9], [10], [11]. Due to this functional diversity, it seems reasonable to presume that different evolutionary constraints might be operative for different RNAs, such as mRNAs, microRNAs, and lncRNAs.

The functional importance of lncRNAs is only now becoming revealed, and to date, of the tens of thousands of metazoan lncRNAs discovered from cDNA libraries and RNAseq data by high-throughput transcriptome projects, only a handful of lncRNAs have been functionally characterized. However, this number has been increasing, with more lncRNAs being found recently to be involved in disease [8], [10], [11], [12], [13]. Although the large majority of lncRNAs remain to be characterized, there is no longer any doubt that at least some are of functional importance. Yet, the non-conservation conundrum remains: For many lncRNAs already proven functional, poor evolutionary conservation is paradoxical and in stark contrast to the conservation of protein-coding genes.

Section snippets

Lack of conservation does not imbue a lack of function

While conservation almost always indicates functionality, lack of sequence conservation does not directly imply the opposite [10], [14]. The evidence that supports this statement arises from two vastly different classes of non-protein-coding genomic regions with completely opposite evolutionary properties; ultra conserved regions (UCR), which are highly conserved with near perfect sequence identity across all vertebrates, and human accelerated regions (HAR), which show unusually high sequence

LncRNAs and secondary structures

The vast majority of post-genomic lncRNA experimental biology has been an observational science, a modern equivalent to Darwin's voyage on The Beagle: high-throughput cDNA library construction and next-generation RNA sequencing have provided deep and comprehensive catalogs of lncRNA genes and transcripts, while the inherent bottleneck between the large size of these datasets and the low throughput of experimental validation methods has ensured that functional validation lags far behind. For

Cis acting asRNAs

It has been estimated that approximately 20–40% of all protein-coding genes have antisense RNA (asRNA) transcription [21], [79], [80]. AsRNAs share complementarity to a sense-expressed transcript, which is usually a protein-coding gene. Promoters, untranslated regions, protein-coding regions, and introns all can be overlapped by antisense RNAs transcribed from the same locus as the protein-coding gene. With the exception of intronic overlaps, all these scenarios confer the possibility of

Concluding remarks

Tens of thousands of human lncRNAs have been identified during the first genomic decade. Functional studies for most of these lncRNAs are however still lacking with only a handful having been characterized in detail [8], [10], [11], [87]. From these few studies it is apparent that some lncRNAs are important cellular effectors ranging from splice complex formation [34] to chromatin and chromosomal complex formation [43], [46] to epigenetic regulators of key cellular genes [11], [12], [87], [90].

Acknowledgments

The project was supported by the National Institute of Allergy and Infectious Disease (NIAID) P01 AI099783-01 to KVM, the Swedish Childhood Cancer Foundation, The Swedish Cancer Society, Radiumhemmets Forskningsfonder, the Karolinska Institutet PhD support programme, and Vetenskapsrådet to DG. The Erik and Edith Fernstrom Foundation for medical research to P.J.

References (95)

  • C.J. Brown

    The human Xist gene — analysis of a 17 kb inactive X-specific RNA that contains conserved repeats and is highly localized within the nucleus

    Cell

    (1992)
  • N. Brockdorff

    X-chromosome inactivation: closing in on proteins that bind XistRNA

    Trends Genet.

    (2002)
  • A. Romito et al.

    Origin and evolution of the long non-coding genes in the X-inactivation center

    Biochimie

    (2011)
  • J.L. Rinn

    Functional demarcation of active and silent chromatin domains in human HOX loci by noncoding RNAs

    Cell

    (2007)
  • Y.J. Shi

    Histone demethylation mediated by the nuclear arnine oxidase homolog LSD1

    Cell

    (2004)
  • J.E. Wilusz et al.

    3′ end processing of a long nuclear-retained noncoding RNA yields a tRNA-like cytoplasmic RNA

    Cell

    (2008)
  • C. Chu et al.

    Genomic maps of long noncoding RNA occupancy reveal principles of RNA–chromatin interactions

    Mol. Cell

    (2011)
  • A. Ackley

    An algorithm for generating small RNAs capable of epigenetically modulating transcriptional gene silencing and activation in human cells

    Mol. Ther. Nucleic Acids

    (2013)
  • Z.L. Zhang et al.

    Comparative analysis of processed pseudogenes in the mouse and human genomes

    Trends Genet.

    (2004)
  • Y. Wan

    Genome-wide measurement of RNA folding energies

    Mol. Cell

    (2012)
  • F.S. Collins et al.

    Finishing the euchromatic sequence of the human genome

    Nature

    (2004)
  • S. Djebali

    Landscape of transcription in human cells

    Nature

    (2012)
  • T. Derrien

    The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression

    Genome Res.

    (2012)
  • K. Struhl

    Transcriptional noise and the fidelity of initiation by RNA polymerase II

    Nat. Struct. Mol. Biol.

    (2007)
  • M. Guttman

    Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals

    Nature

    (2009)
  • J. Ponjavic et al.

    Functionality or transcriptional noise? Evidence for selection within long noncoding RNAs

    Genome Res.

    (2007)
  • R.A. Gupta

    Long non-coding RNA HOTAIR reprograms chromatin state to promote cancer metastasis

    Nature

    (2010)
  • M. Kretz

    Control of somatic tissue differentiation by the long non-coding RNA TINCR

    Nature

    (2013)
  • P. Johnsson

    A pseudogene long-noncoding-RNA network regulates PTEN transcription and translation in human cells

    Nat. Struct. Mol. Biol.

    (2013)
  • W. Yu

    Epigenetic silencing of tumour suppressor gene p15 by its antisense RNA

    Nature

    (2008)
  • K.V. Morris et al.

    Bidirectional transcription directs both transcriptional gene activation and suppression in human cells

    PLoS Genet.

    (2008)
  • L. Lipovich

    Activity-dependent human brain coding/noncoding gene regulatory networks

    Genetics

    (2012)
  • L. Poliseno

    A coding-independent function of gene and pseudogene mRNAs regulates tumour biology

    Nature

    (2010)
  • G. Bejerano

    Ultraconserved elements in the human genome

    Science

    (2004)
  • P. Mestdagh

    An integrative genomics screen uncovers ncRNA T-UCR functions in neuroblastoma tumours

    Oncogene

    (2010)
  • K.S. Pollard

    An RNA gene expressed during cortical development evolved rapidly in humans

    Nature

    (2006)
  • K.S. Pollard

    Forces shaping the fastest evolving regions in the human genome

    PLoS Genet.

    (2006)
  • A. Beniaminov et al.

    Distinctive structures between chimpanzee and human in a brain noncoding RNA

    RNA

    (2008)
  • P.G. Engstrom

    Complex loci in human and mouse genomes

    PLoS Genet.

    (2006)
  • C. Kutter

    Rapid turnover of long noncoding RNAs and the evolution of gene expression

    PLoS Genet.

    (2012)
  • E.J. Wood et al.

    Sense-antisense gene pairs: sequence, transcription, and structure are not conserved between human and mouse

    Front Genet

    (2013)
  • I.V. Novikova et al.

    Structural architecture of the human long non-coding RNA, steroid receptor RNA activator

    Nucleic Acids Res.

    (2012)
  • Y.H. Shi

    Sharp, an inducible cofactor that integrates nuclear receptor repression and activation

    Genes Dev.

    (2001)
  • R.B. Lanz

    Steroid receptor RNA activator stimulates proliferation as well as apoptosis in vivo

    Mol. Cell. Biol.

    (2003)
  • E. Leygue et al.

    Expression of the steroid receptor RNA activator in human breast tumors

    Cancer Res.

    (1999)
  • L.C. Murphy

    Altered expression of estrogen receptor coregulators during human breast tumorigenesis

    Cancer Res.

    (2000)
  • C.M. Smith et al.

    Classification of gas5 as a multi-small-nucleolar-RNA (snoRNA) host gene and a member of the 5′-terminal oligopyrimidine gene family reveals common features of snoRNA host genes

    Mol. Cell. Biol.

    (1998)
  • Cited by (562)

    View all citing articles on Scopus
    View full text