Hubbry Logo
search
logo
1303193

Long non-coding RNA

logo
Community Hub0 Subscribers
Write something...
Be the first to start a discussion here.
Be the first to start a discussion here.
See all
Long non-coding RNA

Long non-coding RNAs (long ncRNAs, lncRNA) are a type of RNA, generally defined as transcripts more than 200 nucleotides that are not translated into protein. This arbitrary limit distinguishes long ncRNAs from small non-coding RNAs, such as microRNAs (miRNAs), small interfering RNAs (siRNAs), Piwi-interacting RNAs (piRNAs), small nucleolar RNAs (snoRNAs), and other short RNAs. Given that some lncRNAs have been reported to have the potential to encode small proteins or micro-peptides, the latest definition of lncRNA is a class of transcripts of over 200 nucleotides that have no or limited coding capacity. However, John S. Mattick and colleagues suggested to change definition of long non-coding RNAs to transcripts more than 500 nt, which are mostly generated by Pol II. That means that question of lncRNA exact definition is still under discussion in the field. Long intervening/intergenic noncoding RNAs (lincRNAs) are sequences of transcripts that do not overlap protein-coding genes.

Long non-coding RNAs include intergenic lincRNAs, intronic ncRNAs, and sense and antisense lncRNAs, each type showing different genomic positions in relation to genes and exons.

The definition of lncRNAs differs from that of other RNAs such as siRNAs, mRNAs, miRNAs, and snoRNAs because it is not connected to the function of the RNA. A lncRNA is any transcript that is not one of the other well-characterized RNAs and is longer than 200-500 nucleotides. Some scientists think that most lncRNAs do not have a biologically relevant function because they are transcripts of junk DNA.

Long non-coding transcripts are found in many species. Large-scale complementary DNA (cDNA) sequencing projects such as FANTOM reveal the complexity of these transcripts in humans. The FANTOM3 project identified ~35,000 non-coding transcripts that bear many signatures of messenger RNAs, including 5' capping, splicing, and poly-adenylation, but have little or no open reading frame (ORF). This number represents a conservative lower estimate, since it omitted many singleton transcripts and non-polyadenylated transcripts (tiling array data shows more than 40% of transcripts are non-polyadenylated). Identifying ncRNAs within these cDNA libraries is challenging since it can be difficult to distinguish protein-coding transcripts from non-coding transcripts. It has been suggested through multiple studies that testis, and neural tissues express the greatest amount of long non-coding RNAs of any tissue type. Using FANTOM5, 27,919 long ncRNAs have been identified in various human sources.

Quantitatively, these transcripts demonstrate ~10-fold lower abundance than mRNAs, much of which is explained by higher cell-to-cell variation of expression levels of lncRNAs in the individual cells, when compared to protein-coding genes and well-characterized non-coding genes. This is consistent with the idea that many of these transcripts are non-functional spurious transcripts and the transcribed regions are not genes by any standard definition.

In general, the majority (~78%) of lncRNAs are characterized as tissue-specific, as opposed to only ~19% of mRNAs. Only 3.6% of human lncRNAs are present in various biological contexts and 34% of lncRNAs are present at high level (top 25% of both lncRNAs and mRNAs) in at least one biological context. In addition to higher tissue specificity, lncRNAs are characterized by higher developmental stage specificity, and cell subtype specificity in tissues such as human neocortex and other parts of the brain, regulating correct brain development and function. In 2022, a comprehensive integration of lncRNAs from existing databases, revealed that there are 95,243 lncRNAs and 323,950 transcripts in humans.

In comparison to mammals relatively few studies have focused on the prevalence of lncRNAs in plants. However an extensive study considering 37 higher plant species and six algae identified ~200,000 non-coding transcripts using an in-silico approach, which also established the associated Green Non-Coding Database (GreeNC), a repository of plant lncRNAs.

In 2005 the landscape of the mammalian genome was described as numerous 'foci' of transcription that are separated by long stretches of intergenic space. While some long ncRNAs are located within the intergenic stretches, the majority are overlapping sense and antisense transcripts that often include protein-coding genes, giving rise to a complex hierarchy of overlapping isoforms. Genomic sequences within these transcriptional foci are often shared within a number of coding and non-coding transcripts in the sense and antisense directions For example, 3012 out of 8961 cDNAs previously annotated as truncated coding sequences within FANTOM2 were later designated as genuine ncRNA variants of protein-coding cDNAs. While the abundance and conservation of these arrangements suggest they have biological relevance, the complexity of these foci frustrates easy evaluation.

See all
User Avatar
No comments yet.