Hubbry Logo
Long non-coding RNALong non-coding RNAMain
Open search
Long non-coding RNA
Community hub
Long non-coding RNA
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Long non-coding RNA
Long non-coding RNA
from Wikipedia

Different types of long non-coding RNAs.[1]

Long non-coding RNAs (long ncRNAs, lncRNA) are a type of RNA, generally defined as transcripts more than 200 nucleotides that are not translated into protein.[2] This arbitrary limit distinguishes long ncRNAs from small non-coding RNAs, such as microRNAs (miRNAs), small interfering RNAs (siRNAs), Piwi-interacting RNAs (piRNAs), small nucleolar RNAs (snoRNAs), and other short RNAs.[3] Given that some lncRNAs have been reported to have the potential to encode small proteins or micro-peptides, the latest definition of lncRNA is a class of transcripts of over 200 nucleotides that have no or limited coding capacity.[4] However, John S. Mattick and colleagues suggested to change definition of long non-coding RNAs to transcripts more than 500 nt, which are mostly generated by Pol II.[5] That means that question of lncRNA exact definition is still under discussion in the field. Long intervening/intergenic noncoding RNAs (lincRNAs) are sequences of transcripts that do not overlap protein-coding genes.[6]

Long non-coding RNAs include intergenic lincRNAs, intronic ncRNAs, and sense and antisense lncRNAs, each type showing different genomic positions in relation to genes and exons.[1][3]

The definition of lncRNAs differs from that of other RNAs such as siRNAs, mRNAs, miRNAs, and snoRNAs because it is not connected to the function of the RNA. A lncRNA is any transcript that is not one of the other well-characterized RNAs and is longer than 200-500 nucleotides. Some scientists think that most lncRNAs do not have a biologically relevant function because they are transcripts of junk DNA.[7][8]

Abundance

[edit]

Long non-coding transcripts are found in many species. Large-scale complementary DNA (cDNA) sequencing projects such as FANTOM reveal the complexity of these transcripts in humans.[9] The FANTOM3 project identified ~35,000 non-coding transcripts that bear many signatures of messenger RNAs, including 5' capping, splicing, and poly-adenylation, but have little or no open reading frame (ORF).[9] This number represents a conservative lower estimate, since it omitted many singleton transcripts and non-polyadenylated transcripts (tiling array data shows more than 40% of transcripts are non-polyadenylated).[10] Identifying ncRNAs within these cDNA libraries is challenging since it can be difficult to distinguish protein-coding transcripts from non-coding transcripts. It has been suggested through multiple studies that testis,[11] and neural tissues express the greatest amount of long non-coding RNAs of any tissue type.[12] Using FANTOM5, 27,919 long ncRNAs have been identified in various human sources.[13]

Quantitatively, these transcripts demonstrate ~10-fold lower abundance than mRNAs,[14][15] much of which is explained by higher cell-to-cell variation of expression levels of lncRNAs in the individual cells, when compared to protein-coding genes and well-characterized non-coding genes.[16] This is consistent with the idea that many of these transcripts are non-functional spurious transcripts and the transcribed regions are not genes by any standard definition.[7][8]

In general, the majority (~78%) of lncRNAs are characterized as tissue-specific, as opposed to only ~19% of mRNAs.[14] Only 3.6% of human lncRNAs are present in various biological contexts and 34% of lncRNAs are present at high level (top 25% of both lncRNAs and mRNAs) in at least one biological context.[17] In addition to higher tissue specificity, lncRNAs are characterized by higher developmental stage specificity,[18] and cell subtype specificity in tissues such as human neocortex[19] and other parts of the brain, regulating correct brain development and function.[20] In 2022, a comprehensive integration of lncRNAs from existing databases, revealed that there are 95,243 lncRNAs and 323,950 transcripts in humans.[21]

In comparison to mammals relatively few studies have focused on the prevalence of lncRNAs in plants. However an extensive study considering 37 higher plant species and six algae identified ~200,000 non-coding transcripts using an in-silico approach,[22] which also established the associated Green Non-Coding Database (GreeNC), a repository of plant lncRNAs.

Genomic organization

[edit]

In 2005 the landscape of the mammalian genome was described as numerous 'foci' of transcription that are separated by long stretches of intergenic space.[9] While some long ncRNAs are located within the intergenic stretches, the majority are overlapping sense and antisense transcripts that often include protein-coding genes,[23] giving rise to a complex hierarchy of overlapping isoforms.[24] Genomic sequences within these transcriptional foci are often shared within a number of coding and non-coding transcripts in the sense and antisense directions[25] For example, 3012 out of 8961 cDNAs previously annotated as truncated coding sequences within FANTOM2 were later designated as genuine ncRNA variants of protein-coding cDNAs.[9] While the abundance and conservation of these arrangements suggest they have biological relevance, the complexity of these foci frustrates easy evaluation.

The GENCODE consortium has collated and analysed a comprehensive set of human lncRNA annotations and their genomic organisation, modifications, cellular locations and tissue expression profiles.[12] Their analysis indicates human lncRNAs show a bias toward two-exon transcripts.[12]

Translation

[edit]

There has been considerable debate about whether lncRNAs have been misannotated and do in fact encode proteins. Several lncRNAs have been found to in fact encode for peptides with biologically significant function.[26][27][28] Ribosome profiling studies have suggested that anywhere from 40% to 90% of annotated lncRNAs are in fact translated,[29][30] although there is disagreement about the correct method for analyzing ribosome profiling data.[31] Additionally, it is thought that many of the peptides produced by lncRNAs may be highly unstable and without biological function.[30]

Conservation

[edit]

The sequences of most long non-coding transcripts are not conserved, which supports the idea that most of them are spurious transcripts with no biological function. Initial studies into lncRNA conservation noted that some of them were enriched for conserved sequence elements,[32] depleted in substitution and insertion/deletion rates[33] and depleted in rare frequency variants,[34] indicative of purifying selection maintaining lncRNA function. However, further investigations into vertebrate lncRNAs revealed that while some lncRNAs are conserved in sequence, they are not conserved in transcription.[35][36][11] In other words, even when the sequence of a human lncRNA is conserved in another vertebrate species, there is often no transcription of a lncRNA in the orthologous genomic region. Some argue that these observations suggest non-functionality of the majority of lncRNAs,[37][38][7] while others argue that they may be indicative of rapid species-specific adaptive selection.[39]

While most long non-coding transcripts are not conserved, it is important to note that still, hundreds of lncRNAs are conserved at the sequence level. There have been several attempts to delineate the different categories of selection signatures seen amongst lncRNAs including: lncRNAs with strong sequence conservation across the entire length of the gene, lncRNAs in which only a portion of the transcript (e.g. 5′ end, splice sites) is conserved, and lncRNAs that are transcribed from syntenic regions of the genome but have no recognizable sequence similarity.[40][41][42] Additionally, there have been attempts to identify conserved secondary structures in lncRNAs, though these studies have currently given way to conflicting results.[43][44] Several of the most well studied lncRNA have indicated conservation of structure within the functional domains of lncRNA, with lack of sequence similarity across species.[45]

Functions

[edit]

Some groups have claimed that the majority of long noncoding RNAs in mammals are likely to be functional,[46][47] but other groups have claimed the opposite.[7][8] This is an active area of research.

Some lncRNAs have been functionally annotated in LncRNAdb (a database of literature described lncRNAs),[48][49] with the majority of these being described in humans. Over 2600 human lncRNAs with experimental evidences have been community-curated in LncRNAWiki (a wiki-based, publicly editable and open-content platform for community curation of human lncRNAs).[50] According to the curation of functional mechanisms of lncRNAs based on the literatures, lncRNAs are extensively reported to be involved in ceRNA regulation, transcriptional regulation, and epigenetic regulation.[50] A further large-scale sequencing study provides evidence that many transcripts thought to be lncRNAs may, in fact, be translated into proteins.[51]

In the regulation of gene transcription

[edit]

In gene-specific transcription

[edit]

In eukaryotes, RNA transcription is a tightly regulated process. Noncoding RNAs act upon different aspects of this process, targeting transcriptional modulators, RNA polymerase (RNAP) II and even the DNA duplex to regulate gene expression.[52][53]

NcRNAs modulate transcription by several mechanisms, including functioning themselves as co-regulators, modifying transcription factor activity, or regulating the association and activity of co-regulators. For example, the noncoding RNA Evf-2 functions as a co-activator for the homeobox transcription factor Dlx2, which plays important roles in forebrain development and neurogenesis.[54][55] Sonic hedgehog induces transcription of Evf-2 from an ultra-conserved element located between the Dlx5 and Dlx6 genes during forebrain development.[54] Evf-2 then recruits the Dlx2 transcription factor to the same ultra-conserved element whereby Dlx2 subsequently induces expression of Dlx5. The existence of other similar ultra- or highly conserved elements within the mammalian genome that are both transcribed and fulfill enhancer functions suggest Evf-2 may be illustrative of a generalised mechanism that regulates developmental genes with complex expression patterns during vertebrate growth.[56][57] Indeed, the transcription and expression of similar non-coding ultraconserved elements was shown to be abnormal in human leukaemia and to contribute to apoptosis in colon cancer cells, suggesting their involvement in tumorigenesis in like fashion to protein-coding RNA.[58][59][60]

Local ncRNAs can also recruit transcriptional programmes to regulate adjacent protein-coding gene expression.

The RNA binding protein TLS binds and inhibits the CREB binding protein and p300 histone acetyltransferase activities on a repressed gene target, cyclin D1. The recruitment of TLS to the promoter of cyclin D1 is directed by long ncRNAs expressed at low levels and tethered to 5' regulatory regions in response to DNA damage signals.[61] Moreover, these local ncRNAs act cooperatively as ligands to modulate the activities of TLS. In the broad sense, this mechanism allows the cell to harness RNA-binding proteins, which make up one of the largest classes within the mammalian proteome, and integrate their function in transcriptional programs. Nascent long ncRNAs have been shown to increase the activity of CREB binding protein, which in turn increases the transcription of that ncRNA.[62] A study found that a lncRNA in the antisense direction of the Apolipoprotein A1 (APOA1) regulates the transcription of APOA1 through epigenetic modifications.[63]

Recent evidence has raised the possibility that transcription of genes that escape from X-inactivation might be mediated by expression of long non-coding RNA within the escaping chromosomal domains.[64]

Regulating basal transcription machinery

[edit]

NcRNAs also target general transcription factors required for the RNAP II transcription of all genes.[52] These general factors include components of the initiation complex that assemble on promoters or involved in transcription elongation. A ncRNA transcribed from an upstream minor promoter of the dihydrofolate reductase (DHFR) gene forms a stable RNA-DNA triplex within the major promoter of DHFR to prevent the binding of the transcriptional co-factor TFIIB.[65] This novel mechanism of regulating gene expression may represent a widespread method of controlling promoter usage, as thousands of RNA-DNA triplexes exist in eukaryotic chromosome.[66] The U1 ncRNA can induce transcription by binding to and stimulating TFIIH to phosphorylate the C-terminal domain of RNAP II.[67] In contrast the ncRNA 7SK is able to repress transcription elongation by, in combination with HEXIM1/2, forming an inactive complex that prevents PTEFb from phosphorylating the C-terminal domain of RNAP II,[67][68][69] repressing global elongation under stressful conditions. These examples, which bypass specific modes of regulation at individual promoters provide a means of quickly affecting global changes in gene expression.

The ability to quickly mediate global changes is also apparent in the rapid expression of non-coding repetitive sequences. The short interspersed nuclear (SINE) Alu elements in humans and analogous B1 and B2 elements in mice have succeeded in becoming the most abundant mobile elements within the genomes, comprising ~10% of the human and ~6% of the mouse genome, respectively.[70][71] These elements are transcribed as ncRNAs by RNAP III in response to environmental stresses such as heat shock,[72] where they then bind to RNAP II with high affinity and prevent the formation of active pre-initiation complexes.[73][74][75][76] This allows for the broad and rapid repression of gene expression in response to stress.[73][76]

A dissection of the functional sequences within Alu RNA transcripts has drafted a modular structure analogous to the organization of domains in protein transcription factors.[77] The Alu RNA contains two 'arms', each of which may bind one RNAP II molecule, as well as two regulatory domains that are responsible for RNAP II transcriptional repression in vitro.[76] These two loosely structured domains may even be concatenated to other ncRNAs such as B1 elements to impart their repressive role.[76] The abundance and distribution of Alu elements and similar repetitive elements throughout the mammalian genome may be partly due to these functional domains being co-opted into other long ncRNAs during evolution, with the presence of functional repeat sequence domains being a common characteristic of several known long ncRNAs including Kcnq1ot1, Xlsirt and Xist.[78][79][80][81]

In addition to heat shock, the expression of SINE elements (including Alu, B1, and B2 RNAs) increases during cellular stress such as viral infection[82] in some cancer cells[83] where they may similarly regulate global changes to gene expression. The ability of Alu and B2 RNA to bind directly to RNAP II provides a broad mechanism to repress transcription.[74][76] Nevertheless, there are specific exceptions to this global response where Alu or B2 RNAs are not found at activated promoters of genes undergoing induction, such as the heat shock genes.[76] This additional hierarchy of regulation that exempts individual genes from the generalised repression also involves a long ncRNA, heat shock RNA-1 (HSR-1). It was argued that HSR-1 is present in mammalian cells in an inactive state, but upon stress is activated to induce the expression of heat shock genes.[84] This activation involves a conformational alteration of HSR-1 in response to rising temperatures, permitting its interaction with the transcriptional activator HSF-1, which trimerizes and induces the expression of heat shock genes.[84] In the broad sense, these examples illustrate a regulatory circuit nested within ncRNAs whereby Alu or B2 RNAs repress general gene expression, while other ncRNAs activate the expression of specific genes.

Transcribed by RNA polymerase III

[edit]

Many of the ncRNAs that interact with general transcription factors or RNAP II itself (including 7SK, Alu and B1 and B2 RNAs) are transcribed by RNAP III,[85] uncoupling their expression from RNAP II, which they regulate. RNAP III also transcribes other ncRNAs, such as BC2, BC200 and some microRNAs and snoRNAs, in addition to housekeeping ncRNA genes such as tRNAs, 5S rRNAs and snRNAs.[85] The existence of an RNAP III-dependent ncRNA transcriptome that regulates its RNAP II-dependent counterpart is supported by the finding of a set of ncRNAs transcribed by RNAP III with sequence homology to protein-coding genes. This prompted the authors to posit a 'cogene/gene' functional regulatory network,[86] showing that one of these ncRNAs, 21A, regulates the expression of its antisense partner gene, CENP-F in trans.

In post-transcriptional regulation

[edit]

In addition to regulating transcription, ncRNAs also control various aspects of post-transcriptional mRNA processing. Similar to small regulatory RNAs such as microRNAs and snoRNAs, these functions often involve complementary base pairing with the target mRNA. The formation of RNA duplexes between complementary ncRNA and mRNA may mask key elements within the mRNA required to bind trans-acting factors, potentially affecting any step in post-transcriptional gene expression including pre-mRNA processing and splicing, transport, translation, and degradation.[87]

In splicing

[edit]

The splicing of mRNA can induce its translation and functionally diversify the repertoire of proteins it encodes. The Zeb2 mRNA requires the retention of a 5'UTR intron that contains an internal ribosome entry site for efficient translation.[88] The retention of the intron depends on the expression of an antisense transcript that complements the intronic 5' splice site.[88] Therefore, the ectopic expression of the antisense transcript represses splicing and induces translation of the Zeb2 mRNA during mesenchymal development. Likewise, the expression of an overlapping antisense Rev-ErbAa2 transcript controls the alternative splicing of the thyroid hormone receptor ErbAa2 mRNA to form two antagonistic isoforms.[89]

In translation

[edit]

NcRNA may also apply additional regulatory pressures during translation, a property particularly exploited in neurons where the dendritic or axonal translation of mRNA in response to synaptic activity contributes to changes in synaptic plasticity and the remodelling of neuronal networks. The RNAP III transcribed BC1 and BC200 ncRNAs, that previously derived from tRNAs, are expressed in the mouse and human central nervous system, respectively.[90][91] BC1 expression is induced in response to synaptic activity and synaptogenesis and is specifically targeted to dendrites in neurons.[92] Sequence complementarity between BC1 and regions of various neuron-specific mRNAs also suggest a role for BC1 in targeted translational repression.[93] Indeed, it was recently shown that BC1 is associated with translational repression in dendrites to control the efficiency of dopamine D2 receptor-mediated transmission in the striatum[94] and BC1 RNA-deleted mice exhibit behavioural changes with reduced exploration and increased anxiety.[95]

In siRNA-directed gene regulation

[edit]

In addition to masking key elements within single-stranded RNA, the formation of double-stranded RNA duplexes can also provide a substrate for the generation of endogenous siRNAs (endo-siRNAs) in Drosophila and mouse oocytes.[96] The annealing of complementary sequences, such as antisense or repetitive regions between transcripts, forms an RNA duplex that may be processed by Dicer-2 into endo-siRNAs. Also, long ncRNAs that form extended intramolecular hairpins may be processed into siRNAs, compellingly illustrated by the esi-1 and esi-2 transcripts.[97] Endo-siRNAs generated from these transcripts seem particularly useful in suppressing the spread of mobile transposon elements within the genome in the germline. However, the generation of endo-siRNAs from antisense transcripts or pseudogenes may also silence the expression of their functional counterparts via RISC effector complexes, acting as an important node that integrates various modes of long and short RNA regulation, as exemplified by the Xist and Tsix (see above).[98]

In epigenetic regulation

[edit]

Epigenetic modifications, including histone and DNA methylation, histone acetylation and sumoylation, affect many aspects of chromosomal biology, primarily including regulation of large numbers of genes by remodeling broad chromatin domains.[99][100] While it has been known for some time that RNA is an integral component of chromatin,[101][102] it is only recently that we are beginning to appreciate the means by which RNA is involved in pathways of chromatin modification.[103][104][105] For example, Oplr16 epigenetically induces the activation of stem cell core factors by coordinating intrachromosomal looping and recruitment of DNA demethylase TET2.[106]

In Drosophila, long ncRNAs induce the expression of the homeotic gene, Ubx, by recruiting and directing the chromatin modifying functions of the trithorax protein Ash1 to Hox regulatory elements.[105] Similar models have been proposed in mammals, where strong epigenetic mechanisms are thought to underlie the embryonic expression profiles of the Hox genes that persist throughout human development.[107][104] Indeed, the human Hox genes are associated with hundreds of ncRNAs that are sequentially expressed along both the spatial and temporal axes of human development and define chromatin domains of differential histone methylation and RNA polymerase accessibility.[104] One ncRNA, termed HOTAIR, that originates from the HOXC locus represses transcription across 40 kb of the HOXD locus by altering chromatin trimethylation state. HOTAIR is thought to achieve this by directing the action of Polycomb chromatin remodeling complexes in trans to govern the cells' epigenetic state and subsequent gene expression. Components of the Polycomb complex, including Suz12, EZH2 and EED, contain RNA binding domains that may potentially bind HOTAIR and probably other similar ncRNAs.[108][109][110] This example nicely illustrates a broader theme whereby ncRNAs recruit the function of a generic suite of chromatin modifying proteins to specific genomic loci, underscoring the complexity of recently published genomic maps.[100] Indeed, the prevalence of long ncRNAs associated with protein coding genes may contribute to localised patterns of chromatin modifications that regulate gene expression during development. For example, the majority of protein-coding genes have antisense partners, including many tumour suppressor genes that are frequently silenced by epigenetic mechanisms in cancer.[111] A recent study observed an inverse expression profile of the p15 gene and an antisense ncRNA in leukaemia.[111] A detailed analysis showed the p15 antisense ncRNA (CDKN2BAS) was able to induce changes to heterochromatin and DNA methylation status of p15 by an unknown mechanism, thereby regulating p15 expression.[111] Therefore, misexpression of the associated antisense ncRNAs may subsequently silence the tumour suppressor gene contributing towards cancer.

Imprinting

[edit]

Many emergent themes of ncRNA-directed chromatin modification were first apparent within the phenomenon of imprinting, whereby only one allele of a gene is expressed from either the maternal or the paternal chromosome. In general, imprinted genes are clustered together on chromosomes, suggesting the imprinting mechanism acts upon local chromosome domains rather than individual genes. These clusters are also often associated with long ncRNAs whose expression is correlated with the repression of the linked protein-coding gene on the same allele.[112] Indeed, detailed analysis has revealed a crucial role for the ncRNAs Kcnqot1 and Igf2r/Air in directing imprinting.[113]

Almost all the genes at the Kcnq1 loci are maternally inherited, except the paternally expressed antisense ncRNA Kcnqot1.[114] Transgenic mice with truncated Kcnq1ot fail to silence the adjacent genes, suggesting that Kcnqot1 is crucial to the imprinting of genes on the paternal chromosome.[115] It appears that Kcnqot1 is able to direct the trimethylation of lysine 9 (H3K9me3) and 27 of histone 3 (H3K27me3) to an imprinting centre that overlaps the Kcnqot1 promoter and actually resides within a Kcnq1 sense exon.[116] Similar to HOTAIR (see above), Eed-Ezh2 Polycomb complexes are recruited to the Kcnq1 loci paternal chromosome, possibly by Kcnqot1, where they may mediate gene silencing through repressive histone methylation.[116] A differentially methylated imprinting centre also overlaps the promoter of a long antisense ncRNA Air that is responsible for the silencing of neighbouring genes at the Igf2r locus on the paternal chromosome.[117][118] The presence of allele-specific histone methylation at the Igf2r locus suggests Air also mediates silencing via chromatin modification.[119]

Xist and X-chromosome inactivation

[edit]

The inactivation of a X-chromosome in female placental mammals is directed by one of the earliest and best characterized long ncRNAs, Xist.[120] The expression of Xist from the future inactive X-chromosome, and its subsequent coating of the inactive X-chromosome, occurs during early embryonic stem cell differentiation. Xist expression is followed by irreversible layers of chromatin modifications that include the loss of the histone (H3K9) acetylation and H3K4 methylation that are associated with active chromatin, and the induction of repressive chromatin modifications including H4 hypoacetylation, H3K27 trimethylation,[120] H3K9 hypermethylation and H4K20 monomethylation as well as H2AK119 monoubiquitylation. These modifications coincide with the transcriptional silencing of the X-linked genes.[121] Xist RNA also localises the histone variant macroH2A to the inactive X–chromosome.[122] There are additional ncRNAs that are also present at the Xist loci, including an antisense transcript Tsix, which is expressed from the future active chromosome and able to repress Xist expression by the generation of endogenous siRNA.[98] Together these ncRNAs ensure that only one X-chromosome is active in female mammals.

Telomeric non-coding RNAs

[edit]

Telomeres form the terminal region of mammalian chromosomes and are essential for stability and aging and play central roles in diseases such as cancer.[123] Telomeres have been long considered transcriptionally inert DNA-protein complexes until it was shown in the late 2000s that telomeric repeats may be transcribed as telomeric RNAs (TelRNAs)[124] or telomeric repeat-containing RNAs.[125] These ncRNAs are heterogeneous in length, transcribed from several sub-telomeric loci and physically localise to telomeres. Their association with chromatin, which suggests an involvement in regulating telomere specific heterochromatin modifications, is repressed by SMG proteins that protect chromosome ends from telomere loss.[125] In addition, TelRNAs block telomerase activity in vitro and may therefore regulate telomerase activity.[124] Although early, these studies suggest an involvement for telomeric ncRNAs in various aspects of telomere biology.

In regulation of DNA replication timing and chromosome stability

[edit]

Asynchronously replicating autosomal RNAs (ASARs) are very long (~200kb) non-coding RNAs that are non-spliced, non-polyadenylated, and are required for normal DNA replication timing and chromosome stability.[126][127][128] Deletion of any one of the genetic loci containing ASAR6, ASAR15, or ASAR6-141 results in the same phenotype of delayed replication timing and delayed mitotic condensation (DRT/DMC) of the entire chromosome. DRT/DMC results in chromosomal segregation errors that lead to increased frequency of secondary rearrangements and an unstable chromosome. Similar to Xist, ASARs show random monoallelic expression and exist in asynchronous DNA replication domains. Although the mechanism of ASAR function is still under investigation, it is hypothesized that they work via similar mechanisms as the Xist lncRNA, but on smaller autosomal domains resulting in allele specific changes in gene expression.

Incorrect reparation of DNA double-strand breaks (DSB) leading to chromosomal rearrangements is one of the oncogenesis's primary causes. A number of lncRNAs are crucial at the different stages of the main pathways of DSB repair in eukaryotic cells: nonhomologous end joining (NHEJ) and homology-directed repair (HDR). Gene mutations or variation in expression levels of such RNAs can lead to local DNA repair defects, increasing the chromosome aberration frequency. Moreover, it was demonstrated that some RNAs could stimulate long-range chromosomal rearrangements.[129]

Structure

[edit]

It took over two decades after the discovery of the first human long non-coding transcripts for the functional significance of lncRNA structures to be fully recognized. Early structural studies led to the proposal of several hypotheses for classifying lncRNA architectures. One hypothesis suggests that lncRNAs may feature a compact tertiary structure, similar to ribozymes like the ribosome or self-splicing introns. Another possibility is that lncRNAs could have structured protein-binding sites arranged in a decentralized scaffold, lacking a compact core. A third hypothesis posits that lncRNAs might exhibit a largely unstructured architecture, with loosely organized protein-binding domains interspersed with long regions of disordered single-stranded RNA.[130]

Studying the tertiary structure of lncRNAs by conventional methods such as X- ray crystallography, cryo-EM and nuclear magnetic resonance (NMR) is unfortunately still hampered by their size and conformational dynamics, and by the fact that for now we still know too little about their mechanism to reconstruct stable and functionally-active lncRNA-ribonucleoprotein complexes. But some pioneering studies, showed that lncRNAs can already be studied by low-resolution single-particle and in-solution methods, such as atomic force microscopy (AFM) and small-angle X-ray scattering (SAXS), in some cases even in complexes with small molecule modulators.[131]

For instance, lncRNA MEG3 was shown to regulate transcription factor p53 thanks to its compact structured core.[132] Moreover, lncRNA Braveheart (Bvht) was shown to have a well-defined, albeit flexible 3D structure that is remodeled upon binding CNBP (Cellular Nucleic-acid Binding Protein) which recognizes distal domains in the RNA.[133] Finally, Xist a master regulator of X chromosome inactivation was shown to specifically bind a small molecule compound, which alters the conformation of Xist RepA motif and displaces two known interacting protein factors (PRC2 and SPEN) from the RNA. By such mechanism of action, the compound abrogates the initiation of X-chromosome inactivation.[134]

See also

[edit]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Long non-coding RNAs (lncRNAs) are a diverse class of non-coding RNA transcripts longer than 200 nucleotides that lack significant protein-coding potential and are primarily transcribed by RNA polymerase II. These molecules, often spanning from hundreds to over 100,000 nucleotides, exhibit low sequence conservation across species but harbor conserved functional domains, such as repetitive elements in cases like the XIST lncRNA involved in X-chromosome inactivation. Unlike protein-coding messenger RNAs (mRNAs), lncRNAs typically undergo processing similar to mRNAs, including capping, splicing, and polyadenylation, though some lack poly(A) tails and localize to specific subcellular compartments like the nucleus or cytoplasm. Discovered in the 1990s through studies of imprinted genes like H19 and dosage compensation mechanisms like , lncRNAs have since been cataloged extensively via projects such as GENCODE, revealing thousands of such transcripts in the . They function predominantly as regulators of through diverse mechanisms, including modification, transcriptional interference, and post-transcriptional modulation of mRNA stability and . For instance, lncRNAs can act as scaffolds to recruit -modifying complexes, form RNA-DNA hybrids (R-loops) to influence replication or repair, or sequester proteins to alter signaling pathways. Biologically, lncRNAs are essential for processes such as embryonic development, cell differentiation, and maintenance of genomic stability, with examples like NEAT1 forming paraspeckles for nuclear organization and preventing chromosomal instability. Dysregulation of lncRNAs is implicated in numerous diseases, including cancers—where MALAT1 promotes —and neurodegenerative disorders, positioning them as potential biomarkers and therapeutic targets due to their tissue-specific expression patterns. Despite advances, challenges persist in elucidating their precise mechanisms, given their low phenotypic visibility in studies and context-dependent functions, underscoring the need for standardized and enhanced functional assays.

Fundamentals

Definition and Discovery

Long non-coding RNAs (lncRNAs) are defined as transcripts longer than 200 that lack significant protein-coding potential. This arbitrary length threshold distinguishes them from shorter non-coding RNAs, such as microRNAs or small nucleolar RNAs, while their non-coding nature is assessed through computational tools evaluating features like short open reading frames (typically fewer than 100 codons) and poor evolutionary conservation of potential coding sequences. Unlike protein-coding messenger (mRNAs), lncRNAs show minimal association with ribosomes and are primarily functional as RNA molecules rather than templates for translation. The discovery of lncRNAs began with isolated examples in the late 1980s and early 1990s, predating genome-wide studies. One of the first identified was H19, a maternally imprinted transcript reported in 1991, which encodes an abundant RNA during mouse embryonic development without an open reading frame. Similarly, the Xist RNA, essential for X-chromosome inactivation, was characterized in 1991 as a 15 kb nuclear transcript expressed exclusively from the inactive X chromosome in female mammals. These early findings highlighted lncRNAs' roles in imprinting and dosage compensation but were viewed as exceptions in a predominantly protein-coding transcriptome paradigm. A major shift occurred in the mid-2000s with high-throughput technologies that unveiled the non-coding transcriptome's scale. Tiling array experiments, such as those from the FANTOM consortium in 2005, detected thousands of unannotated transcripts across the mouse genome. further accelerated discoveries from 2008 onward, identifying pervasive transcription beyond protein-coding genes and confirming lncRNAs' ubiquity in eukaryotic cells. Nomenclature evolved alongside these advances, initially focusing on genomic context; the term "large intergenic non-coding RNAs" (lincRNAs) was coined in for intergenic transcripts longer than 200 nucleotides. By the early 2010s, the broader "lncRNA" designation encompassed diverse classes, including antisense and intronic types, as recommended by bodies like the . The project's 2012 GENCODE v7 annotation established rigorous criteria for lncRNA identification, requiring evidence of transcription, , and absence of protein-coding capacity via tools like and . This framework emphasized functional annotation over mere length, prioritizing transcripts with biochemical signatures akin to mRNAs but without translational output.

Abundance and Diversity

Long non-coding RNAs (lncRNAs) represent a substantial fraction of the eukaryotic , with estimates indicating that they comprise over 68% of annotated genes in deep-sequencing surveys of the polyadenylated (polyA+) RNA fraction. In s, conservative annotations from GENCODE release 49 (2025) identify 35,899 lncRNA genes, producing 191,079 transcripts, while broader catalogs such as NONCODE v6.0 (updated 2021) document around 173,112 lncRNA transcripts, highlighting the variability in detection and annotation approaches. These numbers underscore that lncRNAs may outnumber protein-coding genes, potentially exceeding 100,000 loci when including unannotated or lowly expressed variants, though many remain uncharacterized due to challenges in distinguishing them from transcriptional noise or degraded mRNAs. LncRNAs exhibit considerable diversity in their and expression profiles. Based on their positional relationship to protein-coding genes, they are broadly categorized as intergenic (also known as lincRNAs), antisense, intronic, and enhancer-associated, with intergenic lncRNAs often residing in regions between genes and antisense types overlapping the opposite strand of coding loci. Expression-wise, lncRNAs display higher tissue specificity than mRNAs, with studies reporting that up to 70% of lncRNAs are restricted to specific tissues or cell types compared to about 40% for protein-coding transcripts, reflecting their roles in fine-tuned regulatory networks during development and . This variability is further evident in their sequence features, where lncRNAs generally have lower sequence conservation and fewer exons than mRNAs, contributing to their functional plasticity. Across eukaryotes, lncRNA abundance scales with organismal complexity, with humans harboring over 100,000 predicted lncRNAs compared to far fewer in simpler models like yeast (), where only hundreds of stable lncRNAs have been identified despite pervasive transcription covering 75-85% of the genome. Recent advances in long-read and direct RNA sequencing (2023-2024) have doubled estimates of lncRNA diversity in non-model organisms, such as , revealing thousands more transcripts in species like and crops, often linked to stress responses and development. Despite these insights, annotation remains incomplete, as databases like NONCODE continue to expand with experimentally validated entries, emphasizing the need for improved computational tools to resolve the full spectrum of lncRNA repertoires.

Biogenesis and Properties

Transcription and Processing

Long non-coding RNAs (lncRNAs) are primarily transcribed by (Pol II), similar to protein-coding messenger RNAs (mRNAs), with the vast majority of annotated lncRNA loci showing Pol II occupancy and associated marks such as lysine 4 trimethylation () at promoters and H3K36me3 in bodies. This Pol II-dependent transcription enables lncRNAs to undergo co-transcriptional processing akin to mRNAs, although a small subset of lncRNAs is transcribed by RNA polymerases I or III, such as certain intergenic transcripts derived from rDNA repeats or Pol III-driven elements like Alu-derived lncRNAs. LncRNA promoters often exhibit bidirectional activity, co-transcribing with nearby protein-coding in convergent or divergent orientations, which can influence local architecture and . Additionally, many lncRNAs arise from enhancer regions, where divergent transcription from bidirectional enhancers produces unstable enhancer RNAs (eRNAs) that can mature into longer lncRNA forms, contributing to regulatory feedback loops. Following transcription initiation, lncRNA precursors undergo canonical processing steps that mirror those of mRNAs, including 5' capping with a 7-methylguanosine (m7G) cap shortly after Pol II initiation, which protects the transcript from exonucleases and facilitates subsequent maturation. Splicing removes via the , with most lncRNAs being multi-exonic and featuring an average of ~2 exons (thus ~1 ) per transcript, fewer than typical mRNAs, though lengths are typically longer than in mRNAs, leading to inefficient or retained splicing in some cases. At the 3' end, cleavage and occur at sites (), resulting in many lncRNAs being polyadenylated (polyA+), similar to mRNAs, while others are polyA- and often more nuclear-retained, due to variability in signals. Mature polyA+ lncRNAs are exported to the via the TAP/NXF1 (also known as Mex67-Mtr2 in ) heterodimer, which binds the polyA tail through adaptor proteins like the TREX complex, facilitating translocation through nuclear pore complexes. LncRNA maturation exhibits notable variations that distinguish it from mRNA processing, including instances of unspliced transcripts that retain introns and function in the nucleus, such as certain stress-responsive lncRNAs that evade full splicing due to weak splice sites. Although circular RNAs (circRNAs) form via backsplicing and are considered a distinct class, some lncRNAs may form stable secondary structures that mimic looped conformations, enhancing stability in specific contexts. Studies have highlighted Pol II pausing and termination as key factors in lncRNA biogenesis; for example, promoter-proximal pausing mediated by the NELF complex allows co-transcriptional regulation, while termination dynamics can generate isoform diversity beyond standard mRNA pathways. Aberrant lncRNA transcripts, such as those with improper capping, splicing defects, or failed , are subject to rigorous mechanisms to prevent accumulation of dysfunctional RNAs. The 5'-3' exonuclease XRN2 plays a central role in degrading these aberrant species co-transcriptionally or shortly after release from Pol II, often in coordination with surveillance factors like the exosome complex, thereby maintaining genomic stability and preventing interference with normal . LncRNAs exhibit half-lives comparable to mRNAs, averaging around 10-15 hours depending on subcellular location, though some studies report medians of ~3-5 hours for specific subsets, reflecting variable turnover via pathways like XRN2-mediated decay. This variability, measured via genome-wide metabolic labeling, contrasts with the more stable profiles of mRNAs and highlights lncRNAs' adaptation for dynamic cellular responses.

Structure and Stability

Long non-coding RNAs (lncRNAs) often adopt complex secondary structures characterized by stem-loops, hairpins, and modular domains that contribute to their functional architectures. These elements are predicted computationally using tools like RNAfold, which employs thermodynamic models to forecast stable folding patterns based on base-pairing probabilities. For instance, the lncRNA HOTAIR exhibits an intricate secondary structure comprising multiple hairpins and repeat motifs, including a 300-nucleotide domain with tandem stem-loops that facilitate modular interactions. Such structural motifs enhance the specificity of lncRNA recognition by binding partners, as demonstrated by studies altering stem lengths and loop sizes in HOTAIR. Beyond secondary folding, lncRNAs engage in tertiary interactions that stabilize higher-order conformations, particularly through associations with RNA-binding proteins (RBPs). Heterogeneous nuclear ribonucleoproteins (hnRNPs), such as hnRNP A2/B1, bind to specific motifs in lncRNAs like HOTAIR, promoting compact tertiary structures via multivalent contacts that bridge distant RNA domains. Additionally, lncRNAs such as NEAT1 drive into membraneless organelles like paraspeckles by forming dynamic RNA-protein networks; NEAT1's modular domains, including repeat-containing regions, nucleate aggregation with RBPs like NONO and SFPQ, resulting in liquid-like condensates that sequester cellular components. These interactions underscore the role of lncRNA tertiary architecture in compartmentalizing nuclear processes. The stability of lncRNAs is modulated by sequence elements and protein interactions that influence their degradation and subcellular persistence. AU-rich elements (AREs) in lncRNA transcripts promote rapid decay by recruiting decay factors, similar to their role in mRNAs, though lncRNAs generally exhibit variable half-lives due to processing differences. Stabilizing RBPs like HuR counteract this by binding AU-rich or cytosine-rich stretches; for example, the lncRNA HMS recruits HuR to its target sites, delaying exonucleolytic degradation and extending transcript lifespan. Furthermore, many lncRNAs display prolonged nuclear retention compared to cytoplasmic mRNAs, attributed to weak splice site motifs and interactions with nuclear RBPs that hinder export, thereby maintaining higher steady-state levels in the nucleus. Structural studies, including imaging techniques, have revealed conformational flexibility in lncRNA-RBP complexes like NEAT1 in paraspeckles. As of 2025, genome-wide studies continue to refine lncRNA annotations, highlighting roles of additional modifications like in modulating structure and function. Concurrently, investigations into epitranscriptomic modifications demonstrate that N6-methyladenosine (m6A) marks subtly alter lncRNA folding by disrupting base stacking in stems, leading to more open conformations that influence RBP affinity and stability, as observed in thermodynamic analyses of m6A-modified lncRNAs like MALAT1.

Genomic Organization

Location and Architecture

Long non-coding RNAs (lncRNAs) are transcribed from diverse genomic contexts, including intergenic regions, where they are classified as long intergenic non-coding RNAs (lincRNAs), as well as genic regions involving intronic and exonic overlaps with protein-coding genes. Antisense lncRNAs, transcribed from the opposite strand of protein-coding genes, represent another major category, often overlapping with sense transcripts. A substantial proportion of lncRNA loci, estimated at around 20-40% in various annotations, are associated with regulatory elements such as enhancers and promoters, reflecting their potential integration into gene regulatory networks. LncRNA loci exhibit distinctive architectural features that mirror those of protein-coding genes, including the presence of bidirectional promoters that drive divergent transcription from shared regulatory elements. These loci are frequently enriched within topologically associating domains (TADs), self-interacting regions that compartmentalize the , with lincRNA promoters often positioned relative to orthologous genes in syntenic blocks across . Clustering of lncRNA genes in syntenic regions underscores their positional conservation, facilitating coordinated regulation within loops. From an evolutionary perspective, lncRNA architectures arise through mechanisms such as tandem duplications, which contribute to the rapid birth and diversification of lncRNA repertoires, often leading to volatile turnover across lineages. Recent chromatin conformation capture studies, including analyses from 2023-2025, have revealed that many lncRNA loci localize at or near TAD boundaries and loops, influencing higher-order . In terms of annotation, approximately 19-20% of annotated lincRNAs overlap with or are in proximity to pseudogenes, complicating their classification and highlighting evolutionary relationships between non-coding and defunct coding elements. Tools like lncRNASNP provide resources for assessing how single polymorphisms (SNPs) and mutations impact lncRNA secondary structures and regulatory architectures, aiding in the functional of these loci.

Translation Potential

Long non-coding RNAs (lncRNAs) exhibit limited coding potential primarily due to the presence of short open reading frames (sORFs) typically shorter than 100 amino acids, which often fail to produce stable, functional proteins. Additionally, upstream ORFs (uORFs) in the 5' regions of many lncRNAs can inhibit downstream translation by stalling ribosomes or promoting premature termination. Ribosome profiling studies indicate that fewer than 5% of lncRNAs show ribosome association patterns consistent with productive translation, in contrast to protein-coding mRNAs, underscoring their predominantly non-coding nature. Despite these limitations, emerging evidence from ribosome sequencing (Ribo-seq) demonstrates that a subset of lncRNAs undergoes to yield micropeptides. Studies, including a 2022 analysis in embryogenesis revealing that approximately 30% of lncRNAs harbor sORFs actively engaged by s, resulting in the regulated production of 100 to 300 micropeptides, and human studies identifying approximately 299 lncRNA-encoded small encoded peptides (SEPs) across cell lines and tissues, support this. For instance, the lncRNA ASncmtRNA, transcribed from the mitochondrial genome, encodes the micropeptide SHLP2, which localizes to mitochondria and modulates cellular processes such as energy metabolism and proliferation in tumor cells. These micropeptides often exert biologically significant roles, including regulation of ; for example, certain lncRNA-derived peptides enhance mitochondrial complex IV activity, thereby promoting G1/S transition. Translation of lncRNAs is further constrained by regulatory mechanisms that disfavor efficient initiation. (IRES) elements, which enable cap-independent , are rare in lncRNAs compared to viral or select cellular mRNAs. Moreover, cap-dependent scanning is often inefficient due to stable secondary structures at the 5' ends of lncRNAs, which impede progression. The translational potential of lncRNAs remains controversial, with debates centering on whether observed ribosome associations represent true protein synthesis or experimental artifacts such as non-specific binding. Recent mass spectrometry-based validations have addressed these concerns, identifying approximately 85 to 100 unique lncRNA-derived peptides in human cells, confirming their existence and tissue-specific expression. As of 2025, recent reviews highlight advances in validating lncRNA-derived micropeptides using mass spectrometry and their roles in cellular metabolism. These findings suggest that while translation is not a dominant feature of lncRNAs, the resulting micropeptides may contribute to fine-tuned regulatory functions in cellular physiology.

Regulatory Functions

Transcriptional Regulation

Long non-coding RNAs (lncRNAs) play critical roles in by modulating the initiation, pausing, and elongation phases of (Pol II)-mediated transcription through both cis- and mechanisms. These RNAs often interact with , transcription factors, or components of the basal transcriptional machinery to fine-tune in a context-specific manner. In cis-acting mechanisms, lncRNAs primarily influence nearby genes on the same chromosome by facilitating architecture changes or recruiting regulatory complexes. For instance, enhancer-associated lncRNAs (eRNAs) can tether promoters to distal enhancers via interactions with the Mediator complex, promoting long-range DNA looping that enhances Pol II recruitment and transcriptional activation. A seminal example is the lncRNA ncRNA-a, which binds the Mediator subunit MED12 to stabilize enhancer-promoter loops, thereby activating the expression of target genes such as via looping. Similarly, the lncRNA FENDRR regulates mesodermal lineage genes during embryonic development by associating with Polycomb repressive complex 2 (PRC2) and Trithorax group (TrxG)/mixed-lineage leukemia (MLL) complexes, leading to balanced H3K27me3 and H3K4me3 modifications that either repress or activate transcription in a gene-specific fashion; in limb development models, FENDRR depletion disrupts these states, impairing patterning and differentiation. These cis effects ensure precise spatial control over transcription without requiring the lncRNA to diffuse far from its site of synthesis. Trans-acting mechanisms allow lncRNAs to regulate distal or multiple genes by sequestering transcription factors or modulating the core transcriptional apparatus. One prominent mode involves lncRNAs acting as decoys to inhibit transcription factor activity; for example, the lncRNA NKILA binds the /IκB complex in the , masking IκB phosphorylation sites to prevent IKK-mediated IκB and subsequent activation, thereby suppressing -dependent transcriptional activation of pro-inflammatory genes. Additionally, some lncRNAs target the basal machinery, such as the C-terminal domain (CTD) of Pol II, to influence pausing and elongation; certain nuclear lncRNAs bind the unphosphorylated Pol II CTD to sterically hinder progression or recruit kinases like P-TEFb for CTD at Ser2, facilitating pause release. Pol III-transcribed lncRNAs, such as 7SK RNA, further exemplify this by scaffolding the 7SK small nuclear ribonucleoprotein () complex, which sequesters P-TEFb in an inhibitory state; upon signaling, 7SK dissociates, releasing active P-TEFb to phosphorylate the Pol II CTD and promote transcriptional elongation at paused genes. Recent studies highlight lncRNAs' involvement in Pol II pausing release, a key checkpoint for productive transcription. For instance, a 2023 review of enhancer RNAs indicated that these lncRNAs coordinate with NELF and DSIF complexes at promoter-proximal regions to facilitate pause release of Pol II; their depletion stabilizes paused Pol II, impairing elongation at developmental genes; this mechanism underscores lncRNAs' role in balancing transcriptional poising for rapid responses. Overall, these regulatory strategies enable lncRNAs to integrate cellular signals into precise transcriptional outputs.

Post-Transcriptional Regulation

Long non-coding RNAs (lncRNAs) play a pivotal role in modulating mRNA splicing by interacting with components and influencing patterns. A prominent example is MALAT1, which localizes to nuclear speckles and interacts with serine/arginine-rich (SR) splicing factors, thereby regulating their and distribution to promote efficient assembly.00621-0) MALAT1 also stabilizes small nuclear ribonucleoproteins (snRNPs) within these subnuclear structures, facilitating the of splicing machinery to nascent pre-mRNAs.01303-2) Beyond direct interactions, lncRNAs such as NEAT1 and can induce switches in by altering the localization of splicing factors like PTBP1, leading to inclusion or skipping in target transcripts. These mechanisms ensure precise post-transcriptional processing, with dysregulation often linked to cellular phenotypes like proliferation. In mRNA stability regulation, lncRNAs frequently act as competing endogenous RNAs (ceRNAs) by sequestering microRNAs (miRNAs), thereby preventing miRNA-mediated degradation of target mRNAs. The PTENP1 lncRNA exemplifies this, functioning as a ceRNA that sponges miR-21 and other miRNAs in the PTEN network, stabilizing PTEN mRNA and suppressing tumor progression in cancers like . Similarly, lncRNAs interfere with decay pathways, such as Staufen1-mediated mRNA decay (SMD), where they form RNA duplexes with 3' untranslated regions (UTRs) of target mRNAs to recruit Staufen1 and Upf1, triggering and 5'-to-3' exonucleolytic degradation. This SMD pathway, activated by lncRNAs like BACE1-AS, fine-tunes the stability of specific transcripts during stress or development. Quantitative models of ceRNA competition highlight that effective sponging requires substantial overlap in miRNA response elements (MREs), with derepression thresholds exceeding endogenous MRE abundance by at least 1.5-fold to impact miRNA repression significantly. LncRNAs also exert control over mRNA translation through mechanisms involving upstream open reading frames (uORFs) and dynamics. Some lncRNAs, such as those with translated uORFs, regulate their own expression by modulating reinitiation or stalling at uORF stop codons, indirectly influencing associated protein networks. In trans, lncRNAs like CARDINAL bind and prevent stalling on nascent peptides during cardiac , thereby maintaining translational fidelity and attenuating pathological remodeling. Additionally, certain lncRNAs serve as precursors for small interfering RNAs (siRNAs) in specific contexts, such as in mammalian germ cells, where they are processed into pri-siRNAs to guide and aid siRNA biogenesis pathways. Recent advances, including transcriptome-scale CRISPR-Cas13 screens in , have identified over 700 context-specific essential lncRNAs across human cell lines, with subsets implicated in splicing networks through perturbation of events and integrity. These screens revealed approximately 50 lncRNAs that, when disrupted, alter splicing factor localization and pre-mRNA processing efficiency, underscoring their non-redundant roles in post-transcriptional networks. Such findings, combined with computational models assessing MRE overlap in ceRNA interactions, provide a framework for predicting lncRNA-mediated regulatory impacts without exhaustive derivations.

Epigenetic Regulation

Long non-coding RNAs (lncRNAs) play crucial roles in epigenetic regulation by modulating states and facilitating heritable through interactions with histone-modifying complexes. These RNAs often act in cis or trans to recruit enzymatic machinery that deposits repressive marks, such as lysine 9 (H3K9) and lysine 27 (H3K27) trimethylation, thereby establishing and maintaining domains essential for developmental processes like and dosage compensation. By serving as scaffolds or guides, lncRNAs enable precise targeting of these complexes to specific genomic loci, influencing without altering the underlying DNA sequence.01206-6) A prominent example of lncRNA-mediated imprinting control is the Airn lncRNA, which silences the imprinted Igf2r gene in cis during mouse embryonic development. Airn recruits the G9a to the Igf2r promoter, leading to H3K9 dimethylation and subsequent transcriptional repression in extraembryonic tissues. This mechanism ensures parent-of-origin-specific , with Airn transcription initiating as early as the stage and persisting to maintain silencing. Disruption of Airn or G9a abolishes this repression, highlighting the lncRNA's direct role in epigenetic imprinting.00205-5) In X-chromosome inactivation (XCI), the Xist lncRNA coats the future inactive X chromosome (Xi) in female mammals, recruiting Polycomb repressive complex 2 (PRC2) to catalyze H3K27me3 deposition across large chromatin domains. This coating spreads bidirectionally from the Xist locus, initiating chromosome-wide silencing and forming the Barr body. The antisense Tsix lncRNA antagonizes Xist by competing for shared regulatory elements and preventing Xist upregulation on the active X chromosome, thus ensuring monoallelic Xist expression and proper XCI choice. PRC2 recruitment by Xist involves specific RNA domains, such as the A-repeat, which directly bind EZH2, the catalytic subunit of PRC2, to propagate repressive marks.01415-3)81659-2)00093-5) Beyond imprinting and XCI, lncRNAs like HOTAIR mediate epigenetic modifications at loci by bridging PRC2 and lysine-specific demethylase 1 (LSD1). HOTAIR tethers PRC2 to induce and LSD1 to remove the active H3K4me mark, coordinately repressing in trans across chromosomes. This dual action establishes bivalent states that poise developmental genes for activation or silencing. Similarly, telomeric repeat-containing RNA (TERRA) promotes formation at chromosome ends by recruiting PRC1 and PRC2, facilitating , , and H4K20me3 marks that protect s from DNA damage and recombination. TERRA's localization to telomeric repeats stabilizes these repressive domains, preventing telomere dysfunction.00501-9) Recent studies have revealed advanced mechanisms, including lncRNAs' involvement in phase-separated condensates that sustain epigenetic memory. For instance, certain lncRNAs nucleate liquid-liquid of PRC2 and other factors, concentrating them at target loci to reinforce propagation during . Additionally, epitranscriptomic modifications like N6-methyladenosine (m6A) on lncRNAs influence PRC2 binding; m6A-marked chromatin-associated lncRNAs recruit PRC2 via readers like RBFOX2, enhancing focal deposition and in hematopoiesis. In , m6A on HOTAIR modulates its stability and interaction with epigenetic complexes, underscoring the interplay between modifications and regulation.01206-6)

Genome Maintenance

Long non-coding RNAs (lncRNAs) play critical roles in regulating timing by influencing states at replication origins. For instance, ASAR lncRNAs, such as ASAR6 and ASAR15, control chromosome-wide replication timing in cis by promoting the spreading of , which delays replication initiation in specific genomic regions. These lncRNAs interact with chromatin-modifying complexes during the , ensuring coordinated replication with phases like progression and preventing untimely origin firing that could lead to genomic instability. In maintaining chromosome stability, lncRNAs contribute to telomere protection and centromere function. The telomeric repeat-containing RNA (TERRA), a lncRNA transcribed from subtelomeric regions, regulates telomere length by inhibiting activity through direct binding to the enzyme's RNA component, thus preventing excessive elongation. Additionally, TERRA promotes at telomeres by forming R-loops that facilitate pathways, supporting alternative lengthening of telomeres (ALT) in telomerase-deficient cells. At centromeres, lncRNAs like CCTT mediate assembly by facilitating RNA-DNA and RNA-protein interactions that recruit the centromere protein CENP-C to centromeric DNA, ensuring proper spindle attachment during . LncRNAs are integral to the DNA damage response (DDR) by recruiting repair factors and modulating damage signaling. The lncRNA maintains genomic stability by sequestering PUMILIO proteins, which otherwise destabilize mRNAs involved in control and ; depletion leads to increased DNA damage and . In resolution, recent studies highlight lncRNAs' roles in preventing persistent RNA-DNA hybrids that trigger DNA breaks; for example, TERRA-associated R-loops at telomeres are resolved to support recombination-based repair, with dysregulation linked to ALT activation in cancer cells. Alu-derived lncRNAs, such as those in lincRNA-p21, aid in DDR by forming double-stranded RNA structures via inverted Alu repeats, which activate protein kinase R (PKR) to enhance p53-mediated repair pathways following double-strand breaks. Quantitative impacts of lncRNAs on replication dynamics underscore their protective functions. Depletion of the chromatin-associated lncREST results in sustained replication fork progression, increasing fork speed by approximately 20-30% under stress conditions and leading to unrepaired DNA damage due to bypassed stalling signals. Similarly, knockdown of the ATR/Chk1-interacting lncRNA (ACIL) suppresses replication fork speed by up to 40% during hydroxyurea-induced stress, highlighting lncRNAs' role in stabilizing fork velocity for error-free duplication.

Evolutionary Aspects

Sequence Conservation

Long non-coding RNAs (lncRNAs) generally exhibit low sequence conservation at the nucleotide level compared to protein-coding mRNAs, with most showing less than 10% sequence identity beyond mammalian due to rapid evolutionary turnover and relaxed selective constraints. In comparisons between and , exonic sequences of lncRNAs typically display low identity, often below 30%, such as 22% in cases like the Rmst lncRNA, reflecting weaker purifying selection on primary sequences. This limited conservation is evident in large-scale genomic alignments, where only a small of lncRNAs—estimated at around 5%—maintain sequence similarity across broader clades. Promoters of lncRNAs demonstrate higher conservation than their exonic regions, with PhastCons conservation scores around 0.2-0.3 in human-mouse comparisons, comparable to those of protein-coding genes, suggesting stronger selective pressure on transcriptional initiation sites. Functional domains within lncRNAs, such as RNA-binding motifs or repeat elements, often show elevated conservation; for instance, the GAS5 lncRNA retains approximately 70% nucleotide homology in its exons between human and mouse. Additionally, syntenic preservation occurs at lncRNA loci across mammals, with approximately 25% of human lncRNAs having orthologs in mice based on genomic synteny and expression patterns, indicating that genomic context contributes to evolutionary stability even when sequences diverge. Comparative genomics tools reveal that lncRNAs have lower PhastCons scores than mRNAs, with mean placental scores for lncRNA exons typically below those of coding exons, underscoring reduced nucleotide-level preservation. Alignments via the facilitate these analyses by mapping multi-species sequences and identifying conserved patches. A 2023 pan-eukaryote study highlighted evolutionary bursts of lncRNA conservation particularly in vertebrates, where multidimensional analyses (including PhastCons thresholds >0.58) identified subsets with enhanced sequence retention linked to regulatory roles. Notable exceptions include ultra-conserved elements (UCEs) within lncRNAs, such as transcribed ultra-conserved RNAs (T-ucRNAs or ucRNAs), which exhibit 100% identity over >200 across , , and genomes, comprising 481 such regions with high functional potential. As of 2024, computational tools have identified additional conserved lncRNA functions across vertebrates using multi-omics data.

Functional Conservation

Functional conservation of long non-coding RNAs (lncRNAs) refers to the preservation of their biological roles across species despite limited sequence similarity, often through maintained interactions with conserved cellular machinery or regulatory networks. A key example of functional synteny is observed in X-chromosome dosage compensation, where the lncRNA in placental mammals orchestrates , while a functional ortholog, Rsx (RNA-on-the-silent X), performs a similar cis-acting role in marsupials by silencing the paternal during imprinted X-chromosome inactivation. This conservation highlights how lncRNA-mediated mechanisms can evolve to achieve equivalent dosage balance in diverse mammalian lineages. Similarly, approximately 25% of annotated lncRNAs exhibit functional orthologs in mice, as evidenced by shared expression patterns, genomic synteny, and phenotypic effects upon perturbation, underscoring the selective pressure on lncRNA loci for retained regulatory functions despite sequence divergence. Adaptive evolution of lncRNAs is particularly evident in , where rapid turnover of brain-specific lncRNAs has contributed to lineage-specific neurodevelopmental traits. For instance, many lncRNAs display during early development, with structural conservation but high sequence variability, suggesting that their roles in modulating neuronal differentiation and are preserved through secondary structures or binding motifs rather than primary sequences. This turnover is exemplified by de novo lncRNA-derived genes unique to s that encode proteins influencing brain-specific functions, indicating accelerated evolution in cognitive-related pathways. Furthermore, lncRNA interactions with conserved RNA-binding proteins (RBPs), such as TLS/FUS, demonstrate functional preservation; the lncRNA hsrω, which regulates FUS dimethylation, rescues ALS-like phenotypes in human cellular models, illustrating cross-species conservation of RBP-lncRNA complexes in formation and neurodegeneration. Cross-species evidence further supports lncRNA functional conservation through comparable knockout phenotypes. The lncRNA NEAT1, essential for paraspeckle assembly in mammals, is conserved in marsupials where it supports paraspeckle formation during late , indicating preserved architectural roles in across mammals. In evolutionary developmental (evo-devo) contexts, recent studies reveal lncRNAs in plants that mirror animal immunity by fine-tuning stress responses; for example, conserved lncRNA motifs in regulate pathogen defense and tolerance via epigenetic silencing, paralleling immune modulation in animals through shared mechanisms like . These findings highlight lncRNAs' adaptability in stress adaptation across kingdoms. Metrics of functional conservation often surpass sequence-level analyses, as demonstrated by phenotypic rescue experiments and network-level preservation. In cross-species assays, human lncRNA orthologs rescue knockout phenotypes in zebrafish models, such as restoring cellular proliferation defects, confirming functional equivalence despite sequence divergence. Network-level conservation is notably higher, with ceRNA (competing endogenous RNA) hubs—where lncRNAs act as miRNA sponges—preserved across tumors and species, as hub lncRNAs dominate regulatory interactions in cancer hallmarks, indicating robust evolutionary retention of interaction networks over individual sequences. These approaches reveal that lncRNA functions are under stronger selective constraint at the systems level, facilitating orthologous roles in development and disease.

Biological and Pathological Roles

Roles in Development and Physiology

Long non-coding RNAs (lncRNAs) play pivotal roles in developmental processes by facilitating essential mechanisms such as X-chromosome dosage compensation and . The lncRNA is crucial for X-chromosome inactivation in female mammals, where it coats the inactive to silence gene expression and equalize dosage between sexes. Similarly, Kcnq1ot1, a paternally expressed lncRNA within the Kcnq1 imprinting domain on chromosome 11p15.5, mediates epigenetic silencing of neighboring genes, ensuring proper fetal growth and development through monoallelic expression. In limb development, lncRNAs contribute to patterning by regulating key signaling pathways; for instance, the lncRNA Maenli acts as a limb-specific regulator of the En1, which in turn influences Fgf8 expression to guide anterior-posterior limb patterning. In physiological contexts, lncRNAs modulate immune responses and stress to maintain organismal . NEAT1 lncRNA organizes nuclear paraspeckles that sequester antiviral factors, enhancing the host's defense against viral infections such as hantavirus by promoting expression. Recent studies have also implicated NEAT1 in natural killer (NK) cell function, where it can inhibit cytotoxicity in pathological contexts like by acting as a competing endogenous RNA (ceRNA) for miR-125 to upregulate MCEMP1, as highlighted in 2025 reviews on lncRNA roles in NK . In , lncRNAs aid stress and growth under abiotic challenges; for example, various lncRNAs regulate signaling and uptake to enhance tolerance to and , as summarized in 2023-2024 reviews. LncRNAs further contribute to tissue through metabolic and rhythmic regulation. The imprinted lncRNA H19 promotes insulin sensitivity and by activating the AMPK pathway, thereby fine-tuning systemic glucose . In circadian physiology, lncRNAs influence dynamics to sustain rhythmic . 2024 findings reveal oscillations in accessibility at core clock gene loci, establishing " clocks" for temporal . These functions often integrate within multi-lncRNA networks during . In cardiac development, the lncRNA coordinates a network of cardiovascular genes by associating with complexes, promoting the transition from multipotent to committed cardiac progenitors. Such networks underscore the broader physiological orchestration by lncRNAs, linking regulatory mechanisms to developmental and homeostatic outcomes.

Associations with Diseases

Long non-coding RNAs (lncRNAs) have been implicated in the of various human diseases through their dysregulation, which disrupts normal regulatory functions and contributes to disease progression. In cancer, lncRNAs can act as oncogenes or tumor suppressors, influencing tumor growth, , and therapy resistance. For instance, the lncRNA HOTAIR is upregulated in primary tumors and promotes by reprogramming through interactions with the Polycomb Repressive Complex 2 (PRC2), leading to epigenetic silencing of tumor suppressor genes. HOTAIR expression correlates with poorer metastatic-free and overall survival in patients. Similarly, HOTAIR drives progression by modulating multiple signaling pathways, including those involved in and invasion. Recent reviews highlight the role of lncRNAs like HOTAIR in shaping the , where they facilitate immune evasion and stromal interactions to support . In contrast, the lncRNA GAS5 functions as a tumor suppressor by inhibiting , migration, and invasion while promoting across multiple cancers, including , gastric, and carcinomas; its downregulation is associated with advanced disease stages and reduced patient survival. In cardiovascular diseases, lncRNAs contribute to vascular dysfunction and cardiac remodeling. The lncRNA ANRIL, located at the 9p21 locus, promotes by acting as a scaffold for PRC1 and PRC2 complexes, repressing genes involved in and , which leads to plaque formation and instability. Genetic variants in the ANRIL locus are strongly associated with risk. In , the lncRNA MIAT is upregulated and regulates myocardial and , making it a promising therapeutic target; recent studies from 2025 emphasize MIAT's role in pathological cardiac remodeling post-myocardial and its potential for siRNA-based interventions to improve cell viability. Neurological and immune-related diseases also feature lncRNA dysregulation with causal implications. In , the lncRNA BACE1-AS stabilizes BACE1 mRNA by forming RNA duplexes that prevent its degradation, thereby increasing β-secretase activity and amyloid-β production, which exacerbates neuronal damage. Elevated BACE1-AS levels have been observed in Alzheimer's patients and amyloid precursor protein transgenic models. For immune disorders, lncRNAs modulate natural killer (NK) cell function and contribute to diseases like cancers and autoimmune conditions; a 2025 review details how lncRNAs such as MALAT1 and RMRP regulate NK cell , , and immune escape in tumor microenvironments, with dysregulation linked to impaired in NK cell-associated pathologies. LncRNAs serve as valuable biomarkers due to their stability and disease-specific expression patterns. Circulating lncRNAs, detectable in or , offer non-invasive diagnostic potential; for example, PCA3 is highly specific to and is FDA-approved for detecting high-grade tumors via urine assays, where its overexpression correlates with disease progression and signaling. Genome-wide association studies (GWAS) have identified disease-associated variants in lncRNA loci, with recent analyses indicating that a substantial proportion—estimated up to around 30%—of such variants fall within lncRNA regions due to expanded annotations, influencing susceptibility to cancers, cardiovascular diseases, and immune disorders through altered expression or structure.

Therapeutic Potential

Targeting Approaches

Antisense oligonucleotides (ASOs) represent a primary strategy for targeting long non-coding RNAs (lncRNAs) by inducing their degradation through recruitment of RNase H enzymes. Gapmer ASOs, which feature a central DNA segment flanked by modified nucleotides, hybridize to target lncRNA sequences and trigger cleavage by endogenous RNase H, effectively reducing lncRNA levels. For instance, LNA-gapmer ASOs targeting the oncogenic lncRNA MALAT1 have demonstrated inhibition of gene expression in proteasome subunits, leading to anti-proliferative effects in multiple myeloma models and advancing to preclinical evaluations for broader cancer applications. Locked nucleic acids (LNAs), incorporated into ASOs, enhance binding affinity and nuclease resistance, thereby improving stability and potency in vivo; LNA-modified gapmers have shown superior downregulation of nuclear lncRNAs compared to unmodified counterparts, with applications in silencing disease-associated lncRNAs like Xist. RNA interference (RNAi) approaches utilize small interfering RNAs (siRNAs) or short hairpin RNAs (shRNAs) to achieve sequence-specific degradation of lncRNA transcripts via the (RISC). These tools have been widely applied to knock down lncRNAs in cellular studies, revealing functional roles in gene regulation; for example, siRNAs targeting nuclear lncRNAs like NEAT1 effectively suppress paraspeckle formation and associated pathological processes. Recent advances in CRISPR-Cas13 systems further refine RNAi-like targeting by enabling precise and cleavage with high specificity, avoiding off-target DNA effects; 2024 developments in Cas13-based screens have identified essential lncRNAs through transcriptome-wide perturbations, demonstrating strand- and transcript-specific silencing in cancer cells. Small molecules offer an alternative for modulating lncRNA function by binding to specific structural motifs, such as triple helices or G-quadruplexes, thereby disrupting protein interactions or stability. Compounds targeting the triple helix in lncRNAs like MALAT1 have been shown to reduce RNA levels and inhibit downstream oncogenic signaling, providing a scaffold for structure-based drug design. For NEAT1, small molecules that bind G-quadruplex motifs interfere with NONO protein recruitment, destabilizing paraspeckle assembly and attenuating immune responses in disease models. A 2025 review has discussed the potential of small-molecule inhibitors targeting lncRNAs like CHRF to block fibrotic pathways by preventing lncRNA-mediated epigenetic changes in cardiac tissue. Gene therapy strategies employ viral vectors, such as (AAV), to deliver suppressors that overexpress RNAi components or decoy elements targeting lncRNAs. These approaches can be combined with miRNA mimics to enhance .

Challenges and Recent Advances

One major challenge in developing therapeutics targeting long non-coding RNAs (lncRNAs) is efficient delivery, particularly due to their frequent nuclear localization, which complicates access for cytoplasmic-acting agents like small interfering RNAs (siRNAs). Antisense (ASOs) have shown greater in silencing nuclear lncRNAs compared to approaches, which perform better on cytoplasmic ones, highlighting the need for localization-specific strategies. Off-target effects further hinder progress, as non-specific binding can lead to unintended and toxicity. To address these, nanoparticles (LNPs) have emerged as a promising delivery platform, with formulations like those incorporating ionizable lipids enabling liver-specific uptake via E-mediated , as demonstrated in mRNA therapeutics and adaptable to lncRNA modulators. Specificity remains a critical barrier, as lncRNAs often adopt complex secondary structures that impede binding of therapeutic agents, while isoform variability across tissues can result in incomplete or heterogeneous targeting. Additionally, synthetic RNA analogs risk activating innate immune responses through Toll-like receptors, leading to release and potential , which necessitates chemical modifications such as 2'-O-methylation to enhance stability and reduce . These issues underscore the importance of high-fidelity design tools to minimize off-target interactions and isoform-specific effects. Clinical translation remains primarily preclinical, with studies such as a 2024 investigation demonstrating that LNP-delivered targeting HOTAIR reduced tumorigenic properties in solid tumor models, suggesting potential for future evaluations in advanced cancers. In cardiovascular applications, 2025 preclinical successes include lncRNA MIR181A1HG silencing, which significantly attenuated atherosclerotic lesion burden in aortic models by modulating vascular inflammation, offering promise for plaque stabilization therapies. Emerging technologies are accelerating lncRNA therapeutic development, including AI-driven models like graph neural networks that predict lncRNA-disease associations with high accuracy; for example, the 2025 LDA-GMCB framework integrates multi-head self-attention for scalable inference of novel links. Base editing strategies, particularly RNA-specific deaminases fused to guide RNAs, enable precise correction of lncRNA variants influencing stability and function, as evidenced by tools predicting editing impacts in databases like LNCediting. Insights from plant-derived lncRNAs are informing agrobiotech, where 2025 reviews highlight their roles in stress tolerance and development, inspiring engineered crops with enhanced resilience through lncRNA modulation.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.