Hubbry Logo
RNA splicingRNA splicingMain
Open search
RNA splicing
Community hub
RNA splicing
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
RNA splicing
RNA splicing
from Wikipedia
Not found
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
RNA splicing is a fundamental process in eukaryotic cells, in which non-coding introns are precisely excised from precursor (pre-mRNA) transcripts and the remaining coding exons are ligated together to produce mature mRNA ready for into proteins. This mechanism was first discovered in through studies on adenovirus transcripts, revealing that genes are organized into discontinuous segments, challenging the prevailing view of continuous coding sequences. The process is catalyzed by the , a large ribonucleoprotein complex composed of five small nuclear RNAs (snRNAs: U1, U2, U4, U5, and U6) and over 100 associated proteins, which assembles stepwise on the pre-mRNA to recognize splice sites and execute two sequential reactions: the first forming a lariat intermediate by cleaving the 5' splice site and linking it to the , and the second ligating the exons while releasing the intron. Splicing is highly accurate and regulated, ensuring that only functional mRNAs are produced, and its dysregulation is implicated in numerous diseases, including cancers and neurodegenerative disorders. A key feature of RNA splicing is , which allows a single to generate multiple mRNA isoforms by selectively including or excluding exons, thereby vastly expanding the diversity from a limited —estimated to produce over 90% of multi-exon genes undergoing alternative splicing to yield thousands of variants. This versatility is crucial for , tissue-specific functions, and responses to environmental cues, as seen in the where alternative splicing modulates neuronal excitability and . Regulation occurs through cis-acting elements (such as exonic and intronic splicing enhancers/silencers) and trans-acting factors (splicing regulators like and hnRNPs) that fine-tune splice site selection, with recent structural insights from cryo-electron microscopy revealing dynamic conformational changes in the that underpin these regulatory mechanisms. Defects in splicing machinery, such as mutations in spliceosomal components, contribute to pathologies like and myelodysplastic syndromes, highlighting splicing as a therapeutic target. The discovery of RNA splicing by Phillip Sharp and Richard Roberts in 1977 earned them the 1993 in Physiology or Medicine, underscoring its transformative impact on . Since then, advances in high-throughput sequencing have illuminated the and complexity of splicing across species, from to humans, and its evolutionary conservation as a driver of genomic efficiency. Ongoing continues to uncover novel splicing variants and modulators, promising new avenues for understanding gene regulation and developing splicing-targeted interventions.

Overview

Definition and process

RNA splicing is a critical process in which non-coding sequences known as introns are removed from a precursor (pre-mRNA) transcript, and the remaining coding sequences, called exons, are precisely joined together to produce a mature mRNA ready for translation into protein. This process ensures that only the functional coding information is retained, allowing for the accurate expression of genes in eukaryotic cells. The basic mechanism of RNA splicing involves two sequential reactions. First, specific splice sites are recognized: the 5' splice site typically begins with a GU dinucleotide, the branch point features an residue located 20–50 upstream of the 3' splice site, and the 3' splice site ends with an AG dinucleotide. In the initial step, the 2'-OH group of the branch point performs a nucleophilic attack on the at the 5' splice site, cleaving the 5' and forming a lariat intermediate where the is looped via a 2'-5' to the branch point. The second step involves the 3'-OH of the freed 5' attacking the at the 3' splice site, ligating the two exons and releasing the lariat. RNA splicing is a universal process observed across eukaryotes, , and certain , though its prevalence and machinery vary. In higher eukaryotes like humans, introns are present in over 97% of protein-coding genes, often comprising the majority of transcript length and making splicing indispensable for proper . Unlike other RNA processing events such as 5' capping or 3' , which primarily stabilize the mRNA, splicing directly alters the coding sequence by selecting and joining exons, thereby determining the final .

Historical discovery

The discovery of RNA splicing began in 1977 when researchers independently identified discontinuous gene structures in adenovirus, revealing that eukaryotic genes are composed of interrupted coding sequences separated by non-coding regions, later termed introns. Phillip A. Sharp's team at the Massachusetts Institute of Technology used electron microscopy to visualize hybrid molecules of adenovirus mRNA annealed to viral DNA, showing that mRNA sequences were spliced from separate genomic segments. Concurrently, Richard J. Roberts's group at employed similar techniques to map cytoplasmic poly(A)+ RNA transcripts from adenovirus type 2, demonstrating collinear but non-contiguous alignment with the genome, thus establishing the concept of split genes. This breakthrough challenged the prevailing view of continuous genes and laid the foundation for understanding pre-mRNA processing; Sharp and Roberts were awarded the 1993 in or for their contributions. In the early 1980s, investigations into the machinery of splicing advanced significantly. Michael R. Lerner and at proposed that small nuclear ribonucleoproteins (snRNPs), recently identified as abundant nuclear particles containing small nuclear RNAs (snRNAs), play a central role in pre-mRNA splicing, based on their ability to bind specifically to sequences via assays with autoimmune sera. This work introduced the as a dynamic ribonucleoprotein complex mediating splicing in higher eukaryotes. Independently, Thomas R. Cech's laboratory at the discovered self-splicing in the of thermophila, where the itself catalyzed its excision without protein assistance, providing the first evidence of 's enzymatic activity and expanding the catalytic repertoire of molecules. Cech and , who identified catalytic in RNase P, shared the 1989 for these discoveries. During the 1990s and 2000s, detailed mapping of components emerged, particularly in humans, through proteomic and genetic approaches. Comprehensive analyses identified over 300 proteins associated with the human , including many novel factors involved in assembly and catalysis, using and on yeast and human models. The further highlighted splicing's prevalence, estimating that affects approximately 60% of human genes based on (EST) alignments to the draft genome, underscoring its role in proteomic diversity. In the 2010s, structural biology revolutionized splicing research with cryo-electron microscopy (cryo-EM) providing atomic-level insights into spliceosome dynamics. Kiyoshi Nagai's group at the resolved the structure of the yeast immediately after the branching step at 3.8 Å resolution, revealing key conformational changes and interactions in the catalytic core. Entering the 2020s, long-read sequencing technologies, such as and , unveiled unprecedented splicing complexity in human transcriptomes, identifying thousands of novel isoforms and tissue-specific events that short-read methods overlooked, thus refining estimates of splicing diversity across cell types. In 2024, researchers published the first comprehensive blueprint of the human , identifying its core composition of approximately 150 proteins with specialized regulatory functions, further advancing insights into splicing mechanisms and potential therapeutic targets.

Types of Splicing Pathways

Spliceosomal splicing

Spliceosomal splicing is the predominant mechanism for intron removal from nuclear pre-mRNA in eukaryotic cells, carried out by the , a large ribonucleoprotein complex that assembles de novo on each .00146-9) This process ensures the production of mature mRNA by excising non-coding introns and ligating coding exons, with the major spliceosome handling the vast majority of introns in most eukaryotes. The comprises small nuclear ribonucleoproteins (snRNPs) and numerous associated proteins, enabling precise recognition and catalysis. The major spliceosome includes four key snRNPs: U1, , U4/U6 (a di-snRNP), and U5, each containing a uridine-rich (snRNA) bound to specific proteins.00146-9) These components recognize conserved splice site sequences at boundaries and facilitate the splicing reaction. In contrast, the minor spliceosome processes a small subset of atypical U12-dependent introns using analogous but distinct snRNPs: U11, U12, U4atac/U6atac, and U5. Assembly of the spliceosome proceeds through a series of dynamic, stepwise complexes on the pre-mRNA substrate. It begins with the commitment complex (E complex), where U1 snRNP binds the 5' splice site and U2 auxiliary factors associate with the sequence, followed by U2 snRNP binding to form the pre-spliceosome (A complex). The tri-snRNP (U4/U6·U5) then joins to create the pre-catalytic , which rearranges to the activated and ultimately the C complex for intron excision.00146-9) This ordered recruitment ensures fidelity, with rearrangements driven by ATP-dependent helicases and protein factors. Two primary models describe how splice sites are recognized during assembly: intron definition and exon definition. In the intron definition model, prevalent in organisms with short introns like , the spliceosome initially pairs the 5' and 3' splice sites across the . Conversely, the exon definition model, common in vertebrates with longer introns, involves initial recognition across the exon, where U1 and U2 snRNPs bind opposing splice sites flanking the exon, facilitating cross-exon interactions before removal. These models reflect adaptations to genomic architecture, with consensus sequences at splice sites playing a brief role in initial binding. Most spliceosomal introns are U2-dependent, recognized by the major spliceosome, while U12-dependent introns, comprising about 0.35% of human introns, require the minor spliceosome and often feature AU-AC termini instead of the typical GU-AG. In the , introns average around 3 kb in length, vastly exceeding the typical exon size of about 145 , which contributes to the complexity of accurate splicing. Trans-splicing represents a specialized variant of spliceosomal splicing in certain eukaryotes, where a short leader sequence from one RNA molecule is joined to the 5' end of an independent pre-mRNA exon, rather than ligating exons from the same transcript. This process, mediated by similar snRNPs as cis-splicing, occurs prominently in trypanosomes, where it adds a spliced leader to all mRNAs to resolve polycistronic transcripts, and in , affecting about 70% of genes to add either or SL2 leaders. Though rare in vertebrates, it highlights the spliceosome's versatility. For exceptionally long introns, recursive splicing provides a mechanism to subdivide removal into multiple steps, using internal "ratchet" sites that mimic 3' splice sites. In Drosophila melanogaster, where introns can exceed 50 kb, this stepwise process enhances splicing accuracy by iteratively excising portions, as seen in the 74-kb ultrabithorax intron. Recursive sites are enriched and conserved in long introns, preventing aberrant splicing and maintaining efficiency.

Self-splicing

Self-splicing refers to a form of splicing in which the excises itself from the precursor RNA through activity, independent of protein enzymes. This process was first demonstrated in 1982 with the precursor from the Tetrahymena thermophila, where the 413-nucleotide intervening sequence (IVS) was shown to autocatalytically excise and circularize under conditions mimicking physiological , without requiring additional factors beyond a cofactor. Group I introns are the most extensively studied class of self-splicing introns, characterized by a conserved secondary structure featuring paired helices and an internal guide sequence that aligns the 5' splice site with the 3' hydroxyl of a cofactor. These introns are predominantly found in organellar genomes (mitochondria and chloroplasts), genes of protists and fungi, and genomes, with prokaryotic origins suggesting horizontal transfer to eukaryotic organelles. The splicing mechanism proceeds via two transesterification reactions: first, an exogenous (or GTP/GMP) attacks the 5' splice site, cleaving the upstream exon and attaching to the intron's 5' end; second, the newly freed 3' hydroxyl of the upstream exon attacks the 3' splice site, forming the ligated exons and releasing the linear intron, which often cyclizes via a 2'-3' . This guanosine-dependent pathway requires divalent metal ions like Mg²⁺ for and is highly efficient , with rate constants approaching physiological speeds. Group II introns, another major class of self-splicing elements, are structurally more complex with six helical domains and exhibit a branching mechanism analogous to that of spliceosomal introns, forming a lariat intermediate. These introns are common in mitochondrial and genomes of fungi, plants, and algae, as well as in bacterial genomes, where they often encode a multifunctional reverse transcriptase-like protein that promotes their mobility as retroelements. Splicing initiates with the 2' hydroxyl of a bulged (branch point) within domain VI attacking the 5' splice site, generating a lariat and freeing the upstream ; the second then joins the exons and releases the lariat , again facilitated by Mg²⁺ ions in the . Unlike Group I, can splice in the absence of exogenous cofactors, though some rely on maturase proteins encoded within the for stability . Self-splicing introns of both groups have prokaryotic origins, with Group I introns identified as over 42,000 across nature as of 2025 and numbering in the thousands, primarily in bacterial and organellar contexts, reflecting sporadic distribution and horizontal mobility that contributed to their spread into eukaryotic lineages. Group I and II introns are evolutionarily linked to the emergence of spliceosomal splicing through shared catalytic cores.

tRNA and minor spliceosomal splicing

tRNA splicing occurs in eukaryotes and , where introns are typically located within the anticodon loop of pre-tRNA transcripts.80287-1) These introns are removed through a protein-dependent pathway involving distinct enzymatic steps, contrasting with self-splicing mechanisms in bacteria.57862-0/fulltext) The process begins with site-specific cleavage by a heterotetrameric tRNA splicing endonuclease complex, composed of subunits homologous to Sen proteins in (such as Sen2, Sen34, Sen54, and Sen55), which recognizes structural features of the pre-tRNA rather than sequence alone. In , the endonuclease generates exons with 5'-hydroxyl and 2',3'-cyclic termini, leaving the intron as a linear fragment. The subsequent ligation step seals the exons using a multifunctional , such as Trl1 in , which first opens the 2',3'-cyclic to a 2'- intermediate before forming the standard 3'-5' phosphodiester bond.57862-0/fulltext) This pathway ensures the production of mature tRNA capable of participating in translation, with the cyclic intermediate being a hallmark of the eukaryotic and archaeal tRNA splicing mechanism.80287-1) A well-studied example is the in the tRNATyr gene (SUP6), where removal is essential not only for maturation but also for proper of the tRNA. In addition to tRNA processing, minor spliceosomal splicing handles a rare class of nuclear pre-mRNA introns known as U12-type, which constitute approximately 0.4% of human introns and often feature AU-AC terminal dinucleotides instead of the typical GU-AG. The minor spliceosome employs specialized small nuclear ribonucleoproteins (snRNPs): U11/U12 and U4atac/U6atac, along with the shared U5 snRNP, to recognize and excise these introns through a process analogous but distinct from major spliceosomal activity. These U12-type introns are enriched in genes expressed in neurons, suggesting specialized roles in neural function and development. Representative examples include AT-AC introns in human genes such as ATR, which encodes a key DNA damage response kinase and relies on minor spliceosome components for accurate isoform production.

Biochemical Mechanisms

Splice site recognition and consensus sequences

In eukaryotic pre-mRNA splicing, splice site recognition begins with the identification of motifs at the exon-intron boundaries and within introns, which serve as docking sites for small nuclear ribonucleoproteins (snRNPs) and auxiliary factors. These motifs ensure precise excision of introns and ligation of exons, with deviations from consensus often requiring additional regulatory elements for efficient processing. The core signals include the 5' splice site (5' SS), branch point sequence (BPS), and 3' splice site (3' SS), each exhibiting species-specific consensus patterns derived from extensive sequence analyses and modern computational tools such as position weight matrices (PWMs) and databases like U12DB. The 5' SS is defined by a nearly invariant GU dinucleotide at the start of the , forming part of a broader such as /exonCAG|GURAGU in mammals, where the denotes the cleavage point and R represents a . This GU motif, first identified in viral and cellular genes, is essential for base-pairing with the 5' end of U1 snRNA, initiating assembly. Upstream of the 5' SS, sequences resembling polypyrimidine tracts can influence recognition in certain contexts, though they are more prominently associated with the 3' SS. Mutations altering the GU dinucleotide abolish splicing, underscoring its critical role. The BPS, located approximately 20-50 upstream of the 3' SS, features a conserved residue that acts as the in the first step, forming a lariat intermediate. In mammals, the BPS consensus is YNCURAC (Y = , N = any , R = , underlined A = ), a motif identified through mutational analysis of β-globin pre-mRNA. This sequence binds SF1/mBBP and facilitates U2 snRNP association, with the distance to the 3' SS influencing efficiency; optimal spacing enhances lariat formation. The 3' SS consists of an AG dinucleotide immediately downstream of a polypyrimidine tract (Py tract), typically 12-20 uridine/cytidine-rich nucleotides that promote U2 auxiliary factor () binding. This Py-AG arrangement, conserved across metazoans, was established in early intron sequencing studies and is crucial for defining the acceptor site, with the Py tract compensating for weak AG contexts by recruiting . The scanning model posits that searches downstream from the BPS for the first suitable AG, ensuring accurate cleavage. Splice site recognition is modulated by cis-regulatory elements, including exonic splicing enhancers (ESEs) and intronic splicing enhancers (ISEs), which bind serine/arginine-rich ( to stabilize core site interactions, particularly for suboptimal sequences. Conversely, exonic splicing silencers (ESSs) and intronic splicing silencers (ISSs) recruit heterogeneous nuclear ribonucleoproteins (hnRNPs), such as hnRNP A1, to repress usage. ESE motifs, often purine-rich (e.g., GAR repeats), were first characterized in the immunoglobulin μ chain and promote exon inclusion by recruiting like SF2/ASF. ISEs and ISSs, identified through systematic screens, similarly influence site choice; for instance, G-rich ISEs bind hnRNP F/H to enhance upstream definition. These elements are essential for fine-tuning splicing fidelity. Variations in splice site consensus exist, notably in U12-type introns, a minor class (~1% in humans) processed by the minor and featuring AU-AC dinucleotides instead of GU-AG. These were discovered through computational analysis of divergent 5' sequences and exhibit extended consensus like /RTATCCTTT/, with higher conservation due to their rarity. U12-type introns also have a distinct BPS (UUCCUAAC). Weak splice sites, deviating significantly from consensus (e.g., non-GU 5' ), depend on auxiliary factors like binding ESEs/ISEs to compensate for poor base-pairing with snRNAs, as demonstrated in studies of β-globin introns. Such enhancements can increase splicing efficiency by 10- to 100-fold for suboptimal sites.
Splice SiteConsensus Motif (Mammals)Key FeaturesBinding Factor
5' SSCAG|GURAGUGU dinucleotide invariant; R = U1
BPSYNCURAC (20-50 nt upstream of 3' SS)Underlined A = branch ; Y = SF1, U2
3' SS(YnC)AG|Py tract (YnC, n=12-20); AG invariantU2AF

Spliceosome assembly and catalysis

The assembles stepwise on pre-mRNA introns in a dynamic, ATP-dependent process involving small nuclear ribonucleoproteins (s) and numerous proteins. The initial commitment complex, known as the E complex, forms when U1 recognizes and base-pairs with the 5' splice site (5' SS), committing the pre-mRNA to splicing; this ATP-independent step also involves loosely associated U2 and proteins like SF1/mBBP and the U2AF65/U2AF35 heterodimer at the branch point sequence (BPS) and polypyrimidine tract (PPT), respectively. Subsequent ATP-dependent remodeling leads to the prespliceosome or A complex, where U2 stably binds the BPS via base-pairing between U2 snRNA and the BPS, displacing SF1/mBBP and forming a U1-U2-pre-mRNA network that bridges the 5' SS and BPS. The A complex then recruits the preformed U4/U6.U5 tri-snRNP, along with proteins like Prp19/CDC5L complex members, to form the , marking a major compositional expansion with over 100 proteins. Activation of the to the B* state involves extensive structural rearrangements, including ATP-dependent release of U1 and U4 snRNPs, dissociation of U4 from U6, and base-pairing between U6 snRNA and the 5' SS; this repositions the catalytic RNA elements, with U6 snRNA's ACAGAGA sequence forming the alongside U2-U6 helix II. Cryo-EM structures from onward have revealed these conformational changes in atomic detail, such as the B* complex at 3.5 resolution showing U6 snRNA docking into the catalytic center and magnesium ions positioning for nucleophilic attack. The fully activated then proceeds to the C complex states (C1 before and C2 after the first reaction), where the core undergoes further remodeling to enable while maintaining alignment. Splicing catalysis occurs via two sequential reactions within the spliceosome's RNA-based . In the first step, the 2'-OH of the adenosine (A) attacks the 5' SS phosphodiester bond, cleaving the 5' and forming a lariat intermediate where the intron 5' end is linked via a 2'-5' to the BPS A; this reaction generates a free 3'-OH on the 5' and is facilitated by U6 snRNA positioning the reactive groups. The second follows, with the 5' exon's 3'-OH attacking the 3' SS , ligating the s and releasing the lariat intron; cryo-EM snapshots of post-branching intermediates at 3.8 Å resolution illustrate the A remaining anchored near the during this transition, ensuring precise joining. These reactions highlight the spliceosome's role as a , with proteins stabilizing the dynamic RNA scaffold.

Energy and cofactor requirements

RNA splicing in the spliceosome is an energy-intensive process that relies heavily on to drive conformational rearrangements and maintain fidelity. DEAD-box and DEAH-box RNA helicases, such as Prp2 and Prp16, utilize ATP to remodel the spliceosomal structure during key transitions, including the activation of the catalytic center and after the first step. These helicases hydrolyze numerous ATP molecules per splicing event to facilitate the dynamic assembly and disassembly of the . Magnesium ions (Mg²⁺) serve as essential cofactors for the catalytic steps of splicing, coordinating the within the U6 snRNA and pre-mRNA to enable the two reactions. In the , two Mg²⁺ ions are positioned to activate the for the first step (branch point attack) and stabilize the for the second step ( ligation), analogous to the two-metal-ion mechanism observed in group II intron self-splicing. Depletion of Mg²⁺ severely impairs splicing efficiency, underscoring its role in positioning substrates for precise formation. GTP hydrolysis also contributes to splicing energetics through GTPases like Snu114, a U5 snRNP-associated protein homologous to translation elongation factors such as eEF2. Snu114 facilitates spliceosome activation by promoting U4/U6 snRNA unwinding in an ATP-independent but GTP-dependent manner, acting as a regulatory switch during early assembly stages. Additionally, GTPases with eEF1A-like features support post-splicing recycling by aiding in the release and reuse of spliceosomal components, ensuring efficient turnover in high-demand cellular contexts. The necessity of these energy inputs is highlighted by inhibitors like spliceostatin A, which targets the SF3b subcomplex of snRNP and prevents ATP-dependent branch point recognition, thereby blocking early spliceosome assembly and demonstrating how energy cofactors underpin substrate binding fidelity.

Alternative Splicing

Mechanisms of isoform generation

Alternative splicing generates protein diversity by producing multiple mRNA isoforms from a single pre-mRNA transcript, primarily through variations in exon inclusion, , or splice site selection. In humans, over 95% of multi-exon genes undergo , resulting in an average of approximately 7 transcript isoforms per gene. This process relies on the recognition of splice sites defined by consensus sequences at exon-intron boundaries, allowing the to assemble in different configurations. One prevalent mechanism is , also known as cassette exon splicing, where one or more internal s are omitted from the mature mRNA while adjacent s are joined directly. This is the most common form of in mammals, accounting for 50-60% of events, and can occur as a default process or be modulated to produce functional variants. For instance, skipping of a cassette exon alters the coding sequence, potentially changing structure without disrupting the . Intron retention represents another key mechanism, in which one or more are not excised and remain in the mature mRNA, often leading to premature stop codons or regulatory non-coding transcripts. This event is particularly prominent in stress responses, where retained introns can delay or trigger under conditions like hypoxia or heat shock. Intron retention is less frequent than but plays a critical role in fine-tuning during environmental challenges. Mutually exclusive exons provide a mechanism for precise isoform diversification, where only one exon from a pair or cluster is included in the mRNA, excluding the other. This strict alternative selection ensures mutually incompatible splicing outcomes and is rarer than other modes but essential for generating extensive isoform repertoires. A striking example is the Dscam gene in , which produces up to 38,016 isoforms through mutually exclusive splicing of exon clusters in four variable regions, enabling neuronal self-recognition. Cassette exons, often overlapping with skipping events, further contribute by toggling inclusion or exclusion of individual exons, amplifying isoform complexity across the transcriptome.

Regulatory factors

Regulatory factors in RNA splicing encompass a diverse array of proteins and signaling pathways that fine-tune splice site selection and splicing efficiency, often by binding to cis-regulatory elements in pre-mRNA or modulating the activity and localization of core splicing components. These factors act antagonistically or cooperatively to promote or repress specific splicing events, enabling tissue-specific and stimulus-responsive patterns essential for cellular diversity.00429-8) SR proteins, a family of serine/arginine-rich splicing factors, primarily promote inclusion by binding to exonic splicing enhancers (ESEs) within pre-mRNA s, thereby recruiting the and stabilizing splice site recognition. For instance, SRSF1 (also known as SF2/ASF) binds ESEs to enhance the use of nearby 5' and 3' splice sites, influencing in genes involved in and .80076-3) This activation is mediated through interactions between the RNA recognition motifs (RRMs) of SR proteins and U1/U2 snRNPs, as well as protein-protein contacts via their RS domains.00429-8.pdf) In contrast, heterogeneous nuclear ribonucleoproteins (hnRNPs) often function as repressors of splicing by binding to exonic or intronic splicing silencers (ESS/ISS), which block splice site access or promote . PTB (PTBP1, also classified as hnRNP I) exemplifies this role, binding to polypyrimidine-rich sequences upstream of weak 3' splice sites to inhibit assembly and favor exon exclusion in neuronal genes like c-src.00907-1.pdf)81399-9) Similarly, hnRNP A1 competes with at splice sites, favoring distal 5' splice site usage and repressing proximal exons in a concentration-dependent manner.90477-T)00043-5.pdf) Additional coregulators, such as TIA1 and Sam68, provide context-specific modulation of splicing outcomes. TIA1, an with three RRMs, represses exon inclusion by binding to uridine-rich sequences in 3' UTRs or introns, thereby blocking U1 snRNP association and promoting in pro-apoptotic genes like Fas.01765-X) Sam68 (KHDRBS1), a STAR family protein, influences through phosphorylation-dependent interactions, promoting exon inclusion in and repressing it in Bcl-x to regulate and survival.00163-3) These coregulators often integrate with SR and hnRNP activities to fine-tune isoform ratios. Phosphorylation of SR proteins by kinases like SRPK1 and CLK1 serves as a key post-translational modification that regulates their subcellular localization, activity, and splicing specificity. SRPK1, localized in the , hyperphosphorylates RS domains to promote nuclear import of SR proteins, enhancing their availability for splice site activation during active transcription.30220-9) CLK1, a nuclear , further modifies SR proteins to alter their binding affinity and promote splicing of specific introns, such as those with non-consensus sequences.30220-9.pdf) This dynamic cycle allows rapid adjustment of splicing in response to cellular signals.01567-4) Extracellular signaling pathways, particularly the ERK/MAPK cascade, integrate environmental cues into splicing regulation by phosphorylating SRPK1, which in turn modulates shuttling and activity. Activation of ERK by growth factors leads to SRPK1 hyperactivation, increasing nuclear accumulation of SRSF1 and promoting pro-oncogenic splicing events in genes like Ron. This pathway exemplifies how signaling networks couple splicing to cellular states like proliferation or stress. A 2025 discovery from MIT researchers revealed that secondary structure influences targeting through interactions involving Luc7 proteins and U1 snRNA, adding a structural dimension to splice site selection. Specifically, Luc7 proteins recognize the helical (right- or left-handed) of 5' splice site duplexes formed with U1 snRNA, thereby enhancing recruitment to approximately 50% of human introns and providing a novel layer of regulatory control beyond sequence motifs. This mechanism highlights how pre-mRNA folding can dictate splicing efficiency and has implications for diseases like , where Luc7L2 mutations disrupt this process.

Functional and evolutionary implications

Alternative splicing significantly expands the functional repertoire of the by enabling a single to produce multiple protein isoforms with distinct properties. In humans, approximately 20,000 protein-coding generate over 100,000 distinct protein isoforms through , far exceeding the diversity that could arise from gene number alone.00019-1) This mechanism allows for fine-tuned regulation of protein function, localization, and interactions, thereby enhancing cellular adaptability without requiring genomic expansion. For instance, isoforms can differ in enzymatic activity, subcellular targeting, or binding affinities, contributing to the complexity of cellular processes. Tissue-specific alternative splicing further amplifies this diversity, with the brain exhibiting the highest levels of isoform variation to support specialized neuronal functions. In neural tissues, alternative splicing generates diverse synaptic proteins that modulate connectivity and signaling; a prominent example is the neurexin gene family, where combinatorial splicing at multiple sites produces thousands of isoforms critical for formation and plasticity. These brain-enriched isoforms, such as those of , enable precise control over release and receptor interactions, underscoring how alternative splicing facilitates tissue-specific customization.00799-5) Evolutionarily, alternative splicing emerged around 1.5 billion years ago in early eukaryotes, providing a mechanism to increase proteomic complexity prior to the diversification of families. Its prevalence has steadily risen over the past 1.4 billion years, particularly accelerating in metazoans to drive adaptations in multicellular organisms by allowing rapid evolution of isoform repertoires without altering core sequences. This evolutionary advantage is evident in how supports developmental innovations and environmental responses in complex animals, where isoform diversity correlates with increased organismal complexity. The core spliceosomal machinery remains highly conserved across eukaryotes, reflecting its ancient origins, while alternative splicing patterns diverge rapidly between species, enabling lineage-specific adaptations.

Cellular and Co-transcriptional Context

Co-transcriptional splicing dynamics

Co-transcriptional splicing refers to the process where pre-mRNA introns are removed by the as the nascent RNA is being synthesized by (Pol II). This coupling ensures that splicing factors associate with the elongating transcript in real time, enhancing efficiency and allowing transcription dynamics to influence splicing outcomes. The C-terminal domain (CTD) of Pol II plays a central role in this coordination, serving as a scaffold that recruits splicing factors such as the U1 snRNP and through phosphorylation-dependent interactions, particularly on serine 5 and serine 2 residues of the CTD heptad repeats. This recruitment facilitates spliceosome assembly on nascent RNA even before the entire intron is transcribed, preventing the accumulation of unspliced intermediates. The kinetics of co-transcriptional splicing are rapid and tightly linked to Pol II elongation rates. Studies using live-cell imaging and nascent RNA sequencing have shown that splicing of the first often begins as soon as the intron emerges from Pol II, with 50% completion occurring when Pol II is just 45 nucleotides downstream of the 3' splice site. In many genes, the first is fully spliced during the transcription of the second exon or , with overall intron removal occurring within 20-30 seconds in cells. However, delays arise at weak splice sites, where suboptimal consensus sequences slow assembly; in such cases, slower Pol II elongation rates provide more time for recognition, promoting inclusion of alternative exons via a kinetic competition model. Recent advances in long-read sequencing technologies, such as and PacBio, have provided deeper insights into these dynamics by capturing full-length nascent transcripts. These methods reveal that ~75% of introns are spliced co-transcriptionally before 3' end cleavage and , underscoring the cross-regulation between splicing and termination steps. This timing ensures efficient mRNA maturation and prevents aberrant processing. Co-transcriptional splicing also impacts regulatory processes, including alternative splicing (AS) and circular RNA (circRNA) formation. Promoter choice influences AS by altering Pol II pausing or CTD phosphorylation patterns, which in turn affect the recruitment of splicing regulators and exon inclusion rates; for instance, strong promoters can suppress distal exon usage in certain genes. Similarly, back-splicing events that generate circRNAs often occur co-transcriptionally, facilitated by intron-pairing elements that loop the RNA and promote splice site convergence during Pol II elongation. These mechanisms highlight how transcription-splicing coupling fine-tunes gene expression and RNA diversity.

Subnuclear organization and speckles

Nuclear speckles, also known as interchromatin granule clusters, are dynamic, membraneless subnuclear compartments that serve as major sites of organization for splicing machinery within the eukaryotic nucleus. These structures are typically irregular and punctate, numbering approximately 25-50 per mammalian , and occupy interchromatin regions of the nucleoplasm. They form through , a process driven by intrinsically disordered regions in key proteins, enabling the concentration of splicing factors in a non-membranous environment conducive to rapid assembly and disassembly. Nuclear speckles are enriched in serine/arginine-rich (SR) proteins and small nuclear ribonucleoproteins (snRNPs), which are essential components of the , along with other pre-mRNA processing factors. While splicing itself primarily occurs co-transcriptionally at active loci, nuclear speckles function as storage and recycling hubs for these factors, allowing them to diffuse from transcription sites to speckles for replenishment and modification before redeployment. This spatial organization facilitates efficient factor availability, with , for instance, exhibiting phosphorylation-dependent release from speckles to participate in splicing events. The dynamics of nuclear speckles respond to cellular conditions affecting splicing activity; for example, inhibition of transcription with 5,6-dichloro-1-β-D-ribofuranosylbenzimidazole (DRB) leads to the enlargement and fusion of speckles as splicing factors accumulate without active gene sites to target. Recent advances in , such as expansion microscopy combined with structured illumination, have revealed the nanoscale mobility and internal organization of speckles, showing substructures that move directionally over micrometer scales in live cells, underscoring their role as adaptive platforms for gene expression regulation.

Response to cellular stress and DNA damage

Cellular stress, such as heat shock, triggers alternative splicing events that promote adaptive responses, notably through intron retention in heat shock protein transcripts like those of the family. During heat stress, intron retention increases in genes, leading to the production of transcripts that are often targeted for degradation, thereby fine-tuning the heat shock response to prevent protein overload. This mechanism is exemplified in , where heat shock induces intron retention in multiple paralogs, resulting in mRNA isoforms that enhance stress tolerance. Furthermore, the helicase plays a critical role in coupling this intron retention to nonsense-mediated decay (NMD), ensuring rapid turnover of retained-intron transcripts under heat shock conditions to maintain cellular . In response to DNA damage, the and ATR kinases activate signaling pathways that SR proteins, thereby altering splice site selection and promoting patterns conducive to repair and survival. ATM/ATR-mediated of SR proteins like SRSF1 modulates their activity, leading to shifts in that favor isoforms involved in and regulation. For instance, DNA damage induces in p53 target genes, generating isoforms such as p53β that enhance and to prevent propagation of damaged cells. This splicing switch is particularly evident in response to , where increased occurs in genes linked to , amplifying the DNA damage response. Recent studies in highlight the role of in adapting to temperature extremes, with splicing factors facilitating isoform diversity that bolsters and cold tolerance. In , cold stress induces regulated by CBF transcription factors through protein condensation, which modulates splicing efficiency of stress-responsive genes to enhance freezing tolerance. Similarly, under stress in , of the TaHSFA6e regulates expression, enabling transcriptional adaptation and improved thermotolerance. These dynamic splicing changes, observed in 2025 analyses, underscore splicing's contribution to plant resilience against abiotic stresses by generating functional isoforms that optimize metabolic and protective pathways. DNA damage checkpoints integrate splicing inhibition to halt cell cycle progression, allowing time for repair and preventing genomic instability. Inhibition of pre-mRNA splicing following double-strand breaks impairs the expression of cell cycle regulators, thereby enforcing G2/M arrest via activation of ATM/CHK2 pathways. This splicing-mediated slowdown is evident in the altered processing of transcripts encoding checkpoint proteins like CHK1, where splice variants such as CHK1-short act as endogenous inhibitors to reinforce the DNA damage response. Overall, these mechanisms ensure that splicing adapts transiently to stress signals, prioritizing repair over proliferation.

Biological Roles and Evolution

Evolutionary origins and conservation

The evolutionary origins of RNA splicing can be traced to self-splicing introns in prokaryotes, where group I and group II introns represent ancient ribozyme-based mechanisms for intron removal. Group I introns, which rely on an external guanosine cofactor to initiate splicing via two transesterification reactions, are found in bacterial genomes, bacteriophages, and organelles such as mitochondria and chloroplasts. Group II introns, which self-splice to form a lariat intermediate without external cofactors, are more widespread, occurring in diverse bacterial lineages, archaeal genomes, and through horizontal transfer to organellar genomes in eukaryotes. These introns likely originated in ancient prokaryotes billions of years ago and spread via retrohoming mechanisms, with group II introns invading the protomitochondrion and contributing to early organellar splicing. Such horizontal transfers highlight the mobile nature of these elements, which facilitated their dissemination across prokaryotic domains before eukaryotic divergence. The transition to eukaryotic spliceosomal splicing occurred during approximately 2 billion years ago, coinciding with the evolution of the and the compartmentalization of transcription and processing. The is widely accepted to have arisen through the progressive disassembly and protein-assisted recruitment of components, transforming autonomous ribozymes into a dynamic ribonucleoprotein complex. A key piece of evidence for this origin is the structural and between the U6 (snRNA) in the and the catalytic domain V of , including conserved motifs that coordinate magnesium ions for the splicing reaction. This evolutionary repurposing enabled more precise and regulated splicing of numerous introns in eukaryotic pre-mRNAs, adapting the mechanism to the increased genomic complexity of early eukaryotes. The core splicing machinery exhibits remarkable conservation across all eukaryotic lineages, with the U1, , U4, U5, and U6 snRNAs maintaining near-identical secondary structures and functional elements despite billions of years of divergence. This universality is evident even in early-branching protists like intestinalis, where the spliceosomal snRNAs and associated proteins show high similarity to those in , indicating that the was fully assembled in the last eukaryotic common ancestor. Such conservation underscores the essentiality of these components for viability, as disruptions lead to severe defects in splicing fidelity. Recent transcriptome-wide analyses in 2024, involving systematic knockdown of 305 spliceosome components in human cells, have reconstructed splicing regulatory networks that reveal the core machinery's interconnected and specialized functions, further demonstrating its evolutionary robustness and preservation across metazoans.

Roles in development and multicellularity

Alternative splicing (AS) plays a pivotal role in developmental processes by generating protein isoforms that drive cell fate decisions, tissue differentiation, and temporal . In multicellular organisms, AS contributes to the complexity required for specialized cell types, enabling precise control over developmental timing and the establishment of tissue-specific functions. This mechanism allows a limited to produce diverse proteomes, supporting the transition from pluripotent states to differentiated lineages. A classic example of AS in development occurs in Drosophila melanogaster sex determination, where the transformer (tra) gene undergoes female-specific exon inclusion regulated by the Sex-lethal (Sxl) protein, which promotes productive splicing to produce a functional Tra protein. This Tra protein, in turn, directs the female-specific splicing of the downstream doublesex (dsx) pre-mRNA, leading to sex-specific transcription factors that govern somatic . In males, default splicing of tra yields a non-functional isoform, resulting in male development. This cascade illustrates how AS establishes binary developmental switches essential for . In neural development, AS facilitates isoform switches that coordinate neuronal maturation and . For instance, during mammalian , thousands of undergo temporal splicing changes, such as the exclusion of the NUMB exon in neural progenitors to promote neuronal differentiation, regulated by polypyrimidine tract-binding proteins (PTBPs). These switches often occur before formation and affect genes involved in RNA and , enabling the diversification of neuronal identities across regions. Similar patterns are observed in neocortical development, where isoform diversity peaks during to support circuit assembly. AS also underpins multicellularity by promoting cell-type-specific isoforms, as seen in muscle differentiation where the cardiac troponin T (Tnnt2) gene produces distinct isoforms via alternative inclusion. In embryonic , muscle-specific splicing enhancers drive the incorporation of a cardiac-like exon, generating isoforms that modulate contractile properties and support the transition from embryonic to adult muscle fibers. This isoform diversity enhances functional specialization, allowing striated muscles to adapt to varying physiological demands across tissues. Splicing factors like Muscleblind-like (MBNL) proteins further regulate pluripotency in stem cells by repressing embryonic-specific AS events. In human embryonic stem cells, MBNL1 and MBNL2 suppress pluripotency-associated isoforms, such as those of transcription factors like OCT4, promoting differentiation into lineages like myoblasts. Knockdown of MBNL enhances expression of pluripotency genes, underscoring their role as negative regulators that fine-tune the balance between self-renewal and lineage commitment. These developmental roles of AS are highly conserved across species, with similar patterns observed in nematodes like and mammals. In C. elegans, AS events in ~18% of genes shift >4-fold during development, mirroring mammalian neural and muscle isoform dynamics and supporting conserved mechanisms for tissue specification. For example, quantitative changes in splice variant ratios during C. elegans embryogenesis parallel those in mammalian , indicating evolutionary retention of AS for multicellular complexity.

Splicing in pathogens like HIV

The human immunodeficiency virus type 1 (HIV-1) genome is transcribed as a single primary pre-mRNA transcript that undergoes complex alternative splicing to produce over 50 distinct mRNA isoforms essential for viral replication. This process utilizes four major splice donor sites (D1 to D4) and eight major splice acceptor sites (A1 to A7, including subtypes like A4a, A4b, and A4c), along with minor cryptic sites, to generate unspliced, singly spliced, and multiply spliced transcripts. For instance, multiply spliced isoforms encode regulatory proteins such as Tat, which activates viral transcription, and Rev, which facilitates nuclear export of viral RNAs; these are produced through specific combinations of splice sites, including usage of the A3 acceptor for Tat and A7 for Rev. HIV-1 employs exonic splicing silencers (ESS) and enhancers (ESE/SES), often A-rich or purine-rich sequences, to fine-tune splice site selection and maintain a balance among isoforms, preventing over-splicing that would deplete genomic or structural protein-encoding RNAs. A key viral mechanism for regulating splicing outcomes involves the Rev protein, which binds to the Rev response element (RRE) in unspliced and singly spliced viral transcripts, recruiting the CRM1 exportin to enable their nuclear export to the . This bypasses the nuclear retention typically imposed on intron-containing RNAs by the host machinery, allowing production of , Pol, , and accessory proteins like Vif and Vpr without complete splicing. Rev does not directly inhibit splicing but shifts the equilibrium toward partially spliced forms by promoting their export, thereby coupling splicing regulation to viral gene expression timing during the replication cycle. Following proviral integration into the host genome, HIV-1 transcription produces full-length pre-mRNAs that must be appropriately spliced by the host machinery to generate functional viral proteins. Defects in splice site usage or regulatory elements, such as in donor or acceptor sites, disrupt isoform production and severely impair by reducing levels of essential transcripts like those for or Tat. For example, alterations in the major splice donor can activate cryptic sites but often fail to compensate for the loss of canonical isoforms, leading to diminished . Similar splicing exploitation occurs in other pathogens, such as human T-lymphotropic virus type 1 (HTLV-1), where the viral oncoprotein reprograms host to promote cellular transformation. recruits to intragenic sites, altering splice site selection in host genes involved in , control, and immune signaling, which contributes to oncogenesis in adult T-cell leukemia/lymphoma (ATLL). These changes include Tax-dependent splicing events that are recurrently observed in ATLL patient samples, underscoring the role of viral hijacking of host splicing in pathogenesis.

Errors, Diseases, and Variation

Splicing defects and genetic diseases

Splicing defects in RNA processing often result from mutations that disrupt canonical splice sites or impair the function of splicing factors, leading to aberrant mRNA isoforms and a range of inherited monogenic disorders. These mutations can cause , intron retention, or activation of cryptic splice sites, ultimately producing truncated, elongated, or unstable proteins that underlie disease phenotypes. Such defects are particularly prevalent in genes with complex splicing patterns, where even subtle changes in consensus sequences at -exon boundaries can abolish normal recognition. Estimates indicate that 15–30% of disease-causing variants in inherited disorders affect pre-mRNA splicing, highlighting the substantial contribution of splicing aberrations to human genetic pathology. For instance, mutations directly altering splice donor or acceptor sites represent a common mechanism, as seen in β-thalassemia, where the IVS1-1 G>A substitution in the HBB gene disrupts the 5' splice site of intron 1. This change prevents normal splicing, leading to complete skipping of 1 or production of non-functional mRNA that undergoes , resulting in β^0-thalassemia major with severe . Similar splice site alterations occur in other hemoglobinopathies and neuromuscular disorders, emphasizing the fragility of these regulatory elements to single nucleotide changes. Mutations in genes encoding splicing factors can also cause widespread splicing dysregulation, as exemplified by (SMA), a leading genetic cause of . SMA arises from homozygous deletion or point mutations in the gene, which encodes the survival (SMN) protein essential for assembly of small nuclear ribonucleoproteins (s). Defective SMN variants, such as those identified in SMA patients, exhibit reduced binding to Sm proteins and fail to facilitate snRNP biogenesis , leading to depleted functional spliceosomes and selective degeneration. This results in progressive , with severity correlating inversely with SMN2 copy number, which partially compensates via . Diagnosis of splicing defects relies on functional assays to validate the pathogenicity of identified variants, particularly when predictions are ambiguous. Minigene assays, which involve patient-derived genomic fragments into reporter constructs and transfecting them into cell lines, provide direct evidence of splicing outcomes by analyzing resulting transcripts via RT-PCR or sequencing. For example, these assays have confirmed spliceogenic effects of variants in genes like ATP7B for Wilson disease and FLCN for Birt-Hogg-Dubé syndrome, enabling reclassification of variants of uncertain significance and guiding clinical management. Such approaches are increasingly integrated into pipelines to bridge the gap between genomic sequencing and functional impact.

Dysregulation in cancer

Dysregulation of RNA splicing is a hallmark of many cancers, where mutations in splicing factors or altered splicing patterns generate oncogenic isoforms that promote tumor growth, survival, and metastasis. In myelodysplastic syndromes (MDS) and (AML), mutations in the splicing factor SF3B1 are among the most common, occurring in up to 20-30% of MDS cases and altering recognition during assembly. These mutations, such as the hotspot K700E variant, lead to aberrant 3' splice site usage and inclusion of cryptic exons, resulting in mis-spliced transcripts that disrupt hematopoiesis and confer a proliferative advantage to leukemic cells. Alternative splicing events in cancer often produce pro-tumorigenic isoforms that enhance invasiveness and metastasis. For instance, the CD44v6 isoform, generated by inclusion of variable exon 6, is upregulated in colorectal and breast cancers, where it acts as a marker of cancer stem cells and drives epithelial-mesenchymal transition (EMT), facilitating tumor cell migration and distant colonization. In neuroblastoma, skipping of exon 12A in the BIN1 gene produces isoforms lacking the MYCN-interacting domain, reducing the tumor suppressor function of BIN1 and allowing unchecked MYCN activity that correlates with aggressive disease and poor prognosis. Recent studies have illuminated the broader splicing regulatory landscape in cancer through systematic perturbation experiments. A 2024 transcriptome-wide analysis involving knockdown of 305 components and regulators in human cancer cell lines revealed specialized subnetworks where specific factors control distinct splicing programs, with disruptions promoting oncogenic splicing patterns such as in tumor suppressors or inclusion in proto-oncogenes. These findings underscore how splicing factor dependencies create vulnerabilities exploitable for . Advances in targeting splicing dysregulation have progressed, with splicing modulators showing promise against SF3B1-mutant cancers. Emerging therapies, including inhibitors targeting minor components, have demonstrated preclinical efficacy in slowing tumor growth in various cancers as of 2025.

Natural variation and splicing errors

Natural variation in RNA splicing arises from genetic polymorphisms that influence splice site recognition and patterns across human populations. Single polymorphisms (SNPs) located within or near splice sites, such as the GT-AG dinucleotides, can subtly alter splicing efficiency and isoform ratios in healthy individuals. For instance, SNPs in the 5' splice site consensus sequence often lead to allele-specific , contributing to inter-individual differences in without causing overt pathology. Studies have identified thousands of such variants, with common SNPs in intronic or exonic regions modulating splicing outcomes in a tissue-specific manner. Splicing quantitative trait loci (sQTLs) represent a key mechanism underlying this population-level variation, where genetic variants quantitatively affect splicing efficiency rather than abolishing it. Genome-wide analyses across diverse ancestries have mapped over 100,000 sQTLs in human tissues, revealing that these loci often overlap with (eQTLs) and influence isoform abundance for genes involved in and . In multi-ancestry cohorts, sQTL effects show partial conservation but also population-specific patterns, highlighting how evolutionary pressures shape splicing diversity. This variation enables adaptive responses, such as fine-tuning protein diversity in response to environmental factors, while maintaining overall splicing fidelity. Stochastic splicing errors, though rare in young healthy cells, are filtered by cellular mechanisms like (), which degrades aberrant isoforms containing premature termination codons. targets approximately 5-10% of alternatively spliced transcripts under normal conditions, preventing the accumulation of potentially harmful proteins from erroneous splicing events, such as or cryptic splice site activation. This pathway ensures that only functional isoforms predominate. Genes with multiple are under stronger selective pressure for high splicing fidelity to minimize the cumulative risk of errors across splice sites. retention, a common event, can also trigger if it introduces codons, further refining . With advancing age, splicing fidelity declines globally, particularly in post-mitotic neurons, leading to increased errors that accumulate over time. Long-read RNA sequencing studies from 2025 demonstrate that splicing accuracy declines significantly in aged tissues compared to younger samples, with mis-splicing enriched in neuronal genes involved in synaptic function and . These age-related perturbations, including higher rates of retention and , correlate with progressive neuronal dysfunction and contribute to the molecular underpinnings of neurodegeneration in otherwise healthy aging brains. By 2025, routine -based analytical tools have become integral for predicting the effects of splicing in clinical and research settings. High-throughput RNA sequencing combined with models, such as SpliceAI, enables accurate of splice-altering by scoring their impact on splice site strength and isoform production with over 90% precision. These tools facilitate rapid assessment of polymorphic SNPs, integrating genomic and transcriptomic data to forecast functional consequences without invasive experimentation. For example, routine application in genomic diagnostics has revealed limitations in purely predictions, emphasizing the need for empirical validation to discern benign variation from deleterious changes.

Experimental and Therapeutic Advances

Techniques for splicing manipulation

Techniques for manipulating RNA splicing in laboratory settings have evolved from chemical blockade methods to engineering tools and high-throughput assays, enabling researchers to dissect splicing mechanisms and test regulatory elements. These approaches target splice sites, enhancers, or silencers to alter inclusion or exclusion, providing insights into splicing dynamics without relying solely on computational predictions. By intervening at specific pre-mRNA sequences, such as the 5' or 3' splice sites, these techniques allow precise control over outcomes. Antisense (), including morpholino variants, represent a foundational method for splicing manipulation by sterically blocking key regulatory sequences on pre-mRNA. Morpholinos, uncharged synthetic analogs of , bind with high affinity to target sites like exon-intron boundaries or splicing enhancers, preventing recognition by the and inducing . This approach was pioneered in studies of (DMD), where morpholinos targeting exon 23 in the dystrophin pre-mRNA restored the reading frame and produced truncated but functional protein in patient-derived muscle cells. For instance, in mdx mouse models of DMD, systemic delivery of morpholino achieved up to 20-30% restoration in , demonstrating efficient skipping without off-target effects on unrelated exons. These tools are versatile for both cell culture and models, with modifications like phosphorodiamidate linkages enhancing stability and cellular uptake. CRISPR-based systems have advanced splicing manipulation by enabling programmable RNA targeting with high specificity, often using catalytically dead Cas13 variants (dCas13) to recruit regulatory factors or block splice sites. In one configuration, dCas13 is fused to MS2 coat protein domains, which bind MS2 hairpin loops engineered into guide RNAs (gRNAs); these loops then recruit splicing factors like SRSF1 or hnRNP A1 to enhance or repress exon inclusion. This recruitment strategy has been applied to modulate in endogenous transcripts, achieving up to 50% shifts in isoform ratios for genes like SMN2 in models. A notable 2023 development, CRISPR-mediated Splice Editing, utilizes Cas13d to deliver large RNA segments for insertion or replacement at splice junctions, correcting multi-exon deletions in transcripts with efficiencies exceeding 40% in human cell lines. These methods outperform traditional in , allowing simultaneous of multiple splice sites via arrayed gRNAs. Reporter assays employing minigenes provide a quantitative framework for studying splicing regulation in , typically integrating genes to measure inclusion efficiency. A common design fuses a test -intron cassette upstream of a reporter, where splicing events alter the to produce functional (light-emitting) or non-functional protein; a co-transfected Renilla luciferase normalizes for transfection variability. Dual- minigene systems have quantified changes, such as in variable inclusion modulated by ERK signaling, revealing up to 5-fold shifts in luciferase activity upon pathway activation. These assays are scalable for of splicing modulators, with recent adaptations using ratiometric readouts to detect subtle isoform shifts in real time. Minigenes recapitulate endogenous splicing patterns when genomic context is included, making them essential for validating cis-regulatory elements.00733-X/fulltext) Fluorescence in situ hybridization (FISH) techniques visualize splicing kinetics in fixed cells, particularly for co-transcriptional events, by probing unspliced pre-mRNA with intron-specific . Single-molecule FISH (smFISH) detects nascent transcripts at transcription sites, allowing measurement of intron retention times; for example, in genes, it revealed splicing delays of 5-10 minutes for distal introns compared to proximal ones. High-throughput variants like HiFENS amplify signals via hybridization chain reaction to quantify endogenous splicing isoforms across thousands of cells, identifying co-transcriptional splicing efficiencies above 80% for most human genes. These imaging methods complement live-cell approaches, providing spatial resolution to track assembly along the gene body.

Splicing-targeted therapies

Splicing-targeted therapies aim to correct aberrant RNA splicing patterns underlying various genetic diseases, particularly those involving neuromuscular disorders like (SMA) and (DMD), where splicing defects lead to loss of functional proteins such as SMN and . These approaches leverage () or small molecules to modulate splice site recognition, inclusion, or exclusion, thereby restoring protein production without altering the genome. One of the earliest successes is (Spinraza), an ASO approved by the FDA in December 2016 for treating SMA across all ages and types. works by binding to an intronic splicing silencer in the SMN2 gene, promoting inclusion of 7 and increasing full-length SMN protein levels by up to 10-fold in preclinical models, which has shown sustained motor function improvements in clinical trials. Building on this, (Exondys 51), another ASO, received accelerated FDA approval in 2016 for DMD patients amenable to 51 skipping, addressing about 13-14% of cases with specific mutations in the DMD gene. By targeting exon-splicing enhancer sequences, eteplirsen induces skipping of 51 during pre-mRNA processing, restoring the and enabling production of a truncated but partially functional protein, with clinical data indicating increased dystrophin expression in muscle biopsies. In the pipeline for SMA, (Evrysdi), the first orally bioavailable small-molecule splicing modulator, was approved in 2020 but continues to expand indications through ongoing trials, such as for presymptomatic infants; in February 2025, the FDA approved a tablet to improve administration. enhances SMN2 7 inclusion by binding to an upstream splicing enhancer, elevating SMN protein levels in the and peripheral tissues, with phase 3 trials demonstrating significant improvements in motor milestones for infants with type 1 SMA. As of 2025, advances in RNA therapeutics have introduced novel strategies for splicing defects, including mRNA editing technologies like ADAR-mediated base editing to correct splice site mutations directly in transcripts. These approaches, such as guide RNA-directed editing, enable precise A-to-I changes at aberrant splice junctions, potentially treating a broader range of monogenic diseases beyond SMA and DMD, with early preclinical data showing up to 50% correction efficiency in cellular models. In oncology, splicing inhibitors targeting dysregulated factors like SF3B1 mutations in cancers such as myelodysplastic syndromes have shown promise in preclinical studies, though clinical development of early candidates like H3B-8800 was discontinued in 2024 due to insufficient efficacy. Ongoing efforts include CLK/DYRK inhibitors like cirtuvivint, which entered phase 2 clinical trials for AML/MDS in 2025 to reprogram alternative splicing and suppress oncogenic isoforms. Despite these gains, splicing-targeted therapies face significant hurdles, including off-target effects where or small molecules inadvertently alter splicing of non-intended transcripts, potentially causing or immune . Nuclear delivery remains a key challenge, as many agents struggle to efficiently cross the in non-dividing cells like neurons, limiting efficacy and necessitating advanced carriers such as lipid nanoparticles or viral vectors. Ongoing research focuses on optimizing chemical modifications to enhance specificity and tissue penetration while minimizing these risks.

Computational modeling and AI predictions

Computational modeling of RNA splicing has evolved from rule-based scoring systems to advanced deep learning architectures, enabling precise predictions of splice site recognition and alternative splicing outcomes directly from genomic sequences. Early tools like MaxEntScan, introduced in , employ the maximum entropy principle to quantify the strength of splice sites by modeling short sequence motifs around donor and acceptor sites, providing numerical scores that correlate with splicing efficiency. These scores have become a standard for assessing potential splice site disruptions caused by genetic variants, though they rely on predefined positional weight matrices and do not capture long-range regulatory elements. A major advancement came with SpliceAI in 2019, a deep trained on human genomic data to predict splice junctions and the effects of variants on splicing from arbitrary pre-mRNA sequences.31629-5) SpliceAI outperforms traditional tools like MaxEntScan by integrating contextual information over thousands of base pairs, achieving high accuracy in identifying de novo splice sites and quantifying changes in inclusion levels for alternative exons. Its predictions have been validated across diverse datasets, including variants of unknown significance, demonstrating superior sensitivity for intronic and exonic perturbations. Recent studies have leveraged large-scale experimental data to refine these models. In 2024, a transcriptome-wide analysis involving systematic knockdown of 305 spliceosome components and regulators in human cancer cells reconstructed splicing regulatory networks, revealing specialized functions of core spliceosomal proteins and their impact on patterns. This has informed network-based models that predict cascading effects of splicing factor perturbations on global transcriptome architecture. By 2025, generative AI approaches have emerged for designing tissue-specific splicing outcomes. The TrASPr+BOS model combines a transformer-based predictor (TrASPr) with to generate sequences that achieve desired splicing patterns in specific tissues, surpassing prior models like SpliceAI in accuracy for unseen conditions and enabling rational design of splice-modulating elements. TrASPr was trained on multi-tissue data, such as from GTEx, to forecast percent spliced in (PSI) values, while BOS optimizes sequence variants for targeted applications. These computational tools extend to practical applications beyond human genomes. For instance, models like Spliceator facilitate splicing predictions in non-model organisms by training on multi-species datasets, aiding genomic in lacking extensive transcriptomic resources. In , AI-driven splicing predictions identify therapeutic targets by simulating how modulators affect splicing networks, as exemplified by platforms like SpliceCore that prioritize splicing factors for small-molecule intervention in diseases like cancer.

Protein splicing

Protein splicing is a post-translational process in which an internal protein segment, known as an intein, excises itself from a precursor protein and ligates the flanking N- and C-terminal regions, called exteins, to form a mature protein. This autoproteolytic reaction occurs without the need for external enzymes or cofactors in many cases, though some inteins require specific conditions like pH or metal ions. Unlike RNA splicing, which processes pre-mRNA at the level, protein splicing operates entirely at the protein level and has been identified in organisms across all three domains of life: , , and Eukarya. The discovery of protein splicing occurred in 1990 when researchers identified an unexpected 50-kDa intervening sequence in the precursor of the 69-kDa subunit of the vacuolar H+-ATPase, encoded by the VMA1 . This intein was found to self-excise, joining the exteins to yield the functional protein, marking the first reported instance of such a mechanism. Subsequent studies confirmed that inteins are capable of invading host genes via homing endonucleases, contributing to their distribution in nature. The mechanism of intein-mediated protein splicing typically proceeds in four coordinated steps involving nucleophilic displacements. In the canonical class 1 pathway, the N-terminal residue of the intein, usually a or serine, performs a nucleophilic attack on the carbonyl carbon at the N-extein-intein junction, forming a or intermediate and releasing the N-extein with a new C-terminal residue. This is followed by a nucleophilic attack from the first residue of the C-extein (often a ) on the intermediate, creating a branched structure. The C-terminal of the intein then cyclizes by attacking its own backbone carbonyl, cleaving the intein and simultaneously facilitating the ligation of the exteins via a native . Variations exist, such as class 2 and 3 mechanisms that differ in the order of steps or reliance on residues for catalysis, but the cyclization remains a conserved feature essential for intein excision. Inteins have become powerful tools in due to their precise and efficient splicing activity. A prominent application is in affinity purification, where inteins fused to a chitin-binding domain enable tagless isolation of recombinant proteins; upon induction with thiols like , the intein self-cleaves, releasing the target protein without additional treatment. Split inteins, which function in trans, allow for segmental protein expression and ligation, facilitating applications such as protein , cyclization for therapeutic peptides, and development by reconstituting functional proteins from separate fragments. These methods have been widely adopted for producing homogeneous proteins that are difficult to express intact, enhancing research in and . In recent years, has been harnessed for to deliver large genes that exceed (AAV) packaging limits by using split-intein systems for protein trans-splicing. As of 2025, SpliceBio's SB-007, an investigational therapy for , received FDA IND clearance in December 2024 and dosed the first patient in the Phase 1/2 ASTRA trial in March 2025, marking the first clinical application of protein splicing technology.

Circular RNA biogenesis via splicing

Circular RNAs (circRNAs) are primarily generated through back-splicing, a non-canonical splicing event in which a downstream 5' splice site covalently joins an upstream 3' splice site on the pre-mRNA, forming a closed-loop devoid of free ends. This process competes with linear splicing and can occur via two main mechanisms: direct back-splicing, where flanking intronic sequences bring the splice sites into proximity through base-pairing, and the lariat intermediate pathway, in which an exon-skipping event produces a lariat precursor that resolves by cleaving the and ligating the exons. The lariat model, supported by detection of exon-containing lariat intermediates in cells, underscores how back-splicing leverages the canonical machinery but in an inverted configuration. Seminal genome-wide studies in human cells revealed thousands of such circRNAs, often enriched in neuronal tissues and associated with repetitive intronic elements that facilitate their formation. Regulation of back-splicing is tightly controlled by cis- and factors that modulate splice site accessibility. Inverted repeat sequences, such as Alu elements in genomes, commonly flank the exons destined for circularization, enabling RNA secondary structure formation that loops out the intervening sequence and promotes spliceosomal recognition of the back-splice junction. Conversely, the double-stranded -binding enzyme ADAR1 acts as a suppressor by catalyzing A-to-I within these paired regions, which disrupts base-pairing stability and favors linear splicing over back-splicing; knockdown of ADAR1 accordingly increases circRNA abundance. For instance, the ELAV promotes neuronal circRNA biogenesis by binding to specific RNA regions and modulating back-splicing decisions, as demonstrated in and mammalian models. This editing-dependent competition highlights a broader interplay between RNA modification and splicing fidelity, with ADAR1's activity particularly pronounced in genes harboring hyper-edited introns. Functionally, many circRNAs serve as miRNA sponges, binding and sequestering miRNAs to derepress their mRNA targets and thereby fine-tune . The circRNA ciRS-7 (circular RNA from CDR1 antisense), a highly conserved example, contains more than 70 binding sites for miR-7 and potently inhibits its activity in the , influencing neuronal development and . Beyond sponging, select circRNAs exhibit coding potential, undergoing translation via cap-independent mechanisms such as internal ribosome entry sites (IRES) or m6A-driven recruitment, yielding functional peptides that modulate cellular processes like proliferation. Beyond these roles, circRNAs are emerging as platforms for due to their stability and efficient antigen expression. As of 2025, advances include circRNA-based cancer with IND approvals by companies like Ribox Pharmaceuticals and CirCode Biotech in late 2024 to early 2025, highlighting their potential in . Advances in 2023 have refined our understanding of co-transcriptional biogenesis models, revealing that PARP1-mediated pausing of enhances back-splicing kinetics by allowing nascent pre-mRNA to adopt conformations favorable for circularization within the transcription bubble. In neurodegeneration, dysregulated circRNAs contribute to ; for example, neuron-specific circRNAs are altered in brains, where they influence synaptic gene networks and exacerbate amyloid-beta toxicity through miRNA sequestration or altered translation. These findings position circRNAs as potential biomarkers and therapeutic targets in disorders like Alzheimer's and Parkinson's.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.