Hubbry Logo
Alternative splicingAlternative splicingMain
Open search
Alternative splicing
Community hub
Alternative splicing
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Alternative splicing
Alternative splicing
from Wikipedia
Alternative splicing produces three protein isoforms. Protein A includes all of the exons, whereas Proteins B and C result from exon skipping.

Alternative splicing, alternative RNA splicing, or differential splicing, is an alternative splicing process during gene expression that allows a single gene to produce different splice variants. For example, some exons of a gene may be included within or excluded from the final RNA product of the gene.[1] This means the exons are joined in different combinations, leading to different splice variants. In the case of protein-coding genes, the proteins translated from these splice variants may contain differences in their amino acid sequence and in their biological functions (see Figure).

Biologically relevant alternative splicing occurs as a normal phenomenon in eukaryotes, where it increases the number of proteins that can be encoded by the genome.[1] In humans, it is widely believed that ~95% of multi-exonic genes are alternatively spliced to produce functional alternative products from the same gene[2] but many scientists believe that most of the observed splice variants are due to splicing errors and the actual number of biologically relevant alternatively spliced genes is much lower.[3][4]

Discovery

[edit]

Alternative splicing was first observed in 1977.[5][6] The adenovirus produces five primary transcripts early in its infectious cycle, prior to viral DNA replication, and an additional one later, after DNA replication begins. The early primary transcripts continue to be produced after DNA replication begins. The additional primary transcript produced late in infection is large and comes from 5/6 of the 32kb adenovirus genome. This is much larger than any of the individual adenovirus mRNAs present in infected cells. Researchers found that the primary RNA transcript produced by adenovirus type 2 in the late phase was spliced in many different ways, resulting in mRNAs encoding different viral proteins. In addition, the primary transcript contained multiple polyadenylation sites, giving different 3' ends for the processed mRNAs.[7][8][9]

In 1981, the first example of alternative splicing in a transcript from a normal, endogenous gene was characterized.[7] The gene encoding the thyroid hormone calcitonin was found to be alternatively spliced in mammalian cells. The primary transcript from this gene contains 6 exons; the calcitonin mRNA contains exons 1–4, and terminates after a polyadenylation site in exon 4. Another mRNA is produced from this pre-mRNA by skipping exon 4, and includes exons 1–3, 5, and 6. It encodes a protein known as CGRP (calcitonin gene related peptide).[10][11] Examples of alternative splicing in immunoglobin gene transcripts in mammals were also observed in the early 1980s.[7][12]

Since then, many other examples of biologically relevant alternative splicing have been found in eukaryotes.[1] The "record-holder" for alternative splicing is a D. melanogaster gene called Dscam, which could potentially have 38,016 splice variants.[13]

In 2021, it was discovered that the genome of adenovirus type 2, the adenovirus in which alternative splicing was first identified, was able to produce a much greater variety of splice variants than previously thought.[14] By using next generation sequencing technology, researchers were able to update the human adenovirus type 2 transcriptome and document the presence of 904 splice variants produced by the virus through a complex pattern of alternative splicing. Very few of these splice variants have been shown to be functional, a point that the authors raise in their paper.

"An outstanding question is what roles the menagerie of novel RNAs play or whether they are spurious molecules generated by an overloaded splicing machinery."[14]

Modes

[edit]
Traditional classification of basic types of alternative RNA splicing events. Exons are represented as blue and yellow blocks, introns as lines in between.
Relative frequencies of types of alternative splicing events differ between humans and fruit flies.[15]

Five basic modes of alternative splicing are generally recognized.[1][2][16][15]

  • Exon skipping or cassette exon: in this case, an exon may be spliced out of the primary transcript or retained. This is the most common mode in mammalian pre-mRNAs.[15]
  • Mutually exclusive exons: One of two exons is retained in mRNAs after splicing, but not both.
  • Alternative donor site: An alternative 5' splice junction (donor site) is used, changing the 3' boundary of the upstream exon.
  • Alternative acceptor site: An alternative 3' splice junction (acceptor site) is used, changing the 5' boundary of the downstream exon.
  • Intron retention: A sequence may be spliced out as an intron or simply retained. This is distinguished from exon skipping because the retained sequence is not flanked by introns. If the retained intron is in the coding region, the intron must encode amino acids in frame with the neighboring exons, or a stop codon or a shift in the reading frame will cause the protein to be non-functional. This is the rarest mode in mammals but the most common in plants.[15][17]

In addition to these primary modes of alternative splicing, there are two other main mechanisms by which different mRNAs may be generated from the same gene; multiple promoters and multiple polyadenylation sites. Use of multiple promoters is properly described as a transcriptional regulation mechanism rather than alternative splicing; by starting transcription at different points, transcripts with different 5'-most exons can be generated. At the other end, multiple polyadenylation sites provide different 3' end points for the transcript. Both of these mechanisms are found in combination with alternative splicing and provide additional variety in mRNAs derived from a gene.[1][16]

Schematic cutoff from 3 splicing structures in the murine hyaluronidase gene. Directionality of transcription from 5' to 3' is shown from left to right. Exons and introns are not drawn to scale.


These modes describe basic splicing mechanisms, but may be inadequate to describe complex splicing events. For instance, the figure to the right shows 3 spliceforms from the mouse hyaluronidase 3 gene. Comparing the exonic structure shown in the first line (green) with the one in the second line (yellow) shows intron retention, whereas the comparison between the second and the third spliceform (yellow vs. blue) exhibits exon skipping. A model nomenclature to uniquely designate all possible splicing patterns has recently been proposed.[15]

Mechanisms

[edit]

General splicing mechanism

[edit]
Spliceosome A complex defines the 5' and 3' ends of the intron before removal[16]

When the pre-mRNA has been transcribed from the DNA, it includes several introns and exons. (In nematodes, the mean is 4–5 exons and introns; in the fruit fly Drosophila there can be more than 100 introns and exons in one transcribed pre-mRNA.) The exons to be retained in the mRNA are determined during the splicing process. The regulation and selection of splice sites are done by trans-acting splicing activator and splicing repressor proteins as well as cis-acting elements within the pre-mRNA itself such as exonic splicing enhancers and exonic splicing silencers.

The typical eukaryotic nuclear intron has consensus sequences defining important regions. Each intron has the sequence GU at its 5' end. Near the 3' end there is a branch site. The nucleotide at the branchpoint is always an A; the consensus around this sequence varies somewhat. In humans the branch site consensus sequence is yUnAy.[18] The branch site is followed by a series of pyrimidines – the polypyrimidine tract – then by AG at the 3' end.[16]

Splicing of mRNA is performed by an RNA and protein complex known as the spliceosome, containing snRNPs designated U1, U2, U4, U5, and U6 (U3 is not involved in mRNA splicing).[19] U1 binds to the 5' GU and U2, with the assistance of the U2AF protein factors, binds to the branchpoint A within the branch site. The complex at this stage is known as the spliceosome A complex. Formation of the A complex is usually the key step in determining the ends of the intron to be spliced out, and defining the ends of the exon to be retained.[16] (The U nomenclature derives from their high uridine content).

The U4,U5,U6 complex binds, and U6 replaces the U1 position. U1 and U4 leave. The remaining complex then performs two transesterification reactions. In the first transesterification, 5' end of the intron is cleaved from the upstream exon and joined to the branch site A by a 2',5'-phosphodiester linkage. In the second transesterification, the 3' end of the intron is cleaved from the downstream exon, and the two exons are joined by a phosphodiester bond. The intron is then released in lariat form and degraded.[1]

Regulatory elements and proteins

[edit]
Splicing repression

Splicing is regulated by trans-acting proteins (repressors and activators) and corresponding cis-acting regulatory sites (silencers and enhancers) on the pre-mRNA. However, as part of the complexity of alternative splicing, it is noted that the effects of a splicing factor are frequently position-dependent. That is, a splicing factor that serves as a splicing activator when bound to an intronic enhancer element may serve as a repressor when bound to its splicing element in the context of an exon, and vice versa.[20] The secondary structure of the pre-mRNA transcript also plays a role in regulating splicing, such as by bringing together splicing elements or by masking a sequence that would otherwise serve as a binding element for a splicing factor.[21][22] Together, these elements form a "splicing code" that governs how splicing will occur under different cellular conditions.[23][24]

There are two major types of cis-acting RNA sequence elements present in pre-mRNAs and they have corresponding trans-acting RNA-binding proteins. Splicing silencers are sites to which splicing repressor proteins bind, reducing the probability that a nearby site will be used as a splice junction. These can be located in the intron itself (intronic splicing silencers, ISS) or in a neighboring exon (exonic splicing silencers, ESS). They vary in sequence, as well as in the types of proteins that bind to them. The majority of splicing repressors are heterogeneous nuclear ribonucleoproteins (hnRNPs) such as hnRNPA1 and polypyrimidine tract binding protein (PTB).[16][23] Splicing enhancers are sites to which splicing activator proteins bind, increasing the probability that a nearby site will be used as a splice junction. These also may occur in the intron (intronic splicing enhancers, ISE) or exon (exonic splicing enhancers, ESE). Most of the activator proteins that bind to ISEs and ESEs are members of the SR protein family. Such proteins contain RNA recognition motifs and arginine and serine-rich (RS) domains.[16][23]

Splicing activation

In general, the determinants of splicing work in an inter-dependent manner that depends on context, so that the rules governing how splicing is regulated form a splicing code.[24] The presence of a particular cis-acting RNA sequence element may increase the probability that a nearby site will be spliced in some cases, but decrease the probability in other cases, depending on context. The context within which regulatory elements act includes cis-acting context that is established by the presence of other RNA sequence features, and trans-acting context that is established by cellular conditions. For example, some cis-acting RNA sequence elements influence splicing only if multiple elements are present in the same region so as to establish context. As another example, a cis-acting element can have opposite effects on splicing, depending on which proteins are expressed in the cell (e.g., neuronal versus non-neuronal PTB). The adaptive significance of splicing silencers and enhancers is attested by studies showing that there is strong selection in human genes against mutations that produce new silencers or disrupt existing enhancers.[25][26]

Examples

[edit]

Exon skipping: Drosophila dsx

[edit]
Alternative splicing of dsx pre-mRNA

Pre-mRNAs from the D. melanogaster gene dsx contain 6 exons. In males, exons 1, 2, 3, 5, and 6 are joined to form the mRNA, which encodes a transcriptional regulatory protein required for male development. In females, exons 1, 2, 3, and 4 are joined, and a polyadenylation signal in exon 4 causes cleavage of the mRNA at that point. The resulting mRNA is a transcriptional regulatory protein required for female development.[27]

This is an example of exon skipping. The intron upstream from exon 4 has a polypyrimidine tract that doesn't match the consensus sequence well, so that U2AF proteins bind poorly to it without assistance from splicing activators. This 3' splice acceptor site is therefore not used in males. Females, however, produce the splicing activator Transformer (Tra) (see below). The SR protein Tra2 is produced in both sexes and binds to an ESE in exon 4; if Tra is present, it binds to Tra2 and, along with another SR protein, forms a complex that assists U2AF proteins in binding to the weak polypyrimidine tract. U2 is recruited to the associated branchpoint, and this leads to inclusion of exon 4 in the mRNA.[27][28]

Alternative acceptor sites: Drosophila Transformer

[edit]
Alternative splicing of the Drosophila Transformer gene product.

Pre-mRNAs of the Transformer (Tra) gene of Drosophila melanogaster undergo alternative splicing via the alternative acceptor site mode. The gene Tra encodes a protein that is expressed only in females. The primary transcript of this gene contains an intron with two possible acceptor sites. In males, the upstream acceptor site is used. This causes a longer version of exon 2 to be included in the processed transcript, including an early stop codon. The resulting mRNA encodes a truncated protein product that is inactive. Females produce the master sex determination protein Sex lethal (Sxl). The Sxl protein is a splicing repressor that binds to an ISS in the RNA of the Tra transcript near the upstream acceptor site, preventing U2AF protein from binding to the polypyrimidine tract. This prevents the use of this junction, shifting the spliceosome binding to the downstream acceptor site. Splicing at this point bypasses the stop codon, which is excised as part of the intron. The resulting mRNA encodes an active Tra protein, which itself is a regulator of alternative splicing of other sex-related genes (see dsx above).[1]

Exon definition: Fas receptor

[edit]
Alternative splicing of the Fas receptor pre-mRNA

Multiple isoforms of the Fas receptor protein are produced by alternative splicing. Two normally occurring isoforms in humans are produced by an exon-skipping mechanism. An mRNA including exon 6 encodes the membrane-bound form of the Fas receptor, which promotes apoptosis, or programmed cell death. Increased expression of Fas receptor in skin cells chronically exposed to the sun, and absence of expression in skin cancer cells, suggests that this mechanism may be important in elimination of pre-cancerous cells in humans.[29] If exon 6 is skipped, the resulting mRNA encodes a soluble Fas protein that does not promote apoptosis. The inclusion or skipping of the exon depends on two antagonistic proteins, TIA-1 and polypyrimidine tract-binding protein (PTB).

  • The 5' donor site in the intron downstream from exon 6 in the pre-mRNA has a weak agreement with the consensus sequence, and is not bound usually by the U1 snRNP. If U1 does not bind, the exon is skipped (see "a" in accompanying figure).
  • Binding of TIA-1 protein to an intronic splicing enhancer site stabilizes binding of the U1 snRNP.[16] The resulting 5' donor site complex assists in binding of the splicing factor U2AF to the 3' splice site upstream of the exon, through a mechanism that is not yet known (see b).[30]
  • Exon 6 contains a pyrimidine-rich exonic splicing silencer, ure6, where PTB can bind. If PTB binds, it inhibits the effect of the 5' donor complex on the binding of U2AF to the acceptor site, resulting in exon skipping (see c).

This mechanism is an example of exon definition in splicing. A spliceosome assembles on an intron, and the snRNP subunits fold the RNA so that the 5' and 3' ends of the intron are joined. However, recently studied examples such as this one show that there are also interactions between the ends of the exon. In this particular case, these exon definition interactions are necessary to allow the binding of core splicing factors prior to assembly of the spliceosomes on the two flanking introns.[30]

Repressor-activator competition: HIV-1 tat exon 2

[edit]
Alternative splicing of HIV-1 tat exon 2

HIV, the retrovirus that causes AIDS in humans, produces a single primary RNA transcript, which is alternatively spliced in multiple ways to produce over 40 different mRNAs.[31] Equilibrium among differentially spliced transcripts provides multiple mRNAs encoding different products that are required for viral multiplication.[32] One of the differentially spliced transcripts contains the tat gene, in which exon 2 is a cassette exon that may be skipped or included. The inclusion of tat exon 2 in the RNA is regulated by competition between the splicing repressor hnRNP A1 and the SR protein SC35. Within exon 2 an exonic splicing silencer sequence (ESS) and an exonic splicing enhancer sequence (ESE) overlap. If A1 repressor protein binds to the ESS, it initiates cooperative binding of multiple A1 molecules, extending into the 5' donor site upstream of exon 2 and preventing the binding of the core splicing factor U2AF35 to the polypyrimidine tract. If SC35 binds to the ESE, it prevents A1 binding and maintains the 5' donor site in an accessible state for assembly of the spliceosome. Competition between the activator and repressor ensures that both mRNA types (with and without exon 2) are produced.[31]

Adaptive significance

[edit]

Genuine alternative splicing occurs in both protein-coding genes and non-coding genes to produce multiple products (proteins or non-coding RNAs). External information is needed in order to decide which product is made, given a DNA sequence and the initial transcript. Since the methods of regulation are inherited, this provides novel ways for mutations to affect gene expression.[33]

Alternative splicing may provide evolutionary flexibility. A single point mutation may cause a given exon to be occasionally excluded or included from a transcript during splicing, allowing production of a new protein isoform without loss of the original protein.[1] Studies have identified intrinsically disordered regions (see Intrinsically unstructured proteins) as enriched in the non-constitutive exons[34] suggesting that protein isoforms may display functional diversity due to the alteration of functional modules within these regions. Such functional diversity achieved by isoforms is reflected by their expression patterns and can be predicted by machine learning approaches.[35][36] Comparative studies indicate that alternative splicing preceded multicellularity in evolution, and suggest that this mechanism might have been co-opted to assist in the development of multicellular organisms.[37]

Research based on the Human Genome Project and other genome sequencing has shown that humans have only about 30% more genes than the roundworm Caenorhabditis elegans, and only about twice as many as the fly Drosophila melanogaster. This finding led to speculation that the perceived greater complexity of humans, or vertebrates generally, might be due to higher rates of alternative splicing in humans than are found in invertebrates.[38][39] However, a study on samples of 100,000 expressed sequence tags (EST) each from human, mouse, rat, cow, fly (D. melanogaster), worm (C. elegans), and the plant Arabidopsis thaliana found no large differences in frequency of alternatively spliced genes among humans and any of the other animals tested.[40] Another study, however, proposed that these results were an artifact of the different numbers of ESTs available for the various organisms. When they compared alternative splicing frequencies in random subsets of genes from each organism, the authors concluded that vertebrates do have higher rates of alternative splicing than invertebrates.[41]

Disease

[edit]

Changes in the RNA processing machinery may lead to mis-splicing of multiple transcripts, while single-nucleotide alterations in splice sites or cis-acting splicing regulatory sites may lead to differences in splicing of a single gene, and thus in the mRNA produced from a mutant gene's transcripts. A study in 2005 involving probabilistic analyses indicated that greater than 60% of human disease-causing mutations affect splicing rather than directly affecting coding sequences.[42] A more recent study indicates that one-third of all hereditary diseases are likely to have a splicing component.[20] Regardless of exact percentage, a number of splicing-related diseases do exist.[43] As described below, a prominent example of splicing-related diseases is cancer.

Abnormally spliced mRNAs are also found in a high proportion of cancerous cells.[44][45][46] Combined RNA-Seq and proteomics analyses have revealed striking differential expression of splice isoforms of key proteins in important cancer pathways.[47] It is not always clear whether such aberrant patterns of splicing contribute to the cancerous growth, or are merely consequence of cellular abnormalities associated with cancer. For certain types of cancer, like in colorectal and prostate, the number of splicing errors per cancer has been shown to vary greatly between individual cancers, a phenomenon referred to as transcriptome instability.[48][49] Transcriptome instability has further been shown to correlate grealty with reduced expression level of splicing factor genes. Mutation of DNMT3A has been demonstrated to contribute to hematologic malignancies, and that DNMT3A-mutated cell lines exhibit transcriptome instability as compared to their isogenic wildtype counterparts.[50]

In fact, there is actually a reduction of alternative splicing in cancerous cells compared to normal ones, and the types of splicing differ; for instance, cancerous cells show higher levels of intron retention than normal cells, but lower levels of exon skipping.[51] Some of the differences in splicing in cancerous cells may be due to the high frequency of somatic mutations in splicing factor genes,[46] and some may result from changes in phosphorylation of trans-acting splicing factors.[33] Others may be produced by changes in the relative amounts of splicing factors produced; for instance, breast cancer cells have been shown to have increased levels of the splicing factor SF2/ASF.[52] One study found that a relatively small percentage (383 out of over 26000) of alternative splicing variants were significantly higher in frequency in tumor cells than normal cells, suggesting that there is a limited set of genes which, when mis-spliced, contribute to tumor development.[53] It is believed however that the deleterious effects of mis-spliced transcripts are usually safeguarded and eliminated by a cellular posttranscriptional quality control mechanism termed nonsense-mediated mRNA decay [NMD].[54]

One example of a specific splicing variant associated with cancers is in one of the human DNMT genes. Three DNMT genes encode enzymes that add methyl groups to DNA, a modification that often has regulatory effects. Several abnormally spliced DNMT3B mRNAs are found in tumors and cancer cell lines. In two separate studies, expression of two of these abnormally spliced mRNAs in mammalian cells caused changes in the DNA methylation patterns in those cells. Cells with one of the abnormal mRNAs also grew twice as fast as control cells, indicating a direct contribution to tumor development by this product.[33]

Another example is the Ron (MST1R) proto-oncogene. An important property of cancerous cells is their ability to move and invade normal tissue. Production of an abnormally spliced transcript of Ron has been found to be associated with increased levels of the SF2/ASF in breast cancer cells. The abnormal isoform of the Ron protein encoded by this mRNA leads to cell motility.[52]

Overexpression of a truncated splice variant of the FOSB gene – ΔFosB – in a specific population of neurons in the nucleus accumbens has been identified as the causal mechanism involved in the induction and maintenance of an addiction to drugs and natural rewards.[55][56][57][58]

Recent provocative studies point to a key function of chromatin structure and histone modifications in alternative splicing regulation. These insights suggest that epigenetic regulation determines not only what parts of the genome are expressed but also how they are spliced.[59]

Genome-scale (transcriptome-wide) analysis

[edit]

Transcriptome-wide analysis of alternative splicing is typically performed by high-throughput RNA-sequencing. Most commonly, by short-read sequencing, such as by Illumina instrumentation. But even more informative, by long-read sequencing, such as by Nanopore or PacBio instrumentation. Transcriptome-wide analyses can for example be used to measure the amount of deviating alternative splicing, such as in a cancer cohort.[60]

Deep sequencing technologies have been used to conduct genome-wide analyses of both unprocessed and processed mRNAs; thus providing insights into alternative splicing. For example, results from use of deep sequencing indicate that, in humans, an estimated 95% of transcripts from multiexon genes undergo alternative splicing, with a number of pre-mRNA transcripts spliced in a tissue-specific manner.[2] Functional genomics and computational approaches based on multiple instance learning have also been developed to integrate RNA-seq data to predict functions for alternatively spliced isoforms.[36] Deep sequencing has also aided in the in vivo detection of the transient lariats that are released during splicing, the determination of branch site sequences, and the large-scale mapping of branchpoints in human pre-mRNA transcripts.[61]

More historically, alternatively spliced transcripts have been found by comparing EST sequences, but this requires sequencing of very large numbers of ESTs. Most EST libraries come from a very limited number of tissues, so tissue-specific splice variants are likely to be missed in any case. High-throughput approaches to investigate splicing have, however, been developed, such as: DNA microarray-based analyses, RNA-binding assays, and deep sequencing. These methods can be used to screen for polymorphisms or mutations in or around splicing elements that affect protein binding. When combined with splicing assays, including in vivo reporter gene assays, the functional effects of polymorphisms or mutations on the splicing of pre-mRNA transcripts can then be analyzed.[20][23][62]

In microarray analysis, arrays of DNA fragments representing individual exons (e.g. Affymetrix exon microarray) or exon/exon boundaries (e.g. arrays from ExonHit or Jivan) have been used. The array is then probed with labeled cDNA from tissues of interest. The probe cDNAs bind to DNA from the exons that are included in mRNAs in their tissue of origin, or to DNA from the boundary where two exons have been joined. This can reveal the presence of particular alternatively spliced mRNAs.[63]

CLIP (Cross-linking and immunoprecipitation) uses UV radiation to link proteins to RNA molecules in a tissue during splicing. A trans-acting splicing regulatory protein of interest is then precipitated using specific antibodies. When the RNA attached to that protein is isolated and cloned, it reveals the target sequences for that protein.[64] Another method for identifying RNA-binding proteins and mapping their binding to pre-mRNA transcripts is "Microarray Evaluation of Genomic Aptamers by shift (MEGAshift)".net[65] This method involves an adaptation of the "Systematic Evolution of Ligands by Exponential Enrichment (SELEX)" method[66] together with a microarray-based readout. Use of the MEGAshift method has provided insights into the regulation of alternative splicing by allowing for the identification of sequences in pre-mRNA transcripts surrounding alternatively spliced exons that mediate binding to different splicing factors, such as ASF/SF2 and PTB.[67] This approach has also been used to aid in determining the relationship between RNA secondary structure and the binding of splicing factors.[22]

Use of reporter assays makes it possible to find the splicing proteins involved in a specific alternative splicing event by constructing reporter genes that will express one of two different fluorescent proteins depending on the splicing reaction that occurs. This method has been used to isolate mutants affecting splicing and thus to identify novel splicing regulatory proteins inactivated in those mutants.[64]

Recent advancements in protein structure prediction have facilitated the development of new tools for genome annotation and alternative splicing analysis. For instance, isoform.io, a platform guided by protein structure predictions, has evaluated hundreds of thousands of isoforms of human protein-coding genes assembled from numerous RNA sequencing experiments across a variety of human tissues. This comprehensive analysis has led to the identification of numerous isoforms with more confidently predicted structure and potentially superior function compared to canonical isoforms in the latest human gene database. By integrating structural predictions with expression and evolutionary evidence, this approach has demonstrated the potential of protein structure prediction as a tool for refining the annotation of the human genome.[68]

Databases

[edit]

There is a collection of alternative splicing databases.[69][70][71] These databases are useful for finding genes having pre-mRNAs undergoing alternative splicing and alternative splicing events or to study the functional impact of alternative splicing.

See also

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Alternative splicing is a key post-transcriptional regulatory mechanism in eukaryotic cells that allows a single precursor (pre-mRNA) to be processed into multiple distinct mature mRNA isoforms through the selective inclusion or exclusion of exons, or the use of alternative splice sites. This process dramatically expands the proteome's diversity from a limited , with over 95% of multi-exon estimated to undergo alternative splicing, enabling the production of hundreds or even thousands of protein variants from one , such as in the Drosophila Dscam that generates up to 38,000 isoforms. By facilitating tissue-specific expression, , and adaptive responses, alternative splicing underpins the complexity of multicellular organisms despite their relatively modest number of approximately 20,000 protein-coding . The discovery of alternative splicing emerged from groundbreaking studies on RNA processing in the 1970s, building on the initial identification of introns in adenoviruses. In 1977, researchers including Susan Berget, Phillip Sharp, and Louise Chow reported the first evidence of alternative splicing in the adenovirus genome, demonstrating that a single pre-mRNA could be spliced in multiple ways to yield distinct mRNA products. This finding was soon extended to cellular genes, such as the immunoglobulin M (IgM) gene in 1980, highlighting its role in immune diversity. In 1978, Walter Gilbert formalized the "introns-early" hypothesis, proposing that genes are composed of exons and introns to allow combinatorial splicing and increase informational capacity beyond the linear DNA sequence. At its core, alternative splicing is executed by the , a dynamic ribonucleoprotein complex assembled from five small nuclear ribonucleoproteins (snRNPs: U1, , U4, U5, and U6) and numerous protein factors, which identifies conserved splice site sequences (typically GU at the 5' end and AG at the 3' end of introns) and removes introns through two sequential reactions that form a lariat intermediate. The main types of alternative splicing events include cassette exon inclusion/exclusion (the most common, accounting for about 30% of events), alternative 5' or 3' splice sites, mutually exclusive exons, and intron retention, each contributing to isoform variability. While the majority of splicing follows the U2-type pathway, a minor U12-type pathway handles AT-AC introns, adding further nuance to the process. Regulation of alternative splicing is tightly controlled to ensure specificity and efficiency, involving cis-acting elements—such as exonic splicing enhancers (ESEs) and silencers (ESSs), and their intronic counterparts (ISEs and ISSs)—that modulate splice site recognition. These elements interact with trans-acting factors, including serine/arginine-rich ( that promote splicing and heterogeneous nuclear ribonucleoproteins (hnRNPs) that repress it, whose activities are influenced by cellular signaling, transcription elongation rates, modifications, and secondary structure. For instance, co-transcriptional splicing couples the process to nascent production, allowing real-time adjustments based on promoter usage or marks. Dysfunctions in these regulators, such as mutations in spliceosomal components like SF3B1 or SRSF2, can lead to aberrant splicing patterns. Biologically, alternative splicing is indispensable for development, , and , exemplified by its role in diversity, such as alternative splicing of immunoglobulin genes to produce both membrane-bound receptors and secreted antibodies. It also drives neuronal complexity, muscle function, and differentiation, with tissue-specific isoforms enabling functional specialization. However, aberrant alternative splicing is a hallmark of many diseases; for example, it contributes to oncogenesis in over 30 cancer types via upregulated splicing events observed in large cohorts like (TCGA), and to neurodegenerative conditions like (ALS) through mutations in RNA-binding proteins. Recent advances, including high-throughput sequencing and CRISPR-based editing, have illuminated these pathways, paving the way for spliceosome-targeted therapies, such as small-molecule modulators in clinical trials for myelodysplastic syndromes and leukemias.

History and Discovery

Early Observations

In the and early , studies on eukaryotic cells revealed the existence of heterogeneous nuclear RNA (hnRNA), a class of rapidly labeled, short-lived transcripts that were significantly larger and more heterogeneous in size than the mature messenger RNAs (mRNAs) found in the . These observations, made through techniques like pulse-labeling and sedimentation analysis, suggested that hnRNA served as precursors to mRNA but required extensive to remove non-coding sequences, challenging the prevailing view of direct, continuous transcription from to mRNA. Researchers such as James Darnell and Sheldon Penman noted that hnRNA molecules could be up to 10 times larger than their cytoplasmic counterparts, hinting at an unknown mechanism for trimming or rearranging RNA segments. This puzzle intensified with investigations into viral transcripts, particularly from adenoviruses, where discrepancies between genomic DNA and mRNA sequences became evident. In 1977, independent experiments by Phillip A. Sharp's group at the Massachusetts Institute of Technology and Richard J. Roberts's team at Cold Spring Harbor Laboratory used electron microscopy to visualize RNA:DNA hybrids, revealing "loops" of unpaired DNA corresponding to intervening sequences (later termed introns) that were absent from the mature mRNA. Sharp's team, including Susan M. Berget and Claire Moore, mapped spliced segments at the 5' terminus of adenovirus 2 late mRNA, showing that a single mRNA molecule aligned with multiple discontinuous DNA regions. Similarly, Roberts's collaborators, including Louise T. Chow, Thomas R. Broker, and Robert E. Gelinas, produced detailed maps of cytoplasmic adenovirus type 2 transcripts, confirming the split gene structure through R-loop formations observed under the electron microscope. These findings shattered the pre-1977 dogma of colinearity, which posited that genes were continuous DNA segments directly transcribed into uninterrupted mRNA, as established by earlier bacterial studies and the one-gene-one-polypeptide hypothesis. The discovery of split genes in adenovirus demonstrated that eukaryotic genes contained non-coding introns that were excised during RNA processing, ushering in a paradigm shift toward understanding RNA splicing as a fundamental step in gene expression. For their pioneering work, Sharp and Roberts shared the 1993 Nobel Prize in Physiology or Medicine, recognizing the profound implications for eukaryotic genome organization.

Key Milestones and Pioneers

The discovery of introns and RNA splicing in 1977 by Susan Berget, Claire Moore, and Phillip A. Sharp marked a foundational milestone in understanding eukaryotic gene expression, as their work on adenovirus transcripts revealed non-contiguous gene segments joined during mRNA maturation. This breakthrough, shared with Richard Roberts' independent findings, earned Sharp and Roberts the 1993 Nobel Prize in Physiology or Medicine for demonstrating split genes. Concurrently, Joan A. Steitz's identification of small nuclear ribonucleoproteins (snRNPs) in the late 1970s and her 1980 proposal of their role in pre-mRNA splicing laid the groundwork for elucidating the splicing machinery. In the 1980s, the first alternative splicing events were identified in cellular genes, including the immunoglobulin μ heavy chain locus, where alternative polyadenylation and splicing generate membrane-bound and secreted IgM isoforms from a single pre-mRNA. Similarly, alternative splicing in the gene was documented, producing cell-type-specific isoforms through inclusion or exclusion of extra type III domains (EDA and EDB) and the IIICS region, highlighting splicing's role in protein diversity. Parallel efforts characterized components; Walter Keller's group developed the first in vitro splicing system using cell extracts in 1983, enabling mechanistic studies of splice site recognition and catalysis. The 1990s and 2000s saw technological advances that unveiled alternative splicing's prevalence across the genome. (RT-PCR), developed in the late 1980s, facilitated isoform-specific detection and quantification, revealing widespread splicing variation in development and disease. technologies in the mid-2000s enabled genome-wide profiling, estimating that over 90% of human multi-exon genes undergo alternative splicing. Tom Maniatis pioneered studies of regulatory networks, demonstrating in 1992 how splicing enhancers and silencers, bound by and hnRNP factors, control exon inclusion in a combinatorial manner. Post-2010 milestones include long-read sequencing technologies like (PacBio) SMRT and Oxford Nanopore, which capture full-length transcripts to resolve complex isoforms unresolved by short-read methods. For instance, PacBio sequencing in 2015 identified thousands of novel isoforms in cell lines by spanning entire genes without assembly artifacts. In the 2020s, integration with single-cell sequencing has produced splicing atlases, such as long-read single-nucleus analyses of tissues, mapping cell-type-specific isoform usage and dynamic splicing changes across states like development and stress; for example, the 2025 development of SCSES for deciphering splicing heterogeneity at single-cell resolution.

Fundamentals of RNA Splicing

Core Splicing Machinery

The core splicing machinery consists of the spliceosome, a large ribonucleoprotein complex that catalyzes the removal of introns from pre-mRNA in eukaryotic cells through two transesterification reactions. This machinery is composed primarily of five small nuclear ribonucleoproteins (snRNPs)—U1, U2, U4, U5, and U6—each containing a uridine-rich small nuclear RNA (snRNA) associated with Sm core proteins (B/B', D1, D2, D3, E, F, G) and additional specific proteins, along with numerous non-snRNP factors such as U2AF, SF1/mBBP, and DEAH-box helicases like Prp2, Prp16, and Prp22. These components enable the precise recognition of splice sites and the execution of splicing chemistry, which is essential for constitutive intron excision and serves as the foundation for alternative splicing variations. Spliceosome assembly occurs in a stepwise, ATP-dependent manner on the pre-mRNA, forming distinct intermediates: the commitment (E) complex, prespliceosome (A) complex, pre-catalytic (B) complex, activated (B*) complex, and postspliceosomal (C) complex. In the E complex, U1 snRNP binds the 5' splice site via base-pairing between U1 snRNA and the consensus sequence MAG|GURAGU (where M = A/C, R = A/G, and | denotes the exon-intron boundary), while SF1/mBBP and U2AF65/35 associate with the branch point sequence (BPS) YNYURAC and the polypyrimidine tract upstream of the 3' splice site, respectively. The transition to the A complex involves ATP-dependent activity of Prp28, which displaces U1 and allows U2 snRNP to bind the BPS through base-pairing with U2 snRNA, positioning the bulged branch point adenosine for nucleophilic attack. Recruitment of the U4/U6·U5 tri-snRNP, facilitated by Prp5 and Prp28 , forms the B complex, which contains over 100 proteins and prepares the for activation. The splicing reaction proceeds in two catalytic steps within the activated spliceosome. First, the 2'-OH of the branch point adenosine attacks the 5' splice site, cleaving the 5' exon and forming a lariat intron-3' exon intermediate; this adheres to the near-invariant GU-AG rule, where introns typically begin with GU (GT in DNA) at the 5' end and end with AG at the 3' end. The second step involves the 3'-OH of the freed 5' exon attacking the 3' splice site (consensus YAG|G), ligating the exons and releasing the lariat intron. These transesterifications are powered by conformational rearrangements driven by ATP-dependent DEAH-box helicases, such as Prp2 for B to B* activation, Prp16 for the first step completion, and Prp22 for the second step and disassembly. Catalysis occurs in a magnesium-dependent active site coordinated by U6 snRNA and Prp8 protein, where two Mg²⁺ ions stabilize the transition states by neutralizing negative charges on the RNA backbone during phosphoryl transfer. Alternative splice site usage represents a deviation from this core process by competing recognition elements.

Recognition of Splice Sites

The recognition of splice sites in pre-mRNA is a critical initial step in splicing, relying on specific sequence motifs at the 5' and 3' boundaries of introns, as well as auxiliary elements that modulate accuracy. The 5' splice site (5' SS) is typically defined by a that spans the exon-intron junction, often represented as MAG|GURAGU in mammals, where the intronic GU dinucleotide is nearly invariant and essential for recognition. This GU is preceded by an exonic (commonly A or G) and followed by additional intronic residues that contribute to specificity. The 3' splice site (3' SS), located at the intron-exon boundary, features a conserved AG dinucleotide immediately upstream of the , preceded by a polypyrimidine tract (Py tract) consisting of multiple U and C residues, which enhances binding affinity. These motifs are degenerate, allowing variability while maintaining core functionality, and their identification initiates recruitment of splicing factors. Central to 3' SS recognition is the branch point sequence (BPS), a conserved motif located 18-40 upstream of the 3' SS in mammalian introns, with the consensus YNYURAC (Y = , R = , N = any ), where the bulged serves as the in the first step. The BPS is bound specifically by splicing factor 1 (SF1, also known as mammalian branch point binding protein or mBBP), which recognizes the single-stranded RNA via its KH domain and protects approximately seven centered on the . SF1/mBBP binding is cooperative with U2AF65, a subunit of the U2 auxiliary factor that interacts with the adjacent Py tract, increasing affinity up to 20-fold and stabilizing the E complex during early assembly. Additionally, exonic and intronic splicing enhancers (ESEs/ISEs) and silencers (ESSs/ISSs) provide contextual modulation, promoting or inhibiting splice site usage through interactions with regulatory proteins, though their effects are secondary to core motif recognition. At the 5' SS, initial molecular recognition occurs via base-pairing between the pre-mRNA and the 5' end of U1 (snRNA) within the U1 , forming an duplex that spans up to 11 across the junction, with the GU pairing to positions 3-5 of U1 snRNA for stability. This interaction is dynamic and can accommodate atypical sites through shifted registers, ensuring broad yet specific targeting. The 3' SS elements, including the BPS and Py tract, are similarly scanned by U2 snRNP components post-U1 binding, bridging the two sites. These recognition events precede full spliceosome assembly. Splice site consensus sequences exhibit evolutionary conservation across eukaryotes, though with varying degeneracy; the GU-AG rule is nearly universal for U2-type introns, while BPS motifs show greater divergence in non-vertebrates, such as CURAY in versus YNYURAC in metazoans. Comparative analyses across 30 eukaryotic species reveal shared two-site statistical patterns in 5' SS, indicating ancient origins, but with clade-specific adaptations that influence splicing fidelity. This conservation underscores the fundamental role of these motifs in eukaryotic .

Types of Alternative Splicing

Exon Skipping

Exon skipping, also known as cassette exon exclusion, is a fundamental mode of alternative splicing wherein one or more internal s are omitted from the (mRNA), resulting in the direct ligation of the upstream and downstream exons by the . This process generates protein isoforms that lack the encoded sequences, often leading to truncated or structurally altered proteins without the intervening exon content. In contrast to constitutive splicing, where all exons are included, exon skipping allows for the production of diverse mRNA variants from a single pre-mRNA transcript, enhancing proteomic complexity in eukaryotic cells. The mechanistic basis of exon skipping involves the spliceosome's failure to assemble across the boundaries of the target , typically due to suboptimal recognition of its 5' or 3' splice sites, which are weaker than consensus sequences and thus less efficiently bound by core splicing factors like U1 and small nuclear ribonucleoproteins (snRNPs). This inefficiency promotes the spliceosome to bridge the splice sites of the adjacent exons, effectively bypassing the intervening sequence. Additionally, cis-acting regulatory elements, such as exonic or intronic splicing silencers (ESS or ISS), can recruit repressive trans-acting factors like heterogeneous nuclear ribonucleoproteins (hnRNPs) to further inhibit exon inclusion by blocking enhancer-mediated activation or directly interfering with splice site accessibility.01384-6) Exon skipping represents the predominant form of alternative splicing in humans, comprising approximately 40% of all documented alternative splicing events across multi-exon genes, far outpacing other modes like alternative splice site usage or intron retention. This prevalence underscores its role as a primary mechanism for generating isoform diversity, with high-throughput sequencing studies revealing tens of thousands of such events in the human transcriptome, many of which are tissue-specific or developmentally regulated. The abundance of exon skipping is particularly pronounced in vertebrates, where it facilitates adaptive responses to physiological demands without requiring genomic duplication. Functionally, often results in the deletion of specific protein domains or motifs, yielding shorter isoforms that exhibit modified localization, stability, interactions, or enzymatic activities compared to their full-length counterparts. For instance, the exclusion of an encoding a regulatory domain can shift a protein's conformational state or abolish binding to partners, thereby altering downstream signaling or structural roles within cellular complexes. Such outcomes enable fine-tuned protein functions tailored to distinct contexts, though they may also introduce frameshifts leading to premature termination codons and potential if not balanced by inclusion variants. Overall, these isoform variations contribute to cellular plasticity and functional specialization across tissues.00643-9.pdf)

Alternative Splice Site Usage

Alternative splice site usage refers to the process where multiple potential 5' donor or 3' acceptor sites within or near an are recognized during pre-mRNA splicing, leading to the production of transcript isoforms with varying exon boundary definitions. This type of alternative splicing includes alternative 5' splice sites, where different donor sites upstream of an are selected, and alternative 3' splice sites, where varying acceptor sites downstream are chosen; cassette exons can emerge as a variant when site selection effectively includes or excludes partial segments. These events allow for fine-tuned regulation of isoform diversity without altering entire s. The mechanism underlying alternative splice site usage primarily involves competition among proximal splice sites for recognition by the , with selection influenced by the intrinsic strength of each site. Site strength is determined by the degree of base-pairing complementarity between the splice site sequence and the U1 snRNA for 5' donors or U2 snRNA for 3' acceptors, where stronger matches promote higher binding affinity and preferential usage. For instance, a 5' splice site with extended complementarity to U1 snRNA's 5' end enhances early assembly, outcompeting weaker nearby sites and directing isoform production. Regulatory factors can further modulate this competition by altering site accessibility or snRNP binding efficiency. In humans, alternative 5' and 3' splice site usage each accounts for approximately 25% of all alternative splicing events, making this a prevalent mode that contributes significantly to complexity across multi-exon . These events often result in isoforms with altered coding sequences, such as insertions or deletions of a few , or variable untranslated regions (UTRs) that impact mRNA stability, localization, or efficiency. For example, alternative 3' in the calcitonin generates isoforms with distinct C-terminal extensions affecting protein function. Many isoforms arising from alternative splice site usage introduce premature termination codons (PTCs), particularly when out-of-frame shifts occur, rendering them substrates for (NMD) to prevent accumulation of truncated proteins. Up to one-third of human alternative splicing events, including those from site usage, couple to NMD, thereby regulating steady-state mRNA levels and providing a quality-control mechanism. This AS-NMD linkage is widespread, with evidence from genomic analyses showing that a substantial portion of alternative site-selected transcripts are degraded post-transcriptionally.

Intron Retention and Mutually Exclusive Exons

Intron retention is a form of alternative splicing in which one or more are not removed from the pre-mRNA and remain in the mature mRNA transcript. This retention often introduces premature stop codons that trigger (NMD), thereby reducing the levels of the corresponding protein, or it can result in the production of novel protein isoforms if the retained intron does not disrupt the or evade NMD. Intron retention is particularly prevalent in , where it constitutes the most common alternative splicing event, accounting for 28% to 64% of such events depending on growth conditions and species. In mammals, it is less dominant but widespread, affecting transcripts from over 80% of coding genes across tissues and representing approximately 15% of alternative splicing events in certain contexts like neurodevelopment or stress. It frequently emerges as a regulatory response to environmental stresses in both and mammals, modulating by delaying or fine-tuning protein production. Mutually exclusive exons represent another specialized mode of alternative splicing, where only one exon from a pair (or cluster) of adjacent, cassette-like exons is included in the mature mRNA, while the other is excluded, effectively swapping structurally similar protein domains. This pattern arises from evolutionary exon duplication events and ensures strict reciprocity in exon choice through mechanisms such as steric hindrance due to minimal spacing between splice sites (often around 50 ), which prevents the from simultaneously accommodating both exons. Additional enforcement occurs via incompatible splice site pairings or regulatory factors that block one pathway, as exemplified in the human alpha-tropomyosin gene, where exons 2 and 3 are mutually exclusive due to an unusually positioned lariat only 42 nucleotides from the donor site of exon 2, inhibiting direct splicing between the two exons and favoring exon 3 inclusion in most cell types. This mechanism implies a role in constitutive splicing constraints, where proximity to the , rather than the acceptor site, dictates exclusion. Together, intron retention and mutually exclusive exons contribute to roughly 15-20% of alternative splicing events in mammals, with intron retention being more abundant (around 10-15%) and mutually exclusive exons rarer (less than 5%, often in pairs comprising 80% of cases). In functional terms, retention notably enhances diversity in immune-related genes, such as during differentiation, where it orchestrates the coordinated downregulation of 86 genes—including those involved in nuclear morphology like Lmnb1—through NMD-sensitive transcripts, thereby promoting cell-type-specific adjustments and functional specialization. This process is conserved across humans and mice, highlighting its regulatory precision in immune development.

Regulatory Mechanisms

Cis-Acting Regulatory Elements

Cis-acting regulatory elements are sequence motifs within pre-mRNA that modulate splice site selection during alternative splicing, functioning as intrinsic signals independent of diffusible factors. These elements include enhancers that promote exon inclusion and silencers that repress it, often located in exons or introns near splice sites. Their activity influences the balance between competing splice sites, contributing to isoform diversity. Exonic splicing enhancers (ESEs) are purine-rich motifs, typically 6–8 long, embedded within exons that facilitate recognition of adjacent weak splice sites. These sequences, such as those responsive to , stabilize interactions at the exon-intron boundaries. Intronic splicing enhancers (ISEs), found within introns, similarly boost splicing efficiency by aiding recruitment to nearby sites. For instance, ISEs in the calcitonin/CGRP pre-mRNA promote inclusion through cooperative effects with exonic elements. In contrast, exonic splicing silencers (ESSs) and intronic splicing silencers (ISSs) inhibit splicing by masking splice sites or competing with enhancers. ESSs, often bound by heterogeneous nuclear ribonucleoproteins (hnRNPs), suppress exon definition, as seen in the where an ESS promotes skipping of exon 6, producing a soluble isoform that inhibits by acting as a receptor. ISSs within introns can loop out exons, reducing their inclusion, such as in the of the variable exons. Mechanistically, enhancers like ESEs promote splicing by facilitating the recruitment of U2AF to the 3' splice site, enhancing polypyrimidine tract recognition. Silencers, conversely, may inhibit U1 binding to the 5' splice site, blocking early assembly. These effects can be modulated by RNA secondary structures, where stable stem-loops as short as 7 base pairs sequester enhancers, abolishing their activity, or expose silencers, as observed in the EDA exon. Identification of these elements relies on computational tools that scan sequences for conserved motifs. ESEfinder, for example, uses position weight matrices derived from selection experiments to predict ESEs responsive to specific , scoring potential sites above thresholds (e.g., 1.956 for SF2/ASF) and highlighting mutation impacts. Similar algorithms extend to ISEs, ISSs, and ESSs, aiding in the annotation of splicing regulatory landscapes.

Trans-Acting Splicing Factors

Trans-acting splicing factors are soluble proteins and protein complexes that bind to cis-acting RNA elements on pre-mRNA to modulate splice site selection and promote or repress specific alternative splicing events. These factors include activator families like and repressor families like heterogeneous nuclear ribonucleoproteins (hnRNPs), as well as spliceosome-associated components such as U2 auxiliary factor (U2AF) and the SF3b complex. By interacting with regulatory sequences, they influence the recruitment and assembly of the , thereby determining exon inclusion, skipping, or other splicing patterns. SR proteins function as key activators of splicing, binding primarily to exonic splicing enhancers (ESEs) through their RNA recognition motifs (RRMs) and recruiting spliceosomal components via their serine/arginine-rich (RS) domains. They promote exon inclusion by bridging U1 snRNP to the 5' splice site and U2AF to the 3' splice site, thereby enhancing spliceosome assembly on weak or regulated splice sites. Representative members include SRSF1, SRSF2, and SRSF6, whose combinatorial actions fine-tune alternative splicing outcomes. Heterogeneous nuclear ribonucleoproteins (hnRNPs) typically act as repressors, binding to intronic splicing silencers (ISSs) or exonic splicing silencers (ESSs) to inhibit splice site recognition and inclusion. For example, PTB (a member of the hnRNP family) forms multimers that loop out exons or block U2AF binding, competing directly with for access to overlapping motifs. Other hnRNPs, such as hnRNPA1, can also promote splicing when bound to intronic enhancers, highlighting their context-dependent roles in alternative splicing . Spliceosome-associated trans-acting factors like U2AF and SF3b play essential roles in early splice site recognition. The U2AF heterodimer, comprising U2AF65 and U2AF35, facilitates 3' splice site selection: U2AF65 binds the polypyrimidine tract upstream of the AG dinucleotide, while U2AF35 interacts with the AG itself, cooperatively recruiting U2 snRNP to commit the site for splicing. The SF3b complex, integrated into the U2 snRNP, recognizes and stabilizes the sequence through interactions involving SF3b1, SF3b6, and SF3b7 subunits, enabling proper spliceosome activation and influencing alternative 3' splice site choices. The activities of trans-acting splicing factors are tightly regulated by cellular dynamics, including their concentration, post-translational modifications, and expression profiles. Variations in factor concentration, such as higher levels of favoring inclusion, can shift splicing equilibria towards specific isoforms. , particularly of SRSF1 by kinases like SRPK1 and CLK1, modulates its RNA-binding affinity, subcellular localization, and interactions during progression, thereby linking splicing to temporal cellular states. Tissue-specific expression of factors like hnRNPs and further contributes to the generation of diverse splicing programs across cell types.

Illustrative Examples

Invertebrate Development: Drosophila dsx and tra

In Drosophila melanogaster, sex determination is governed by a cascade of genes where alternative splicing plays a central role in producing sex-specific proteins that direct somatic sexual differentiation. The master regulator, Sex-lethal (Sxl), is expressed in both sexes but undergoes female-specific alternative splicing to produce a functional protein only in females, where the X chromosome-to-autosome ratio is 1.0. This female Sxl protein maintains its own expression through positive autoregulation by blocking the use of a male-specific exon in its pre-mRNA, ensuring persistent female-specific splicing and preventing reversion to the male state. Downstream of Sxl, the (tra) gene exemplifies alternative 3' splice site selection regulated by Sxl. In s, Sxl protein binds to uridine-rich sequences near the non-sex-specific (male) 3' splice site in the tra pre-mRNA, repressing its use and promoting the distal female-specific 3' splice site. This results in an mRNA that lacks a premature , producing a functional Tra protein essential for development. In males, lacking functional Sxl, the default proximal splice site is used, incorporating a stop codon that yields a non-functional, truncated Tra protein. The (dsx) gene, further downstream, illustrates female-specific inclusion orchestrated by Tra in conjunction with Transformer-2 (Tra2). In females, Tra and Tra2 form a complex that enhances recognition of a weak female-specific 3' splice site upstream of 4, leading to its inclusion after 1-3 and subsequent splicing to 6, producing a functional Dsx^F protein that represses male traits and promotes female differentiation in somatic tissues. In males, without functional Tra, the default splicing skips 4 and includes the male-specific 5, followed by the common 6, yielding the male-specific Dsx^M protein, which represses female traits and activates male-specific . These opposing Dsx isoforms thus coordinate in structures like the genitalia and behaviors. This splicing-based sex determination cascade in exhibits evolutionary conservation in its downstream effectors, particularly the DM domain-containing transcription factors like Dsx, which have orthologs such as MAB-3 in that similarly implement sex-specific gene regulation despite differences in upstream signals.

Vertebrate Apoptosis: Fas Receptor

The , encoded by the FAS gene, undergoes alternative splicing that plays a critical role in regulating in vertebrates, particularly within the . In the Fas pre-mRNA, the inclusion or exclusion of 6 determines the functional isoform produced. During T-cell , the polypyrimidine tract-binding protein (PTB) represses the inclusion of 6 by binding to an upstream regulatory element (URE6), an exonic splicing silencer, thereby promoting . This PTB-mediated repression is prominent in activated T-cells, where it shifts splicing toward the exclusion of 6 to modulate apoptotic responses. Two primary isoforms arise from this splicing event: the membrane-bound , which includes 6 and encodes a that triggers upon ligand binding, and the soluble Fas isoform, generated by skipping 6, which lacks this domain and circulates extracellularly. The soluble form exerts an anti-apoptotic effect by acting as a decoy receptor, sequestering and inhibiting the pro-apoptotic signaling of the membrane-bound isoform. This isoform switch allows immune cells to fine-tune , preventing excessive cell death during immune responses. The mechanism underlying exon 6 regulation follows the exon definition model, where the weak 3' splice site of 6 necessitates coordinated recognition across the for efficient inclusion. PTB binding to URE6 disrupts U2AF recruitment to this weak 3' site, impairing snRNP association and favoring skipping, while antagonistic factors like TIA-1 can promote inclusion by enhancing U1 snRNP binding at the 5' splice site. Dysregulation of this splicing event has implications for autoimmune diseases, such as systemic lupus erythematosus (SLE), where splicing is skewed toward the soluble isoform, leading to elevated soluble Fas levels that impair of autoreactive lymphocytes. In SLE patients, the soluble-to-membrane Fas mRNA ratio is significantly higher than in healthy individuals, correlating with disease activity and contributing to immune dysregulation.

Viral Gene Regulation: HIV-1 tat

In human immunodeficiency virus type 1 (HIV-1), alternative splicing is essential for due to the virus's compact 9.2 kb , which encodes approximately 15 proteins through the production of over 30 distinct mRNA isoforms from a single primary transcript driven by one promoter. This strategy maximizes coding capacity by utilizing four 5' splice donors and eight 3' splice acceptors, enabling the virus to generate regulatory proteins like Tat and Rev from overlapping genomic regions. The of splicing at the tat locus exemplifies how the virus exploits host splicing machinery to balance protein production across its lifecycle. The Tat protein, critical for viral transcription elongation, exists in two main isoforms differing by the inclusion of a second . The multi-exon (two-exon) isoform incorporates a 86-nucleotide 2 (from splice donor D4 to acceptor A7), encoding the full 101-amino-acid protein with the C-terminal domain necessary for optimal activity via TAR RNA binding and of cellular factors like CDK9. In contrast, the single-exon isoform (using D1 to A3) produces a truncated Tat72 lacking this domain, resulting in reduced efficiency and limited support. This alternative splicing pattern overlaps with Rev production, where Rev mRNA utilizes D4 to A6 for its second , sharing sequence elements but diverging at the acceptor site choice, thus competitively allocating transcripts between full-length Tat and Rev. Splicing of tat exon 2 is tightly regulated by competitive interactions between cis-acting elements and factors within the host splicing machinery. An exonic splicing silencer (ESS) in tat exon 2 binds heterogeneous nuclear ribonucleoprotein A1 (hnRNP A1), which represses splice site recognition by blocking U2 snRNP association and inhibiting upstream donor usage, thereby reducing exon 2 inclusion. Counteracting this, an exonic splicing enhancer (ESE) in the same exon recruits serine/arginine-rich () proteins such as ASF/SF2 (also known as SRSF1), which promote exon definition by stabilizing U2AF65 binding to the 3' splice site and facilitating U2 snRNP recruitment. This repressor-activator competition fine-tunes splicing efficiency, with hnRNP A1 dominance favoring Rev transcripts over multi-exon Tat, while ASF/SF2 shifts the balance toward Tat production. Additionally, an intron splicing silencer (ISS) upstream of A7 further enhances hnRNP A1-mediated inhibition by overlapping the , additively suppressing splicing when combined with the ESS. This regulatory balance integrates with nuclear export pathways and the viral lifecycle. Early in infection, when Rev levels are low, multi-exon Tat and Rev mRNAs—being fully spliced—are exported via the cellular NXF1/TAP pathway to produce initial regulatory proteins. As Rev accumulates, it binds the Rev-responsive element (RRE) on unspliced and partially spliced transcripts, redirecting export through CRM1 to enable structural protein production (, Pol, ). Splicing efficiency at tat 2 thus modulates the Tat/Rev ratio, ensuring early for genome expression while transitioning to late-phase replication; disruptions in this equilibrium, such as ESS mutations, impair viral propagation. Such dependence on host factors highlights HIV-1's adaptation of alternative splicing for temporal control in a constrained .

Evolutionary and Functional Roles

Adaptive Benefits

Alternative splicing significantly expands the functional diversity of the from a limited number of genes, allowing organisms to generate multiple protein isoforms without requiring . The encodes approximately 20,000 protein-coding genes, yet alternative splicing produces over distinct transcripts, with up to 95% of multi-exon genes undergoing this process to yield diverse isoforms that perform specialized roles. This mechanism enhances proteomic complexity, enabling fine-tuned cellular responses that contribute to organismal adaptability. Tissue-specific alternative splicing further exemplifies these adaptive advantages by producing isoforms tailored to the physiological demands of distinct cell types, thereby optimizing function without altering the underlying . For instance, in neurons and muscle cells, voltage-gated calcium channels exhibit splice variants that differ in their biophysical properties, such as activation thresholds and inactivation kinetics, which are crucial for precise in excitable tissues. These variants, including those in the CACNA1C gene encoding Cav1.2 channels, promote efficient calcium handling in for contraction while supporting synaptic transmission in neurons. Alternative splicing also facilitates rapid adaptation to environmental stresses, shifting isoform ratios to enhance survival under adverse conditions like hypoxia or heat shock. In hypoxic environments, such as those encountered in tumors or high-altitude settings, splicing changes in genes like those involved in and produce pro-survival isoforms that mitigate oxygen deprivation effects. Similarly, during heat shock, alternative splicing adjusts transcripts of stress-response factors, such as heat shock proteins, to stabilize cellular and prevent , thereby improving thermal tolerance and viability. These dynamic shifts allow cells to rewire their proteome swiftly in response to transient stressors. Quantitative models of splicing treat isoform ratios as tunable parameters that directly influence fitness landscapes, where optimal splicing efficiencies balance functional diversity against the costs of erroneous splicing. In evolutionary terms, these ratios are shaped by selection pressures to maximize adaptive outcomes, with deviations from optimal proportions incurring fitness penalties through reduced protein functionality or wasteful . Such models underscore how alternative splicing acts as a regulatory dial, fine-tuning expression to environmental cues while maintaining evolutionary conservation of core splicing machinery across species.

Evolutionary Conservation and Diversity

Alternative splicing is absent in prokaryotes, which lack and the spliceosomal machinery necessary for intron removal, but it emerged as an ancient feature in the last eukaryotic common ancestor (LECA), where spliceosomal first appeared and enabled basic splicing processes. This origin is tied to the evolution of the , a complex of snRNPs and proteins that recognizes conserved splice sites, with early splicing errors likely contributing to the initial diversification of transcripts in unicellular eukaryotes. In metazoans, alternative splicing expanded dramatically, coinciding with increased intron density and multicellularity, allowing for greater regulatory complexity and diversity beyond what alone could provide. Core splice site sequences, such as the GU-AG dinucleotides at exon-intron boundaries and motifs, are universally conserved across eukaryotes, reflecting their essential role in accurate splicing and the deep evolutionary roots of the . Alternative splicing events themselves show substantial conservation, particularly among vertebrates; for instance, studies comparing and transcriptomes indicate that a majority—approximately 60%—of cassette events and other alternative isoforms are preserved between these species, underscoring the functional importance of many AS patterns. This conservation is higher for tissue-specific events in complex organs like the , where selective pressures maintain isoform diversity for specialized functions. The diversity of alternative splicing varies markedly across eukaryotic lineages, with unicellular organisms like yeast (Saccharomyces cerevisiae) exhibiting low complexity, where only about 5% of genes produce alternatively spliced transcripts, primarily through rare retention or site usage. In contrast, mammals display high AS prevalence, with over 90% of multi-exon genes undergoing alternative splicing, generating multiple isoforms per gene and contributing to phenotypic complexity in tissues such as the and . This gradient reflects evolutionary expansions in intron number and spliceosomal components, with metazoans like arthropods and showing intermediate levels. Evolutionary drivers of alternative splicing include intronic mutations that create novel splice sites, such as point changes introducing cryptic GU or AG sequences, which can activate new exons or shift inclusion levels without disrupting core coding frames. These cis-regulatory mutations often arise neutrally and become fixed if they confer adaptive isoform variants, as seen in the emergence of tissue-specific splicing patterns. Exon shuffling, facilitated by recombination events at intron borders or insertions, further promotes diversity by rearranging modular exons, enabling the assembly of novel protein domains in multidomain genes during metazoan evolution. Such mechanisms have been pivotal in expanding functional repertoires, particularly in and signaling proteins.

Implications in Disease

Oncogenic Splicing Aberrations

Alternative splicing aberrations play a pivotal role in oncogenesis by generating isoforms that promote tumor cell , proliferation, , and resistance to . These dysregulation events often arise from genetic mutations, altered expression of splicing factors, or disruptions in cis-regulatory elements, leading to the production of oncogenic protein variants. In a large proportion of cancers, such aberrations contribute to key hallmarks of , including evasion of and enhanced metastatic potential. Mutations in core components of the , particularly the U2-type SF3B1, are among the most frequent splicing-related alterations in cancer, occurring in up to 20% of myelodysplastic syndromes (MDS) and (AML) cases. These hotspot mutations, often at lysine 700 (K700E), impair recognition during spliceosome assembly, resulting in widespread aberrant splicing of target genes involved in DNA damage response, epigenetic regulation, and . For instance, SF3B1 mutations lead to inclusion of cryptic exons in genes like MAP3K7, promoting leukemic transformation through altered signaling pathways. Overexpression of trans-acting splicing factors, such as SRSF1 (also known as ASF/SF2), is a common mechanism in solid tumors, exemplified by its upregulation in breast cancers, where it drives oncogenic splicing programs. SRSF1 overexpression shifts splicing toward pro-tumorigenic isoforms, including those enhancing cell migration and survival, by binding to exonic splicing enhancers and promoting exon inclusion in targets like BIN1 and S6K1. Similarly, core splice site mutations, such as those creating or disrupting 5' or 3' splice sites, directly alter isoform ratios in proto-oncogenes; for example, mutations in the TP53 gene's splice sites generate dominant-negative isoforms that impair tumor suppression. A prominent example of oncogenic splicing is the skipping of exon 11 in the RON receptor (MST1R), yielding the ΔRON isoform that confers invasive and metastatic properties to epithelial cells. This skipping event, upregulated in colorectal, , and tumors due to enhanced activity of silencers like hnRNP A1 or SF2/ASF, activates downstream pathways such as Rac1/PAK1, promoting epithelial-mesenchymal transition without stimulation. Another critical case involves the BCL2L1 gene, where alternative splicing favors the anti-apoptotic isoform over the pro-apoptotic Bcl-xS in over 60% of analyzed tumors, inhibiting mitochondrial outer membrane permeabilization and conferring resistance to . These aberrations are therapeutically targetable, as demonstrated by spliceostatin A, a natural compound that binds SF3B1 and inhibits function, selectively killing SF3B1-mutant MDS cells by inducing aberrant splicing of survival genes like and MCL1. H3B-8800 was investigated in clinical trials for splicing factor-mutant myeloid malignancies but development was discontinued in 2024 due to lack of efficacy. As of 2025, research continues into novel spliceosome-targeted therapies to exploit cancer-specific vulnerabilities.

Neurological and Other Disorders

Alternative splicing plays a critical role in function, with over 90% of multi-exon genes undergoing alternative splicing events that contribute to neuronal diversity and . Dysregulation of these processes is implicated in various neurological disorders, often through mechanisms such as sequestration of splicing factors by toxic RNA expansions or loss of nuclear function in RNA-binding proteins, leading to aberrant exon inclusion or skipping. In spinal muscular atrophy (SMA), a neurodegenerative disease characterized by motor neuron loss, the survival motor neuron 2 (SMN2) gene undergoes inefficient splicing of exon 7 due to a weak 3' splice site created by a C-to-T mutation, resulting in frequent exon skipping and production of truncated, unstable SMN protein. This skipping reduces functional SMN protein levels, essential for snRNP biogenesis and splicing fidelity, exacerbating motor neuron degeneration. Therapeutic intervention with nusinersen, an antisense oligonucleotide, targets an intronic splicing silencer in SMN2 intron 7, blocking the binding of repressive factors like hnRNP A1/A2 and promoting exon 7 inclusion to restore SMN protein production. Clinical trials have demonstrated that nusinersen increases full-length SMN2 transcripts and improves motor function in SMA patients. Amyotrophic lateral sclerosis (ALS), a , frequently involves misregulation of (TDP-43), which normally represses cryptic exons in target transcripts. In ALS, cytoplasmic mislocalization and depletion of nuclear TDP-43 lead to derepression and inclusion of a cryptic exon in stathmin-2 (STMN2) pre-mRNA, generating non-functional transcripts that reduce STMN2 protein levels critical for axonal stability and regeneration. This cryptic exon inclusion is observed in ALS patient brains and spinal cords, correlating with TDP-43 pathology and contributing to neurodegeneration. Similar TDP-43-dependent cryptic splicing events occur in other ALS/FTD-related genes, highlighting a broader mechanism of splicing disruption. Myotonic dystrophy type 1 (DM1), a multisystem disorder affecting muscle and brain, arises from CTG trinucleotide expansions in the DMPK 3' UTR, producing toxic CUG-repeat that sequesters muscleblind-like (MBNL) splicing factors into nuclear foci. MBNL sequestration disrupts its regulatory role in alternative splicing, leading to aberrant inclusion or exclusion of exons in targets like CLCN1 (causing ) and fetal-like splicing patterns in neuronal genes, contributing to cognitive deficits. This toxic mechanism parallels factor sequestration in other disorders but manifests in widespread splicing misregulation across muscle and brain tissues. Alternative splicing aberrations in synaptic genes are also linked to autism spectrum disorders (ASD), where dysregulation affects neuronal connectivity and synaptic transmission. Genes such as neurexin-1 (NRXN1), neuroligin-3 (NLGN3), and SHANK3 exhibit altered splicing patterns in ASD brains, with specific isoforms influencing trans-synaptic adhesion and excitatory/inhibitory balance. For instance, mutations affecting splicing in NLGN4X disrupt synaptic function, and RNA-binding proteins like RBFOX1, often mutated in ASD, regulate splicing of hundreds of synaptic targets, with their loss leading to network-wide dysregulation. These changes underscore how splicing defects in ASD converge on synaptic organization, distinct from oncogenic splicing but sharing therapeutic potential in splice-modulating approaches.

Analysis and Computational Approaches

Transcriptome-Wide Profiling

Transcriptome-wide profiling of alternative splicing (AS) has revolutionized the study of isoform diversity by enabling the detection and quantification of splicing events across thousands of genes simultaneously. High-throughput RNA sequencing (RNA-seq) techniques, particularly short-read RNA-seq, serve as the cornerstone for identifying AS events through the mapping of junction-spanning reads that skip exons or include alternative junctions. This approach allows for the genome-wide annotation of splicing patterns, revealing tissue-specific and condition-dependent regulation, as demonstrated in early deep-sequencing studies of human transcriptomes. A key metric in short-read RNA-seq profiling is the percent spliced in (PSI or Ψ), which quantifies the inclusion level of a specific exon or cassette in the mature mRNA population by comparing the density of inclusion versus exclusion junction reads. PSI values range from 0 (complete exclusion) to 1 (complete inclusion) and provide a robust measure for comparing splicing efficiency across samples. For differential AS analysis, tools like replicate multivariate analysis of transcript splicing (rMATS) employ statistical models to identify significant changes in PSI between conditions, accounting for biological replicates and multiple testing, thereby facilitating the discovery of regulated events in large-scale datasets. Long-read sequencing technologies, such as (PacBio) and (ONT), complement short-read methods by enabling full-length isoform assembly without fragmentation, which is particularly advantageous for resolving complex splicing patterns involving multiple variable exons or overlapping transcripts that confound short-read alignment. These platforms capture entire transcripts in single reads, allowing de novo reconstruction of isoforms and precise quantification of low-abundance variants that are often missed in short-read data. Recent advances extend transcriptome-wide profiling to finer resolutions, including single-cell (scRNA-seq), which uncovers cell-type-specific AS heterogeneity by analyzing splicing in individual cells, revealing dynamic isoform usage that underlies cellular identity and responses. Time-series further elucidates temporal dynamics of AS, such as during development or stress, by modeling isoform abundance changes over multiple time points to detect coordinated splicing shifts. These approaches can inform downstream predictive modeling for AS regulation. However, challenges persist, including alignment biases from repetitive genomic regions or multi-mapping reads, which can skew junction detection, and the under-detection of low-expression isoforms due to insufficient read depth or stochastic noise in sparse data.

Predictive Modeling and Tools

Predictive modeling of alternative splicing relies on computational algorithms that analyze genomic sequences to forecast splice site usage, exon inclusion, and isoform production. Early approaches employed techniques, such as support vector machines (SVMs), to identify splicing regulatory motifs like exonic splicing enhancers (ESEs) and silencers (ESSs) by training on sequence features that promote or inhibit recognition. These methods extract motifs from pre-mRNA contexts and classify potential regulatory elements with accuracies approaching 90% in motif detection tasks. Thermodynamic models complement sequence-based predictions by simulating RNA secondary structure folding, which influences splice site accessibility and alternative isoform formation. These models compute free energy minima for RNA conformations, integrating base-pairing probabilities to predict how structural motifs near splice junctions affect splicing efficiency. For instance, thermodynamic frameworks have been applied to donor splice site recognition, achieving high specificity in identifying functional sites across eukaryotic genes. Such approaches unify evolutionary conservation with folding energetics to refine predictions of splice site strength. A seminal tool for splice site strength prediction is MaxEntScan, which uses maximum entropy modeling to score consensus sequences at 5' and 3' splice sites based on positional frequencies and dinucleotide dependencies. Developed in 2004, it quantifies the likelihood of site usage, with scores typically ranging from 0 to 12 for strong sites, and has become a standard for evaluating variant impacts on splicing. MaxEntScan is integrated into plugins for the Ensembl Variant Effect Predictor (VEP), enabling rapid assessment of how single variants alter splice site potentials in genomic datasets. Deep learning has advanced predictive capabilities, particularly with SpliceAI, a 32-layer convolutional neural network trained on human genomic data to predict splice junctions and cryptic site activation from raw pre-mRNA sequences up to 10 kb in length. Released in 2019, SpliceAI incorporates sequence context over long ranges, outperforming traditional tools in detecting variant-induced splicing aberrations, with an area under the receiver operating characteristic curve (AUROC) exceeding 0.95 for acceptor and donor site prediction.31629-5) In variant effect prediction, it identifies ~75% of cryptic splice variants validated by RNA-seq, making it valuable for clinical genomics applications like prioritizing pathogenic mutations in rare diseases. These models are routinely applied in pipelines for variant interpretation, such as VEP integrations that flag splicing disruptions in data, aiding prioritization of non-coding variants with potential regulatory effects. Overall accuracies for predicting common alternative splicing events, like or inclusion, hover around 80-90% when benchmarked against reporter assays, though performance varies by event type and tissue context. Transcriptome-wide profiling data from sources like GTEx serve as gold standards for validating these predictions. Post-2020 advancements leverage transformer architectures to address gaps in tissue-specific and isoform-level predictions. SpliceTransformer, introduced in 2024, employs self-attention mechanisms to model long-range dependencies in splicing patterns across 49 human tissues, achieving superior AUROC scores (up to 0.98) for tissue-aware splice site usage compared to prior models. For isoform simulation, generative transformer-based approaches like TrASPr enable de novo design of sequences with desired splicing outcomes, capturing probabilistic distributions of alternative isoforms with state-of-the-art accuracy on GTEx datasets.

Databases and Resources

Major Splicing Databases

Ensembl serves as a primary repository for alternative splicing data through its integration of GENCODE annotations, which provide comprehensive catalogs of protein-coding and non-coding isoforms for and genomes. As of release 112 (October 2024), GENCODE annotations (version 47) emphasize the diversity of alternative splicing events, including , alternative donors and acceptors, and retention, with detailed tracks in Ensembl displaying isoform structures and splicing patterns for over 20,000 genes. These resources enable researchers to explore tissue-specific and developmental splicing variations, supported by evidence from short- and long-read sequencing alignments. The UCSC Genome Browser offers visualization-focused resources for alternative splicing, featuring tracks that depict splice junctions derived from aligned mRNA, ESTs, and gene predictions. Its alternative splicing track (altGraphX) summarizes exon-intron structures and potential isoform variants in current assemblies like hg38 and hg39, while custom tracks allow users to upload and overlay personal datasets, such as RNA-seq-derived junctions, for comparative analysis. Post-2020 updates have incorporated long-read sequencing data, including Capture Long-Seq tracks for full-length transcripts in human (hg38) and mouse (mm10) assemblies, enhancing the resolution of complex splicing events like mutually exclusive exons. ASTALAVISTA provides a specialized catalog of alternative splicing events extracted from genomic annotations across multiple species, classifying patterns into standardized types such as tandem cassettes, alternative positions, and complex events using a coordinate-based . Originally developed for custom datasets, it processes annotations from sources like Ensembl to generate comprehensive event inventories, facilitating cross-species comparisons of splicing and . The core tool relies on short-read alignments, but it has been applied in studies post-2020 integrating long-read-derived transcriptomes to improve detection of rare isoforms in non-model organisms. Additional resources include the Alternative Splicing and Transcript Diversity (ASTD) database, which integrates data on alternative transcripts, including transcription initiation, , and splicing variants across human tissues as of its 2025 update. These databases form the backbone for computational tools in alternative splicing analysis, supplying annotated data for predictive modeling and event quantification.

Annotation and Visualization Tools

Annotation and visualization tools for alternative splicing facilitate the quantification, differential , and graphical representation of splicing events derived from data, often leveraging outputs from major splicing databases for contextual annotation. These tools enable researchers to identify isoform diversity, detect regulatory changes, and explore spatial aspects of splice sites without requiring extensive manual curation. SUPPA is a suite of software designed for rapid quantification of alternative splicing events, computing percent spliced-in () values across seven event types from transcript abundance estimates produced by aligners like . The original SUPPA tool emphasizes speed by avoiding read realignment, processing large datasets in minutes, while SUPPA2 extends this with and differential splicing using beta-binomial models for improved accuracy in comparing conditions. SUPPA's supports isoform switching detection by tracking changes in profiles across samples, aiding in the identification of context-specific splicing regulation. The Integrative Genomics Viewer (IGV) serves as a high-performance platform for visualizing alignments, highlighting splice junctions as arcs connecting exons to reveal alternative splicing patterns, such as or mutually exclusive exons, in real-time interactive sessions. Users can overlay junction tracks with annotations to inspect read coverage and evidence of novel isoforms, making it invaluable for validating predictions from quantification tools. IGV's ability to handle BAM files directly allows for zooming into specific loci, where split reads illustrate retention or alternative donor/acceptor usage. LeafCutter provides an annotation-independent approach to splicing analysis by clustering into excision sets based on shared splice junctions from reads, then deriving intron excision ratios as proxies for splicing efficiency. This method excels in detecting differential splicing in large cohorts, such as GTEx tissues, by focusing on local intron clusters rather than full transcripts, reducing bias from incompleteness. LeafCutter's output, including per-cluster PSI-like metrics, supports downstream network analysis of splicing factor-target interactions when combined with regulatory motif data. Advanced features in these tools include automated isoform switching detection, where shifts in dominant isoforms are flagged via PSI deltas exceeding thresholds (e.g., >0.1), and integration with graph-based methods for visualizing splicing factor networks, such as those linking to target exons. Emerging web-based platforms like SpliceWiz enable interactive R-based exploration of splicing landscapes, generating coverage plots, sashimi-style junction visualizations, and enrichments for differentially spliced events. For three-dimensional structure visualization tied to splicing, tools like RNAfold extensions in web interfaces model secondary structures of alternatively spliced isoforms to predict functional impacts on folding. Post-2023 AI-integrated tools, such as TrASPr, employ generative models to annotate and predict tissue-specific splicing outcomes from sequence data, achieving high fidelity in simulating isoform probabilities via . Splam, another recent AI-driven annotator, pinpoints splice sites with precision surpassing traditional tools by training on diverse datasets to distinguish canonical from cryptic junctions. These AI approaches enhance visualization by generating probabilistic heatmaps of splice site usage. Most annotation and visualization tools, including SUPPA, IGV, LeafCutter, and SpliceWiz, are open-source under licenses like GPL, promoting widespread , whereas proprietary options like some commercial genome browsers offer enhanced support but limit customization. Integration with workflows streamlines pipelines, allowing seamless chaining of SUPPA quantification with IGV-like previews and LeafCutter clustering within a reproducible, web-accessible environment.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.