Hubbry Logo
IntronIntronMain
Open search
Intron
Community hub
Intron
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Intron
Intron
from Wikipedia

An intron is any nucleotide sequence within a gene that is not expressed or operative in the final RNA product. The word intron is derived from the term intragenic region, i.e., a region inside a gene.[1] The term intron refers to both the DNA sequence within a gene and the corresponding RNA sequence in RNA transcripts.[2] The non-intron sequences that become joined by this RNA processing to form the mature RNA are called exons.[3]

Introns are found in the genes of most eukaryotes and many eukaryotic viruses, and they can be located in both protein-coding genes and genes that function as RNA (noncoding genes). There are four main types of introns: tRNA introns, group I introns, group II introns, and spliceosomal introns (see below). Introns are rare in Bacteria and Archaea (prokaryotes).

Discovery and etymology

[edit]

Introns were first discovered in protein-coding genes of adenovirus,[4][5] and were subsequently identified in genes encoding transfer RNA and ribosomal RNA genes. Introns are now known to occur within a wide variety of genes throughout organisms, bacteria,[6] and viruses within all of the biological kingdoms.

The fact that genes were split or interrupted by introns was discovered independently in a number of labs in 1977 including those run by Phillip Allen Sharp and Richard J. Roberts, for which they shared the Nobel Prize in Physiology or Medicine in 1993,[7] Other labs that contributed to the discovery were those of Louise Chow and Thomas Broker. Much of the work in the Sharp lab was done by postdoctoral fellow Susan Berget.[8][9]

The term intron was introduced by American biochemist Walter Gilbert:[1]

"The notion of the cistron [i.e., gene] ... must be replaced by that of a transcription unit containing regions which will be lost from the mature messenger – which I suggest we call introns (for intragenic regions) – alternating with regions which will be expressed – exons." (Gilbert 1978)

The term intron also refers to intracistron, i.e., an additional piece of DNA that arises within a cistron.[10]

Although introns are sometimes called intervening sequences,[11] the term "intervening sequence" can refer to any of several families of internal nucleic acid sequences that are not present in the final gene product, including inteins, untranslated regions (UTR), and nucleotides removed by RNA editing, in addition to introns.

Distribution

[edit]

The frequency of introns within different genomes is observed to vary widely across the spectrum of biological organisms. For example, introns are extremely common within the nuclear genome of jawed vertebrates (e.g. humans, mice, and pufferfish (fugu)), where protein-coding genes almost always contain multiple introns, while introns are rare within the nuclear genes of some eukaryotic microorganisms,[12] for example baker's/brewer's yeast (Saccharomyces cerevisiae). In contrast, the mitochondrial genomes of vertebrates are entirely devoid of introns, while those of eukaryotic microorganisms may contain many introns.[13]

Simple illustration of an unspliced mRNA precursor, with two introns and three exons (top). After the introns have been removed via splicing, the mature mRNA sequence is ready for translation (bottom).

A particularly extreme case is the Drosophila DhDhc7 gene containing a ≥3.6 megabase (Mb) intron, which takes roughly three days to transcribe.[14][15] On the other extreme, a 2015 study suggests that the shortest known metazoan intron length is 30 base pairs (bp) belonging to the human MST1L gene.[16] The shortest known introns belong to the heterotrich ciliates, such as Stentor coeruleus, in which most (> 95%) introns are 15 or 16 bp long.[17]

Classification

[edit]

Splicing of all intron-containing RNA molecules is superficially similar, as described above. However, different types of introns were identified through the examination of intron structure by DNA sequence analysis, together with genetic and biochemical analysis of RNA splicing reactions. At least four distinct classes of introns have been identified:

Group III introns are proposed to be a fifth family, but little is known about the biochemical apparatus that mediates their splicing. They appear to be related to group II introns, and possibly to spliceosomal introns.[18]

Spliceosomal introns

[edit]

Nuclear pre-mRNA introns (spliceosomal introns) are characterized by specific intron sequences located at the boundaries between introns and exons.[19] These sequences are recognized by spliceosomal RNA molecules when the splicing reactions are initiated.[20] In addition, they contain a branch point, a particular nucleotide sequence near the 3' end of the intron that becomes covalently linked to the 5' end of the intron during the splicing process, generating a branched (lariat)[clarification needed (complicated jargon)] intron. Apart from these three short conserved elements, nuclear pre-mRNA intron sequences are highly variable. Nuclear pre-mRNA introns are often much longer than their surrounding exons.

tRNA introns

[edit]

Transfer RNA introns that depend upon proteins for removal occur at a specific location within the anticodon loop of unspliced tRNA precursors, and are removed by a tRNA splicing endonuclease. The exons are then linked together by a second protein, the tRNA splicing ligase.[21] Note that self-splicing introns are also sometimes found within tRNA genes.[22]

Group I and group II introns

[edit]

Group I and group II introns are found in genes encoding proteins (messenger RNA), transfer RNA and ribosomal RNA in a very wide range of living organisms.[23][24] Following transcription into RNA, group I and group II introns also make extensive internal interactions that allow them to fold into a specific, complex three-dimensional architecture. These complex architectures allow some group I and group II introns to be self-splicing, that is, the intron-containing RNA molecule can rearrange its own covalent structure so as to precisely remove the intron and link the exons together in the correct order. In some cases, particular intron-binding proteins are involved in splicing, acting in such a way that they assist the intron in folding into the three-dimensional structure that is necessary for self-splicing activity. Group I and group II introns are distinguished by different sets of internal conserved sequences and folded structures, and by the fact that splicing of RNA molecules containing group II introns generates branched introns (like those of spliceosomal RNAs), while group I introns use a non-encoded guanosine nucleotide (typically GTP) to initiate splicing, adding it on to the 5'-end of the excised intron.

On the accuracy of splicing

[edit]

The spliceosome is a very complex structure containing up to one hundred proteins and five different RNAs. The substrate of the reaction is a long RNA molecule, and the transesterification reactions catalyzed by the spliceosome require the bringing together of sites that may be thousands of nucleotides apart.[25][26] All biochemical reactions are associated with known error rates – and the more complicated the reaction, the higher the error rate. Therefore, it is not surprising that the splicing reaction catalyzed by the spliceosome has a significant error rate even though there are spliceosome accessory factors that suppress the accidental cleavage of cryptic splice sites.[27]

Under ideal circumstances, the splicing reaction is likely to be 99.999% accurate (error rate of 10−5) and the correct exons will be joined and the correct intron will be deleted.[28] However, these ideal conditions require very close matches to the best splice site sequences and the absence of any competing cryptic splice site sequences within the introns, and those conditions are rarely met in large eukaryotic genes that may cover more than 40 kilobase pairs. Recent studies have shown that the actual error rate can be considerably higher than 10−5 and may be as high as 2% or 3% errors (error rate of 2 or 3×10−2) per gene.[29][30][31] Additional studies suggest that the error rate is no less than 0.1% per intron.[32][33] This relatively high level of splicing errors explains why most splice variants are rapidly degraded by nonsense-mediated decay.[34][35]

The presence of sloppy binding sites within genes causes splicing errors and it may seem strange that these sites haven't been eliminated by natural selection.[tone] The argument for their persistence is similar to the argument for junk DNA.[32][36]

Although mutations which create or disrupt binding sites may be slightly deleterious, the large number of possible such mutations makes it inevitable that some will reach fixation in a population. This is particularly relevant in species, such as humans, with relatively small long-term effective population sizes. It is plausible, then, that the human genome carries a substantial load of suboptimal sequences which cause the generation of aberrant transcript isoforms. In this study, we present direct evidence that this is indeed the case.[32]

While the catalytic reaction may be accurate enough for effective processing most of the time, the overall error rate may be partly limited by the fidelity of transcription because transcription errors will introduce mutations that create cryptic splice sites. In addition, the transcription error rate of 10−5 – 10−6 is high enough that one in every 25,000 transcribed exons will have an incorporation error in one of the splice sites leading to a skipped intron or a skipped exon. Almost all multi-exon genes will produce incorrectly spliced transcripts but the frequency of this background noise will depend on: the size of the genes, the number of introns, and the quality of the splice site sequences.[30][33]

In some cases, splice variants will be produced by mutations in the gene (DNA). These can be SNP polymorphisms that create a cryptic splice site or mutate a functional site. They can also be somatic cell mutations that affect splicing in a particular tissue or a cell line.[37][38][39] When the mutant allele is in a heterozygous state, this will result in production of two abundant splice variants: one functional, and one non-functional. In the homozygous state, the mutant alleles may cause a genetic disease, such as the hemophilia found in descendants of Queen Victoria, where a mutation in one of the introns in a blood clotting factor gene creates a cryptic 3' splice site, resulting in aberrant splicing.[40] A significant fraction of human deaths by disease may be caused by mutations that interfere with normal splicing, mostly by creating cryptic splice sites.[41][38]

Incorrectly spliced transcripts can easily be detected and their sequences entered into the online databases. They are usually described as "alternatively spliced" transcripts, which can be confusing because the term does not distinguish between real, biologically relevant, alternative splicing and processing noise due to splicing errors. One of the central issues in the field of alternative splicing is working out the differences between these two possibilities. Many scientists have argued that the null hypothesis should be splicing noise, putting the burden of proof on those who claim biologically relevant alternative splicing. According to those scientists, the claim of function must be accompanied by convincing evidence that multiple functional products are produced from the same gene.[42][43]

Biological functions and evolution

[edit]

While introns do not encode protein products, they are integral to gene expression regulation. Some introns themselves encode functional RNAs through further processing after splicing to generate noncoding RNA molecules.[44] Alternative splicing is widely used to generate multiple proteins from a single gene. Furthermore, some introns play essential roles in a wide range of gene expression regulatory functions such as nonsense-mediated decay[45] and mRNA export.[46]

After the initial discovery of introns in protein-coding genes of the eukaryotic nucleus, there was significant debate as to whether introns in modern-day organisms were inherited from a common ancient ancestor (termed the introns-early hypothesis), or whether they appeared in genes rather recently in the evolutionary process (termed the introns-late hypothesis). Another theory is that the spliceosome and the intron-exon structure of genes is a relic of the RNA world (the introns-first hypothesis).[47] There is still considerable debate about the extent to which of these hypotheses is most correct but the popular consensus at the moment is that following the formation of the first eukaryotic cell group II introns from the bacterial endosymbiont invaded the host genome. In the beginning these self-splicing introns excised themselves from the mRNA precursor but over time some of them lost that ability and their excision had to be aided in trans by other group II introns. Eventually a number of specific trans-acting introns evolved and these became the precursors to the snRNAs of the spliceosome. The efficiency of splicing was improved by association with stabilizing proteins to form the primitive spliceosome.[48][49][50][51]

Early studies of genomic DNA sequences from a wide range of organisms show that the intron-exon structure of homologous genes in different organisms can vary widely.[52] More recent studies of entire eukaryotic genomes have now shown that the lengths and density (introns/gene) of introns varies considerably between related species. For example, while the human genome contains an average of 8.4 introns/gene (139,418 in the genome), the unicellular fungus Encephalitozoon cuniculi contains only 0.0075 introns/gene (15 introns in the genome).[53] Since eukaryotes arose from a common ancestor (common descent), there must have been extensive gain or loss of introns during evolutionary time.[54][55] This process is thought to be subject to selection, with a tendency towards intron gain in larger species due to their smaller population sizes, and the converse in smaller (particularly unicellular) species.[56] Biological factors also influence which genes in a genome lose or accumulate introns.[57][58][59]

Alternative splicing of exons within a gene after intron excision acts to introduce greater variability of protein sequences translated from a single gene, allowing multiple related proteins to be generated from a single gene and a single precursor mRNA transcript. The control of alternative RNA splicing is performed by a complex network of signaling molecules that respond to a wide range of intracellular and extracellular signals.

Introns contain several short sequences that are important for efficient splicing, such as acceptor and donor sites at either end of the intron as well as a branch point site, which are required for proper splicing by the spliceosome. Some introns are known to enhance the expression of the gene that they are contained in by a process known as intron-mediated enhancement (IME).

Actively transcribed regions of DNA frequently form R-loops that are vulnerable to DNA damage. In highly expressed yeast genes, introns inhibit R-loop formation and the occurrence of DNA damage.[60] Genome-wide analysis in both yeast and humans revealed that intron-containing genes have decreased R-loop levels and decreased DNA damage compared to intronless genes of similar expression.[60] Insertion of an intron within an R-loop prone gene can also suppress R-loop formation and recombination. Bonnet et al. (2017)[60] speculated that the function of introns in maintaining genetic stability may explain their evolutionary maintenance at certain locations, particularly in highly expressed genes.

Starvation adaptation

[edit]

The physical presence of introns promotes cellular resistance to starvation via intron enhanced repression of ribosomal protein genes of nutrient-sensing pathways.[61]

As mobile genetic elements

[edit]

Introns may be lost or gained over evolutionary time, as shown by many comparative studies of orthologous genes. Subsequent analyses have identified thousands of examples of intron loss and gain events, and it has been proposed that the emergence of eukaryotes, or the initial stages of eukaryotic evolution, involved an intron invasion.[62] Two definitive mechanisms of intron loss, reverse transcriptase-mediated intron loss (RTMIL) and genomic deletions, have been identified, and are known to occur.[63] The definitive mechanisms of intron gain, however, remain elusive and controversial. At least seven mechanisms of intron gain have been reported thus far: intron transposition, transposon insertion, tandem genomic duplication, intron transfer, intron gain during double-strand break repair (DSBR), insertion of a group II intron, and intronization. In theory it should be easiest to deduce the origin of recently gained introns due to the lack of host-induced mutations, yet even introns gained recently did not arise from any of the aforementioned mechanisms. These findings thus raise the question of whether or not the proposed mechanisms of intron gain fail to describe the mechanistic origin of many novel introns because they are not accurate mechanisms of intron gain, or if there are other, yet to be discovered, processes generating novel introns.[64]

In intron transposition, the most commonly purported intron gain mechanism, a spliced intron is thought to reverse splice into either its own mRNA or another mRNA at a previously intron-less position. This intron-containing mRNA is then reverse transcribed and the resulting intron-containing cDNA may then cause intron gain via complete or partial recombination with its original genomic locus.

Transposon insertions have been shown to generate thousands of new introns across diverse eukaryotic species.[65] Transposon insertions sometimes result in the duplication of this sequence on each side of the transposon. Such an insertion could intronize the transposon without disrupting the coding sequence when a transposon inserts into the sequence AGGT or encodes the splice sites within the transposon sequence. Where intron-generating transposons do not create target site duplications, elements include both splice sites GT (5') and AG (3') thereby splicing precisely without affecting the protein-coding sequence.[65] It is not yet understood why these elements are spliced, whether by chance, or by some preferential action by the transposon.

In tandem genomic duplication, due to the similarity between consensus donor and acceptor splice sites, which both closely resemble AGGT, the tandem genomic duplication of an exonic segment harboring an AGGT sequence generates two potential splice sites. When recognized by the spliceosome, the sequence between the original and duplicated AGGT will be spliced, resulting in the creation of an intron without alteration of the coding sequence of the gene. Double-stranded break repair via non-homologous end joining was recently identified as a source of intron gain when researchers identified short direct repeats flanking 43% of gained introns in Daphnia.[64] These numbers must be compared to the number of conserved introns flanked by repeats in other organisms, though, for statistical relevance. For group II intron insertion, the retrohoming of a group II intron into a nuclear gene was proposed to cause recent spliceosomal intron gain.

Intron transfer has been hypothesized to result in intron gain when a paralog or pseudogene gains an intron and then transfers this intron via recombination to an intron-absent location in its sister paralog. Intronization is the process by which mutations create novel introns from formerly exonic sequence. Thus, unlike other proposed mechanisms of intron gain, this mechanism does not require the insertion or generation of DNA to create a novel intron.[64]

The only hypothesized mechanism of recent intron gain lacking any direct evidence is that of group II intron insertion, which when demonstrated in vivo, abolishes gene expression.[66] Group II introns are therefore likely the presumed ancestors of spliceosomal introns, acting as site-specific retroelements, and are no longer responsible for intron gain.[67][68] Tandem genomic duplication is the only proposed mechanism with supporting in vivo experimental evidence: a short intragenic tandem duplication can insert a novel intron into a protein-coding gene, leaving the corresponding peptide sequence unchanged.[69] This mechanism also has extensive indirect evidence lending support to the idea that tandem genomic duplication is a prevalent mechanism for intron gain. The testing of other proposed mechanisms in vivo, particularly intron gain during DSBR, intron transfer, and intronization, is possible, although these mechanisms must be demonstrated in vivo to solidify them as actual mechanisms of intron gain. Further genomic analyses, especially when executed at the population level, may then quantify the relative contribution of each mechanism, possibly identifying species-specific biases that may shed light on varied rates of intron gain amongst different species.[64]

See also

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
An intron is a non-coding sequence of DNA located within a gene in eukaryotic organisms, interrupting the coding regions known as exons, and is transcribed into pre-messenger RNA (pre-mRNA) but subsequently removed during RNA splicing to form mature mRNA for translation into proteins. Introns were discovered in 1977 through independent studies by Phillip Sharp and Richard Roberts, who showed that eukaryotic genes, including those in viruses like adenovirus, are discontinuous with non-coding intervening sequences that are spliced out of the primary RNA transcript. This groundbreaking finding, which challenged the prevailing view of genes as continuous coding units, earned Sharp and Roberts the Nobel Prize in Physiology or Medicine in 1993. Spliceosomal introns, the predominant type in eukaryotes, are absent in prokaryotes but present in the vast majority of eukaryotic genes, often numbering in the dozens or hundreds per gene and varying widely in length from tens to tens of thousands of base pairs. Their positions show remarkable conservation across distant species, suggesting evolutionary and functional importance. Beyond being removed during splicing, introns fulfill diverse roles in gene regulation, such as enhancing transcriptional efficiency (sometimes by over 100-fold), enabling alternative splicing that generates multiple protein isoforms from a single gene (affecting up to 95% of human multi-exon genes), promoting mRNA nuclear export and translation, and encoding functional non-coding RNAs like microRNAs and small nucleolar RNAs.

Definition and Discovery

Definition and Characteristics

Introns are non-coding sequences of DNA located within genes, primarily in eukaryotes but also present in some prokaryotes and archaea, that are transcribed into precursor messenger RNA (pre-mRNA) yet excised during RNA splicing to yield mature mRNA for translation into proteins. These sequences interrupt the coding regions of genes and do not contribute to the final protein product in the standard case. Structurally, introns are positioned between exons, the segments that are retained in mature mRNA, and exhibit a wide range of lengths, typically spanning from 50 to more than 6,000 nucleotides, though extremes can exceed 100,000 nucleotides in some cases. Their boundaries are defined by highly conserved sequence motifs essential for recognition by the splicing machinery: the 5' splice site begins with a GU dinucleotide, the 3' splice site ends with an AG dinucleotide, and an internal branch point sequence features a critical adenine residue approximately 20–50 nucleotides upstream of the 3' site. Additionally, many introns contain a polypyrimidine tract, a stretch of pyrimidine-rich nucleotides, near the 3' splice site that aids in spliceosome assembly. In contrast to exons, which encode sequences or regulatory elements in the mature transcript, introns are generally non-coding and removed prior to . Exceptions exist where introns harbor genes for functional RNAs, such as small nucleolar RNAs (snoRNAs) that modifications. In eukaryotic genomes, introns ; for instance, in humans, they comprise over 95% of the of many protein-coding genes, with approximately 210,000 introns distributed across the approximately protein-coding genes.

Historical Discovery and Etymology

The discovery of introns marked a paradigm shift in understanding eukaryotic gene structure, emerging from investigations into adenovirus transcription in the mid-1970s. In 1977, Phillip A. Sharp and colleagues at the Massachusetts Institute of Technology hybridized late mRNA from adenovirus type 2 with its genomic DNA and used electron microscopy to observe the resulting structures. These revealed regions where the DNA looped out, unpaired with the mRNA, indicating that the gene contained non-coding intervening sequences separating expressed segments. This work, published in the Proceedings of the National Academy of Sciences, provided the first direct evidence of discontinuous genes in eukaryotes. Independently, in the same year, Richard J. Roberts and his team at Cold Spring Harbor Laboratory analyzed adenovirus type 2 transcripts using similar RNA-DNA hybridization and electron microscopy techniques. Their mapping showed an "amazing sequence arrangement" at the 5' ends of the mRNAs, confirming the presence of spliced intervening sequences that were removed during RNA processing to form mature mRNA. These findings, detailed in Cell, demonstrated that eukaryotic genes are composed of coding exons interrupted by non-coding introns, challenging the long-held assumption of gene-protein colinearity observed in prokaryotes. The 1977 discoveries by Sharp and Roberts were initially met with skepticism, as they contradicted the prevailing view that genes were continuous stretches of DNA directly encoding proteins, a model well-established from bacterial studies. Many scientists questioned whether split genes were artifacts of viral genomes or unique to eukaryotes, given the absence of introns in prokaryotic systems at the time. Confirmation came swiftly through additional electron microscopy studies, which consistently visualized the looped-out intron regions in RNA-DNA hybrids from various eukaryotic genes, solidifying the reality of this "gene-in-pieces" architecture. Earlier hints of RNA processing complexity had appeared in the 1970s from studies on heterogeneous nuclear RNA (hnRNA) in eukaryotic cells, which showed that large precursor transcripts were trimmed to smaller mRNAs, though these did not yet reveal the splicing mechanism. The groundbreaking 1977 papers in Cell and PNAS earned Sharp and Roberts the 1993 Nobel Prize in Physiology or Medicine for their "discoveries concerning split genes." The term "intron," short for intragenic region or intervening sequence, was coined by biochemist Walter Gilbert in 1978 to describe these non-coding elements, while "exon" denoted the expressed sequences joined during splicing. Gilbert introduced these terms in his Nature article "Why genes in pieces?," proposing that introns facilitated evolutionary flexibility by allowing exon shuffling. Before Gilbert's nomenclature, the sequences were commonly called intervening sequences (IVS) in the original 1977 publications. By the early 1980s, further milestones included the elucidation of splicing mechanisms, with the discovery of self-splicing introns in ribosomal RNA precursors, such as the group I intron in Tetrahymena thermophila reported by Thomas Cech in 1982, highlighting the catalytic potential of RNA itself.

Distribution and Occurrence

In Eukaryotic Genomes

Introns are highly abundant in eukaryotic genomes, particularly in more complex organisms. In the human genome, protein-coding genes contain an average of approximately 8 introns per gene, resulting in a total of around 180,000 to 200,000 introns across roughly 20,000 genes. Introns constitute about 24-25% of the total genomic DNA in mammals, significantly contributing to genome size despite not being translated into proteins. Intron sizes exhibit considerable variation across eukaryotic lineages, reflecting differences in genome architecture. In the budding yeast Saccharomyces cerevisiae, introns are rare and small, with only about 5% of genes containing a single intron on average, and typical lengths ranging from 50 to 400 nucleotides. In contrast, vertebrate genomes feature much larger introns; the average human intron is approximately 3.4 kilobases (kb), though sizes can extend up to 100 kb or more in some cases. This expansion contributes to the overall bloat in mammalian genomes, where introns often dwarf exon lengths. Patterns of intron distribution correlate with organismal complexity and gene features. Simpler eukaryotes like yeast have few introns, primarily in ribosomal protein genes, while multicellular organisms show increased numbers; for instance, the fruit fly Drosophila melanogaster genome harbors over 48,000 introns, averaging about 4 per gene with lengths around 487 base pairs. In plants, such as Arabidopsis thaliana, genes average nearly 5 introns with short lengths of about 165 nucleotides, but many plant species exhibit exceptionally long introns exceeding several kilobases, correlating with larger overall genome sizes. Intron number and length also tend to increase with gene length and are more prevalent in housekeeping versus tissue-specific genes in vertebrates.

In Prokaryotic and Archaeal Genomes

Introns are exceedingly rare in bacterial genomes, with the vast majority of prokaryotic genes lacking them entirely. Among the known cases, self-splicing group I and group II introns predominate, often functioning as mobile genetic elements that insert into host genes such as those encoding ribosomal RNAs or surface proteins. For instance, in Clostridium tetani, a group II intron interrupts a surface layer protein gene and undergoes alternative splicing in vivo, representing one of the few documented examples of introns in bacterial protein-coding sequences. Comprehensive analyses indicate that while group II introns occur in approximately 25% of surveyed bacterial genomes, they average only about 5.3 per affected genome, underscoring their scarcity relative to eukaryotic counterparts. In archaeal genomes, introns are more prevalent than in bacteria, though still limited in scope and primarily confined to transfer RNA (tRNA) and ribosomal RNA (rRNA) genes. These introns are typically processed by archaeal tRNA splicing endonucleases that recognize bulge-helix-bulge motifs at the exon-intron boundaries. Notable examples include multiple introns in the 23S rRNA genes of hyperthermophilic species like Pyrobaculum aerophilum and Pyrobaculum islandicum, where a 713-nucleotide intron interrupts the 16S rRNA gene in the former. Additionally, group I introns are widespread in archaeal rRNA and tRNA loci, and according to a 2024 preprint, group II introns have been identified in certain lineages, including members of the Asgard superphylum such as Lokiarchaeota, which exhibit structural and mechanistic parallels to eukaryotic spliceosomal introns. Archaeal and bacterial introns are generally short, ranging from 15 to around 600 nucleotides, in stark contrast to the often kilobase-scale introns in eukaryotes, and genomes harbor only a handful—typically 1 to 10 in total—rather than the thousands found in eukaryotic nuclear genes. This paucity suggests that introns were likely acquired relatively late in the evolution of prokaryote-like ancestors, possibly through horizontal transfer or independent insertions, rather than being ancestral features retained from a common origin.

Classification and Types

Spliceosomal Introns

Spliceosomal introns are non-coding sequences within eukaryotic pre-mRNA transcripts that are excised by the spliceosome, a large ribonucleoprotein complex composed of small nuclear RNAs (snRNAs) and associated proteins. This process is essential for generating mature messenger RNA (mRNA) in the nucleus, where spliceosomal introns predominate as the primary type interrupting protein-coding genes. Unlike other intron classes, spliceosomal introns require the coordinated action of multiple spliceosomal components for accurate removal, distinguishing them as a hallmark of eukaryotic gene architecture. Key structural features of spliceosomal introns include conserved consensus sequences that guide spliceosome recognition and assembly. The 5' splice site typically begins with a GT dinucleotide, while the 3' splice site ends with an AG dinucleotide, and an internal branch point sequence, often featuring an adenine residue, facilitates the splicing reaction. These motifs are recognized by specific small nuclear ribonucleoproteins (snRNPs): U1 snRNP binds the 5' splice site, U2 snRNP interacts with the branch point, and the U4/U5/U6 tri-snRNP complex contributes to catalysis and exon ligation. Introns vary widely in length, from tens to thousands of base pairs, but these conserved elements ensure splicing fidelity across diverse eukaryotic lineages. Spliceosomal introns constitute approximately 99% of all introns in eukaryotic genomes and are entirely absent in prokaryotes, reflecting their role in the complex regulation of eukaryotic gene expression. Their prevalence underscores the evolutionary expansion of nuclear pre-mRNA processing machinery, with densities varying from fewer than one per gene in some unicellular eukaryotes to approximately 8-9 per gene in vertebrates such as humans. This abundance enables alternative splicing, which generates proteomic diversity without expanding gene number. A of spliceosomal introns, known as U12-type introns, deviate from the standard GT-AG rule and instead feature AT-AC termini; these are processed by a distinct minor spliceosome comprising U11, U12, U4atac, U5, and U6atac snRNPs. In humans, U12-type introns represent about 0.5% of total introns, occurring in roughly 700-800 genes, often in clusters within the same transcript. These rare highlight the spliceosome's adaptability while maintaining core mechanistic principles shared with the major pathway.

Self-Splicing Introns

Self-splicing introns represent a class of capable of catalyzing their own excision from precursor RNA transcripts through intrinsic activity, independent of protein enzymes for the core splicing steps. These introns are primarily classified into Group I and Group II based on their distinct secondary structures and catalytic mechanisms, with both types facilitating two sequential transesterification reactions to remove the intron and ligate the flanking exons. Unlike spliceosomal introns, self-splicing introns rely on RNA folding to form active sites, highlighting their role as ancient in organellar and prokaryotic genomes. Group I introns were first identified in the ribosomal RNA (rRNA) precursor of Tetrahymena thermophila, where their self-splicing activity was demonstrated in vitro without added proteins. These introns occur in diverse genes, including rRNA, transfer RNA (tRNA), and mitochondrial protein-coding genes across eukaryotes, bacteria, and organelles. Their splicing mechanism begins with the exogenous guanosine cofactor, whose 3'-hydroxyl group attacks the 5' splice site in the first transesterification step, cleaving the 5' exon and attaching the guanosine to the intron's 5' end; the freed 3'-hydroxyl of the 5' exon then attacks the 3' splice site in the second step, joining the exons and releasing the linear intron. Structurally, Group I introns feature a conserved core of nine helical elements (P1 through P9), where P1 pairs the 5' exon with the internal guide sequence (IGS) to position the splice sites, and the P4-P6 domain forms a key catalytic scaffold, as revealed by crystallographic studies. This ribozyme architecture enables precise recognition and catalysis, with the UGU triplet in P7 coordinating a guanosine-binding pocket. Group II introns, initially characterized in yeast mitochondrial genes such as the cox1 locus, are prevalent in organellar genomes of plants, fungi, and algae, as well as in bacterial chromosomes and plasmids.90264-3.pdf) Their splicing mirrors the lariat-forming pathway of spliceosomal introns, starting with the 2'-hydroxyl of an adenosine bulge in domain VI attacking the 5' splice site to form a lariat intermediate and release the 5' exon; the subsequent attack by the 5' exon's 3'-hydroxyl on the 3' splice site ligates the exons and excises the branched intron. The conserved secondary structure comprises six double-helical domains (I through VI), with domain I serving as the catalytic core that scaffolds the active site through tertiary interactions, including coordination of two Mg²⁺ ions for phosphodiester bond hydrolysis. This ribozyme function is often enhanced by an intron-encoded maturase protein, though core self-splicing occurs in vitro without it. Group II introns exhibit mobility through retrohoming, where a reverse transcriptase-maturase fusion protein facilitates target-primed reverse transcription into homologous DNA sites. In terms of distribution, self-splicing introns are rare in eukaryotic nuclear genomes, where Group I examples are limited to specific fungal and protist rRNA or protein genes, but they are abundant in organelles. For instance, the mitochondrial genome of Saccharomyces cerevisiae contains at least 10 Group I and several Group II introns interrupting genes like cox1, cob, and rRNAs, contributing to genome complexity and enabling independent splicing in isolated transcripts. This organellar prevalence underscores their adaptation to compact, maternally inherited genomes, contrasting with the protein-dependent splicing dominant in nuclear pre-mRNAs.

tRNA and Other Specialized Introns

Introns in transfer RNA (tRNA) genes represent a specialized class distinct from those in messenger RNA (mRNA), as they occur in non-coding RNAs essential for translation and are processed through unique enzymatic mechanisms. In eukaryotes, tRNA introns are invariably positioned at a conserved site within the anticodon loop, specifically one nucleotide downstream of the anticodon between positions 37 and 38 of the mature tRNA sequence. These introns vary in length from 6 to over 100 nucleotides but do not interact extensively with the splicing machinery, allowing accommodation of diverse sequences by the processing enzymes. Unlike spliceosomal introns, tRNA introns are excised by the heterotetrameric tRNA splicing endonuclease (TSEN) complex, composed of subunits TSEN2, TSEN15, TSEN34, and TSEN54 in humans, which employs a molecular ruler mechanism to recognize the pre-tRNA structure and cleave at the exon-intron boundaries without requiring guanosine cofactors or lariat formation. Following cleavage, the exons are ligated by tRNA ligase (RTL or CGI-99 in mammals), ensuring precise restoration of the tRNA's functional cloverleaf structure. The of tRNA introns in eukaryotic genomes is relatively low and variable across , reflecting evolutionary dynamics rather than a universal for tRNA maturation. For instance, in the yeast Saccharomyces , approximately % of tRNA genes—59 out of 274—contain introns, distributed across 10 isodecoder families, with all introns at the position. In about 7% of the roughly tRNA genes harbor introns, totaling around 28 intron-containing genes, primarily in tRNA-Arg and tRNA-Tyr . Across broader eukaryotic diversity, the proportion ranges from 5% to 25%, with higher incidences in lower eukaryotes like yeast compared to vertebrates, and the introns often serving no essential role in tRNA function, as evidenced by viable intronless mutants in yeast. In archaea, introns in tRNA and (rRNA) genes exhibit distinct structural features adapted to the domain's splicing machinery, emphasizing RNA motifs over extensive secondary structures. These introns are typically recognized by the archaeal splicing endonuclease (aSen) through a conserved bulge-helix-bulge (BHB) motif at the exon-intron boundaries, consisting of two 2- to 3-nucleotide bulges flanking a 4-base-pair helix. The BHB structure facilitates precise cleavage, after which ligation occurs via ATP-dependent RNA ligase, distinguishing this from eukaryotic TSEN-mediated splicing despite superficial similarities in endonuclease recognition. While most archaeal tRNA and rRNA introns rely on this protein-assisted mechanism, some rare group I introns in archaeal rRNA can undergo self-splicing, paralleling organellar variants but remaining infrequent. Examples include BHB-motif introns in the pre-tRNA^{Ile} of Haloferax volcanii and rRNA precursors of Desulfurococcus mobilis, where the motif ensures fidelity in harsh environmental conditions typical of archaeal habitats. Other specialized introns include twintrons, which are nested arrangements where an internal intron is embedded within an external one, requiring sequential splicing for resolution. Twintrons occur rarely in tRNA and rRNA contexts, primarily in organellar genomes, such as group II twintrons in algal chloroplast rRNA genes or mitochondrial tRNA precursors in lycophytes, where the internal intron must be excised first to expose the external splice sites. Another rare variant involves permuted exons associated with group I introns in ciliate rRNA, as seen in Tetrahymena thermophila, where the linear order of exon segments is rearranged, yet self-splicing proceeds via trans-esterification to yield functional circular or linear RNAs. These configurations, including permuted intron-exon (PIE) structures, highlight evolutionary innovations in intron architecture, with the permuted group I introns in ciliates demonstrating autocatalytic activity despite disrupted sequential order. Such specialized forms underscore the adaptability of introns in non-mRNA RNAs, maintaining low overall prevalence to minimize processing burdens.

Splicing Mechanism

Splicing Process Overview

In eukaryotic cells, the splicing process begins with the transcription of pre-mRNA by RNA polymerase II, producing a primary transcript that contains both exons and introns. This pre-mRNA is then subject to splicing, a co-transcriptional process where introns are precisely removed and exons are joined to form mature mRNA. The spliceosome, a large ribonucleoprotein complex, assembles dynamically on the pre-mRNA to catalyze this removal through two sequential transesterification reactions. Recent cryo-electron microscopy (cryo-EM) studies, as of 2024, have provided the first atomic-level blueprint of the human spliceosome, revealing intricate details of its assembly and conformational changes. Spliceosome assembly occurs in a stepwise manner, starting with the E (commitment) complex, where U1 binds the 5' splice site, and additional factors recognize the and polypyrimidine tract near the 3' splice site. This progresses to the A (pre-spliceosome) complex with U2 binding the , forming base-pairing interactions that position key sites. The B complex forms upon of the U4/U6.U5 tri-snRNP, bringing all five major snRNPs (U1, U2, U4, U5, U6) together, followed by structural rearrangements driven by ATP-dependent DExD/H-box helicases like Prp28 and Brr2, which release U1 and activate the catalytic core. The process culminates in the B* and C complexes, where the spliceosome becomes catalytically active, with further ATPase activity (e.g., Prp2) facilitating the first reaction. This assembly is ATP-dependent throughout, relying on helicases for conformational changes, and occurs co-transcriptionally in eukaryotes, coupling splicing to nascent RNA production. The splicing reactions involve two transesterification steps without net consumption of chemical energy. In the first step, the 2'-OH group of an adenosine at the branch point acts as a nucleophile, attacking the phosphate at the 5' splice site, cleaving the 5'-exon and forming a lariat structure with the intron via a 2'-5' phosphodiester bond. The second step follows, where the newly freed 3'-OH of the 5' exon attacks the phosphate at the 3' splice site, ligating the exons with a standard 3'-5' phosphodiester bond and releasing the intron lariat. These reactions are facilitated by the snRNAs in U2, U5, and U6, which mimic ribozyme-like catalysis in the active site. While most introns undergo constitutive splicing, where all introns are removed in a fixed manner, variations arise through , allowing a single pre-mRNA to produce multiple isoforms; for example, excludes specific exons from the mature mRNA, regulated by splicing factors that modulate snRNP binding or complex stability. This is highly conserved across eukaryotes but can be tuned for regulated splicing in response to cellular signals.

Fidelity and Error Correction

The spliceosome achieves remarkably high fidelity in intron removal, with in vivo splicing error rates typically below 1% per intron, often approaching 0.7% on average across human genes, ensuring that over 99% of splicing events produce accurate exon ligation. This precision is essential given the sequence similarity between splice sites and potential cryptic sites in pre-mRNA, where errors such as exon skipping or intron retention can disrupt coding frames and lead to non-functional transcripts. In specific cases, like the SMN2 gene, splicing errors result in exon 7 skipping in approximately 90% of transcripts, contributing to spinal muscular atrophy pathogenesis by producing unstable SMN protein variants. Cellular proofreading mechanisms enhance this accuracy during spliceosome assembly and catalysis. DEAH-box ATPases, such as Prp16, act as molecular clocks by unwinding suboptimal lariat intermediates formed after the first transesterification step, directing aberrant substrates into a discard pathway that prevents their progression to the second step of splicing. This kinetic proofreading process discriminates against slowly reacting complexes, rejecting those with mismatched branch points or splice sites and thereby reducing error propagation. Recent structural studies (as of 2025) highlight roles for factors like DHX35-GPATCH1 in ensuring splice site fidelity during assembly. Additionally, post-splicing surveillance via nonsense-mediated decay (NMD) degrades aberrant mRNAs containing premature termination codons often introduced by splicing errors like frameshift-inducing exon skips or retained introns. Several factors influence splicing fidelity, including pre-mRNA secondary structure, which can mask or expose splice sites and promote alternative or erroneous pairings, and the concentration of splicing factors like SR proteins that stabilize canonical splice site recognition. Imbalances in these factors, such as reduced SR protein levels, can increase error rates by favoring cryptic sites or inefficient assembly. In disease contexts, mutations altering these elements exacerbate inaccuracies, as seen in SMN2 where a single nucleotide change disrupts an exonic splicing enhancer, leading to predominant exon skipping. Experimental studies highlight differences in between and conditions, with reconstituted spliceosomes showing reduced , such as 10-fold slower cleavage under suboptimal conditions, due to the absence of cellular chaperones and pathways. Kinetic models, informed by inhibition assays, demonstrate how energy-dependent branches in the splicing cycle amplify . These models the spliceosome's to balance speed and accuracy, with Prp16-mediated rejection preventing the accumulation of defective lariats in cellular extracts.

Biological Roles and Evolution

Regulatory and Functional Roles

Introns exert significant influence on gene regulation, primarily through intron-mediated enhancement (IME) and alternative splicing. IME enables specific introns, particularly those located near the 5' end of genes and in their native orientation, to amplify mRNA levels by boosting transcription efficiency, promoting nuclear export, and stabilizing transcripts. This enhancement can increase gene expression by several-fold, with effects observed across diverse organisms from yeast to mammals, underscoring introns' role in fine-tuning protein output without altering coding sequences. Complementing IME, alternative splicing leverages introns to produce multiple mRNA isoforms from a single pre-mRNA, expanding proteomic diversity; in humans, this process affects approximately 95% of multi-exon genes, enabling tissue-specific and developmental regulation of gene function. Beyond direct enhancement, introns serve as reservoirs for non-coding RNAs that regulate cellular processes. A substantial portion—around 50-60%—of human microRNAs (miRNAs) originates from intronic sequences within protein-coding or non-coding host genes, where these miRNAs are excised and mature independently to silence target mRNAs post-transcriptionally. Likewise, the majority of small nucleolar RNAs (snoRNAs), exceeding 95% in vertebrates, are processed from introns, guiding chemical modifications on ribosomal and other RNAs essential for ribosome biogenesis and translation fidelity. Introns also contribute to the formation of circular RNAs (circRNAs) via backsplicing, where flanking intronic repeats or structures facilitate exon circularization, yielding stable RNAs that act as miRNA sponges or modulators of splicing. Introns further support mRNA maturation and nuclear by facilitating assembly of the (EJC) approximately 20-24 upstream of exon-exon junctions during splicing, which recruits factors to processed transcripts reach the cytoplasm efficiently. They also modulate chromatin , with intronic sequences influencing positioning and to promote or repress transcription in a context-dependent manner. In specific examples, intron retention during stress responses—such as shock or deprivation—delays of sensor genes by sequestering premature transcripts in the nucleus, providing a rapid post-transcriptional brake on protein synthesis. For 40 years, intron retention was often dismissed as splicing noise but is now recognized as a dynamic and evolutionarily conserved mechanism of gene regulation. Similarly, in immune gene diversity, introns within the immunoglobulin heavy chain locus enable V(D)J recombination and class-switch recombination, generating varied antibody specificities and isotypes critical for adaptive immunity.

Evolutionary Origins and Significance

The evolutionary origins of introns remain a subject of debate, encapsulated by the "introns-early" and "introns-late" hypotheses. The introns-early theory proposes that introns were abundant in the last universal common ancestor (LUCA) or even predated the RNA-protein world, facilitating early exon shuffling, with extensive losses occurring in prokaryotic lineages due to streamlining pressures. This view is bolstered by the conservation of intron positions in orthologous genes across eukaryotes, where roughly 25-30% of introns align in sequences from animals, fungi, and plants, often at protosplice sites such as (A/C)AG||G that suggest ancient insertions rather than independent gains. Conversely, the introns-late theory argues that spliceosomal introns emerged as a eukaryotic innovation after the divergence from prokaryotes, driven by the need for complex gene regulation, with evidence from their near-absence in bacterial and archaeal genomes—where introns constitute less than 1% of genes—and the sporadic distribution in eukaryotic paralogs indicating ongoing gains and losses. Recent comparative genomic analyses have identified hundreds of recent intron gain events in human genes, supporting continued intron dynamics in modern eukaryotes. A prevailing compromise reconciles these perspectives by positing that self-splicing group II introns, originally from bacterial endosymbionts like the mitochondrial progenitor, massively invaded the early eukaryotic nuclear genome during eukaryogenesis, creating an intron-rich ancestor before differential losses in descendant lineages. This scenario aligns with the mechanistic parallels between group II intron ribozyme activity and spliceosomal transesterification reactions, where conserved structural domains in group II RNAs mirror spliceosomal snRNAs. Such an invasion likely coincided with the emergence of nuclear-cytoplasmic compartmentalization, enabling intron proliferation without immediate lethality to the host. Introns have profoundly influenced genome evolution by enabling exon shuffling, which promotes the recombination of protein-coding modules to generate functional diversity. In vertebrates, for example, introns in immunoglobulin loci allow V(D)J recombination, shuffling variable exons to produce antibody diversity essential for adaptive immunity. Broader analyses indicate that exon shuffling has assembled modular domains in a significant fraction of eukaryotic multidomain proteins, underscoring introns' role in expanding proteome complexity without de novo sequence invention. Intron proliferation, mediated by gene duplication and retrotransposition-like mobility, correlates strongly with the transition to multicellularity, particularly in metazoans, where intron density surged at the lineage's base to support tissue-specific regulation. Phylogenetic reconstructions reveal that early metazoan ancestors acquired thousands of novel introns, far exceeding those in unicellular relatives, facilitating alternative splicing variants that underpin developmental complexity. This expansion contrasts with the rarity of introns in prokaryotes and archaea, where they appear sporadically, often as mobile group II elements. The fossil record of introns is embodied in ancient preserved in bacterial genomes, such as those in and Sinorhizobium species, indicating their pre-eukaryotic antiquity as retroelements capable of self-propagation. Debates persist on evolution, with structural and phylogenetic supporting its derivation from disassembled components: the intron's catalytic core likely fragmented into snRNAs (U2, U6), while maturase proteins evolved into splicing factors like Prp8. This transition highlights introns' significance in driving eukaryotic innovation, from modular protein evolution to the architectural foundations of complex .

Specific Adaptations (e.g., Starvation Response)

One prominent example of intron-mediated adaptation to nutrient stress involves phosphate starvation in Arabidopsis thaliana. Under phosphate (Pi) deficiency, intron retention increases in numerous root transcripts, particularly those associated with phosphate transport and cellular responses, leading to the production of truncated protein isoforms that enhance resource efficiency and stress tolerance. This splicing shift allows plants to fine-tune gene expression without altering transcription levels, promoting survival in low-Pi soils. In heat stress responses, introns facilitate decay mechanisms in plants such as Arabidopsis and tomato, where elevated temperatures induce widespread intron retention, often introducing premature termination codons that trigger nonsense-mediated decay (NMD) of transcripts. This reduces the synthesis of non-essential proteins, conserving energy during thermal stress and contributing to thermotolerance; for instance, retention in heat shock factor genes like HsfA2 modulates isoform production for adaptive protein functions. Similarly, in viral contexts, stable introns from latency-associated transcripts in herpes simplex virus type 1 accumulate post-splicing in infected neurons, suppressing lytic gene expression and maintaining viral latency by interfering with host or viral transcription. Osmotic stress in yeast (Saccharomyces cerevisiae) triggers concerted intron retention in ribosomal protein genes, such as RPS22B, generating bimodal expression patterns that create phenotypic heterogeneity within cell populations. This bet-hedging strategy enables some cells to endure prolonged stress via low protein output while others recover quickly upon relief, enhancing overall population fitness. These adaptations arise through stress-induced shifts in splicing factors, including hnRNP-like proteins in yeast that relocalize to the nucleus under osmotic or thermal stress, altering splice site recognition and favoring retention. In plants, similar changes in SR and hnRNP proteins modulate intron inclusion, often via phosphorylation or binding affinity alterations. Responsive elements, such as weak 5' splice sites or upstream open reading frames in introns, show evolutionary conservation across plant species and even kingdoms, underscoring their adaptive utility. Studies from the 2010s, including genome-wide analyses in Arabidopsis, revealed that approximately 10-20% of intron-containing genes exhibit modulated splicing under various stresses, with intron retention being the dominant event in nutrient and abiotic responses. These findings highlight introns' role in rapid, post-transcriptional adjustments, distinct from broader alternative splicing mechanisms.

Mobility and Genetic Dynamics

Mechanisms of Intron Mobility

Introns exhibit mobility through distinct biochemical mechanisms that enable their spread within and between genomes, primarily observed in self-splicing group I and group II introns, as well as rarer events in spliceosomal introns. Retrohoming is the primary mobility mechanism for group II introns, involving an RNA intermediate that invades a homologous target DNA site. These introns encode a multifunctional intron-encoded protein (IEP) with reverse transcriptase (RT) and endonuclease domains, which assembles with the excised intron RNA to form a ribonucleoprotein (RNP) particle. The RNP targets an intronless allele, where the intron RNA performs reverse splicing directly into one strand of the target DNA, creating a RNA/DNA hybrid; the IEP's endonuclease then nicks the opposite DNA strand, and its RT activity synthesizes the second DNA strand using the intron RNA as a template, resulting in intron insertion. For instance, the Ll.LtrB intron from Lactococcus lactis demonstrates this process with homing efficiencies reaching up to 1.3 × 10^{-3} per recipient cell in vivo, though frequencies can drop to 10^{-5} without full IEP function. In group I introns, mobility occurs through homing endonucleases encoded within the intron open reading frame (ORF). These enzymes, such as I-SceI from the Saccharomyces cerevisiae mitochondrial 21S rRNA intron, recognize and cleave a specific 18-40 base pair sequence in the intronless target DNA, generating a double-strand break. Cellular double-strand break repair via homologous recombination then uses the intron-containing donor allele as a template, copying the intron into the recipient site. This DNA-based mechanism contrasts with the RNA intermediate in retrohoming and is highly site-specific, promoting unidirectional spread. Spliceosomal introns, which rely on the spliceosome for excision, exhibit rare transposition events mediated by DNA intermediates rather than RNA. Experimental evidence from yeast reporter systems has captured intron gain through transposition, where an intron sequence is duplicated and inserted into a new genomic location, potentially via non-long terminal repeat (non-LTR) retrotransposon-like processes or direct DNA copying. Such events are infrequent and contribute to intron proliferation in eukaryotic genomes. Horizontal transfer facilitates intron dissemination across bacterial species, particularly for group I introns, with phylogenetic evidence indicating spread via phage-mediated vectors or direct gene exchange. For example, group I introns in cyanobacterial and α-proteobacterial tRNA genes show patterns inconsistent with vertical inheritance, supporting horizontal transmission events. Mobility rates for such transfers are low, estimated at approximately 10^{-5} per generation in bacterial populations, limiting widespread invasion but enabling occasional colonization of new hosts. Experimental studies of intron mobility often employ in vitro assays to reconstitute these processes. For group II introns, such assays involve assembling RNP particles from purified IEP and intron RNA, then incubating with target DNA to measure reverse splicing and cDNA synthesis efficiencies, as demonstrated for the Ll.LtrB system where insertion occurs preferentially at replication forks. Computational approaches detect ancient "intron fossils"—degenerate or remnant sequences—by scanning genomes for intron-like motifs using sequence homology searches and phylogenetic reconciliation to identify transfer events or losses.

Implications as Mobile Elements

Mobile introns function as selfish genetic elements that promote their own propagation within host genomes, often at the expense of host fitness, thereby driving dynamic patterns of intron gain and loss that shape evolutionary trajectories. In bacteria, mobile group II introns exhibit remarkable abundance and diversity, with many facilitating horizontal gene transfer and contributing to genetic variation across prokaryotic lineages. This mobility enables introns to insert into new genomic sites, influencing gene structure and function over evolutionary time. In eukaryotes, the accumulation of such introns contributes to genome expansion, where non-essential insertions lead to increased genome size through a process of random genetic drift and bloating of non-coding regions. As parasitic entities, mobile introns, particularly those encoding homing endonucleases, can disrupt host genes by inserting into coding sequences, potentially reducing host viability unless counterbalanced by splicing efficiency or host repair mechanisms. These elements exhibit super-Mendelian inheritance, spreading rapidly in populations until fixation, after which endonuclease activity often decays, though persistence in asexual lineages suggests ongoing evolutionary pressures like recombination or rare beneficial roles. Host genomes counteract this parasitism through genetic conflicts, including suppression mechanisms that limit excessive proliferation and maintain genome stability. The broader evolutionary implications of intron mobility include facilitation of speciation by promoting divergence in splicing patterns and gene regulation across populations. In bacteria, mobile introns contribute to the dissemination of genetic material via horizontal transfer, which can indirectly support the spread of adaptive traits such as antibiotic resistance genes embedded in mobile contexts. In modern applications, homing endonucleases derived from group I introns serve as precise tools for genome editing in gene therapy, enabling targeted disruptions or repairs in therapeutic contexts like viral interference and disease correction. Post-2020 advances, as of 2025, have expanded their engineered use in synthetic biology; for example, ARCUS nucleases—derived from the homing endonuclease I-CreI—enable high-efficiency homology-directed insertions in bacterial genomes, while synthetic homing endonuclease gene drives have been developed for applications such as mosquito population control. These developments also support antiviral strategies, including leveraging endonuclease activity for viral interference in phage systems.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.