Recent from talks
Contribute something to knowledge base
Content stats: 0 posts, 0 articles, 1 media, 0 notes
Members stats: 0 subscribers, 0 contributors, 0 moderators, 0 supporters
Subscribers
Supporters
Contributors
Moderators
Hub AI
Protein splicing AI simulator
(@Protein splicing_simulator)
Hub AI
Protein splicing AI simulator
(@Protein splicing_simulator)
Protein splicing
Protein splicing is an intramolecular reaction of a particular protein in which an internal protein segment (called an intein) is removed from a precursor protein with a ligation of C-terminal and N-terminal external proteins (called exteins) on both sides. The splicing junction of the precursor protein is mainly a cysteine or a serine, which are amino acids containing a nucleophilic side chain. The protein splicing reactions which are known now do not require exogenous cofactors or energy sources such as adenosine triphosphate (ATP) or guanosine triphosphate (GTP). Normally, splicing is associated only with pre-mRNA splicing. This precursor protein contains three segments—an N-extein followed by the intein followed by a C-extein. After splicing has taken place, the resulting protein contains the N-extein linked to the C-extein; this splicing product is also termed an extein.
The first intein was discovered in 1988 through sequence comparison between the Neurospora crassa and carrot vacuolar ATPase (without intein) and the homologous gene in yeast (with intein) that was first described as a putative calcium ion transporter. In 1990 Hirata et al. demonstrated that the extra sequence in the yeast gene was transcribed into mRNA and removed itself from the host protein only after translation. Since then, inteins have been found in all three domains of life (eukaryotes, bacteria, and archaea) and in viruses.
Protein splicing was unanticipated and its mechanisms were discovered by two groups (Anraku and Stevens) in 1990. They both discovered a Saccharomyces cerevisiae VMA1 in a precursor of a vacuolar H+-ATPase enzyme. The amino acid sequence of the N- and C-termini corresponded to 70% DNA sequence of that of a vacuolar H+-ATPase from other organisms, while the amino acid sequence of the central position corresponded to 30% of the total DNA sequence of the yeast HO nuclease.
Many genes have unrelated intein-coding segments inserted at different positions. For these and other reasons, inteins (or more properly, the gene segments coding for inteins) are sometimes called selfish genetic elements, but it may be more accurate to call them parasitic. According to the gene centered view of evolution, most genes are "selfish" only insofar as to compete with other genes or alleles but usually they fulfill a function for the organisms, whereas "parasitic genetic elements", at least initially, do not make a positive contribution to the fitness of the organism.
As of December 2019, the UniProtKB database contains 188 entries manually annotated as inteins, ranging from just tens of amino acid residues to thousands. The first intein was found encoded within the VMA gene of Saccharomyces cerevisiae. They were later found in fungi (ascomycetes, basidiomycetes, zygomycetes and chytrids) and in diverse proteins as well. A protein distantly related to known inteins containing protein, but closely related to metazoan hedgehog proteins, has been described to have the intein sequence from Glomeromycota. Many of the newly described inteins contain homing endonucleases and some of these are apparently active. The abundance of intein in fungi indicates lateral transfer of intein-containing genes. While in eubacteria and archaea, there are 289 and 182 currently known inteins. Not surprisingly, most intein in eubacteria and archaea are found to be inserted into nucleic acid metabolic protein, like fungi.
Inteins vary greatly, but many of the same intein-containing proteins are found in a number of species. For example, pre-mRNA processing factor 8 (Prp8) protein, instrumental in the spliceosome, has seven different intein insertion sites across eukaryotic species. Intein-containing Prp8 is most commonly found in fungi, but is also seen in Amoebozoa, Chlorophyta, Capsaspora, and Choanoflagellida. Many mycobacteria contain inteins within DnaB (bacterial replicative helicase), RecA (bacterial DNA recombinase), and SufB (FeS cluster assembly protein). There is remarkable variety within the structure and number of DnaB inteins, both within the mycobacterium genus and beyond. Interestingly, intein-containing DnaB is also found in the chloroplasts of algae. Intein-containing proteins found in archaea include RadA (RecA homolog), RFC, PolB, RNR. Many of the same intein-containing proteins (or their homologs) are found in two or even all three domains of life. Inteins are also seen in the proteomes encoded by bacteriophages and eukaryotic viruses. Viruses may have been involved as vectors of intein distribution across the wide variety of intein containing organisms.
The process for class 1 inteins begins with an N-O or N-S shift when the side chain of the first residue (a serine, threonine, or cysteine) of the intein portion of the precursor protein nucleophilically attacks the peptide bond of the residue immediately upstream (that is, the final residue of the N-extein) to form a linear ester (or thioester) intermediate. A transesterification occurs when the side chain of the first residue of the C-extein attacks the newly formed (thio)ester to free the N-terminal end of the intein. This forms a branched intermediate in which the N-extein and C-extein are attached, albeit not through a peptide bond. The last residue of the intein is always an asparagine (Asn), and the amide nitrogen atom of this side chain cleaves apart the peptide bond between the intein and the C-extein, resulting in a free intein segment with a terminal cyclic imide. Finally, the free amino group of the C-extein now attacks the (thio)ester linking the N- and C-exteins together. An O-N or S-N shift produces a peptide bond and the functional, ligated protein.
Class 2 inteins have no nucleophilic first side chain, only an alanine. Instead, the reaction starts directly with a nucleophilic displacement, with the first residue of the C-extein attacking the peptide carboxyl on the final residue of the N-extein. The rest proceeds as usual, starting with Asn turning into a cyclic imide.
Protein splicing
Protein splicing is an intramolecular reaction of a particular protein in which an internal protein segment (called an intein) is removed from a precursor protein with a ligation of C-terminal and N-terminal external proteins (called exteins) on both sides. The splicing junction of the precursor protein is mainly a cysteine or a serine, which are amino acids containing a nucleophilic side chain. The protein splicing reactions which are known now do not require exogenous cofactors or energy sources such as adenosine triphosphate (ATP) or guanosine triphosphate (GTP). Normally, splicing is associated only with pre-mRNA splicing. This precursor protein contains three segments—an N-extein followed by the intein followed by a C-extein. After splicing has taken place, the resulting protein contains the N-extein linked to the C-extein; this splicing product is also termed an extein.
The first intein was discovered in 1988 through sequence comparison between the Neurospora crassa and carrot vacuolar ATPase (without intein) and the homologous gene in yeast (with intein) that was first described as a putative calcium ion transporter. In 1990 Hirata et al. demonstrated that the extra sequence in the yeast gene was transcribed into mRNA and removed itself from the host protein only after translation. Since then, inteins have been found in all three domains of life (eukaryotes, bacteria, and archaea) and in viruses.
Protein splicing was unanticipated and its mechanisms were discovered by two groups (Anraku and Stevens) in 1990. They both discovered a Saccharomyces cerevisiae VMA1 in a precursor of a vacuolar H+-ATPase enzyme. The amino acid sequence of the N- and C-termini corresponded to 70% DNA sequence of that of a vacuolar H+-ATPase from other organisms, while the amino acid sequence of the central position corresponded to 30% of the total DNA sequence of the yeast HO nuclease.
Many genes have unrelated intein-coding segments inserted at different positions. For these and other reasons, inteins (or more properly, the gene segments coding for inteins) are sometimes called selfish genetic elements, but it may be more accurate to call them parasitic. According to the gene centered view of evolution, most genes are "selfish" only insofar as to compete with other genes or alleles but usually they fulfill a function for the organisms, whereas "parasitic genetic elements", at least initially, do not make a positive contribution to the fitness of the organism.
As of December 2019, the UniProtKB database contains 188 entries manually annotated as inteins, ranging from just tens of amino acid residues to thousands. The first intein was found encoded within the VMA gene of Saccharomyces cerevisiae. They were later found in fungi (ascomycetes, basidiomycetes, zygomycetes and chytrids) and in diverse proteins as well. A protein distantly related to known inteins containing protein, but closely related to metazoan hedgehog proteins, has been described to have the intein sequence from Glomeromycota. Many of the newly described inteins contain homing endonucleases and some of these are apparently active. The abundance of intein in fungi indicates lateral transfer of intein-containing genes. While in eubacteria and archaea, there are 289 and 182 currently known inteins. Not surprisingly, most intein in eubacteria and archaea are found to be inserted into nucleic acid metabolic protein, like fungi.
Inteins vary greatly, but many of the same intein-containing proteins are found in a number of species. For example, pre-mRNA processing factor 8 (Prp8) protein, instrumental in the spliceosome, has seven different intein insertion sites across eukaryotic species. Intein-containing Prp8 is most commonly found in fungi, but is also seen in Amoebozoa, Chlorophyta, Capsaspora, and Choanoflagellida. Many mycobacteria contain inteins within DnaB (bacterial replicative helicase), RecA (bacterial DNA recombinase), and SufB (FeS cluster assembly protein). There is remarkable variety within the structure and number of DnaB inteins, both within the mycobacterium genus and beyond. Interestingly, intein-containing DnaB is also found in the chloroplasts of algae. Intein-containing proteins found in archaea include RadA (RecA homolog), RFC, PolB, RNR. Many of the same intein-containing proteins (or their homologs) are found in two or even all three domains of life. Inteins are also seen in the proteomes encoded by bacteriophages and eukaryotic viruses. Viruses may have been involved as vectors of intein distribution across the wide variety of intein containing organisms.
The process for class 1 inteins begins with an N-O or N-S shift when the side chain of the first residue (a serine, threonine, or cysteine) of the intein portion of the precursor protein nucleophilically attacks the peptide bond of the residue immediately upstream (that is, the final residue of the N-extein) to form a linear ester (or thioester) intermediate. A transesterification occurs when the side chain of the first residue of the C-extein attacks the newly formed (thio)ester to free the N-terminal end of the intein. This forms a branched intermediate in which the N-extein and C-extein are attached, albeit not through a peptide bond. The last residue of the intein is always an asparagine (Asn), and the amide nitrogen atom of this side chain cleaves apart the peptide bond between the intein and the C-extein, resulting in a free intein segment with a terminal cyclic imide. Finally, the free amino group of the C-extein now attacks the (thio)ester linking the N- and C-exteins together. An O-N or S-N shift produces a peptide bond and the functional, ligated protein.
Class 2 inteins have no nucleophilic first side chain, only an alanine. Instead, the reaction starts directly with a nucleophilic displacement, with the first residue of the C-extein attacking the peptide carboxyl on the final residue of the N-extein. The rest proceeds as usual, starting with Asn turning into a cyclic imide.