Hubbry Logo
logo
Start codon
Community hub

Start codon

logo
0 subscribers
Read side by side
from Wikipedia
Start codon (blue circle) of the human mitochondrial DNA MT-ATP6 gene. For each nucleotide triplet (square brackets), the corresponding amino acid is given (one-letter code), either in the +1 reading frame for MT-ATP8 (in red) or in the +3 frame for MT-ATP6 (in blue). In this genomic region, the two genes overlap.

The start codon is the first codon of a messenger RNA (mRNA) transcript translated by a ribosome. The start codon always codes for methionine in eukaryotes and archaea and a N-formylmethionine (fMet) in bacteria, mitochondria and plastids.

The start codon is often preceded by a 5' untranslated region (5' UTR). In prokaryotes this includes the ribosome binding site.

Decoding

[edit]

In all three domains of life, the start codon is decoded by a special "initiation" transfer RNA different from the tRNAs used for elongation. There are important structural differences between an initiating tRNA and an elongating one, with distinguish features serving to satisfy the constraints of the translation system. In bacteria and organelles, an acceptor stem C1:A72 mismatch guide formylation, which directs recruitment by the 30S ribosome into the P site; so-called "3GC" base pairs allow assembly into the 70S ribosome.[1] In eukaryotes and archaea, the T stem prevents the elongation factors from binding, while eIF2 specifically recognizes the attached methionine and a A1:U72 basepair.[2]

In any case, the natural initiating tRNA only codes for methionine.[3] Knowledge of the key recognizing features has allowed researchers to construct alternative initiating tRNAs that code for different amino acids; see below.

Alternative start codons

[edit]

Alternative start codons are different from the standard AUG codon and are found in both prokaryotes (bacteria and archaea) and eukaryotes. Alternate start codons are still translated as Met when they are at the start of a protein (even if the codon encodes a different amino acid otherwise). This is because a separate tRNA is used for initiation.[3]

Eukaryotes

[edit]

Alternate start codons (non-AUG) are very rare in eukaryotic genomes: a wide range of mechanisms work to guarantee the relative fidelity of AUG initiation.[4] However, naturally occurring non-AUG start codons have been reported for some cellular mRNAs.[5] Seven out of the nine possible single-nucleotide substitutions at the AUG start codon of dihydrofolate reductase are functional as translation start sites in mammalian cells.[6]

Bacteria

[edit]

Bacteria do not generally have the wide range of translation factors monitoring start codon fidelity. GUG and UUG are the main, even "canonical", alternate start codons.[4] GUG in particular is important to controlling the replication of plasmids.[4]

E. coli uses 83% AUG (3542/4284), 14% (612) GUG, 3% (103) UUG[7] and one or two others (e.g., an AUU and possibly a CUG).[8][9]

Well-known coding regions that do not have AUG initiation codons are those of lacI (GUG)[10][11] and lacA (UUG)[12] in the E. coli lac operon. Two more recent studies have independently shown that 17 or more non-AUG start codons may initiate translation in E. coli.[13][14]

Mitochondria

[edit]

Mitochondrial genomes use alternate start codons more significantly (AUA and AUG in humans).[15] Many such examples, with codons, systematic range, and citations, are given in the NCBI list of translation tables.[16]

Archaea

[edit]

Archaea, which are prokaryotes with a translation machinery similar to but simpler than that of eukaryotes, allow initiation at UUG and GUG.[4]

Upstream start codons

[edit]

These are "alternative" start codons in the sense that they are upstream of the regular start codons and thus could be used as alternative start codons. More than half of all human mRNAs have at least one AUG codon upstream (uAUG) of their annotated translation initiation starts (TIS) (58% in the current versions of the human RefSeq sequence). Their potential use as TISs could result in translation of so-called upstream Open Reading Frames (uORFs). uORF translation usually results in the synthesis of short polypeptides, some of which have been shown to be functional, e.g., in ASNSD1, MIEF1, MKKS, and SLC35A4.[17] However, it is believed that most translated uORFs only have a mild inhibitory effect on downstream translation because most uORF starts are leaky (i.e. don't initiate translation or because ribosomes terminating after translation of short ORFs are often capable of reinitiating).[17]

Standard genetic code

[edit]
Amino-acid biochemical properties Nonpolar Polar Basic Acidic Termination: stop codon
Standard genetic code (NCBI table 1)[18]
1st
base
2nd base 3rd
base
U C A G
U UUU (Phe/F) Phenylalanine UCU (Ser/S) Serine UAU (Tyr/Y) Tyrosine UGU (Cys/C) Cysteine U
UUC UCC UAC UGC C
UUA (Leu/L) Leucine UCA UAA Stop (Ochre)[B] UGA Stop (Opal)[B] A
UUG[A] UCG UAG Stop (Amber)[B] UGG (Trp/W) Tryptophan G
C CUU CCU (Pro/P) Proline CAU (His/H) Histidine CGU (Arg/R) Arginine U
CUC CCC CAC CGC C
CUA CCA CAA (Gln/Q) Glutamine CGA A
CUG CCG CAG CGG G
A AUU (Ile/I) Isoleucine ACU (Thr/T) Threonine AAU (Asn/N) Asparagine AGU (Ser/S) Serine U
AUC ACC AAC AGC C
AUA ACA AAA (Lys/K) Lysine AGA (Arg/R) Arginine A
AUG[A] (Met/M) Methionine ACG AAG AGG G
G GUU (Val/V) Valine GCU (Ala/A) Alanine GAU (Asp/D) Aspartic acid GGU (Gly/G) Glycine U
GUC GCC GAC GGC C
GUA GCA GAA (Glu/E) Glutamic acid GGA A
GUG[A] GCG GAG GGG G
A Possible start codons in NCBI table 1. AUG is most common.[19] The two other start codons listed by table 1 (GUG and UUG) are rare in eukaryotes.[20] Prokaryotes have less strigent start codon requirements; they are described by NCBI table 11.
B ^ ^ ^ The historical basis for designating the stop codons as amber, ochre and opal is described in an autobiography by Sydney Brenner[21] and in a historical article by Bob Edgar.[22]

Non-methionine start codons

[edit]

Natural

[edit]

Translation started by an internal ribosome entry site (IRES), which bypasses a number of regular eukaryotic initiation systems, can have a non-methinone start with GCU or CAA codons.[23]

Mammalian cells can initiate translation with leucine using a specific leucyl-tRNA that decodes the codon CUG. This mechanism is independent of eIF2. No secondary structure similar to that of an IRES is needed. It proceeds by ribosomal scanning, and a Kozak context enhances initiation efficiency.[24][25][26]

Engineered start codons

[edit]

Engineered initiator tRNA (tRNAfMet
CUA
, changed from a MetY tRNAfMet
CAU
) have been used to initiate translation at the amber stop codon UAG in E. coli. Initiation with this tRNA not only inserts the traditional formylmethionine, but also formylglutamine, as glutamyl-tRNA synthase also recognizes the new tRNA.[27] (Recall from above that the bacterial translation initiation system does not specifically check for methionine, only the formyl modification).[1] One study has shown that the amber initiator tRNA does not initiate translation to any measurable degree from genomically-encoded UAG codons, only plasmid-borne reporters with strong upstream Shine-Dalgarno sites.[28]

See also

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
The start codon is a specific sequence of three nucleotides in messenger RNA (mRNA) that marks the point at which translation—the process of synthesizing a protein—begins by directing the ribosome to assemble the first amino acid. In both prokaryotes and eukaryotes, the most common start codon is AUG, which encodes the amino acid methionine in eukaryotes and N-formylmethionine in prokaryotes, serving dual roles as both an initiation signal and the first codon in the genetic code.[1][2] This codon is recognized by initiator transfer RNA (tRNA), which binds to the ribosome's P-site to kickstart polypeptide chain elongation.[3] While AUG predominates, alternative start codons exist and can expand the proteome's diversity, particularly under specific cellular conditions or in certain organisms. In prokaryotes, GUG and UUG can also function as start codons, often leading to the incorporation of formylmethionine, though with lower efficiency than AUG.[2] In eukaryotes, non-AUG codons such as CUG, GUG, and UUG are used in a subset of mRNAs, sometimes resulting in proteins with non-methionine N-termini, and their selection is influenced by the surrounding nucleotide context known as the Kozak sequence.[3] A 2017 study indicated that at least 47 of the 64 possible triplet codons may initiate translation in bacteria, challenging traditional views and highlighting the flexibility of start codon recognition.[4] The accuracy of start codon selection is critical for proper gene expression, as errors can lead to out-of-frame translation or truncated proteins, potentially causing cellular dysfunction or disease. In eukaryotes, ribosomal scanning from the mRNA's 5' cap ensures the first suitable AUG is chosen, modulated by initiation factors like eIF2 and eIF1.[5] Non-canonical start codons, while less efficient, play roles in regulating translation during stress, development, or in mitochondrial and viral genomes, underscoring their biological significance beyond canonical initiation.[3]

Overview

Definition and Function

A start codon is a sequence of three nucleotides, or trinucleotide, in messenger RNA (mRNA) that specifies the initiation site for protein translation by the ribosome.[6] In the standard genetic code, the primary start codon is AUG, which codes for the amino acid methionine but serves a distinct role in signaling the beginning of translation.[6] The primary function of the start codon is to recruit the initiator transfer RNA (tRNA), which carries N-formylmethionine in prokaryotes or unmodified methionine in eukaryotes, to the ribosome-mRNA complex.[6] This recruitment facilitates the assembly of the ribosomal initiation complex, including the binding of the small ribosomal subunit to mRNA and subsequent joining of the large subunit, thereby establishing the correct reading frame for decoding subsequent codons.[6] By defining the start point, the start codon ensures the synthesis of the polypeptide chain proceeds accurately from the N-terminus to the C-terminus, preventing misinterpretation of the genetic message.[6] Start codons exhibit near-universal conservation across all domains of life, underscoring the shared evolutionary ancestry of the genetic code.[7] This universality, with AUG as the predominant initiator in most organisms, reflects the code's ancient origins, though rare exceptions occur in specialized systems such as mitochondria.[6] The absence or mutation of a start codon typically prevents translation initiation, resulting in no protein production or the use of an alternative downstream start site, which often yields truncated or non-functional proteins.[6] Such alterations can also induce frameshift errors if translation begins out of frame, leading to aberrant polypeptides with incorrect amino acid sequences and potential loss of biological activity.[6]

Context in the Genetic Code

The standard genetic code consists of 64 possible triplets (codons) formed from the four nucleotide bases adenine (A), cytosine (C), guanine (G), and uracil (U) in messenger RNA (mRNA), which specify the 20 standard amino acids and three stop signals during protein translation.[8] This code is nearly universal across all domains of life, with the codon AUG universally assigned to the amino acid methionine (Met) in both internal positions and as the primary initiation signal.[9] The following table summarizes the standard genetic code, organized by the first two bases of each codon (third base degeneracy is indicated in the rows):
First baseUCAGThird base
UUUU (Phe)
UUC (Phe)
UUA (Leu)
UUG (Leu)
UCU (Ser)
UCC (Ser)
UCA (Ser)
UCG (Ser)
UAU (Tyr)
UAC (Tyr)
UAA (Stop)
UAG (Stop)
UGU (Cys)
UGC (Cys)
UGA (Stop)
UGG (Trp)
U
C
A
G
CCUU (Leu)
CUC (Leu)
CUA (Leu)
CUG (Leu)
CCU (Pro)
CCC (Pro)
CCA (Pro)
CCG (Pro)
CAU (His)
CAC (His)
CAA (Gln)
CAG (Gln)
CGU (Arg)
CGC (Arg)
CGA (Arg)
CGG (Arg)
U
C
A
G
AAUU (Ile)
AUC (Ile)
AUA (Ile)
AUG (Met, Start)
ACU (Thr)
ACC (Thr)
ACA (Thr)
ACG (Thr)
AAU (Asn)
AAC (Asn)
AAA (Lys)
AAG (Lys)
AGU (Ser)
AGC (Ser)
AGA (Arg)
AGG (Arg)
U
C
A
G
GGUU (Val)
GUC (Val)
GUA (Val)
GUG (Val)
GCU (Ala)
GCC (Ala)
GCA (Ala)
GCG (Ala)
GAU (Asp)
GAC (Asp)
GAA (Glu)
GAG (Glu)
GGU (Gly)
GGC (Gly)
GGA (Gly)
GGG (Gly)
U
C
A
G
The codon AUG exhibits dual functionality within this code: it encodes methionine for incorporation at internal sites during elongation but primarily serves as the start codon to initiate translation, where it specifies N-formylmethionine (fMet) in prokaryotes or unmodified methionine in eukaryotes.[10] This distinction arises because the initiator form of methionine tRNA recognizes AUG in a specific ribosomal context that prioritizes initiation over routine amino acid addition.[10] The start codon AUG also establishes the reading frame for the entire mRNA sequence, defining the correct phase (offset of 0, +1, or +2 nucleotides) to ensure accurate grouping of subsequent codons into amino acids without frameshift errors.[11] Without this defined starting point, translation could produce non-functional polypeptides due to misaligned triplets. The role of AUG as the initiating triplet was elucidated in the 1960s through cell-free translation experiments, notably by Marshall Nirenberg and Philip Leder, who used synthetic RNA triplets and ribosome-binding assays to identify AUG as the codon that promotes the binding of methionyl-tRNA and initiates polypeptide synthesis.[12]

Recognition and Initiation

Decoding by the Ribosome

In prokaryotes, the ribosome recognizes the start codon through base-pairing between a purine-rich Shine-Dalgarno (SD) sequence, typically AGGAGG, located 4–12 nucleotides upstream of the AUG, and the anti-SD sequence (CCUCC) at the 3' end of the 16S rRNA in the 30S small ribosomal subunit.[13] This interaction positions the start codon precisely in the ribosomal P site, facilitating the assembly of the 70S initiation complex.[14] In eukaryotes, recognition involves the Kozak consensus sequence, such as GCCAUGG, surrounding the AUG start codon, which optimizes binding and enhances selection by the 40S small ribosomal subunit during the scanning process from the mRNA 5' cap.[15] The initiator tRNA briefly pairs with the start codon during this decoding step. The start codon is decoded directly in the peptidyl (P) site of the ribosome, unlike during elongation where incoming codons occupy the aminoacyl (A) site, ensuring the initiator tRNA is positioned for the first peptide bond. Fidelity of start codon selection is maintained by proofreading mechanisms involving GTP hydrolysis in initiation factors: IF2 in prokaryotes and eIF2 in eukaryotes, which hydrolyze GTP upon correct codon-anticodon pairing to commit the ribosome to initiation and reject mismatches, significantly enhancing the fidelity of start codon selection.[16][17]

Initiator tRNA and Methionine

The initiator transfer RNA (tRNAiMet) is a specialized tRNA that recognizes the start codon AUG through its anticodon sequence CAU, enabling the delivery of methionine to initiate protein synthesis.[18] Unlike elongator tRNAs, tRNAiMet exhibits unique structural features, including a conserved A1:U72 base pair at the aminoacyl acceptor stem and three consecutive G:C base pairs in the anticodon stem (positions 29–31 paired with 41–39), which collectively promote direct binding to the ribosomal P site and enhance fidelity in start codon selection.[18] In eukaryotes, additional post-transcriptional modifications, such as N6-threonylcarbamoyladenosine (t6A) at position 37 adjacent to the anticodon, stabilize codon-anticodon interactions and improve decoding efficiency at the AUG start site.[18] In bacteria, the methionine carried by tRNAiMet is modified to N-formylmethionine (fMet) after charging with methionine, a process catalyzed by the enzyme methionyl-tRNA formyltransferase (encoded by the fmt gene) using 10-formyltetrahydrofolate as the formyl donor.[19] This formylation occurs specifically on the initiator tRNAiMet and not on the elongator tRNAMet, due to structural determinants in the tRNA that allow selective recognition by the formyltransferase, thereby committing the amino acid exclusively to initiation.[20] In eukaryotes, the methionine remains unmodified, as cytosolic formyltransferase activity is absent, though fMet is utilized in mitochondrial and chloroplast translation, reflecting their bacterial ancestry.[19] The N-terminal fMet in prokaryotic proteins is frequently removed post-translationally by methionine aminopeptidases, exposing the penultimate residue for further processing or degradation signals.[19] Both initiator and elongator tRNAMet are charged with methionine by the same methionyl-tRNA synthetase (MetRS), which recognizes conserved identity elements like the anticodon CAU and acceptor stem sequences without distinguishing between the two tRNAs during aminoacylation.[20] The specificity for initiation arises downstream: in prokaryotes, formylation by Fmt ensures fMet-tRNAiMet is directed to initiation factors like IF2, while in eukaryotes, unmodified Met-tRNAiMet preferentially binds eIF2-GTP due to unique structural motifs, such as the A1:U72 pair and T-loop features (e.g., A54:U55).[18] Key distinctions between initiator and elongator tRNAs prevent the former from participating in elongation. In eukaryotes, fungal and plant tRNAiMet often feature a 2'-O-ribosyl phosphate modification at adenosine 64 (A64) in the TψC loop, which sterically hinders binding to the elongation factor eEF1A and favors P-site accommodation.[21] Mammalian tRNAiMet lacks this A64 modification but relies on base-pairing differences (e.g., U50:A64 instead of the elongator's G:C) and reduced affinity for eEF1A to exclude elongation participation.[18] In bacteria, tRNAifMet shows lower affinity for elongation factor Tu (EF-Tu) compared to elongator tRNAMet, attributed to mismatches in the acceptor stem and T-loop, while exhibiting higher affinity for initiation factor IF2 to ensure preferential recruitment to the ribosome during start codon decoding.[22] These adaptations collectively ensure that initiator tRNA binds the ribosomal P site first, establishing the reading frame without competing in internal elongation cycles.[18]

Domain-Specific Variations

Bacterial Start Codons

In bacteria, the primary start codon is AUG, accounting for approximately 83% of protein-coding genes in model organisms such as Escherichia coli. This codon is recognized through the Shine-Dalgarno (SD) sequence, a purine-rich motif typically located 4–9 nucleotides upstream of the AUG, which base-pairs with a complementary anti-SD sequence at the 3' end of the 16S rRNA in the 30S ribosomal subunit. This interaction positions the ribosome precisely at the start site, facilitating efficient initiation complex formation.[17][23] Alternative start codons include GUG (approximately 14% of genes) and UUG (approximately 3% of genes), which are decoded by the same initiator tRNAfMet as AUG. The anticodon of tRNAfMet (5'-CAU-3') enables wobble pairing at the first codon position, allowing recognition of GUG and UUG while incorporating N-formylmethionine (fMet) as the initial amino acid in all cases. These non-AUG codons are typically associated with stronger or more conserved SD sequences, which compensate for reduced base-pairing stability with the initiator tRNA.[17][24][23] Non-AUG start codons are more frequently utilized in specific contexts, such as leaderless mRNAs lacking extended 5' untranslated regions or for internal initiation within polycistronic transcripts. Their translation efficiency is lower than that of AUG, with GUG and UUG supporting initiation at roughly 10–70% the rate of AUG depending on the SD context and assay conditions. Genes initiating with non-AUG codons often exhibit reduced expression levels compared to AUG-initiated genes.[23][17] In vitro translation assays using reporter constructs like GFP and nanoluciferase have demonstrated the functionality of GUG and UUG as start codons, producing full-length proteins with correct N-terminal fMet. These experiments confirm that non-AUG initiation occurs without altering the reading frame, as the initiator tRNA occupies the ribosomal P site and elongation proceeds from the defined start position.[17]

Eukaryotic Start Codons

In eukaryotic cytoplasmic translation, the primary start codon is AUG, which accounts for over 99% of annotated protein-coding open reading frames across diverse eukaryotes, including mammals, yeast, and plants. This codon is recognized by the initiator methionyl-tRNA during the scanning process, where the 43S pre-initiation complex, comprising the 40S ribosomal subunit, eukaryotic initiation factors, and Met-tRNAi^Met, binds near the 5' cap structure of the mRNA and migrates downstream in a 5'-to-3' direction to locate the first suitable AUG.[25][26] This cap-dependent scanning mechanism ensures efficient initiation at the optimal site, contrasting with the Shine-Dalgarno-mediated direct binding in bacteria.[25] The efficiency of AUG recognition is strongly influenced by the surrounding nucleotide context, known as the Kozak consensus sequence. The optimal motif in vertebrates is GCCGCCACCAUGG, where the purine (A or G) at position -3 relative to the A of AUG and guanine at +4 are particularly critical for ribosomal positioning and stable anticodon-codon pairing.[5] Mutations in these positions, especially at -3, can reduce translation initiation efficiency by >10-fold in mammalian cell systems, as demonstrated through site-directed mutagenesis experiments that quantified reporter gene expression.[5] Suboptimal contexts thus modulate protein synthesis levels, providing a layer of post-transcriptional regulation.[5] Although AUG predominates, rare non-canonical start codons such as CUG (coding for leucine), GUG, and UUG are utilized in eukaryotic cytoplasmic translation, comprising approximately 1-2% of initiation sites in organisms like yeast under specific conditions.[27] These alternatives, often with efficiencies 20-60% of AUG depending on context, occur in stress responses or for particular genes, such as the CUG-initiated isoform of the GRS1 tRNA synthetase in Saccharomyces cerevisiae during amino acid starvation.[27][28] Context-dependent selection of these sites enables the production of protein isoforms with distinct N-termini, as seen in oncogenes where alternative initiation at upstream CUG codons generates longer, more stable variants; for instance, the CUG-initiated c-Myc1 isoform in human cells promotes cell proliferation and is implicated in tumorigenesis.[27][29]

Archaeal Start Codons

In archaea, the primary start codon for translation initiation is AUG, which is utilized in the vast majority of cases, often exceeding 90% across analyzed genomes and transcripts.[30] This codon pairs with an initiator tRNA that is structurally similar to its eukaryotic counterpart, carrying an unmodified methionine residue without N-formylation, distinguishing it from the bacterial system.[30] The archaeal initiator tRNAMeti features a CAU anticodon and is charged by a methionyl-tRNA synthetase homologous to eukaryotic enzymes, ensuring precise recognition of AUG during initiation complex assembly.[31] Alternative start codons are employed infrequently in archaea, reflecting a partial similarity to bacterial mechanisms but with lower overall frequency. GUG and UUG serve as non-canonical initiators in a minority of transcripts, typically less than 10-20% depending on the species and mRNA type, and they also code for methionine when functioning as starts despite their standard leucine assignment.[30] In certain archaea, such as Sulfolobus solfataricus, GUG accounts for about 12% of leaderless mRNAs and 20% of leadered ones, while UUG is even rarer at 0-7%.[31] Additionally, AUA—normally decoding isoleucine—has been documented as a start codon in specific genes, such as the L12 ribosomal protein in S. solfataricus, where it initiates translation of a shortened protein isoform in natural contexts.[32] CUG usage remains exceptionally rare and is not a dominant alternative in archaeal systems.[23] Archaeal translation initiation exhibits a hybrid character, incorporating eukaryotic-like factors with prokaryotic ribosomal elements. The key GTPase aIF2, a homolog of eukaryotic eIF2, consists of α, β, and γ subunits and delivers the initiator tRNA to the P-site of the 30S ribosomal subunit in a manner akin to eukaryotes, promoting fidelity in start codon selection.[30] However, the ribosomes themselves resemble bacterial 70S particles, with prokaryotic-like rRNA and proteins that facilitate direct binding rather than cap-dependent recruitment.[31] Accessory factors like aIF1 and aIF1A further enhance specificity by monitoring the anticodon-codon interaction and preventing initiation at suboptimal sites. A distinctive feature of archaeal mRNAs is the prevalence of leaderless transcripts, which lack 5' untranslated regions and initiate directly at the start codon, mirroring bacterial strategies and comprising 69% of mRNAs in S. solfataricus and 72% in Haloferax volcanii.[30] These are recruited via Shine-Dalgarno (SD) sequences base-pairing with the 16S rRNA anti-SD, enabling efficient 30S subunit binding without scanning.[31] In contrast, some leadered mRNAs in certain archaeal lineages incorporate scanning elements, where the pre-initiation complex moves along the 5' UTR to locate the first suitable AUG, blending prokaryotic direct binding with eukaryotic scanning principles.[33] Efficiency studies in Sulfolobus species underscore AUG dominance; for instance, in cell-free systems, AUG-initiated leaderless mRNAs exhibit up to 88% usage and higher translational output compared to GUG alternatives, with SD motifs boosting overall initiation rates by stabilizing ribosome-mRNA interactions.[34]

Mitochondrial and Chloroplast Start Codons

In vertebrate mitochondria, the genetic code deviates from the standard code such that the codons AUA and AUG both encode methionine, with AUA serving as an alternative start codon in addition to AUG, which remains the primary initiator.[8] Furthermore, the codon AUU, which typically encodes isoleucine in internal positions, can also function as a start codon and is translated as methionine during initiation.[35] In human mitochondrial DNA, which encodes 13 proteins, four of these—ND1, ND2, ND3, and ND5—utilize non-AUG start codons (AUA for ND1, ND3, and ND5; AUU for ND2), representing approximately 31% of the protein-coding genes.[35] This expanded usage is facilitated by a single mitochondrial tRNA^Met (mt-tRNA^Met) with the anticodon CAU, modified at the wobble position with 5-formylcytosine (f^5C), enabling it to decode both AUG and AUA as methionine for both initiation and elongation.[36] Mitochondrial translation initiates with N-formylmethionine (fMet), similar to bacterial systems, where the charged mt-tRNA^Met is formylated by mitochondrial methionyl-tRNA formyltransferase before binding to the ribosomal P-site.[19] Unlike the bacterial fMet, which often remains N-terminal or is processed by deformylation and further cleavage, the mitochondrial fMet is typically deformylated post-translationally by peptide deformylases, resulting in an unmodified N-terminal methionine in the mature protein.[37] In human mitochondria, this process is exemplified by the mt-tRNA^Met, which is the sole tRNA for methionine and supports efficient initiation at these alternative codons despite the compact genome's high A+T bias.[38] Chloroplasts, like mitochondria, employ a genetic code closely resembling the bacterial standard, with AUG as the primary start codon encoding methionine, but they also utilize non-AUG alternatives such as GUG and UUG, which initiate translation with methionine at efficiencies of about 10-15% relative to AUG.[8] These non-AUG starts are observed in specific genes, such as the psbC gene in plants like tobacco and Chlamydomonas, where GUG serves as the functional initiator despite an upstream AUG that is not utilized.[39] Unlike vertebrate mitochondria, chloroplast codes do not reassign AUA to methionine; instead, AUA encodes isoleucine internally, and non-AUG initiation is limited to GUG and UUG without broader codon expansions.[40] Chloroplast translation also begins with fMet, charged to the initiator tRNA^fMet (with anticodon CAU), which is formylated similarly to its bacterial counterpart and derived from the cyanobacterial endosymbiont.[41] The use of expanded or alternative start codons in mitochondria and chloroplasts traces back to their endosymbiotic origins from free-living bacteria—alpha-proteobacteria for mitochondria and cyanobacteria for chloroplasts—where the ancestral bacterial machinery allowed flexible initiation at GUG and UUG.[42] Over evolutionary time, mitochondrial codes underwent further modifications, such as the reassignment of AUA to methionine, likely driven by genome compaction and tRNA reduction to a minimal set, enabling broader start codon recognition without additional isoacceptors.[43] In contrast, chloroplast codes retained more bacterial fidelity, with non-AUG starts providing regulatory layers for gene expression, as seen in plant cpDNA where about 5% of genes may initiate at non-AUG sites.[40] This bacterial ancestry is evident in the shared formylation of initiator methionine and the reliance on prokaryotic-like ribosomes for decoding.[19]

Non-Canonical Start Codons

Upstream Open Reading Frames

Upstream open reading frames (uORFs) are short sequences within the 5′ untranslated regions (UTRs) of mRNAs that begin with an AUG start codon and terminate at an in-frame stop codon, typically encoding peptides of 10 to 100 amino acids in length. Although uORFs utilize the canonical AUG start codon, they represent non-canonical initiation events due to their location in the 5' UTR and regulatory function.[44] These elements are prevalent in eukaryotic transcripts, with over 50% of human mRNAs containing at least one uORF.[45] In contrast, uORFs are rare in bacterial mRNAs but occur in leader peptides associated with riboswitches, where their translation influences downstream gene expression or mRNA stability.[46] uORFs primarily regulate translation of the main open reading frame (ORF) by impeding ribosomal scanning from the 5′ cap, leading to reduced initiation at the primary coding sequence.[47] Common mechanisms include ribosome stalling during uORF translation, which blocks access to the main ORF, or modulation of reinitiation, where ribosomes completing a uORF may reacquire initiation factors to translate downstream sequences under specific conditions.[47] A classic example of reinitiation control occurs in the GCN4 gene of yeast (Saccharomyces cerevisiae), where four uORFs in the 5′ UTR respond to amino acid starvation. Under nutrient-replete conditions, ribosomes translate the inhibitory uORFs 2–4, preventing main ORF initiation; during starvation, enhanced ternary complex availability allows ribosomes to bypass these after translating the permissive uORF1, promoting GCN4 translation and activation of the general amino acid control pathway.[48] In mammals, the ATF4 transcription factor exemplifies stress-responsive uORF regulation, with two uORFs in its 5′ leader: uORF1 permits efficient reinitiation, while uORF2 overlaps the main AUG and inhibits basal translation. Under integrated stress conditions like endoplasmic reticulum stress, phosphorylation of eIF2α delays reinitiation after uORF1, enabling ribosomes to skip uORF2 and initiate at the ATF4 ORF, thereby inducing adaptive responses such as antioxidant gene expression.[49]

Natural Non-AUG Starts

In various organisms, translation can initiate at non-AUG codons, allowing the production of proteins with N-terminal amino acids other than methionine in some cases, though often still incorporating formylmethionine in bacteria or methionine in eukaryotes via specialized mechanisms. In bacteria such as Escherichia coli, GUG (normally encoding valine) serves as a start codon for approximately 14% of genes, including the lacI repressor, but initiation occurs with N-formylmethionine delivered by the initiator tRNA^fMet through base-pairing at the first two positions of the codon. Similarly, UUG (normally leucine) initiates translation for about 3% of E. coli genes, again with fMet, as demonstrated by comprehensive measurements of initiation efficiencies across all 64 codons. These non-AUG starts in prokaryotes typically exhibit 20-50% of the efficiency of AUG, relying on wobble pairing with the initiator tRNA and strong Shine-Dalgarno sequences for recognition.[17] In eukaryotes, non-AUG initiation more frequently results in alternative N-terminal residues, particularly under stress conditions or in viral contexts, where elongator tRNAs may be recruited instead of the standard initiator tRNA^iMet. For instance, in human cells, the CUG codon (normally leucine) initiates translation of an N-terminally extended isoform of fibroblast growth factor 2 (FGF2), incorporating leucine at the start via eIF2A-mediated decoding, which promotes nuclear localization and is upregulated in cancer and stress responses. Another example is the MLV gag region, where an upstream CUG codon initiates a longer glycosylated Gag (gPr80gag) protein with leucine rather than methionine, contributing to viral protein diversity and immune evasion through altered MHC class I presentation. In viral genomes, the Sendai virus P/C mRNA uses an ACG codon (normally threonine) to initiate the C' protein, decoding as threonine and extending the protein by 11 residues compared to the C protein initiated at a downstream AUG, facilitating nested protein expression essential for viral replication. These eukaryotic and viral non-AUG starts generally show reduced efficiency, ranging from 10-70% relative to AUG, depending on Kozak context and cellular conditions, and often involve wobble pairing or alternative initiation factors like eIF2A during stress when eIF2 is phosphorylated.[50] Such non-canonical initiations expand the proteome, enabling regulatory isoforms in developmental genes, stress adaptation, and viral strategies, sometimes followed by post-translational cleavage to refine the N-terminus. For example, in mammalian heat shock responses, CUG initiation at the MRPL18 gene produces a leucine-started truncated isoform that enhances synthesis of heat shock proteins by altering ribosomal composition. Recent work shows non-canonical start codons confer context-dependent advantages in carbohydrate utilization for commensal E. coli in the murine gut.[51] Discovery of these events has been advanced by ribosome profiling, which maps ribosome-protected fragments to reveal non-AUG sites in endogenous transcripts, and site-directed mutagenesis studies confirming functional protein production from these codons.[52][3]

Engineered and Synthetic Starts

Engineered and synthetic start codons expand the genetic code beyond canonical AUG initiation, enabling the site-specific incorporation of non-natural amino acids (ncAAs) at protein N-termini for biotechnology applications. Reassignment techniques, such as amber suppression, repurpose the UAG stop codon as a start using orthogonal initiator tRNAs in genomically recoded Escherichia coli strains lacking TAG codons. An engineered amber initiator tRNACUAfMet decodes UAG with high orthogonality, minimizing off-target initiation at internal sites while achieving 20–60% efficiency relative to AUG, depending on context and strain optimization. Quadruplet codons, like AUGC, further enable code expansion through frameshift-capable tRNAs that read four bases as one unit, allowing ncAA insertion; these systems yield 1–3% efficiency compared to triplets but support multiplexed decoding for enhanced proteome diversity.[53] In E. coli, the pyrrolysyl-tRNA synthetase (PylRS)/tRNAPylCUA pair has been adapted to reassign UAG as a methionine-like start codon, charging the tRNA with ncAAs such as azido-lysine variants for bioorthogonal reactions. This orthogonal system, derived from archaeal origins and evolved for bacterial compatibility, facilitates N-terminal ncAA incorporation at efficiencies up to 50% in optimized recoded strains, avoiding competition from release factor 1.[54] Orthogonal initiator tRNAs, such as itRNATy2, extend this to encode multiple distinct ncAAs (e.g., p-azidophenylalanine or propargyl-lysine) at UAG starts in single proteins, confirmed via mass spectrometry and functional assays like fluorescence reporting. These synthetic starts support protein engineering by installing reactive handles, such as azides at the N-terminus for copper-free click chemistry, enabling conjugation to drugs, fluorophores, or polymers without disrupting folding. In directed evolution, they allow library diversification with ncAAs, yielding variants with improved stability or activity, as demonstrated in selections for enzyme enhancement. Overall efficiencies reach 30–80% with tuned expression of synthetases, tRNAs, and reduced release factor activity, though toxicity from mischarging limits scalability. Post-2020 advances include quadruplet-decoding tRNAs for simultaneous ncAA incorporation at multiple sites, including N-termini, in eukaryotic models like Caenorhabditis elegans. In mammalian cells, engineered initiator tRNAs initiate at non-AUG codons (e.g., CUG or GUG) with up to 70% efficiency, diversifying N-terminal residues for therapeutic protein production; these draw brief inspiration from natural non-AUG starts but rely on synthetic anticodon modifications. CRISPR-based editing integrates such orthogonal systems into mammalian genomes, enabling stable, heritable expression of ncAA-modified therapeutics like cytokines or antibodies with enhanced pharmacokinetics.[55][56][57]

References

User Avatar
No comments yet.