Hubbry Logo
Expanded genetic codeExpanded genetic codeMain
Open search
Expanded genetic code
Community hub
Expanded genetic code
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Expanded genetic code
Expanded genetic code
from Wikipedia
There must not be crosstalk between the new tRNA/synthase pair and the existing tRNA/synthase molecules, only with the ribosomes

An expanded genetic code is an artificially modified genetic code in which one or more specific codons have been re-allocated to encode an amino acid that is not among the 22 common naturally-encoded proteinogenic amino acids.[1]

The key prerequisites to expand the genetic code are:

Expanding the genetic code is an area of research of synthetic biology, an applied biological discipline whose goal is to engineer living systems for useful purposes. The genetic code expansion enriches the repertoire of useful tools available to science.

In May 2019, researchers, in a milestone effort, reported the creation of a new synthetic (possibly artificial) form of viable life, a variant of the bacteria Escherichia coli, by reducing the natural number of 64 codons in the bacterial genome to 61 codons (eliminating two out of the six codons coding for serine and one out of three stop codons) – of which 59 used to encode 20 amino acids.[2][3]

Introduction

[edit]

It is noteworthy that the genetic code for all organisms is basically the same, so that all living beings use the same 'genetic language'.[4] In general, the introduction of new functional unnatural amino acids into proteins of living cells breaks the universality of the genetic language, which ideally leads to alternative life forms.[5] Proteins are produced thanks to the translational system molecules, which decode the RNA messages into a string of amino acids. The translation of genetic information contained in messenger RNA (mRNA) into a protein is catalysed by ribosomes. Transfer RNAs (tRNA) are used as keys to decode the mRNA into its encoded polypeptide. The tRNA recognizes a specific three nucleotide codon in the mRNA with a complementary sequence called the anticodon on one of its loops. Each three-nucleotide codon is translated into one of twenty naturally occurring amino acids.[6] There is at least one tRNA for any codon, and sometimes multiple codons code for the same amino acid. Many tRNAs are compatible with several codons. An enzyme called an aminoacyl tRNA synthetase covalently attaches the amino acid to the appropriate tRNA.[7] Most cells have a different synthetase for each amino acid (20 or more synthetases). On the other hand, some bacteria have fewer than 20 aminoacyl tRNA synthetases, and introduce the "missing" amino acid(s) by modification of a structurally related amino acid by an aminotransferase enzyme.[8] A feature exploited in the expansion of the genetic code is the fact that the aminoacyl tRNA synthetase often does not recognize the anticodon, but another part of the tRNA, meaning that if the anticodon were to be mutated the encoding of that amino acid would change to a new codon. In the ribosome, the information in mRNA is translated into a specific amino acid when the mRNA codon matches with the complementary anticodon of a tRNA, and the attached amino acid is added onto a growing polypeptide chain. When it is released from the ribosome, the polypeptide chain folds into a functioning protein.[7]

In order to incorporate a novel amino acid into the genetic code several changes are required. First, for successful translation of a novel amino acid, the codon to which the novel amino acid is assigned cannot already code for one of the 20 natural amino acids. Usually a nonsense codon (stop codon) or a four-base codon are used.[6] Second, a novel pair of tRNA and aminoacyl tRNA synthetase are required, these are called the orthogonal set. The orthogonal set must not crosstalk with the endogenous tRNA and synthetase sets, while still being functionally compatible with the ribosome and other components of the translation apparatus. The active site of the synthetase is modified to accept only the novel amino acid. Most often, a library of mutant synthetases is screened for one which charges the tRNA with the desired amino acid. The synthetase is also modified to recognize only the orthogonal tRNA.[6] The tRNA synthetase pair is often engineered in other bacteria or eukaryotic cells.[9]

In this area of research, the 20 encoded proteinogenic amino acids are referred to as standard amino acids, or alternatively as natural or canonical amino acids, while the added amino acids are called non-standard amino acids (NSAAs), or unnatural amino acids (UAAs; term not used in papers dealing with natural non-proteinogenic amino acids, such as phosphoserine), or non-canonical amino acids.

Non-standard amino acids

[edit]
Tyrosine and some synthetic tyrosine variants used for protein labeling. Different variants of tyrosine have been synthesized and can be incorporated into proteins using an expanded genetic code. The variants depicted here are all used for chemical or photochemical linking. This means that the incorporated AA specifically reacts with either a particular chemical group (such as hydrazides, amines, azides, or thiols) or can be UV-activated to crosslink with other AAs.

The first element of the system is the amino acid that is added to the genetic code of a certain strain of organism.

Over 71 different NSAAs have been added to different strains of E. coli, yeast or mammalian cells.[10] Due to technical details (easier chemical synthesis of NSAAs, less crosstalk and easier evolution of the aminoacyl-tRNA synthase), the NSAAs are generally larger than standard amino acids and most often have a phenylalanine core but with a large variety of different substituents. These allow a large repertoire of new functions, such as labeling (see figure), as a fluorescent reporter (e.g. dansyl alanine)[11] or to produce translational proteins in E. coli with Eukaryotic post-translational modifications (e.g. phosphoserine, phosphothreonine, and phosphotyrosine).[10][12]

The founding work was reported by Rolf Furter, who singlehandedly used yeast tRNAPhe/PheRS pair to incorporate p-iodophenylalanine in E. coli.[13]

Unnatural amino acids incorporated into proteins include heavy atom-containing amino acids to facilitate certain x-ray crystallographic studies; amino acids with novel steric/packing and electronic properties; photocrosslinking amino acids which can be used to probe protein-protein interactions in vitro or in vivo; keto, acetylene, azide, and boronate-containing amino acids which can be used to selectively introduce a large number of biophysical probes, tags, and novel chemical functional groups into proteins in vitro or in vivo; redox active amino acids to probe and modulate electron transfer; photocaged and photoisomerizable amino acids to photoregulate biological processes; metal binding amino acids for catalysis and metal ion sensing; amino acids that contain fluorescent or infra-red active side chains to probe protein structure and dynamics; α-hydroxy acids and D-amino acids as probes of backbone conformation and hydrogen bonding interactions; and sulfated amino acids and mimetics of phosphorylated amino acids as probes of post-translational modifications.[14][15][16]

Availability of the non-standard amino acid requires that the organism either import it from the medium or biosynthesize it. In the first case, the unnatural amino acid is first synthesized chemically in its optically pure L-form.[17] It is then added to the growth medium of the cell.[10] A library of compounds is usually tested for use in incorporation of the new amino acid, but this is not always necessary, for example, various transport systems can handle unnatural amino acids with apolar side-chains. In the second case, a biosynthetic pathway needs to be engineered, for example, an E. coli strain that biosynthesizes a novel amino acid (p-aminophenylalanine) from basic carbon sources and includes it in its genetic code.[16][18][19] Another example: the production of phosphoserine, a natural metabolite, and consequently required alteration of its pathway flux to increase its production.[12]

Codon assignment

[edit]

Another element of the system is a codon to allocate to the new amino acid.[citation needed]

A major problem for the genetic code expansion is that there are no free codons. The genetic code has a non-random layout that shows tell-tale signs of various phases of primordial evolution, however, it has since frozen into place and is near-universally conserved.[20] Nevertheless, some codons are rarer than others. In fact, in E. coli (and all organisms) the codon usage is not equal, but presents several rare codons (see table), the rarest being the amber stop codon (UAG).

Amber codon suppression

[edit]

The possibility of reassigning codons was realized by Normanly et al. in 1990, when a viable mutant strain of E. coli read through the UAG ("amber") stop codon.[22] This was possible thanks to the rarity of this codon and the fact that release factor 1 alone makes the amber codon terminate translation. Later, in the Schultz lab, the tRNATyr/tyrosyl-tRNA synthetase (TyrRS) from Methanococcus jannaschii, an archaebacterium,[6] was used to introduce a tyrosine instead of STOP, the default value of the amber codon.[23] This was possible because of the differences between the endogenous bacterial syntheses and the orthologous archaeal synthase, which do not recognize each other. Subsequently, the group evolved the orthologonal tRNA/synthase pair to utilize the non-standard amino acid O-methyltyrosine.[6] This was followed by the larger naphthylalanine[24] and the photocrosslinking benzoylphenylalanine,[25] which proved the potential utility of the system.

The amber codon is the least used codon in Escherichia coli, but hijacking it results in a substantial loss of fitness. One study, in fact, found that there were at least 83 peptides majorly affected by the readthrough.[26] Additionally, the labelling was incomplete. As a consequence, several strains have been made to reduce the fitness cost, including the removal of all amber codons from the genome. In most E. coli K-12 strains (viz. Escherichia coli (molecular biology) for strain pedigrees) there are 314 UAG stop codons. Consequently, a gargantuan amount of work has gone into the replacement of these. One approach pioneered by the group of Prof. George Church from Harvard, was dubbed MAGE in CAGE: this relied on a multiplex transformation and subsequent strain recombination to remove all UAG codons—the latter part presented a halting point in a first paper,[27] but was overcome. This resulted in the E. coli strain C321.ΔA, which lacks all UAG codons and RF1.[28] This allowed an experiment to be done with this strain to make it "addicted" to the amino acid biphenylalanine by evolving several key enzymes to require it structurally, therefore putting its expanded genetic code under positive selection.[29]

Rare sense codon reassignment

[edit]

In addition to the amber codon, rare sense codons have also been considered for use. The AGG codon codes for arginine, but a strain has been successfully modified to make it code for 6-N-allyloxycarbonyl-lysine.[30] Another candidate is the AUA codon, which is unusual in that its respective tRNA has to differentiate against AUG that codes for methionine (primordially, isoleucine, hence its location). In order to do this, the AUA tRNA has a special base, lysidine. The deletion of the synthase (tilS) was possible thanks to the replacement of the native tRNA with that of Mycoplasma mobile (no lysidine). The reduced fitness is a first step towards pressuring the strain to lose all instances of AUA, allowing it to be used for genetic code expansion.[31]

E. coli strain Syn61 is a variant where all uses of TCG (Ser), TCA (Ser), TAG (STOP) codons are eliminated using a synthetic genome (see § Recoded synthetic genome below) containing 18,214 replacements. By removing the unneeded tRNA genes and RF1, strain Syn61Δ3 was produced. The three freed codons then become available for adding three special residues, as demonstrated in strain "Syn61Δ3(ev4)".[32] A newer strain Syn57 frees up 7 codons (101,553 replacements) and is expected to allow more special residues to be added.[33]

Four base (quadruplet) codons

[edit]

While triplet codons are the basis of the genetic code in nature, programmed +1 frameshift is a natural process that allows the use of a four-nucleotide sequence (quadruplet codon) to encode an amino acid.[34] Recent developments in genetic code engineering also showed that quadruplet codon could be used to encode non-standard amino acids under experimental conditions.[35][36][37] This allowed the simultaneous usage of two unnatural amino acids, p-azidophenylalanine (pAzF) and N6-[(2-propynyloxy)carbonyl]lysine (CAK), which cross-link with each other by Huisgen cycloaddition.[38] Quadrupled decoding in wild-type, non-recoded strains is very inefficient.[38] This stems from the fact that the interaction between engineered tRNAs with ternary complexes or other translation components is not as favorable and strong as with cell endogenous translation elements.[39] This problem can be overcome by specifically engineering and evolving tRNA that can decode quadruplet codons in non-recoded strains.[40] Up to 4 different quadruplet orthogonal tRNA/tRNA synthethase pairs can be generated in this manner.[41] Quadruplet codon decoding approach has also been applied to the construction of an HIV-1 vaccine.[42]

tRNA/synthetase pair

[edit]

Another key element is the tRNA/synthetase pair.

The orthologous set of synthetase and tRNA can be mutated and screened through directed evolution to charge the tRNA with a different, even novel, amino acid. Mutations to the plasmid containing the pair can be introduced by error-prone PCR or through degenerate primers for the synthetase's active site. Selection involves multiple rounds of a two-step process, where the plasmid is transferred into cells expressing chloramphenicol acetyl transferase with a premature amber codon. In the presence of toxic chloramphenicol and the non-natural amino acid, the surviving cells will have overridden the amber codon using the orthogonal tRNA aminoacylated with either the standard amino acids or the non-natural one. To remove the former, the plasmid is inserted into cells with a barnase gene (toxic) with a premature amber codon but without the non-natural amino acid, removing all the orthogonal syntheses that do not specifically recognize the non-natural amino acid.[6] In addition to the recoding of the tRNA to a different codon, they can be mutated to recognize a four-base codon, allowing additional free coding options.[43] The non-natural amino acid, as a result, introduces diverse physicochemical and biological properties in order to be used as a tool to explore protein structure and function or to create novel or enhanced protein for practical purposes.

Several methods for selecting the synthetase that accepts only the non-natural amino acid have been developed. One of which is by using a combination of positive and negative selection

Orthogonal sets in model organisms

[edit]

The orthogonal pairs of synthetase and tRNA that work for one organism may not work for another, as the synthetase may mis-aminoacylate endogenous tRNAs or the tRNA be mis-aminoacylated itself by an endogenous synthetase. As a result, the sets created to date differ between organisms.

Pair Source E. coli Yeast Mammals Notes and references
tRNATyr-TyrRS Methanococcus jannaschii Yes No No
tRNALys–LysRS Pyrococcus horikoshii Yes No No [44]
tRNAGlu–GluRS Pyrococcus horikoshii Yes No No [45]
tRNALeu–LeuRS tRNA: mutant Halobacterium sp.
RS: Methanobacterium thermoautotrophicum
Yes No No [46]
tRNAAmber-PylRS Methanosarcina barkeri and Methanosarcina mazei Yes Yes Yes [47]
tRNAAmber-3-iodotyrosyl-RS RS: variant Methanocaldococcus jannaschii aaRS Yes No No [48]
tRNATyr/Amber-TyrRS Escherichia coli No Yes No Reported in 2003,[49] mentioned in 2014 LeuRS[50]
tRNAiMet-GlnRS tRNA: human
RS: Escherichia coli
No Yes No Switched to Amber codon.[51]
tRNAifMet-TyrRS tRNA: Escherichia coli
RS: S. cerevisiae
Yes Yes No Switched to Amber codon.[51]
tRNALeu/Amber-LeuRS Escherichia coli No Yes Yes Reported in 2004 and mutated for 2-Aminooctanoic acid, o-methyl tyrosine, and o-nitrobenzyl cysteine.[50] Evolved in yeast for 4,5-dimethoxy-2-nitrobenzyl serine,[52][53] tested in mice and mammalian cells with photosensitive 4,5-dimethoxy-2-nitrobenzyl-cysteine.[54][55]
tRNATyr-TyrRS Bacillus stearothermophilus No No Yes [9]
tRNATrp-TrpRS Bacillus subtilis, RS modified No No Yes New AA is 5-OH Trp.[56]

In 2017, a mouse engineered with an extended genetic code that can produce proteins with unnatural amino acids was reported.[57]

Orthogonal ribosomes

[edit]

Similarly to orthogonal tRNAs and aminoacyl tRNA synthetases (aaRSs), orthogonal ribosomes have been engineered to work in parallel to the natural ribosomes. Orthogonal ribosomes ideally use different mRNA transcripts than their natural counterparts and ultimately should draw on a separate pool of tRNA as well. This should alleviate some of the loss of fitness which currently still arises from techniques such as Amber codon suppression. Additionally, orthogonal ribosomes can be mutated and optimized for particular tasks, like the recognition of quadruplet codons. Such an optimization is not possible, or highly disadvantageous for natural ribosomes.

o-Ribosome

[edit]

In 2005, three sets of ribosomes were published, which did not recognize natural mRNA, but instead translated a separate pool of orthogonal mRNA (o-mRNA).[58] This was achieved by changing the recognition sequence of the mRNA, the Shine-Dalgarno sequence, and the corresponding recognition sequence in the 16S rRNA of ribosomes, the so-called Anti-Shine-Dalgarno-Sequence. This way the base pairing, which is usually lost if either sequence is mutated, stays available. However the mutations in the 16S rRNA were not limited to the obviously base-pairing nucleotides of the classical Anti-Shine-Dalgarno sequence.

Ribo-X

[edit]

In 2007, the group of Jason W. Chin presented an orthogonal ribosome, which was optimized for Amber codon suppression.[59] The 16S rRNA was mutated in such a way that it bound the release factor RF1 less strongly than the natural ribosome does. This ribosome did not eliminate the problem of lowered cell fitness caused by suppressed stop codons in natural proteins. However through the improved specificity it raised the yields of correctly synthesized target protein significantly (from ~20% to >60% percent for one amber codon to be suppressed and from <1% to >20% for two amber codons).

Ribo-Q

[edit]

In 2010, the group of Jason W. Chin presented a further optimized version of the orthogonal ribosome. The Ribo-Q is a 16S rRNA optimized to recognize tRNAs, which have quadruplet anti-codons to recognize quadruplet codons, instead of the natural triplet codons.[38] With this approach the number of possible codons rises from 64 to 256. Even accounting for a variety of stop codons, more than 200 different amino acids could potentially be encoded this way.

Ribosome stapling

[edit]

The orthogonal ribosomes described above all focus on optimizing the 16S rRNA. Thus far, this optimized 16S rRNA was combined with natural large-subunits to form orthogonal ribosomes. If the 23S rRNA, the main RNA-component of the large ribosomal subunit, is to be optimized as well, it had to be assured, that there was no crosstalk in the assembly of orthogonal and natural ribosomes (see figure B). To ensure that optimized 23S rRNA would only form into ribosomes with the optimized 16S rRNA, the two rRNAs were combined into one transcript.[60] By inserting the sequence for the 23S rRNA into a loop-region of the 16S rRNA sequence, both subunits still adopt functioning folds. Since the two rRNAs are linked and thus in constant proximity, they preferably bind each other, not other free floating ribosomal subunits.[citation needed]

Engineered peptidyl transferase center

[edit]

In 2014, it was shown that by altering the peptidyl transferase center of the 23S rRNA, ribosomes could be created which draw on orthogonal pools of tRNA.[61] The 3' end of tRNAs is universally conserved to be CCA. The two cytidines base pair with two guanines the 23S rRNA to bind the tRNA to the ribosome. This interaction is required for translational fidelity. However, by co-mutating the binding nucleotides in such a way, that they can still base pair, the translational fidelity can be conserved. The 3'-end of the tRNA is mutated from CCA to CGA, while two cytidine nucleotides in the ribosomes A- and P-sites are mutated to guanidine. This leads to ribosomes which do not accept naturally occurring tRNAs as substrates and to tRNAs, which cannot be used as substrate by natural ribosomes.
To use such tRNAs effectively, they would have to be aminoacylated by specific, orthogonal aaRSs. Most naturally occurring aaRSs recognize the 3'-end of their corresponding tRNA.[62][63] aaRSs for these 3'-mutated tRNAs are not available yet. Thus far, this system has only been shown to work in an in-vitro translation setting where the aminoacylation of the orthogonal tRNA was achieved using so called "flexizymes". Pioneered by the laboratory of Hiroaki Suga at the University of Tokyo, flexizymes are ribozymes with tRNA-amino-aclylation activity.[64]

Applications

[edit]

With an expanded genetic code, the unnatural amino acid can be genetically directed to any chosen site in the protein of interest. The high efficiency and fidelity of this process allows a better control of the placement of the modification compared to modifying the protein post-translationally, which, in general, will target all amino acids of the same type, such as the thiol group of cysteine and the amino group of lysine.[65] Also, an expanded genetic code allows modifications to be carried out in vivo. The ability to site-specifically direct lab-synthesized chemical moieties into proteins allows many types of studies that would otherwise be extremely difficult, such as:

  • Probing protein structure and function: By using amino acids with slightly different size such as O-methyltyrosine or dansyl alanine instead of tyrosine, and by inserting genetically coded reporter moieties (color-changing and/or spin-active) into selected protein sites, chemical information about the protein's structure and function can be measured.
  • Probing the role of post-translational modifications in protein structure and function: By using amino acids that mimic post-translational modifications such as phosphoserine, biologically active protein can be obtained, and the site-specific nature of the amino acid incorporation can lead to information on how the position, density, and distribution of protein phosphorylation effect protein function.[66][67][68][69]
  • Identifying and regulating protein activity: By using photocaged aminoacids, protein function can be "switched" on or off by illuminating the organism.
  • Changing the mode of action of a protein: One can start with the gene for a protein that binds a certain sequence of DNA and, by inserting a chemically active amino acid into the binding site, convert it to a protein that cuts the DNA rather than binding it.
  • Improving immunogenicity and overcoming self-tolerance: By replacing strategically chosen tyrosines with p-nitro phenylalanine, a tolerated self-protein can be made immunogenic.[70]
  • Selective destruction of selected cellular components: using an expanded genetic code, unnatural, destructive chemical moieties (sometimes called "chemical warheads") can be incorporated into proteins that target specific cellular components.[71]
  • Producing better protein: the evolution of T7 bacteriophages on a non-evolving E. coli strain that encoded 3-iodotyrosine on the amber codon, resulted in a population fitter than wild-type thanks to the presence of iodotyrosine in its proteome[72]
  • Probing protein localization and protein-protein interaction in bacteria.[73]

Future

[edit]

The expansion of the genetic code is still in its infancy. Current methodology uses only one non-standard amino acid at the time, whereas ideally multiple could be used. In fact, the group of Jason Chin has recently broken the record for a genetically recoded E. coli strain that can simultaneously incorporate up to 4 unnatural amino acids.[74] Moreover, there has been development in software that allows combination of orthogonal ribosomes and unnatural tRNA/RS pairs in order to improve protein yield and fidelity.[74]

Recoded synthetic genome

[edit]

One way to achieve the encoding of multiple unnatural amino acids is by synthesising a rewritten genome.[75] In 2010, at the cost of $40 million an organism, Mycoplasma laboratorium, was constructed that was controlled by a synthetic, but not recoded, genome.[76] The first genetically recoded organism was created by a collaboration between George Church's and Farren Isaacs' labs, when the wild type E. coli MG1655 was recoded in such a way that all 321 known UAG stop codons were substituted with synonymous UAA codons and release factor 1 was knocked out in order to eliminate the interaction with the exogenous stop codon and improve unnatural protein synthesis.[28] In 2019, Escherichia coli Syn61 was created, with a 4 megabase recoded genome consisting of only 61 codons instead of the natural 64.[3][2] In addition to the elimination of the usage of rare codons, the specificity of the system needs to be increased as many tRNA recognise several codons[75]

Expanded genetic alphabet

[edit]

Another approach is to expand the number of nucleobases to increase the coding capacity.

An unnatural base pair (UBP) is a designed subunit (or nucleobase) of DNA which is created in a laboratory and does not occur in nature. A demonstration of UBPs were achieved in vitro by Ichiro Hirao's group at RIKEN institute in Japan. In 2002, they developed an unnatural base pair between 2-amino-8-(2-thienyl)purine (s) and pyridine-2-one (y) that functions in vitro in transcription and translation for the site-specific incorporation of non-standard amino acids into proteins.[77] In 2006, they created 7-(2-thienyl)imidazo[4,5-b]pyridine (Ds) and pyrrole-2-carbaldehyde (Pa) as a third base pair for replication and transcription.[78] Afterward, Ds and 4-[3-(6-aminohexanamido)-1-propynyl]-2-nitropyrrole (Px) was discovered as a high fidelity pair in PCR amplification.[79][80] In 2013, they applied the Ds-Px pair to DNA aptamer generation by in vitro selection (SELEX) and demonstrated the genetic alphabet expansion significantly augment DNA aptamer affinities to target proteins.[81]

In 2012, a group of American scientists led by Floyd Romesberg, a chemical biologist at the Scripps Research Institute in San Diego, California, published that his team designed an unnatural base pair (UBP).[82] The two new artificial nucleotides or Unnatural Base Pair (UBP) were named "d5SICS" and "dNaM." More technically, these artificial nucleotides bearing hydrophobic nucleobases, feature two fused aromatic rings that form a (d5SICS–dNaM) complex or base pair in DNA.[83][84] In 2014 the same team from the Scripps Research Institute reported that they synthesized a stretch of circular DNA known as a plasmid containing natural T-A and C-G base pairs along with the best-performing UBP Romesberg's laboratory had designed, and inserted it into cells of the common bacterium E. coli that successfully replicated the unnatural base pairs through multiple generations.[85] This is the first known example of a living organism passing along an expanded genetic code to subsequent generations.[83][86] This was in part achieved by the addition of a supportive algal gene that expresses a nucleotide triphosphate transporter which efficiently imports the triphosphates of both d5SICSTP and dNaMTP into E. coli bacteria.[83] Then, the natural bacterial replication pathways use them to accurately replicate the plasmid containing d5SICS–dNaM.

The successful incorporation of a third base pair into a living micro-organism is a significant breakthrough toward the goal of greatly expanding the number of amino acids which can be encoded by DNA, thereby expanding the potential for living organisms to produce novel proteins.[85] The artificial strings of DNA do not encode for anything yet, but scientists speculate they could be designed to manufacture new proteins which could have industrial or pharmaceutical uses.[87]

In May 2014, researchers announced that they had successfully introduced two new artificial nucleotides into bacterial DNA, and by including individual artificial nucleotides in the culture media, were able to induce amplification of the plasmids containing the artificial nucleotides by a factor of 2 x 107 (24 doublings); they did not create mRNA or proteins able to use the artificial nucleotides.[83][88][89][90]

[edit]

Selective pressure incorporation (SPI) method for production of alloproteins

[edit]

There have been many studies that have produced protein with non-standard amino acids, but they do not alter the genetic code. These protein, called alloprotein, are made by incubating cells with an unnatural amino acid in the absence of a similar coded amino acid in order for the former to be incorporated into protein in place of the latter, for example L-2-aminohexanoic acid (Ahx) for methionine (Met).[91]

These studies rely on the natural promiscuous activity of the aminoacyl tRNA synthetase to add to its target tRNA an unnatural amino acid (i.e. analog) similar to the natural substrate, for example methionyl-tRNA synthase's mistaking isoleucine for methionine.[92] In protein crystallography, for example, the addition of selenomethionine to the media of a culture of a methionine-auxotrophic strain results in proteins containing selenomethionine as opposed to methionine (viz. Multi-wavelength anomalous dispersion for reason).[93] Another example is that L-photo-leucine and photomethionine are added instead of leucine and methionine to cross-label protein.[94] Similarly, some tellurium-tolerant fungi can incorporate tellurocysteine and telluromethionine into their protein instead of cysteine and methionine.[95] The objective of expanding the genetic code is more radical as it does not replace an amino acid, but it adds one or more to the code. On the other hand, proteome-wide replacements are most efficiently performed by global amino acid substitutions. For example, global proteome-wide substitutions of natural amino acids with fluorinated analogs have been attempted in E. coli[96] and B. subtilis.[97] A complete tryptophan substitution with thienopyrrole-alanine in response to 20899 UGG codons in E. coli was reported in 2015 by Budisa and Söll.[98] Moreover, many biological phenomena, such as protein folding and stability, are based on synergistic effects at many positions in the protein sequence.[99]

In this context, the SPI method generates recombinant protein variants or alloproteins directly by substitution of natural amino acids with unnatural counterparts.[100] An amino acid auxotrophic expression host is supplemented with an amino acid analog during target protein expression.[101] This approach avoids the pitfalls of suppression-based methods[102] and it is superior to it in terms of efficiency, reproducibility and an extremely simple experimental setup.[103] Numerous studies demonstrated how global substitution of canonical amino acids with various isosteric analogs caused minimal structural perturbations but dramatic changes in thermodynamic,[104] folding,[105] aggregation[106] spectral properties[107][108] and enzymatic activity.[109]

in vitro synthesis

[edit]

The genetic code expansion described above is in vivo. An alternative is the change of coding in vitro translation experiments. This requires the depletion of all tRNAs and the selective reintroduction of certain aminoacylated-tRNAs, some chemically aminoacylated.[110]

Chemical synthesis

[edit]

There are several techniques to produce peptides chemically, generally it is by solid-phase protection chemistry. This means that any (protected) amino acid can be added into the nascent sequence.[citation needed]

In November 2017, a team from the Scripps Research Institute reported having constructed a semi-synthetic E. coli bacteria genome using six different nucleotides (versus four found in nature). The two extra 'letters' form a third, unnatural base pair. The resulting organisms were able to thrive and synthesize proteins using "unnatural amino acids".[111][112] The unnatural base pair used is dNaM–dTPT3.[112] This unnatural base pair has been demonstrated previously,[113][114] but this is the first report of transcription and translation of proteins using an unnatural base pair.

See also

[edit]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
The expanded genetic code refers to the extension of the canonical genetic code, which normally encodes standard using 64 triplets of (codons), by incorporating non-canonical or unnatural amino acids (ncAAs) into proteins at specific sites through engineered translational machinery. This is achieved primarily via orthogonal (aaRS)/tRNA pairs that recognize unique codons, such as the amber stop codon (UAG), without interfering with the host's native system, allowing site-specific insertion of ncAAs with novel chemical properties like , photocrosslinking, or bioorthogonal handles. Pioneered in the late 20th century, the concept builds on early demonstrations of tRNA-mediated amino acid reassignment in cell-free systems dating back to 1962, but significant in vivo breakthroughs occurred in the 1980s and 1990s through the work of Peter G. Schultz and colleagues at the Scripps Research Institute. In 1989, Schultz's group first incorporated ncAAs using amber suppression in vitro, followed by the development of orthogonal pairs derived from archaeal organisms, such as the tyrosyl-tRNA synthetase (TyrRS) from Methanocaldococcus jannaschii and the pyrrolysyl-tRNA synthetase (PylRS) from Methanosarcina mazei. By 2001, Wang et al. achieved the first genetic encoding of an ncAA (O-methyl-L-tyrosine) in living Escherichia coli, marking the birth of in vivo genetic code expansion and enabling the incorporation of over 13 distinct ncAAs by 2003. Subsequent innovations, including quadruplet codon decoding and directed evolution of synthetases via methods like phage-assisted continuous evolution (PACE), have expanded the repertoire to more than 400 ncAAs across diverse organisms, from bacteria to mammalian cells and even mice. Key methods for expansion include amber suppression, where the UAG codon is reassigned via suppressor tRNAs, and advanced strategies like frame-shift suppression with quadruplet codons or genome recoding to free up additional codons in engineered strains such as Syn61Δ3 E. coli. Orthogonal systems ensure fidelity, with PylRS variants particularly versatile for charging diverse ncAAs, including backbone-modified ones, as demonstrated in recent evolutions by the and Badran labs in 2024. In vitro approaches, such as cell-free translation systems (e.g., PURExpress), complement methods for rapid prototyping and incorporation of unstable ncAAs. Applications of the expanded genetic code span , , and therapeutics, enabling precise studies of , dynamics, and interactions through site-specific labeling or modifications. For instance, ncAAs like p-acetyl-L-phenylalanine facilitate for protein conjugation, while photocaged amino acids allow light-controlled protein activation. In , expanded codes have produced optimized biologics, including antibody-drug conjugates like ARX788 (in clinical trials NCT04829604), achieving yields up to 5 g/L in mammalian systems. leverages these tools for creating proteins with enhanced enzymatic activity or novel functions, such as multiplexed ncAA incorporation (up to four per protein) for complex post-translational mimicry. Despite progress, challenges persist, including limited codon availability in the universal code, reduced translation efficiency in higher eukaryotes, and the need for orthogonal pairs that evade cellular toxicity or immune responses. Ongoing research focuses on , ncAA biosynthesis pathways, and semi-synthetic organisms with unnatural base pairs to further broaden the code's scope.

Overview

Introduction

The expanded genetic code refers to the engineered modification of the canonical , which normally encodes 20 standard , to enable the site-specific incorporation of non-standard amino acids (nsAAs) into proteins during . This is achieved through the development of orthogonal translation components, such as engineered tRNA-aminoacyl tRNA synthetase (aaRS) pairs, that recognize unique codons and charge tRNAs with nsAAs without interfering with the host organism's native protein synthesis machinery. The approach allows for precise control over protein composition, introducing chemical functionalities not found in natural , such as photocrosslinkers, fluorescent probes, or bioorthogonal handles. A pivotal historical milestone occurred in 1989, when and colleagues demonstrated the first site-specific incorporation of an unnatural into a protein using an suppressor tRNA chemically acylated with a non-standard residue, such as p-fluorophenylalanine, in an E. coli system. This method laid the foundation for subsequent expansions, including the 2001 achievement of incorporating O-methyl-L-tyrosine in living E. coli via evolved orthogonal aaRS/tRNA pairs. These advances built on earlier suppressor tRNA technologies but shifted toward genetically encoded systems for broader applicability. The expansion of the has profound implications for , , and , enabling the creation of proteins with enhanced therapeutic properties, such as antibody-drug conjugates for targeted cancer treatment (e.g., ARX788 for HER2-positive ) and tools for studying protein dynamics . By facilitating the integration of over 200 distinct nsAAs in various organisms, including , , and mammals, this technology supports innovations in , biosensors, and while minimizing disruption to endogenous .

Non-Standard Amino Acids

Non-standard amino acids (nsAAs) encompass a diverse class of building blocks beyond the canonical 20 , broadly classified into naturally occurring rare variants and unnatural, synthetically engineered ones. Naturally occurring nsAAs include (Sec) and pyrrolysine (Pyl), which are incorporated into proteins in specific organisms despite not being part of the standard . , the 21st amino acid, features a selenol group in place of the thiol in and is found in enzymes across prokaryotes, eukaryotes, and , where it participates in reactions. Pyrrolysine, the 22nd, contains a pyrroline ring and is utilized in methanogenic and certain for . In contrast, unnatural nsAAs are chemically synthesized to introduce novel functionalities, such as p-acetyl-L-phenylalanine (pAcPhe), which bears a group enabling biorthogonal conjugation reactions for site-specific protein labeling without interfering with native . These nsAAs expand the chemical repertoire of proteins by imparting specialized properties that enhance structural, functional, or analytical capabilities. For instance, photocrosslinking is facilitated by nsAAs like p-benzoyl-L-phenylalanine (pBpa), which generates a reactive upon UV to form covalent bonds with nearby residues or biomolecules, aiding in the mapping of protein interactions. properties can be introduced via nsAAs such as p-azido-L-phenylalanine (pAzF), whose moiety allows post-incorporation attachment of fluorophores through , enabling real-time visualization of protein dynamics. Metal is another key attribute, exemplified by nsAAs with bipyridine or side chains that bind transition metals, supporting applications in or imaging. These properties allow precise modulation of protein behavior while maintaining compatibility with cellular machinery. Synthesis of nsAAs typically involves chemical routes tailored to their modified structures, with variants of the Strecker synthesis being particularly versatile for producing D-amino acids and other stereoisomers. The classical Strecker reaction condenses an aldehyde, ammonia, and cyanide to form α-amino nitriles, which are hydrolyzed to amino acids; adaptations incorporate chiral catalysts or modified precursors to yield enantiopure unnatural variants, such as fluorinated or azido-substituted D-phenylalanines. Additionally, nsAAs can serve as post-translational modification mimics, replicating effects like phosphorylation (e.g., via O-phosphoserine analogs) or ubiquitination through designed side chains that emulate native PTM chemistry without requiring enzymatic processing. To date, over 200 distinct nsAAs have been successfully incorporated into recombinant proteins, as cataloged in databases like iNClusive (over 466 as of 2023), demonstrating the breadth of this expansion. Representative examples include fluorinated amino acids like (2,3,4,5,6-pentafluorophenyl)glycine, which enhance protein thermal stability by strengthening hydrophobic cores through fluorine's electronegativity and low polarizability. Such nsAAs are briefly incorporated via codon suppression techniques to site-specifically diversify proteomes.

Codon Reassignment Methods

Amber Codon Suppression

Amber codon suppression utilizes the amber (UAG or TAG) to enable site-specific incorporation of non-standard (nsAAs) into proteins during . In this approach, an orthogonal (tRNA) with its anticodon mutated to CUA is engineered to recognize the UAG codon, while an orthogonal (aaRS) specifically charges this tRNA with the desired nsAA. This suppresses translation termination by competing with the release factor 1 (RF1), allowing the to insert the nsAA at the UAG site and continue polypeptide elongation. The of the tRNA/aaRS pair ensures minimal with endogenous components, preserving the fidelity of the standard elsewhere. The efficiency of amber suppression is influenced by several factors, primarily the competition between the suppressor tRNA and RF1 at UAG codons, which often results in truncated proteins or reduced yields. Suppression efficiency can be enhanced by overexpressing the orthogonal tRNA or optimizing the aaRS for better charging, but a significant improvement comes from using RF1-deficient strains, such as the genomically recoded Syn61ΔprfA, where amber suppression yields increase by up to 250% for multi-site incorporations. In these strains, the absence of RF1 eliminates termination at UAG, though RF2 still handles other stop codons, maintaining cell viability. Pioneering work in amber suppression began with the demonstration of site-specific incorporation of p-fluorophenylalanine into a protein using chemically aminoacylated suppressor tRNA in an translation system, marking the first use of this method for unnatural insertion. This proof-of-concept evolved into applications in the 1990s and 2000s, with the development of orthogonal pairs derived from archaeal or eukaryotic sources expressed in bacterial hosts. By the 2010s, amber suppression had enabled the incorporation of over 100 distinct nsAAs, including photocrosslinkers, fluorescent probes, and post-translationally modified residues, into recombinant proteins in various organisms. Despite its versatility, amber suppression faces limitations due to off-target suppression at the approximately 300 natural UAG stop codons in the E. coli , which can disrupt essential protein termination and reduce cellular fitness. This issue is particularly pronounced in RF1-deficient strains, where unintended may occur more frequently. To mitigate such effects and expand coding capacity, strategies like quadruplet codon decoding have been developed as alternatives or complements, allowing nsAA incorporation without relying on reassignment.

Rare Sense Codon Reassignment

Rare sense codon reassignment involves repurposing low-frequency sense codons, such as AGG or AGA for arginine in Escherichia coli, to encode non-standard amino acids (nsAAs) in organisms engineered to minimize interference with native translation. The strategy entails depleting the natural tRNAs that decode these rare codons through genome recoding—replacing all instances of the target codon with synonymous alternatives—and subsequently introducing an orthogonal tRNA/synthetase pair specific for the desired nsAA. This approach leverages the low usage of rare codons (e.g., AGG occurs only about 1,400 times in the E. coli genome, primarily in non-essential genes) to avoid disrupting proteome function while expanding the code. A seminal example is the reassignment of the AGG codon to incorporate the non-canonical L-homoarginine in E. coli. Researchers engineered a variant of the pyrrolysyl-tRNA synthetase (HarRS) to charge tRNAPylCCU with L-homoarginine, while eliminating AGG from 38 sites in 32 essential genes via synonymous substitutions. confirmed incorporation of L-homoarginine at AGG sites in model proteins, achieving detectable levels in the (approximately 0.4% relative abundance) under optimized conditions, demonstrating efficient suppression of the depleted codon. Similar strategies have been applied to other nsAAs, though efficiencies vary based on synthetase affinity and cellular context. This method offers key advantages over (UAG) suppression, as it avoids competition with release factor 1 (RF1), which can limit incorporation efficiency at s to 20-50% in standard strains. By targeting sense codons in recoded backgrounds, rare sense reassignment enables the simultaneous incorporation of multiple distinct nsAAs into a single protein using different reassigned codons, facilitating complex without the toxicity or yield penalties associated with readthrough. Orthogonal tRNA/synthetase pairs, such as those derived from , ensure specificity and minimal with . Despite these benefits, challenges persist, primarily the need for extensive genome-wide recoding to fully eliminate natural codon usage and prevent mistranslation or growth defects. For instance, the 2016 design and partial assembly of a recoded E. coli strain involved synonymous replacement of all 62,214 instances of seven codons (including rare sense codons for , , and serine, plus UAG) across the 4.3 Mb , reducing the code to 57 codons and enabling safer reassignment, though full assembly required iterative synthesis and validation of essential genes. This effort culminated in 2025 with the realization of a fully viable 57-codon E. coli strain, further enhancing the scalability of rare sense codon reassignment. Partial recoding, as in early AGG studies, limits scalability due to residual endogenous tRNA activity, and achieving high-fidelity incorporation (>90%) often demands further optimization of orthogonal components.

Quadruplet Codons

Quadruplet codons represent an approach to expand the by introducing four-base codon sequences that trigger a +1 , enabling the decoding of non-standard (nsAAs) without repurposing the standard 64 triplet codons. Engineered transfer RNAs (tRNAs) feature extended anticodon loops, typically seven or eight long, which pair with the quadruplet codon in the ribosomal A-site, facilitating a four-nucleotide translocation during . This mechanism operates through models such as the "yardstick" model, where the tRNA anticodon fully base-pairs with the four-base codon, or the "slippery" model involving partial slippage and realignment, with recent refinements suggesting a four-base interaction in the post-translocation. The development of quadruplet codon decoding began in the with in vitro demonstrations using synthetic mRNAs and mutant tRNAs to incorporate at four-base sites like AGGU or CGGG. applications emerged in the early , with seminal work by Moore and colleagues engineering a suppressor tRNA to decode an expanded codon (UAGA) in , achieving functional protein expression. Further advancements in the involved orthogonal tRNA-synthetase pairs, such as the pyrrolysyl-tRNA synthetase (PylRS)/tRNA^Pyl^ system, optimized for quadruplet anticodons like CUA G, which improved charging specificity for nsAAs. Efficiency, initially below 5% relative to triplet decoding, reached 20-50% by the mid- through tRNA body modifications and anticodon stem-loop optimizations that reduced ribosomal slippage. This strategy theoretically expands the codon repertoire to 256 possibilities (4^4), allowing simultaneous incorporation of multiple nsAAs at distinct sites without interference from the natural code, as demonstrated in proteins for studies and . Applications include multiplexed protein labeling for and the creation of logic-gated circuits in , where quadruplet codons control protein function in response to environmental cues. In therapeutic contexts, quadruplet decoding has enabled the production of antigens with nsAAs mimicking modifications, enhancing . Despite these advances, quadruplet decoding suffers from lower fidelity and efficiency compared to triplet suppression, primarily due to ribosomal slippage causing frameshift errors or competition with endogenous triplet tRNAs, often yielding only 1-25% of wild-type protein levels for multi-site incorporations. Cross-decoding of embedded triplets can further reduce orthogonality, limiting the number of usable quadruplet codons to around 20-30. Recent progress in eukaryotic systems, such as Caenorhabditis elegans, has achieved stable quadruplet frameshifting with 30-50% efficiency using hybrid PylRS/tRNA variants, enabling applications like photocaged nsAA incorporation for optogenetic control of neuronal proteins. These eukaryotic adaptations address prior prokaryotic biases and pave the way for broader in vivo expansion.

Orthogonal Translation Components

Engineered tRNA/Synthetase Pairs

The principle of orthogonality in engineered tRNA/aminoacyl-tRNA synthetase (aaRS) pairs relies on importing components from archaea or eukaryotes into bacterial hosts, where they do not interact with the endogenous translation machinery. A seminal example is the tyrosyl-tRNA synthetase (TyrRS) and its cognate tRNATyr from Methanocaldococcus jannaschii, which exhibit minimal cross-reactivity with Escherichia coli aaRSs and tRNAs due to differences in identity elements such as the anticodon and acceptor stem. This pair, first demonstrated in 2001, allows selective charging of non-canonical amino acids (ncAAs) without disrupting canonical protein synthesis. Engineering these pairs for ncAA specificity typically involves using positive and negative selection schemes . Libraries of aaRS mutants are generated through error-prone PCR or , then screened for activation of the target ncAA (positive selection, often via suppression of an amber codon in a reporter gene like chloramphenicol acetyltransferase) while eliminating charging of the 20 standard (negative selection, using a toxic reporter like barnase). A key illustration is the evolution of an M. jannaschii TyrRS variant specific for p-acetyl-L-phenylalanine (pAcPhe), achieved through iterative selection yielding mutants with altered active-site residues that accommodate the ketone group. This approach has been widely adopted, enabling high-fidelity charging where optimized pairs achieve >90% efficiency for the target ncAA and <1% misacylation of canonical . Prominent examples include variants derived from M. jannaschii TyrRS, which have been tailored for over 50 aromatic and phenylalanine derivatives. Another major class stems from the pyrrolysyl-tRNA synthetase (PylRS) and tRNAPyl pair from Methanosarcina species, naturally orthogonal in E. coli and engineered for pyrrolysine analogs like Nε-acetyl-L-lysine or cyclopropynylated lysine through mutations in the aaRS binding pocket. Recent evolutions of PylRS by the Chin and Badran groups in 2024 have further expanded its versatility to charge diverse ncAAs, including backbone-modified ones. By the 2020s, dozens of such orthogonal pairs had been developed across various aaRS scaffolds, including leucyl- and tryptophanyl-tRNA synthetases, facilitating diverse ncAA incorporations with robust orthogonality and minimal off-target effects.

Orthogonal Sets in Model Organisms

Orthogonal tRNA/aminoacyl-tRNA synthetase (aaRS) pairs, originally derived from archaea or bacteria, have been successfully implemented in bacterial model organisms such as Escherichia coli to enable the incorporation of non-canonical amino acids (ncAAs). A prominent example is the pyrrolysyl-tRNA synthetase (PylRS) and its cognate tRNAPylCUA from Methanosarcina mazei, which was imported into E. coli to genetically encode pyrrolysine and later engineered variants for diverse ncAAs like Nε-acetyl-L-lysine. This system demonstrates high orthogonality in bacteria, minimizing cross-reactivity with endogenous translation machinery, and has been used to produce proteins with up to 70% suppression efficiency at amber codons in optimized strains. However, overexpression of these exogenous components can lead to cellular toxicity due to interference with native aaRS activities or resource competition, which is often mitigated through inducible expression systems, such as arabinose-inducible promoters, to control levels during protein synthesis. In eukaryotic model organisms, adapting orthogonal sets presents unique challenges, including the nuclear export of engineered tRNAs and competition from host synthetases, which can reduce charging efficiency and lead to misacylation. In Saccharomyces cerevisiae, early efforts in the 2000s faced issues with tRNA nuclear retention and poor cytoplasmic functionality, but by 2010, optimized PylRS/tRNAPylCUA pairs achieved functional expression through modifications enhancing tRNA export signals and synthetase stability, enabling ncAA incorporation at amber sites with moderate yields. Similarly, in mammalian cells like HEK293, initial implementations in the early 2000s using archaeal or bacterial orthogonal pairs encountered low orthogonality due to host editing mechanisms and tRNA instability, but engineered variants improved fidelity by the mid-2000s, allowing site-specific ncAA insertion in proteins expressed via transient transfection. Comparisons across model organisms highlight differences in implementation efficiency, often attributed to variations in translation machinery and codon usage biases. In E. coli, orthogonal systems typically achieve 60-80% amber suppression efficiency, benefiting from streamlined bacterial ribosomes and minimal tRNA competition, whereas in HEK293 cells, efficiencies range from 20-50% due to eukaryotic regulatory layers and higher background suppression. Codon bias adjustments, such as optimizing tRNA anticodon loops to match host preferences or using codon-optimized genes, further enhance performance; for instance, in yeast, tailoring tRNA sequences to S. cerevisiae codon usage increased ncAA incorporation by 2-3 fold compared to unmodified prokaryotic tRNAs. As of 2024, advances have focused on stable genomic integration of orthogonal sets to enable long-term, heritable expression without reliance on plasmids, reducing toxicity and improving consistency. In E. coli, chromosomally integrated PylRS/tRNAPylCUA systems created auxotrophic strains with altered genetic codes, achieving sustained ncAA-dependent growth and up to 90% incorporation fidelity. In mammalian models, transgenic integration of engineered aaRS/tRNA pairs into genomes has supported efficient ncAA labeling , paving the way for applications in multicellular organisms.

Specialized Ribosome Systems

o-Ribosome

The o-ribosome, or orthogonal ribosome, represents a engineered ribosomal system designed to selectively translate orthogonal messenger RNAs (o-mRNAs) bearing altered Shine-Dalgarno (SD) sequences, thereby dedicating translation resources to the synthesis of proteins incorporating non-standard amino acids (nsAAs) without interfering with the host cell's proteome. This system achieves specificity through mutations in the anti-Shine-Dalgarno (ASD) sequence of the 16S ribosomal RNA (rRNA), which is the complement of the canonical SD sequence (AGGAGG) and located at positions 1535–1541 in Escherichia coli 16S rRNA. By altering the ASD (originally CCUCCU), the o-ribosome preferentially binds and initiates translation on o-mRNAs with complementary mutated SD sequences, such as those changed at one or more positions from the wild-type GGAGG, enabling up to 100-fold greater selectivity for o-mRNAs compared to natural mRNAs. The foundational design was developed in 2005 by Rackham and Chin using a dual positive-negative selection strategy in E. coli, where libraries of mutated 16S rRNA variants (targeting all seven nucleotides in the ASD region) were screened against corresponding o-mRNA libraries to identify orthogonal pairs that confer resistance (positive selection via reporter) while avoiding 5-fluorouracil toxicity (negative selection via ). This approach generated a network of at least four orthogonal ribosome-o-mRNA pairs that operate independently of the wild-type translation machinery, demonstrating the system's capacity for parallel, non-competing translation pathways. In 2007, the Chin laboratory further evolved the o-ribosome (termed ribo-Xm in the study) specifically for enhanced expansion, introducing the key A1191G in the 16S rRNA to improve (TAG) codon suppression efficiency when paired with orthogonal amber suppressor tRNAs. This evolution was achieved through directed selection for improved decoding of o-mRNAs containing codons, resulting in a ribosome that boosts nsAA incorporation yields: from approximately 20% to over 60% full-length protein for constructs with a single codon, and from less than 1% to over 20% for those with two codons. The A1191G is thought to reduce interactions with 1 at the codon, thereby favoring suppressor tRNA accommodation and enabling the production of proteins with multiple nsAAs. Subsequent variants of the o-ribosome have incorporated additional 16S rRNA mutations to refine initiation efficiency and orthogonality, such as combined ASD alterations with helix 44 modifications, allowing for more robust of recoded genes in model organisms like E. coli. These improvements maintain the core selectivity of the original design while supporting applications in multi-site nsAA incorporation for proteins up to hundreds of residues long.

Ribo-X

The Ribo-X system represents an advanced orthogonal engineered to boost the efficiency of non-standard amino acid (nsAA) incorporation in the expanded genetic code within . Introduced in 2007 by the laboratory (Wang et al.), it evolves from the foundational orthogonal ribosome by applying to select variants that preferentially translate orthogonal mRNAs containing (TAG) stop codons, enabling precise nsAA insertion without competing with the host machinery. This development addresses limitations in earlier systems, where low suppression efficiency led to truncated proteins and reduced yields. At its core, Ribo-X operates through mutations acquired during selection that diminish the ribosome's affinity for release factor 1 (RF1), the endogenous factor that terminates at codons. This alteration allows the orthogonal suppressor tRNA, charged with an nsAA by a paired orthogonal , to outcompete RF1 more effectively during decoding. Consequently, Ribo-X selectively engages orthogonal mRNAs—those with altered Shine-Dalgarno sequences that evade native ribosomes—while minimizing off-target . In vitro assays confirm Ribo-X's reduced RF1 binding, supporting its role in favoring productive suppression over termination. Ribo-X offers substantial advantages in performance, elevating nsAA incorporation yields to over 60% for proteins with a single codon and over 20% for those with two, compared to approximately 20% and less than 1% with unmodified orthogonal ribosomes. These gains stem from lower background activity on native mRNAs and higher fidelity in suppression, achieving overall efficiencies exceeding 80% in optimized E. coli strains when paired with refined components. The system's also mitigates from leaky expression, as it confines activity to designated mRNAs. In applications, Ribo-X facilitates tunable expression of nsAA-containing proteins, particularly those prone to toxicity when truncated or harboring reactive groups like azides or alkynes. By integrating with inducible promoters for rRNA expression, it enables temporal control of translation , allowing researchers to synchronize nsAA incorporation with cellular growth phases. This has proven valuable for producing modified superfolder GFP or variants with photocrosslinkers, advancing studies in protein dynamics, therapeutic design, and .

Ribo-Q

The Ribo-Q is an engineered orthogonal variant designed to facilitate the incorporation of non-standard (nsAAs) by efficiently decoding quadruplet codons. Developed through , it features specific mutations in the 16S rRNA that alter the decoding center to favor binding of quadruplet-decoding tRNAs, thereby reducing frameshifting errors and enhancing fidelity for expanded genetic codes. Introduced in 2010, this system builds on prior orthogonal designs by targeting residues in the A-site to improve selectivity for modified tRNAs. Key mutations in the 16S rRNA of Ribo-Q include A1196G, often combined with alterations at nearby positions such as C1195A or C1195T and A1197G, depending on the specific variant (e.g., Ribo-Q1 with A1196G and A1197G). These changes, identified via of libraries derived from earlier ribosome variants, preferentially accommodate extended anticodons of quadruplet tRNAs while maintaining compatibility with standard triplet decoding when needed. The design ensures orthogonality, allowing Ribo-Q to translate orthogonal mRNAs containing quadruplet codons without significant interference from host . In terms of functionality, Ribo-Q enhances frameshift suppression during of quadruplet codons, such as CGGG or CUA G, by stabilizing the interaction between the ribosome's A-site and quadruplet tRNAs. This selectivity arises from the reconfigured A-site geometry, which discriminates against standard triplet tRNAs and promotes accurate nsAA insertion at defined sites. When paired with orthogonal tRNA/synthetase systems, Ribo-Q enables the site-specific incorporation of multiple distinct nsAAs in a single protein, supporting applications in . Performance evaluations demonstrate that Ribo-Q achieves incorporation yields of 50-70% for nsAAs at quadruplet codons, a substantial improvement over unmodified orthogonal ribosomes, which often suffer from lower efficiency due to frameshifting. It is fully compatible with orthogonal synthetases, such as pyrrolysyl-tRNA synthetase variants, allowing seamless integration into existing expansion workflows. These yields were measured in E. coli expression systems using reporter proteins like superfolder . Ribo-Q evolved from the o-Ribosome, an earlier orthogonal system, through iterative selection to specifically support quadruplet codon decoding and broader expansion of the genetic code. This derivation involved introducing the A-site mutations into o-Ribosome scaffolds to optimize for four-base codon recognition, resulting in a more versatile platform for synthetic biology.

Ribosome Stapling

Ribosome stapling is a specialized engineering approach to generate orthogonal ribosomes by covalently linking their small (30S) and large (50S) subunits via an RNA staple, enabling selective and efficient translation of orthogonal mRNAs for expanded genetic code applications in Escherichia coli. Introduced in 2015, this method addresses limitations in subunit association for orthogonal systems by fusing a circularly permuted 23S rRNA into the 16S rRNA at helix 44 and helix 101, connected by a flexible RNA hinge derived from the J5/J5a region of the Tetrahymena group I intron. The resulting ~4500-nucleotide fused rRNA assembles into a functional ribosome that preferentially recognizes orthogonal Shine-Dalgarno (oSD) sequences on target mRNAs, isolating translation from endogenous ribosomes and facilitating the incorporation of non-standard amino acids (nsAAs) through amber suppression. To enhance control over assembly and purity, later implementations incorporate an MS2 stem-loop tag into the stapled orthogonal rRNA (o-rRNA), allowing affinity purification via binding to the MS2 coat protein. This tethering step increases the local concentration of stapled ribosomes relative to native ones, minimizing cross-assembly and enabling high-fidelity orthogonal translation. The purified stapled ribosomes are then used to translate o-mRNAs containing amber codons, where orthogonal tRNA/synthetase pairs deliver nsAAs, such as p-acetylphenylalanine, with reduced competition from release factor 1 (RF1). This setup supports localized translation by confining activity to engineered mRNAs, improving overall system orthogonality. The primary benefit of ribosome stapling lies in its ability to boost nsAA incorporation efficiency by enforcing subunit pairing and insulating the ribosome from cellular competitors, achieving translation activities comparable to non-stapled orthogonal ribosomes. For instance, the optimized O-d2d8 variant supports suppression and cellular growth at 30-40% of wild-type rates initially, with yielding variants that reach 81% growth efficiency while incorporating nsAAs at sites previously limited by RF1 . This enhancement is particularly valuable for challenging sequences, such as polyproline stretches, where stapled ribosomes enable elongation without auxiliary factors like EF-P, demonstrating up to 10-fold improvements in of non-canonical building blocks. Variants of the technique vary in RNA staple linker length to fine-tune subunit geometry and activity, with shorter staples like O-d0d0 promoting tight association but lower flexibility, and longer ones like O-d2d8 balancing assembly efficiency and translational fidelity. Evolved iterations, such as O-d2d8(5), further expand capabilities by incorporating β-amino acids or other non-proteinogenic monomers via amber suppression, supporting the synthesis of novel . While photocaged variants for spatiotemporal control have been explored in related systems, direct application to stapled ribosomes remains under development to enable light-activated . Despite these advantages, ribosome stapling faces limitations, including reduced initial rRNA expression (~25% of wild-type levels) due to folding challenges in the extended transcript and potential steric hindrance during subunit docking. In crowded cellular environments, the bulky fused structure may exacerbate assembly inefficiencies or interfere with formation, necessitating strain optimizations like RF1 depletion to maximize nsAA yields. These constraints highlight the need for further engineering to approach full wild-type performance .

Engineered Peptidyl Transferase Center

The peptidyl transferase center (PTC) of the bacterial , formed primarily by the 23S rRNA in the large subunit, catalyzes the formation of bonds between during protein synthesis. Engineering the PTC through targeted in this rRNA region enables the incorporation of non-standard amino acids (nsAAs), such as D-amino acids, by alleviating steric clashes and altering the geometry of the to accommodate atypical substrates. These modifications expand the beyond the standard L-, facilitating the synthesis of proteins with novel properties, though they often require orthogonal systems to minimize interference with native . Initial efforts to engineer the PTC emerged in the early , focusing on mutations to enhance compatibility with D-amino acids. Researchers overexpressed modified 23S rRNA from multicopy plasmids in , introducing alterations in the PTC loop (nucleotides 2447–2450) and 89 (nucleotides 2457–2462), such as substitutions converting 2447GAUA2450 to 2447UUAC2450 or 2447GAUAA2450 paired with changes in 89 like U2460G. These mutant ribosomes incorporated D-methionine and D-phenylalanine into model proteins like with efficiencies of 11–23% relative to L-amino acid counterparts, while retaining up to 45% suppression activity compared to wild-type levels. The mechanism involves local conformational changes in the PTC that reduce discrimination against the reversed of D-aminoacyl-tRNAs, allowing formation without major disruption to the catalytic core. Subsequent refinements, such as additional mutations in the same regions, improved D-amino acid tolerance in cell-free systems, yielding functional proteins with up to 50% activity retention. Further engineering has targeted PTC variants for other ncAAs, including β-amino acids, using selection methods like β-puromycin. For example, selected ribosomes enable incorporation of select β-amino acids, such as methyl-β-alanine isomers, at relative efficiencies up to ~20% in reporter proteins. Ongoing computational design and selections continue to refine PTC mutations (e.g., in helices H73 and H75) to improve general performance and support diverse nsAAs in orthogonal systems, though efficiencies for sterically hindered residues like α-aminoisobutyric acid (Aib) remain low (typically 3–15%). These designs support non-ribosomal chemistries, like altered bond angles, but carry risks of global impairment, including reduced cell fitness and off-target effects, necessitating selective expression of mutant rRNAs to limit dominant-negative impacts on endogenous ribosomes. These engineered PTCs continue to support ncAA incorporation in contemporary genetic code expansion workflows, as reviewed in 2024.

Applications

Biomedical and Therapeutic Uses

The expanded genetic code enables the site-specific incorporation of non-standard (nsAAs) into therapeutic proteins, enhancing their pharmacological properties for biomedical applications. By introducing chemically reactive nsAAs, such as p-acetylphenylalanine, researchers can perform precise modifications like , which extends protein and reduces compared to random conjugation methods. For instance, site-specific of cytokines has been used to improve ; similarly, PEGylated interleukin-2 (IL-2) engineered via genetic code expansion selectively activates regulatory T cells, showing promise in models by sustaining without broad immune activation. Photocaging groups incorporated as nsAAs allow spatiotemporal control of protein function, facilitating -activated therapeutics. These photocaged nsAAs, such as nitrobenzyl-lysine, block activity until or visible uncages them, enabling on-demand drug release or activation in targeted tissues. In therapeutic contexts, photocaged nanobodies have been generated using expansion to inhibit proteins upon illumination, offering potential for precise modulation in cancer or neurological treatments. For example, incorporation of photocaged into key residues permits optical control of protein signaling pathways, advancing light-responsive biologics. Clinical applications of expanded code-modified proteins include antibody-drug conjugates (ADCs) and fluorinated therapeutics. In the , ADCs with site-specifically incorporated nsAAs, such as p-acetylphenylalanine for payload attachment, entered clinical trials, improving homogeneity and over traditional cysteine-based conjugates; ARX788, an anti-HER2 ADC from Ambrx, has progressed to Phase III trials (as of 2025) for HER2-positive , demonstrating improved in pivotal studies. Fluorinated nsAAs, like 3-fluoro-tyrosine, enhance protein stability and metabolic resistance; recent developments in 2023 involved their use in engineered therapeutics to improve , though no direct FDA approvals for expanded code-derived fluorinated proteins were reported that year, building on prior preclinical successes. As of 2025, no therapeutics produced via expanded genetic code have been FDA-approved, though several are in late-stage clinical trials. In , the expanded code facilitates site-specific nsAA incorporation into capsids, enhancing targeting and reducing off-target effects. (AAV) vectors modified with azide-bearing nsAAs in proteins allow conjugation of targeting ligands, improving for specific cell types like neurons or hepatocytes. This approach has been applied to AAV2 variants, where nsAA insertion during production enables precise functionalization without disrupting assembly, advancing therapies for genetic disorders. A key challenge in translating nsAA-containing therapeutics to humans is potential , as noncanonical structures may elicit anti-drug antibodies. Studies indicate that certain nsAAs, particularly those with bulky or charged side chains, can alter T-cell presentation, potentially triggering immune responses, though conservative nsAAs mimicking natural ones show minimal reactivity. Strategies like selecting low-immunogenic nsAAs and preclinical screening in humanized models are essential to mitigate this risk.

Industrial and Synthetic Biology Uses

In industrial and , the expanded genetic code enables the incorporation of non-standard (nsAAs) into enzymes to enhance their stability and performance as biocatalysts. For instance, nsAAs such as fluorinated or cyclopropyl derivatives can rigidify protein structures, improving for processes requiring high temperatures, like production or . A notable example involves engineering a stereoselective biocatalyst with genetically encoded staples formed via nsAAs, which increased its melting temperature by up to 30°C while maintaining catalytic efficiency, allowing sustained activity under industrial conditions. Similarly, keratinase variants incorporating ncAAs like p-bromophenylalanine (pBpF) exhibited an 8.2-fold longer at 60°C, facilitating robust degradation of waste in processing. The use of nsAAs has also advanced the design of protein-based materials through and . Proteins engineered with - or alkyne-bearing nsAAs, such as p-azidophenylalanine, enable site-specific crosslinking via copper-free click reactions to form stable hydrogels with tunable mechanical properties for applications in scaffolds. In the , these approaches have contributed to development, where self-assembling peptides modified with nsAAs improve printability and in , yielding constructs with shear-thinning behavior suitable for extrusion-based fabrication of complex structures. Fluorescent nsAAs integrated via genetic code expansion have transformed biosensors for real-time industrial monitoring. For example, fluorogenic nsAAs inserted into protein scaffolds enable sensitive detection of metabolites in fermentation broths. This allows precise control of bioprocesses, optimizing yields in microbial production of chemicals or biofuels. Scale-up efforts leverage engineered Escherichia coli strains to achieve high-density fermentation for nsAA-containing proteins, with reported yields reaching hundreds of mg/L in optimized cultures. These advancements support industrial viability.

Future Directions

Recoded Synthetic Genomes

Recoded synthetic genomes represent a transformative approach to expanding the genetic code by systematically redesigning the DNA sequence of an entire organism through synonymous codon compression. This method involves replacing multiple synonymous codons with a reduced set, thereby eliminating or reassigning specific codons across the genome to create "blank" codons available for encoding non-standard amino acids (nsAAs) without interfering with native protein synthesis. Unlike targeted suppression techniques, this global recoding ensures that freed codons are absent from essential genes, enabling orthogonal tRNA/aminoacyl-tRNA synthetase pairs to incorporate nsAAs proteome-wide with minimal off-target effects. A seminal milestone in bacterial recoding was the development of the genomically recoded strain C321.ΔA, in which all 321 instances of the TAG () stop were replaced with the synonymous TAA stop , and the prfA gene encoding release factor 1 (RF1) was deleted. This freed the TAG codon for reassignment to nsAAs, allowing efficient amber suppression without RF1-mediated termination and demonstrating enhanced viral resistance due to disrupted . Building on this, advanced compression efforts have further reduced the codon repertoire; for instance, a 2025 study engineered an variant with a 57-codon by compressing four serine codons, two codons, and the stop into three serine, two , and one stop codon equivalents, enabling the reallocation of seven codons for potential nsAA encoding while maintaining viability through adaptive evolution. In eukaryotic systems, the 2020s Synthetic Yeast Genome Project (Sc2.0 and proposed Sc3.0) has advanced recoding by synonymously refactoring open reading frames to minimize codon diversity, with designs freeing multiple codons through systematic replacement of synonymous sets, paving the way for multi-nsAA incorporation in complex pathways. These efforts highlight the potential to reassign over 50 codons for nsAAs in highly compressed genomes, vastly expanding proteomic diversity. The primary benefits of recoded genomes include achieving global , where nsAAs can be encoded without suppression artifacts such as ribosomal stalling or premature termination in native transcripts, and improved through genetic isolation from natural organisms. This approach avoids the and inefficiency of partial suppression systems by ensuring freed codons are never encountered in the host genome, facilitating high-fidelity multi-site nsAA incorporation for applications like novel therapeutics. However, significant technical hurdles persist, including reduced cell fitness from altered codon bias, which disrupts speed, mRNA stability, and balance, often necessitating extensive adaptive laboratory to restore growth rates comparable to wild-type strains. Ethical concerns also arise regarding the environmental release of such organisms, given their potential to outcompete natural variants or evade standard antibiotics, alongside the immense engineering challenges of synthesizing and assembling multi-megabase recoded chromosomes without introducing deleterious mutations.

Expanded Genetic Alphabet

The expanded genetic alphabet refers to the incorporation of synthetic nucleobases into DNA and RNA, forming unnatural base pairs (UBPs) that function orthogonally to the natural A-T and G-C pairs, thereby increasing the informational density of the genetic code. This approach contrasts with codon reassignment by directly augmenting the nucleotide repertoire, allowing for a larger number of unique codons. Pioneering work by Floyd Romesberg's laboratory at The Scripps Research Institute developed the d5SICS-dNaM UBP, where d5SICS (6-methyl-8-(6'-N-benzamidomethyl-isoxazol-3'-yl)-2'-deoxyadenosine) and dNaM (2-fluoro-6-(2''-N-benzamidomethyl-pyridin-5'-yl)-2'-deoxyguanosine) rely on hydrophobic and shape-complementary interactions for selective pairing, rather than hydrogen bonding. This pair was first demonstrated to replicate efficiently in vitro using evolved polymerases, such as those derived from Taq DNA polymerase, achieving replication fidelities comparable to natural bases. In 2014, Romesberg and colleagues achieved a landmark by engineering an Escherichia coli strain harboring a plasmid with the d5SICS-dNaM UBP, marking the first semi-synthetic organism (SSO) with a stably expanded six-letter genetic alphabet in vivo, where the unnatural nucleotides were retained through multiple rounds of cell division with negligible loss. Integration of UBPs into functional genetic systems involves their transcription into containing unnatural bases, which can then form codons recognized by orthogonal tRNAs charged with non-standard (nsAAs). In a 2017 advancement, the same group engineered an SSO where an unnatural codon derived from the d5SICS-dNaM pair was transcribed and decoded by an orthogonal tRNA-synthetase pair, enabling site-specific incorporation of nsAAs like α-hydroxyl-naphthalene-1-acetic acid into expressed in E. coli. This process required orthogonal tRNAs modified to carry the unnatural anticodon and paired with an engineered specific for the nsAA, ensuring minimal with the natural code. Such systems expand the proteome's chemical diversity, allowing proteins with novel properties, such as enhanced or chemical reactivity. Progress in the 2020s has focused on scaling to an eight-letter , exemplified by the hachimoji system developed by Romesberg and colleagues, which incorporates two additional UBPs (P-Z and S-B, where P is 2-amino-8-(2-thienyl)- and Z is 6-amino-5-nitropyridin-2-one, paired with S as 6-methyl-8-(6-methylisoxazol-3-yl)- and B as 2-nitro-6-(2-thienyl)-). This system supports stable duplex formation, replication, and transcription , with potential for up to 512 unique codons (8³), vastly exceeding the natural 64. While full implementation of the eight-letter code in E. coli remains under development, partial successes include E. coli RNA polymerase-mediated transcription of six- and eight-letter templates with high selectivity (greater than 99.9% for unnatural base incorporation). With a six-letter , the theoretical expansion yields 216 total codons, providing 152 additional codons beyond the natural set for encoding new nsAAs or functions. Despite these advances, limitations persist, particularly in replication fidelity, which for UBPs like d5SICS-dNaM averages around 99% in vivo—slightly lower than the near-100% for natural pairs—leading to gradual erosion of unnatural nucleotides over generations without selective pressure. Toxicity from unnatural nucleoside triphosphates and the need for evolved polymerases or transporters to maintain cellular uptake further challenge scalability. Ongoing efforts aim to improve these metrics through computational design and directed evolution to enable robust, heritable expansion.

Selective Pressure Incorporation

Selective Pressure Incorporation (SPI) is a method for residue-specific incorporation of non-standard amino acids (nsAAs) into recombinant proteins expressed in vivo, relying on the natural mistranslation machinery of auxotrophic host cells. The protocol involves culturing auxotrophic bacterial strains, such as Escherichia coli mutants deficient in biosynthesis of a specific canonical amino acid (e.g., methionine or proline auxotrophs), in chemically defined minimal media that lacks the essential amino acid but is supplemented with a structurally similar nsAA analog. As the cells deplete any residual natural amino acid and experience amino acid starvation, the endogenous aminoacyl-tRNA synthetases mischarge cognate tRNAs with the nsAA, enabling its global incorporation at all codon positions corresponding to the targeted natural amino acid during protein translation, resulting in functional proteins if the analog maintains structural compatibility. This pressure-driven selection ensures survival and growth only through utilization of the provided nsAA, typically yielding recombinant proteins with high purity after purification. The technique traces its origins to the , with early demonstrations using auxotrophic strains to incorporate D-amino acids into proteins and study stereochemical effects on folding and function, as shown in model enzymes like beta-galactosidase. Over time, SPI has been extended to other nsAA analogs, including fluorinated or phosphorylated variants, with reported incorporation yields reaching up to 100% for simple, single-domain proteins under optimized conditions. As of 2025, advances in SPI include residue-specific ncAA exchange in auxotrophic hosts for global engineering and antimicrobial design. A primary advantage of SPI is that it circumvents the need for genetic engineering of tRNAs, synthetases, or the host ribosome, allowing straightforward adaptation of standard expression systems for nsAA production. This makes it particularly suitable for global, residue-specific incorporation across the entire proteome or targeted recombinant proteins, facilitating studies on protein stability, , or peptide design without complex orthogonal components. Despite these benefits, SPI is constrained to nsAAs that serve as effective analogs recognizable by host synthetases, limiting its scope to chemically similar variants rather than highly diverse or unnatural structures. Furthermore, the proteome-wide replacement can induce off-target effects, such as misfolding of essential cellular proteins or reduced host viability, necessitating careful selection of auxotrophs and growth conditions to minimize toxicity.

In Vitro Protein Synthesis

In vitro protein synthesis enables the translation of expanded genetic codes using cell-free systems, which provide a controlled environment for incorporating non-standard (nsAAs) without the constraints of living cells. These platforms typically rely on supplemented extracts or reconstituted components to support orthogonal translation machinery, allowing and modification of proteins with nsAAs at specified codons. By decoupling synthesis from cellular metabolism, such systems facilitate direct addition of orthogonal tRNA/synthetase pairs and nsAAs, often targeting (TAG) stop codons suppressed by engineered suppressors. Key systems include the PURExpress platform, a commercially available reconstituted E. coli-based system with purified factors, which can be augmented with orthogonal tRNA/synthetase pairs for nsAA incorporation. For instance, the pyrrolysyl-tRNA synthetase (PylRS)/tRNACUAPyl pair enables site-specific integration of nsAAs like Nε-acetyl-lysine into target proteins. Similarly, crude E. coli lysates derived from 1 (RF1)-deficient strains, such as the genomically recoded C321.ΔA, support high-fidelity nsAA incorporation when supplemented with orthogonal components and nsAAs, achieving yields of 0.9–1.7 mg/mL for superfolder GFP variants bearing p-azido-L-phenylalanine or similar ncAAs. These setups bypass the need for cellular uptake or toxicity concerns associated with nsAAs, enabling of protein variants. Advantages of these cell-free approaches include accelerated iteration cycles for expansion, as reactions can be optimized in hours rather than days, and the absence of cellular barriers allows direct testing of potentially cytotoxic nsAAs or synthetases. In the , integration with has further enhanced throughput, with droplet-based or continuous-flow formats yielding nsAA-modified proteins at scales of several mg/L per reaction while minimizing reagent consumption. For example, continuous-exchange cell-free (CECF) formats, which use dialysis to replenish energy sources and remove byproducts, have enabled the production of proteins with multiple nsAAs, such as dual incorporation of acetyl-lysine and thioacetyl-lysine into tails, supporting studies of post-translational mimicry. Despite these advances, limitations persist, including short active windows—typically 4–20 hours—due to depletion and factor instability, which restrict overall productivity compared to systems. Scalability also poses challenges, as batch reactions remain dominant for preparative scales, while microfluidic setups, though efficient for screening, struggle with transitioning to gram-level production without specialized . Ongoing refinements, such as energy-regenerating modules, aim to address these issues for broader in . As of 2025, cell-free systems have advanced with automated biofoundries for scalable ncAA and integration with AI for .

Chemical Protein Synthesis

Chemical protein synthesis provides a non-biological approach to incorporating non-standard (nsAAs) into proteins by assembling synthetic fragments, enabling precise control over sequence and modifications independent of ribosomal machinery. This method is particularly valuable for the expanded genetic code, as it allows the integration of nsAAs—such as fluorophores, photocrosslinkers, or post-translationally modified residues—directly into defined positions during fragment synthesis. Unlike biological systems, circumvents limitations of cellular , offering flexibility for studying protein function with unnatural building blocks. The cornerstone technique is native chemical ligation (NCL), developed in the 1990s, which chemoselectively joins unprotected segments in to form a native bond. In NCL, a C-terminal on one fragment reacts with an N-terminal on the second fragment, proceeding via a transient intermediate that rearranges to yield the ligated product without or side reactions. This process is highly efficient for peptides up to approximately 50 per fragment and has been demonstrated to work with all 20 proteinogenic , readily extending to nsAAs incorporated during solid-phase . For instance, nsAAs like β-turn mimics or metal-binding residues can be placed at ligation sites or elsewhere, facilitating the creation of modified proteins for biophysical studies. A key variant, expressed protein ligation (EPL), combines chemical and recombinant methods to extend NCL to larger proteins by generating recombinant thioester-tagged polypeptides via intein-mediated cleavage. Developed in 1998, EPL allows ligation of a synthetic peptide containing nsAAs to the of an expressed , enabling of full-length proteins with site-specific modifications. This approach has been instrumental for introducing nsAAs in contexts requiring post-translational additions, such as ubiquitination, where NCL joins thioesters to target protein cysteines, yielding natively linked polyubiquitin chains for functional assays. Advantages of these techniques include atomic-level precision in nsAA placement and the ability to perform ligations under mild conditions that preserve protein folding, contrasting with global incorporation methods. They enable the synthesis of proteins with homogeneous modifications, such as singly ubiquitinated histones, which are challenging to produce biologically. In the 2020s, advances in automated flow chemistry have enhanced scalability, allowing rapid synthesis and sequential NCL of peptide fragments to assemble proteins exceeding 100 amino acids, such as antibody fragments and enzymes, in hours rather than days. These developments have broadened applications in probing nsAA effects on protein stability and interactions. As of 2025, recent NCL enhancements include traceless multi-segment ligations for complex protein assemblies and improved strategies for N-terminal modifications with ncAAs.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.