Hubbry Logo
Biomolecular structureBiomolecular structureMain
Open search
Biomolecular structure
Community hub
Biomolecular structure
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Biomolecular structure
Biomolecular structure
from Wikipedia
Protein primary structureProtein secondary structureProtein tertiary structureProtein quaternary structure
The image above contains clickable links
The image above contains clickable links
This diagram (which is interactive) of protein structure uses PCNA as an example. (PDB: 1AXC​)
Nucleic acid primary structureNucleic acid secondary structureNucleic acid tertiary structureNucleic acid quaternary structure
The image above contains clickable links
The image above contains clickable links
Interactive image of nucleic acid structure (primary, secondary, tertiary, and quaternary) using DNA helices and examples from the VS ribozyme and telomerase and nucleosome. (PDB: ADNA, 1BNA, 4OCB, 4R4V, 1YMO, 1EQZ​)

Biomolecular structure is the intricate folded, three-dimensional shape that is formed by a molecule of protein, DNA, or RNA, and that is important to its function. The structure of these molecules may be considered at any of several length scales ranging from the level of individual atoms to the relationships among entire protein subunits. This useful distinction among scales is often expressed as a decomposition of molecular structure into four levels: primary, secondary, tertiary, and quaternary. The scaffold for this multiscale organization of the molecule arises at the secondary level, where the fundamental structural elements are the molecule's various hydrogen bonds. This leads to several recognizable domains of protein structure and nucleic acid structure, including such secondary-structure features as alpha helixes and beta sheets for proteins, and hairpin loops, bulges, and internal loops for nucleic acids. The terms primary, secondary, tertiary, and quaternary structure were introduced by Kaj Ulrik Linderstrøm-Lang in his 1951 Lane Medical Lectures at Stanford University.

Primary structure

[edit]

The primary structure of a biopolymer is the exact specification of its atomic composition and the chemical bonds connecting those atoms (including stereochemistry). For a typical unbranched, un-crosslinked biopolymer (such as a molecule of a typical intracellular protein, or of DNA or RNA), the primary structure is equivalent to specifying the sequence of its monomeric subunits, such as amino acids or nucleotides.

The primary structure of a protein is reported starting from the amino N-terminus to the carboxyl C-terminus, while the primary structure of DNA or RNA molecule is known as the nucleic acid sequence reported from the 5' end to the 3' end. The nucleic acid sequence refers to the exact sequence of nucleotides that comprise the whole molecule. Often, the primary structure encodes sequence motifs that are of functional importance. Some examples of such motifs are: the C/D[1] and H/ACA boxes[2] of snoRNAs, LSm binding site found in spliceosomal RNAs such as U1, U2, U4, U5, U6, U12 and U3, the Shine-Dalgarno sequence,[3] the Kozak consensus sequence[4] and the RNA polymerase III terminator.[5]

Secondary structure

[edit]
Secondary (inset) and tertiary structure of tRNA demonstrating coaxial stacking PDB: 6TNA​)

The secondary structure of a protein is the pattern of hydrogen bonds in a biopolymer. These determine the general three-dimensional form of local segments of the biopolymers, but does not describe the global structure of specific atomic positions in three-dimensional space, which are considered to be tertiary structure. Secondary structure is formally defined by the hydrogen bonds of the biopolymer, as observed in an atomic-resolution structure. In proteins, the secondary structure is defined by patterns of hydrogen bonds between backbone amine and carboxyl groups (sidechain–mainchain and sidechain–sidechain hydrogen bonds are irrelevant), where the DSSP definition of a hydrogen bond is used.

The secondary structure of a nucleic acid is defined by the hydrogen bonding between the nitrogenous bases.

For proteins, however, the hydrogen bonding is correlated with other structural features, which has given rise to less formal definitions of secondary structure. For example, helices can adopt backbone dihedral angles in some regions of the Ramachandran plot; thus, a segment of residues with such dihedral angles is often called a helix, regardless of whether it has the correct hydrogen bonds. Many other less formal definitions have been proposed, often applying concepts from the differential geometry of curves, such as curvature and torsion. Structural biologists solving a new atomic-resolution structure will sometimes assign its secondary structure by eye and record their assignments in the corresponding Protein Data Bank (PDB) file.

The secondary structure of a nucleic acid molecule refers to the base pairing interactions within one molecule or set of interacting molecules. The secondary structure of biological RNA's can often be uniquely decomposed into stems and loops. Often, these elements or combinations of them can be further classified, e.g. tetraloops, pseudoknots and stem loops. There are many secondary structure elements of functional importance to biological RNA. Famous examples include the Rho-independent terminator stem loops and the transfer RNA (tRNA) cloverleaf. There is a minor industry of researchers attempting to determine the secondary structure of RNA molecules. Approaches include both experimental and computational methods (see also the List of RNA structure prediction software).

Tertiary structure

[edit]

The tertiary structure of a protein or any other macromolecule is its three-dimensional structure, as defined by the atomic coordinates.[6] Proteins and nucleic acids fold into complex three-dimensional structures which result in the molecules' functions. While such structures are diverse and complex, they are often composed of recurring, recognizable tertiary structure motifs and domains that serve as molecular building blocks. Tertiary structure is considered to be largely determined by the biomolecule's primary structure (its sequence of amino acids or nucleotides).

Quaternary structure

[edit]

The protein quaternary structure [a] refers to the number and arrangement of multiple protein molecules in a multi-subunit complex.

For nucleic acids, the term is less common, but can refer to the higher-level organization of DNA in chromatin,[7] including its interactions with histones, or to the interactions between separate RNA units in the ribosome[8][9] or spliceosome.

Viruses, in general, can be regarded as molecular machines. Bacteriophage T4 is a particularly well studied virus and its protein quaternary structure is relatively well defined.[10] A study by Floor (1970)[11] showed that, during the in vivo construction of the virus by specific morphogenetic proteins, these proteins need to be produced in balanced proportions for proper assembly of the virus to occur. Insufficiency (due to mutation) in the production of one particular morphogenetic protein (e.g. a critical tail fiber protein), can lead to the production of progeny viruses almost all of which have too few of the particular protein component to properly function, i.e. to infect host cells.[11] However, a second mutation that reduces another morphogenetic component (e.g. in the base plate or head of the phage) could in some cases restore a balance such that a higher proportion of the virus particles produced are able to function.[11] Thus it was found that a mutation that reduces expression of one gene, whose product is employed in morphogenesis, may be partially suppressed by a mutation that reduces expression of a second morphogenetic gene resulting in a more balanced production of the virus gene products. The concept that, in vivo, a balanced availability of components is necessary for proper molecular morphogenesis may have general applicability for understanding the assembly of protein molecular machines.

Structure determination

[edit]

Structure probing is the process by which biochemical techniques are used to determine biomolecular structure.[12] This analysis can be used to define the patterns that can be used to infer the molecular structure, experimental analysis of molecular structure and function, and further understanding on development of smaller molecules for further biological research.[13] Structure probing analysis can be done through many different methods, which include chemical probing, hydroxyl radical probing, nucleotide analog interference mapping (NAIM), and in-line probing.[12]

Protein and nucleic acid structures can be determined using either nuclear magnetic resonance spectroscopy (NMR) or X-ray crystallography or single-particle cryo electron microscopy (cryoEM). The first published reports for DNA (by Rosalind Franklin and Raymond Gosling in 1953) of A-DNA X-ray diffraction patterns—and also B-DNA—used analyses based on Patterson function transforms that provided only a limited amount of structural information for oriented fibers of DNA isolated from calf thymus.[14][15] An alternate analysis was then proposed by Wilkins et al. in 1953 for B-DNA X-ray diffraction and scattering patterns of hydrated, bacterial-oriented DNA fibers and trout sperm heads in terms of squares of Bessel functions.[16] Although the B-DNA form' is most common under the conditions found in cells,[17] it is not a well-defined conformation but a family or fuzzy set of DNA conformations that occur at the high hydration levels present in a wide variety of living cells.[18] Their corresponding X-ray diffraction & scattering patterns are characteristic of molecular paracrystals with a significant degree of disorder (over 20%),[19][20] and the structure is not tractable using only the standard analysis.

In contrast, the standard analysis, involving only Fourier transforms of Bessel functions[21] and DNA molecular models, is still routinely used to analyze A-DNA and Z-DNA X-ray diffraction patterns.[22]

Structure prediction

[edit]
Saccharomyces cerevisiae tRNA-Phe structure space: the energies and structures were calculated using RNAsubopt and the structure distances computed using RNAdistance.

Biomolecular structure prediction is the prediction of the three-dimensional structure of a protein from its amino acid sequence, or of a nucleic acid from its nucleobase (base) sequence. In other words, it is the prediction of secondary and tertiary structure from its primary structure. Structure prediction is the inverse of biomolecular design, as in rational design, protein design, nucleic acid design, and biomolecular engineering.

Protein structure prediction is one of the most important goals pursued by bioinformatics and theoretical chemistry. Protein structure prediction is of high importance in medicine (for example, in drug design) and biotechnology (for example, in the design of novel enzymes). Every two years, the performance of current methods is assessed in the Critical Assessment of protein Structure Prediction (CASP) experiment.

There has also been a significant amount of bioinformatics research directed at the RNA structure prediction problem. A common problem for researchers working with RNA is to determine the three-dimensional structure of the molecule given only the nucleic acid sequence. However, in the case of RNA, much of the final structure is determined by the secondary structure or intra-molecular base-pairing interactions of the molecule. This is shown by the high conservation of base pairings across diverse species.

Secondary structure of small nucleic acid molecules is determined largely by strong, local interactions such as hydrogen bonds and base stacking. Summing the free energy for such interactions, usually using a nearest-neighbor method, provides an approximation for the stability of given structure.[23] The most straightforward way to find the lowest free energy structure would be to generate all possible structures and calculate the free energy for them, but the number of possible structures for a sequence increases exponentially with the length of the molecule.[24] For longer molecules, the number of possible secondary structures is vast.[23]

Sequence covariation methods rely on the existence of a data set composed of multiple homologous RNA sequences with related but dissimilar sequences. These methods analyze the covariation of individual base sites in evolution; maintenance at two widely separated sites of a pair of base-pairing nucleotides indicates the presence of a structurally required hydrogen bond between those positions. The general problem of pseudoknot prediction has been shown to be NP-complete.[25]

Design

[edit]

Biomolecular design can be considered the inverse of structure prediction. In structure prediction, the structure is determined from a known sequence, whereas, in protein or nucleic acid design, a sequence that will form a desired structure is generated.

Other biomolecules

[edit]

Other biomolecules, such as polysaccharides, polyphenols and lipids, can also have higher-order structure of biological consequence.

See also

[edit]

Notes

[edit]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Biomolecular structure refers to the three-dimensional arrangement of atoms in biological macromolecules, which dictates their function, stability, and interactions within living organisms. These structures are primarily composed of four major classes of biomolecules: proteins, nucleic acids, carbohydrates, and , each exhibiting distinct architectural features essential for cellular processes. Proteins, the most diverse class, are linear polymers of 20 standard amino acids linked by peptide bonds, folding into complex shapes that enable roles in catalysis, transport, and structural support. Nucleic acids, including DNA and RNA, consist of nucleotide monomers with nitrogenous bases, sugars, and phosphates, forming double helices (in DNA) or single strands (in RNA) that store and transmit genetic information through base pairing. Carbohydrates are polysaccharides built from monosaccharide units like glucose, connected by glycosidic bonds to create linear or branched chains that provide energy storage and structural integrity, such as in cellulose. Lipids, including fats, phospholipids, and steroids, feature hydrophobic hydrocarbon chains and often amphipathic properties, assembling into membranes and serving as energy reserves or signaling molecules. The organization of these biomolecules occurs across hierarchical levels of structure, particularly evident in proteins and nucleic acids. Primary structure defines the linear sequence of monomers (e.g., order in proteins). Secondary structure involves local folding patterns stabilized by hydrogen bonds, such as alpha helices and beta sheets in proteins or base-paired stems in . Tertiary structure encompasses the overall three-dimensional fold of a single chain, driven by non-covalent interactions like hydrophobic effects and electrostatic forces. structure, when applicable, describes the assembly of multiple subunits into functional complexes, as in . Understanding these levels is crucial, as disruptions in structure—due to or environmental factors—can lead to loss of function and diseases.

Overview

Definition and Scope

Biomolecular structure refers to the three-dimensional arrangement of atoms in biological molecules, which determines their shape, stability, and function at atomic, molecular, and hierarchical levels. This organization arises from the precise positioning of atoms connected by chemical bonds and influenced by surrounding environmental factors, enabling molecules to perform essential roles in cellular processes. The scope of biomolecular structure encompasses the primary classes of biomolecules: proteins, which serve as enzymes and structural components; nucleic acids, including DNA and RNA for genetic information storage and transfer; carbohydrates, involved in energy storage and cell recognition; and lipids, which form membranes and signaling molecules. These macromolecules, along with their smaller constituents, constitute the building blocks of living organisms. The field originated in early 20th-century biochemistry, with foundational progress such as Frederick Sanger's determination of the amino acid sequence of insulin between 1945 and 1955, providing the first complete primary structure of a protein and establishing sequencing as a key tool for . Central to biomolecular architecture are covalent interactions, such as and phosphodiester bonds, which define the primary connectivity, contrasted with non-covalent interactions—including hydrogen bonds, van der Waals forces, ionic bonds, and hydrophobic effects—that drive folding and assembly into functional three-dimensional forms.

Biological Importance

The three-dimensional of biomolecules fundamentally dictates their biological function, enabling precise molecular interactions essential for cellular processes. For instance, the of an enzyme's determines its catalytic specificity and efficiency, allowing substrates to bind and reactions to proceed with . Similarly, the structural features of receptor binding sites govern ligand recognition and , which are critical for processes like signaling and immune responses. This structure-function paradigm underscores how biomolecular conformations enable the diverse activities that sustain life, from to cellular communication. Evolutionary pressures have conserved key structural motifs across species, reflecting their indispensable roles in core biological functions. A prominent example is the Rossmann fold, a β-α-β sandwich domain found in nucleotide-binding enzymes like dehydrogenases, which has been preserved throughout due to its efficiency in cofactor binding and catalysis. Such conservation highlights how structural stability and functionality are selected for, allowing homologous proteins to perform analogous tasks in distant organisms and providing insights into the origins of metabolic pathways. Aberrant biomolecular structures contribute significantly to disease , often through misfolding or mutations that disrupt normal function. In , misfolded amyloid-β peptides aggregate into insoluble fibrils, leading to neurotoxic plaques that impair neuronal health and contribute to cognitive decline. Likewise, in sickle cell anemia, a single substitution in (glutamic acid to at position 6 of the β-chain) alters its quaternary structure, promoting polymerization into rigid fibers that deform red blood cells and cause vascular occlusion. These examples illustrate how structural deviations can cascade into systemic disorders, emphasizing the need for structural biology in diagnostics and therapeutics. Understanding biomolecular structures has revolutionized applications in and , enabling targeted interventions. Structure-based drug design leverages atomic-level models to develop inhibitors that bind specific protein pockets, accelerating the discovery of therapies for diseases like cancer and infections. In biotechnology, advances in since the 2000s have used structural insights to create novel enzymes and therapeutics with enhanced stability and function, powering innovations in industrial biocatalysis and .

Protein Structure

Primary Structure

The primary structure of a protein is defined as the linear sequence of covalently linked by bonds to form a polypeptide chain, with a free amino group at the and a free carboxyl group at the . This sequence is typically denoted using one-letter codes for the , such as A for , C for , and G for , as standardized by the International Union of Pure and Applied Chemistry (IUPAC). Proteins are composed of 20 standard amino acids, each distinguished by a unique side chain (R group) that imparts specific chemical properties, including hydrophobicity, polarity, or charge. The average length of a protein is approximately 300 amino acids, though this varies widely across organisms and functions, with eukaryotic proteins often longer than those in bacteria. Historically, primary structure was determined using Edman degradation, a method developed by Pehr Edman in the 1950s that sequentially removes and identifies the N-terminal amino acid through reaction with phenylisothiocyanate, enabling automated sequencing of up to 50-60 residues. Modern approaches primarily rely on mass spectrometry, such as tandem mass spectrometry (MS/MS) coupled with liquid chromatography, which fragments peptides and analyzes their mass-to-charge ratios to reconstruct the sequence by matching against databases. The primary structure serves as the foundational blueprint for all higher levels of protein organization, as alterations in the sequence—often caused by like single nucleotide polymorphisms (SNPs) that change a codon—can disrupt protein function and lead to diseases, such as sickle cell anemia resulting from a single substitution in . For instance, SNPs in coding regions may introduce missense , replacing one with another and thereby affecting the protein's overall properties. This sequence directly influences the propensity for local folding patterns in secondary structure.

Secondary Structure

Secondary structure refers to the local spatial arrangement of the polypeptide backbone in proteins, primarily stabilized by hydrogen bonds between the carbonyl oxygen and amide hydrogen atoms of the peptide bonds, excluding those involving side chains. These conformations arise from the inherent flexibility and steric constraints of the backbone, allowing segments of the chain to adopt repeating patterns that contribute to the overall folding without considering distant interactions. The most prevalent secondary structures are the α-helix and β-sheet, first proposed by Linus Pauling and Robert Corey in 1951 based on model-building constrained by known bond lengths and angles. The α- is a right-handed coiled in which the polypeptide backbone forms a cylindrical with 3.6 residues per turn and a rise of 5.4 along the helical axis per turn, resulting in a pitch of approximately 5.4 . In this configuration, hydrogen bonds form between the of residue i and the group of residue i+4, creating a stable, intra-chain network that aligns the peptide dipoles nearly parallel to the axis. The side chains project outward from the , enabling hydrophobic residues to interact with the environment; α- are particularly common in transmembrane proteins, where they span the as bundles of 20-25 residues. The β-sheet consists of two or more β-strands—extended polypeptide segments—aligned laterally to form a pleated sheet-like , with bonds between adjacent strands stabilizing the assembly. β-sheets can adopt parallel or antiparallel orientations: in antiparallel sheets, adjacent strands run in opposite directions (N-to-C terminus), allowing for optimal, perpendicular bonding patterns that enhance stability; parallel sheets have strands running in the same direction, with slightly offset and less direct bonds. These sheets often twist due to the of L-amino acids, and in proteins, they frequently form closed cylindrical structures known as β-barrels, which are prevalent in porins and outer membrane proteins for channel formation. The conformational space available to the polypeptide backbone is visualized in the , which maps the dihedral angles φ (phi, rotation around the N-Cα bond) and ψ (psi, rotation around the Cα-C bond) for each residue, revealing regions allowed by steric constraints from van der Waals repulsions. Allowed regions cluster around φ ≈ -60°, ψ ≈ -45° for α-helices and φ ≈ -120°, ψ ≈ 120° for β-sheets, while glycine's lack of a permits broader access, and proline's ring restricts φ to about -60°. Disallowed areas highlight conformations that would cause atomic clashes, guiding the feasibility of secondary structures. Beyond helices and sheets, other secondary structure motifs include β-turns and loops, which introduce reversals or irregular segments in the chain to connect regular elements. A β-turn typically spans four residues, with a tight bend stabilized by a between the carbonyl of residue i and the of i+3, classified into types (I, II, etc.) based on φ and ψ angles at positions i+1 and i+2.85807-8/fulltext) Loops are longer, non-repetitive segments lacking regular hydrogen bonding patterns, often solvent-exposed and functionally important for flexibility. Early methods for predicting secondary structure from primary relied on empirical parameters derived from known protein structures, such as the Chou-Fasman rules from the , which assign propensity values P_α, P_β, and P_t to each for , sheet, and turn formation, respectively, to identify potential segments where local averages exceed thresholds (e.g., P_α > 1.03 for nucleation). These parameters, calculated from statistical analysis of 29 proteins, reflect preferences influenced by primary but refined over time for better accuracy.

Tertiary Structure

The tertiary structure of a protein describes the spatial arrangement of its side chains in a single polypeptide chain, resulting in a compact, globular fold that positions distant residues in close proximity to enable functional conformation. This fold is primarily stabilized by the formation of a hydrophobic core, where nonpolar side chains cluster in the interior away from the aqueous environment, driven by the —an entropy-dominated process in which molecules gain disorder upon release from ordered shells around hydrophobic residues. Additional stabilization arises from covalent disulfide bonds between residues, which lock specific regions in place, and noncovalent salt bridges (ionic interactions between oppositely charged side chains like and aspartate), which contribute to overall stability, particularly in thermophilic proteins. Pi-stacking interactions between aromatic rings, such as those in or , further reinforce the core by providing attractive forces between electron clouds. Protein tertiary structures often consist of modular domains and motifs, which are recurrent folding patterns that confer specific functions. For example, the immunoglobulin fold is a beta-sandwich domain composed of two antiparallel beta-sheets stabilized by a conserved disulfide bond, commonly found in variable regions and molecules. Another prominent motif is the , a compact structure where a zinc ion coordinates and residues to stabilize a beta-beta-alpha fold, enabling DNA binding in transcription factors. These elements demonstrate how tertiary folding integrates secondary structural features, such as alpha-helices and beta-strands, into functional units. The principle that a protein's tertiary structure is dictated by its primary amino acid sequence—known as —was established through experiments showing that denatured ribonuclease A could spontaneously renature to its native fold upon removal of denaturants, regaining full enzymatic activity as the thermodynamically most stable conformation under physiological conditions. This renaturation highlights the reversibility of tertiary folding and the absence of obligatory covalent information beyond the sequence itself. Intermediates in this process, such as the molten globule state, represent partially compact forms with native-like secondary structure but fluctuating side-chain packing and a less defined hydrophobic core, serving as kinetic waypoints during folding. Typical globular proteins exhibit a between 20 and 50 Å, reflecting the scale of these compact folds for chains of 100–500 residues.

Quaternary Structure

Quaternary structure describes the spatial arrangement and non-covalent interactions between multiple polypeptide subunits that form a functional . These interactions occur at specific interfaces, often exhibiting to maximize stability and efficiency, such as in homodimers composed of two identical subunits or heterotetramers consisting of four distinct subunits.00196-2) A prominent example is heterotetramer comprising two α and two β subunits, where oxygen binding induces allosteric conformational changes that transition the complex from a low-affinity tense (T) state to a high-affinity relaxed (R) state, enhancing cooperative oxygen transport. In viral capsids, icosahedral organizes numerous identical protein subunits into a geometrically efficient shell, as seen in many viruses where quasi-equivalent positions allow for stable assembly without genetic redundancy. The stability of quaternary structures arises from hydrophobic and electrostatic interactions at subunit interfaces, typically burying 1000–2000 Ų of surface area per interface, with dissociation constants (K_d) ranging from micromolar to nanomolar, reflecting affinities sufficient for physiological function. Approximately 30–50% of proteins function as oligomers, enabling regulatory control and metabolic efficiency. Evolutionarily, many such assemblies arise from gene duplication events, where paralogous subunits diverge to form heteromeric complexes, diversifying function while retaining core interactions. The tertiary folds of individual subunits provide the scaffolds for these inter-subunit associations.

Nucleic Acid Structure

DNA Structure

The structure of deoxyribonucleic acid (DNA), the primary genetic material in most organisms, was determined in 1953 by James D. Watson and Francis H. C. Crick, who proposed a double-helical model based on diffraction data from and Maurice H. F. Wilkins. This model revealed DNA as two antiparallel polynucleotide strands wound around a common axis, stabilized by hydrogen bonds between complementary bases. The human diploid genome comprises approximately 6.4 billion base pairs, extending to about 2 meters in length if uncoiled. Under physiological conditions, DNA predominantly adopts the B-form, a right-handed double helix characterized by 10.5 base pairs per helical turn and a pitch of 3.4 nm, with an axial rise of 0.34 nm per base pair. The strands are connected by Watson-Crick base pairing, where adenine (A) pairs with thymine (T) through two hydrogen bonds, and guanine (G) pairs with cytosine (C) through three, ensuring specific and stable complementarity. This configuration allows the molecule to compactly store genetic information while permitting access for replication and transcription. DNA can assume alternative conformations depending on environmental conditions and sequence. The A-form, observed in dehydrated states such as during , is a shorter, wider right-handed with 11 base pairs per turn and a pitch of about 2.8 nm, resembling the double-helical structure of . In contrast, is a left-handed formed preferentially in sequences with alternating purines and pyrimidines, such as poly(dG-dC), featuring 12 base pairs per turn and a zigzag backbone that gives it its name. These non-B forms can influence local DNA flexibility and interactions, though B-DNA remains the predominant physiological structure. To achieve further compaction in cells, DNA undergoes supercoiling, where the double helix twists upon itself beyond its relaxed state. The topology is described by the (Lk), defined as Lk = Tw + Wr, where Tw is the twist (helical turns) and Wr is the writhe (superhelical ). In eukaryotes, negative supercoiling facilitates packaging; for instance, each wraps about 147 base pairs of DNA in 1.65 left-handed turns, introducing negative supercoils that aid in folding. This topological constraint is essential for fitting the into the nucleus while regulating access to genetic information.

RNA Structure

RNA, unlike DNA, is typically single-stranded and folds into complex three-dimensional structures that enable diverse functions beyond genetic information storage. A key distinguishing feature is the presence of a hydroxyl group (-OH) at the 2' position of the sugar, which imparts chemical reactivity absent in and facilitates RNA's catalytic capabilities by participating in nucleophilic attacks during reactions such as cleavage. RNA bases pair via Watson-Crick rules (A-U, G-C) but also form non-canonical pairs like the G-U wobble, where guanine's amino group hydrogen-bonds with uracil's carbonyl, allowing structural flexibility and stability in folded regions. This wobble pairing is ubiquitous in RNA motifs and contributes to functional diversity across RNA classes. (mRNA) molecules, which carry protein-coding information, typically range from 1000 to 5000 in length, allowing for the encoding of polypeptides of varying sizes. At the secondary structure level, RNA forms double-helical stems through intramolecular base pairing, often terminated by unpaired loops that create motifs like stem-loops (hairpins), where a short double-stranded region connects to a single-stranded loop. These stem-loops are critical for RNA stability, protein recognition, and regulatory functions, appearing in precursor microRNAs and ribozymes. More complex secondary elements include pseudoknots, formed when bases in a loop pair with a distant single-stranded region, creating intertwined helices that enhance and are common in viral RNAs for frameshifting during . A classic example is (tRNA), whose secondary structure adopts a cloverleaf model with four stems—acceptor, D-arm, anticodon, and T-arm—connected by loops, which folds into a compact L-shaped tertiary conformation essential for delivery during protein synthesis. Tertiary RNA structures arise from long-range interactions stabilizing secondary motifs into functional folds, often involving metal ions and non-canonical base pairs. Ribozymes exemplify this, as catalytic RNAs that perform self-cleavage or ligation; the first were discovered in the early 1980s when identified self-splicing introns in pre-rRNA and discovered the catalytic activity of RNase P, where the RNA components perform reactions without protein assistance, revealing RNA's enzymatic potential. These discoveries earned the Nobel Prize in Chemistry for Cech and Altman. These introns fold into intricate tertiary structures with active sites coordinating Mg²⁺ ions for catalysis. (rRNA) domains further illustrate tertiary complexity; the 23S rRNA in the large subunit comprises seven domains radiating from a central Domain 0 core, forming the peptidyl transferase center, while the 16S rRNA in the small subunit has four domains that assemble into the decoding site, enabling formation and mRNA reading. Functional RNA motifs often rely on precise tertiary folding for regulation, as seen in microRNAs (miRNAs), small non-coding RNAs (~22 ) that post-transcriptionally repress by binding target mRNAs. MiRNA regulation primarily occurs through seed pairing, where nucleotides 2–8 at the miRNA 5' end form complementary base pairs with the mRNA 3' , leading to translational inhibition or mRNA degradation. This mechanism underscores RNA's role in fine-tuning cellular processes via structural specificity.

Structures of Other Biomolecules

Carbohydrates

Carbohydrates are essential biomolecules composed primarily of carbon, , and oxygen, often in the ratio of 1:2:1, forming polyhydroxy aldehydes or ketones known as sugars. Their structural diversity arises from monomeric units called monosaccharides, which polymerize into oligosaccharides (2–10 units) and (more than 10 units) through glycosidic linkages. These structures enable carbohydrates to serve as stores and structural components in cells, with their configurations influencing , digestibility, and biological function. Monosaccharides, the simplest carbohydrates, are classified as aldoses, which possess an group at the carbonyl carbon (C1), or ketoses, which have a group typically at C2. For example, glucose, an aldohexose with six carbons, predominantly exists in cyclic ring forms rather than the open-chain structure. In its form, glucose cyclizes via a reaction between the aldehyde at C1 and the hydroxyl at C5, forming a six-membered ring. This cyclization creates a new chiral center at C1, termed the anomeric carbon, resulting in two anomers: α-D-glucopyranose, where the hydroxyl at C1 is axial, and β-D-glucopyranose, where it is equatorial. Oligosaccharides and form through reactions that create glycosidic bonds between the anomeric carbon of one and a hydroxyl group of another. These bonds can be α or β, depending on the anomeric configuration, and specify the linkage position, such as α-1,4 or β-1,4. In , a branched , glucose units link via α-1,4-glycosidic bonds in linear chains, with branches introduced every 8–12 residues through α-1,6-glycosidic bonds at C6, enhancing and rapid enzymatic access for mobilization. The three-dimensional conformations of carbohydrates significantly affect their properties. rings, common in hexoses like glucose, adopt a conformation as the most stable form, with substituents positioned either equatorially (preferred for bulkier groups) or axially; less stable boat conformations can occur but are rare under physiological conditions. further diversifies structures: and forms are mirror images distinguished by the configuration at the penultimate carbon (C5 in hexoses), with predominant in . Epimers are diastereomers differing at a single chiral center, such as glucose and , which differ at C2. A key example of structural variation is seen in and , both glucose polymers but with distinct linkages. Cellulose consists of linear chains of β-D-glucose linked by β-1,4-glycosidic bonds, promoting an extended, rigid conformation stabilized by hydrogen bonds between chains, forming microfibrils that provide tensile strength to cell walls. In contrast, starch features α-1,4-glycosidic bonds in its linear component and branching via α-1,6 linkages every 24–30 residues in , yielding a helical, compact suited for in . These differences render cellulose indigestible by most animals, while starch is readily hydrolyzed.

Lipids

Lipids constitute a diverse group of amphipathic biomolecules essential for cellular architecture, primarily forming the structural basis of biological membranes and depots. Unlike proteins or nucleic acids, lipids do not form linear polymers but instead self-assemble into dynamic supra-molecular structures driven by hydrophobic interactions between their nonpolar tails and hydrophilic interactions of their polar heads. This amphipathicity enables lipids to create barriers that compartmentalize cellular processes while allowing selective permeability. In biomolecular structure, lipids are classified based on their core scaffolds, with fatty acids serving as the fundamental hydrophobic components. Fatty acids are long-chain carboxylic acids typically containing 12 to 24 carbon atoms, with a polar carboxyl group at one end and a hydrocarbon chain that can be saturated or unsaturated. Saturated fatty acids, such as palmitic acid (16:0), feature fully hydrogenated chains with no carbon-carbon double bonds, resulting in straight, linear structures that pack tightly due to van der Waals interactions. In contrast, unsaturated fatty acids incorporate one or more cis double bonds, which introduce kinks in the chain—for instance, oleic acid (18:1 Δ9 cis) has a single cis double bond between carbons 9 and 10, disrupting alignment and reducing packing density. These structural variations in chain saturation and configuration profoundly influence the physical properties of lipid assemblies. Major lipid classes in membranes include phospholipids, steroids, and , each contributing distinct structural motifs. Phospholipids, the predominant membrane lipids, consist of a backbone esterified to two tails and a phosphorylated polar head group, such as choline in , creating a classic head-tail architecture. This amphipathic design drives spontaneous formation of bilayers, where hydrophilic heads face aqueous environments and hydrophobic tails sequester inward, as observed in cell plasma membranes. Steroids, exemplified by , feature a rigid, planar tetracyclic with a hydroxyl group at C3 and a nonpolar isooctyl tail, allowing intercalation between phospholipid tails to enhance membrane stability without disrupting the bilayer core. 's fused rings confer rigidity, counteracting excessive fluidity in high-temperature or unsaturated environments. share a backbone—formed by (an 18-carbon amino alcohol) amide-linked to a chain—and bear diverse head groups like in , enabling roles in and signaling domains. Lipid assemblies vary by molecular geometry and environmental conditions, yielding structures like micelles, vesicles, and phase-separated domains. Micelles form from single-tailed amphiphiles, such as lysophospholipids or detergents, arranging into spherical monolayers with tails inward to minimize water contact, often seen in solubilization processes. Vesicles, or liposomes, arise from bilayer-forming like phospholipids, enclosing an aqueous core in closed spherical structures that mimic cellular compartments and are used in models. In native membranes, lipid rafts emerge as ordered microdomains through lateral , enriched in , , and glycosphingolipids, which adopt a liquid-ordered phase distinct from the surrounding liquid-disordered phase, facilitating protein clustering and signaling. These rafts highlight how lipid composition dictates heterogeneous membrane organization. Membrane fluidity, critical for protein mobility and permeability, is finely tuned by fatty acid properties. Longer acyl chains increase van der Waals interactions, promoting tighter packing and reduced fluidity, whereas shorter chains enhance disorder and mobility. Unsaturation further modulates this: each cis double bond introduces bends that hinder crystallization, elevating fluidity—as seen in polyunsaturated fatty acids like (18:2 Δ9,12 cis,cis), which maintain membrane flexibility at physiological temperatures. This modulation ensures adaptive responses to environmental stresses, such as temperature changes. The foundational lipid bilayer model, proposing a bimolecular leaflet arrangement, was established in 1925 by Gorter and through monolayer experiments on extracted lipids, revealing that surface area doubled upon spreading, indicating a dual-layer configuration. Glycolipids, hybrids of and carbohydrates, briefly extend this diversity by attaching sugar moieties to or backbones, influencing surface recognition.

Experimental Structure Determination

X-ray Crystallography

X-ray crystallography is a cornerstone technique for elucidating the atomic-level three-dimensional structures of biomolecules, such as proteins and nucleic acids, by exploiting the diffraction of X-rays through ordered molecular crystals. The method measures the interference patterns generated when X-rays scatter off electrons in the atoms, yielding data that can be transformed into electron density maps for model building. This approach has been instrumental in understanding biomolecular function, as structures reveal key features like active sites and folding motifs. The process commences with and , the most labor-intensive step, where biomolecules are screened against thousands of conditions involving salts, polymers, or ligands to nucleate and grow diffraction-quality , often using vapor diffusion or microbatch methods. Suitable , typically micrometers in size, are then mounted and irradiated with monochromatic X-rays, producing a diffraction pattern of discrete spots whose positions and intensities correspond to the Fourier components of the . These intensities provide the magnitudes of structure factors, but reconstructing the full density requires solving for the missing phases. The phase problem arises because X-ray detectors record only intensities (proportional to the square of amplitudes), necessitating indirect methods to infer phases. Multiple isomorphous replacement (MIR) addresses this by deriving phases from differences in diffraction between the native crystal and isomorphous heavy-atom derivatives, such as mercury or compounds, which introduce phase shifts without disrupting the lattice. Complementing MIR, multiwavelength anomalous diffraction (MAD) exploits tunable synchrotron X-rays near the absorption edge of atoms like (incorporated via substitution), collecting data at multiple wavelengths to exploit anomalous scattering for phase determination, offering higher accuracy and avoiding non-isomorphism issues. With phases in hand, an map is calculated via inverse , contoured to display regions of high electron density where atoms are positioned manually or automatically, followed by refinement to minimize discrepancies with observed data. High-quality structures achieve resolutions of 1-2 , sufficient to distinguish individual atoms, bond lengths, and side-chain orientations, though resolutions below 1.5 are ideal for unambiguous interpretation. Historically, the technique's application to proteins culminated in 1959 when reported the 2 Å structure of using MIR, marking the first visualization of a protein's polypeptide chain folded into α-helices and revealing its oxygen-binding pocket. This seminal work, shared with for , earned Kendrew the 1962 and established as viable for complex biomolecules. The advent of sources in the 1980s dramatically accelerated progress by delivering collimated, high-flux X-rays orders of magnitude brighter than rotating anodes, enabling rapid data collection from tiny or weakly diffracting crystals and facilitating time-resolved studies. Facilities like the UK's Daresbury Laboratory, operational since 1980, democratized access and boosted output. As of 2025, the holds 199,418 entries from , representing the majority of deposited biomolecular structures and enabling comparative analyses across diverse systems. Despite these advances, the method's reliance on crystals introduces limitations, as packing forces can induce conformational artifacts not present in solution, potentially misrepresenting dynamic or flexible regions. excels for static, high-resolution snapshots of compact biomolecules but is often paired with cryo-electron microscopy for large, heterogeneous complexes.

Nuclear Magnetic Resonance (NMR) Spectroscopy

Nuclear magnetic resonance (NMR) spectroscopy provides atomic-level insights into biomolecular structures in solution, complementing techniques like by capturing dynamic ensembles rather than static crystals. It relies on the magnetic properties of atomic nuclei, such as ¹H, ¹³C, and ¹⁵N, to probe interatomic interactions and conformations under near-physiological conditions. This method has been instrumental in determining structures of proteins, nucleic acids, and their complexes, with over 14,600 entries in the derived from NMR data as of 2025. The core principles of NMR for structure determination involve spectral parameters that report on local environments and spatial relationships. Chemical shifts indicate the electronic surroundings of nuclei, correlating with secondary structure elements like α-helices and β-sheets through deviations from values, often quantified via the . The (NOE) yields through-space distance restraints up to approximately 5 , with intensity scaling as 1/r⁶ where r is the interproton distance, enabling mapping of tertiary contacts such as those in β-sheet hydrogen bonds. J-couplings, mediated through bonds, provide information via the Karplus relation; for instance, the three-bond ³J_{HN-Hα} coupling (typically 3–9 Hz) distinguishes backbone φ angles in helices (<4 Hz) from those in sheets (>8 Hz). Structural elucidation proceeds through multidimensional NMR experiments on isotope-labeled samples. In 2D spectroscopy, COSY detects J-coupled protons within spin systems for initial residue identification, while HSQC correlates ¹H with ¹⁵N or ¹³C, producing a "fingerprint" spectrum with one peak per amide group. Higher-dimensional (3D/4D) spectra, such as HNCA and HN(CA)CO, facilitate sequential resonance assignment by linking intra- and inter-residue correlations through backbone nuclei, often via "sequential walks" that trace the polypeptide chain using NOE connectivities. These assignments, pioneered in the early with the bovine pancreatic inhibitor (BPTI), marked the first complete protein structure determination by NMR, achieving a bundle of conformers with root-mean-square deviations below 1 Å in rigid regions. Resulting distance and angle restraints are input into molecular modeling software to generate ensembles of structures. Despite its strengths, solution NMR faces limitations, including a practical size threshold of about 50 kDa for comprehensive studies due to increasing linewidths from slower tumbling, which reduce sensitivity and resolution. Uniform with ¹³C and ¹⁵N is essential to access heteronuclear experiments and suppress spectral overlap, typically achieved by expressing proteins in media enriched with ¹⁵N-NH₄Cl and ¹³C-glucose. NMR excels at probing dynamics, such as conformational exchanges via CPMG relaxation dispersion, which quantifies exchange rates (k_{ex} ≈ 100–3,000 s⁻¹) and populations of excited states in enzymes like . These dynamic insights, often validated by comparison to structures, highlight functional flexibility invisible in crystal lattices.

Cryo-Electron Microscopy (Cryo-EM)

Cryo-electron microscopy (cryo-EM) is a pivotal technique for determining the three-dimensional structures of biomolecular complexes at near-atomic resolution, particularly those that are large, dynamic, or resistant to . Developed over decades, it involves imaging biological samples preserved in a frozen-hydrated state to minimize structural perturbations, enabling visualization of proteins, nucleic acids, and assemblies in near-native conditions. Unlike methods requiring ordered , cryo-EM accommodates heterogeneous and flexible biomolecules, making it ideal for studying macromolecular machines such as viruses and ribosomes. The core process begins with , where purified biomolecules are applied to a holey carbon grid and rapidly frozen by plunging into , forming a thin layer of vitreous ice that embeds the particles without formation. This , pioneered by Jacques Dubochet in the , preserves the native hydration and conformation of the samples. The grid is then transferred to a cryo-electron microscope, where low-dose beams (typically <20 e⁻/Ų) are used to capture 2D projection images at cryogenic temperatures, often as dose-fractionated movies with direct detectors to mitigate beam-induced motion. Particle picking follows, involving automated or semi-automated identification and extraction of individual macromolecular projections from thousands of micrographs, followed by 2D classification to remove junk particles and generate class averages. These are then used for via iterative alignment and refinement algorithms, such as projection matching, to build a density map that can be interpreted with atomic models. A major advancement, termed the "resolution revolution," occurred in the 2010s with the introduction of direct electron detectors, which improved signal-to-noise ratios and enabled movie-mode imaging to correct for specimen drift, routinely achieving resolutions better than 4 . These detectors, such as the Gatan K2 Summit and Thermo Fisher Falcon, capture individual electron events with high quantum efficiency, dramatically enhancing data quality compared to earlier CCD cameras. By 2025, resolutions of 2-4 have become standard for well-behaved samples, allowing de novo model building and visualization of side-chain densities in many cases. This breakthrough was recognized with the 2017 awarded to Jacques Dubochet, , and Richard Henderson for their foundational contributions: Dubochet's method, Frank's development of single-particle reconstruction algorithms in the 1970s-1980s, and Henderson's demonstration of atomic-resolution potential in the 1990s. Early milestones included the first near-atomic resolution structures of icosahedral viruses achieved between 2008 and 2010, such as the 3.8 Å reconstruction of double-layer particles and capsids, which resolved secondary structures and interfaces previously inaccessible. Applications have since expanded to complex assemblies like ribosomes, where structures at 2.5-3 Å have elucidated mechanisms across , and viruses, revealing entry and assembly pathways for pathogens like Zika and SARS-CoV-2. To address sample heterogeneity—variations in conformation, composition, or occupancy—modern methods employ focused classification, 3D variability analysis, or Gaussian mixture models during refinement, allowing separation of distinct states without averaging out dynamics. As of November 2025, the Electron Microscopy Data Bank (EMDB) holds 51,509 entries, predominantly cryo-EM maps, underscoring its dominance in . Cryo-EM data can also integrate with for hybrid models of subdomains.

Computational Structure Analysis

Structure Prediction

Structure prediction in biomolecular science involves computational approaches to infer three-dimensional (3D) conformations from primary sequences, such as amino acid or nucleotide chains, without relying on experimental data. Traditional methods include homology modeling, which constructs models by aligning a target sequence to structurally similar templates in databases like the Protein Data Bank (PDB), and ab initio prediction, which uses physics-based energy minimization to explore conformational space from first principles. Homology modeling relies on evolutionary conservation, achieving reliable results when sequence identity exceeds 30% to known structures, as implemented in tools like SWISS-MODEL. Ab initio methods, exemplified by the Rosetta protocol, employ fragment assembly and Monte Carlo sampling to generate low-energy decoys, proving effective for small proteins lacking close homologs during early Critical Assessment of Structure Prediction (CASP) experiments. The advent of (AI) has revolutionized prediction, particularly through models that leverage multiple alignments (MSAs) to capture coevolutionary signals indicating residue proximities. DeepMind's , first entering CASP13 in 2018, outperformed competitors by integrating convolutional neural networks with MSAs and structural templates. Its successor, AlphaFold2, dominated CASP14 in 2020 with a global distance test (GDT) score of 92.4, achieving backbone root-mean-square deviation (RMSD) accuracies below 1 Å for many targets. The method uses an Evoformer module to MSAs and pairwise representations, followed by iterative refinement via invariant point attention, enabling atomic-level predictions even for novel folds. In July 2021, DeepMind released an initial AlphaFold database containing over 365,000 high-accuracy models for 20 proteomes, later expanded to more than 200 million covering nearly all known proteins. Subsequent AI developments, such as Meta AI's ESMFold released in 2023, further accelerated predictions by using large language models trained on evolutionary-scale data to directly infer structures from single sequences, bypassing MSA computation and achieving near-AlphaFold accuracy in seconds rather than hours. These methods typically yield RMSD values under 2 Å for ordered regions of globular proteins, establishing atomic precision comparable to experimental techniques. However, limitations persist: AlphaFold2 struggles with intrinsically disordered regions (IDRs), where low-confidence predictions (pLDDT <50) indicate poor MSA signals due to rapid sequence evolution, and with protein complexes, particularly those dominated by heterotypic interactions lacking strong intra-chain contacts. ESMFold shares similar challenges for IDRs and multi-chain assemblies. Despite these advances, predictions remain static snapshots, often requiring experimental validation for functional insights. Building on these, AlphaFold 3, released by DeepMind in May 2024, extends predictions to complexes involving proteins with DNA, RNA, ligands, and ions using a diffusion-based architecture, achieving improved accuracy for biomolecular interactions. Additionally, ESM3, developed by EvolutionaryScale (founded by former Meta AI researchers) and released in June 2024, is a generative multimodal model that jointly reasons over protein sequence, structure, and function, simulating evolutionary processes to design novel proteins.

Molecular Modeling and Simulation

Molecular modeling and simulation play a crucial role in elucidating the dynamic aspects of biomolecular structures, complementing static experimental by capturing conformational changes, interactions, and energetic landscapes over time. These techniques primarily employ (MD) simulations, which compute the of atomic positions and velocities in a biomolecular system based on . By solving the for thousands to millions of atoms, MD reveals how structures fluctuate, fold, and interact at the atomic level, providing insights into processes that occur on timescales inaccessible to many experiments. The core of MD simulations involves empirical force fields that approximate the of the system, such as and CHARMM, which parameterize bonded (bonds, angles, dihedrals) and non-bonded (van der Waals, electrostatic) interactions. These fields enable the of forces on each atom, derived from the negative of the . The dynamics are governed by Newton's second of motion, Fi=miai\mathbf{F}_i = m_i \mathbf{a}_i, where Fi\mathbf{F}_i is the on atom ii, mim_i its mass, and ai\mathbf{a}_i its acceleration. To propagate the system in time, these equations are discretized and numerically integrated using algorithms like the Verlet or velocity Verlet methods, typically with timesteps of 1-2 femtoseconds to maintain energy conservation. The first biomolecular MD simulation, performed on the bovine pancreatic trypsin inhibitor (BPTI) protein in 1977, covered just 10 picoseconds and demonstrated atomic fluctuations consistent with experimental observations. Standard all-atom MD simulations, which treat every atom explicitly, are limited to timescales of picoseconds to microseconds due to computational demands, restricting their ability to observe slower processes like large-scale conformational changes. Coarse-grained models, which represent groups of atoms as single beads, extend accessible timescales to microseconds or longer by reducing the , though at the cost of atomic detail. To overcome sampling limitations for rare events, enhanced sampling techniques such as and are employed; applies biasing potentials along a to sample multiple windows, while deposits Gaussian hills in collective variable space to flatten the free-energy landscape and accelerate exploration. Applications of MD simulations in biomolecular structure include probing protein folding pathways, where trajectories reveal intermediate states and transition mechanisms, and studying ligand binding, which captures , association kinetics, and induced-fit adaptations. Free energy calculations, often using thermodynamic integration or within MD frameworks, quantify binding affinities via the relation ΔG=RTlnK\Delta G = -RT \ln K, where ΔG\Delta G is the standard free energy change, RR the , TT the , and KK the ; these enable relative of ligands for . Advances in hardware, such as the Anton developed by D.E. Shaw Research, have enabled millisecond-scale simulations of proteins like BPTI by the early , unveiling rare events like domain motions and folding funnels previously inaccessible.

Biomolecular Design

Biomolecular design involves the rational and computational creation of novel biomolecules, primarily proteins, with predefined structures and functions not found in . This field leverages physics-based modeling and to engineer sequences and folds for applications in therapeutics, , and . De novo design starts from scratch, generating entirely new backbones and sequences, while inverse folding designs sequences compatible with target s. These approaches have enabled the development of stable, functional proteins, with over 1,500 structurally characterized de novo designs reported by 2025. Key approaches in de novo protein design include blueprint-based methods that assemble secondary structure elements into novel topologies, as exemplified by the Rosetta software suite. RosettaDesign, introduced in 2000, optimizes amino acid sequences for given backbones by minimizing free energy using a physics-based potential, allowing the creation of proteins with atomic-level accuracy. For instance, it has been used to redesign nine natural protein folds with sequences that fold correctly and maintain stability comparable to wild-type proteins. Inverse folding complements this by solving the inverse problem: generating sequences likely to adopt a specified 3D structure. The ProteinMPNN model, a deep learning-based inverse folder from 2022, achieves high success rates in designing functional sequences for diverse motifs, outperforming traditional methods in both in silico and experimental validation. Recent AI-assisted tools have accelerated de novo design, particularly diffusion models that generate protein backbones from . RFdiffusion, released in 2023, fine-tunes a RoseTTAFold-derived network to produce diverse, high-fidelity structures conditioned on specifications like or binding sites, enabling the design of monomers, oligomers, and binders with experimental success rates exceeding 20% for novel folds. Generative models like ESM3 (2024) further advance this by simulating evolutionary trajectories to create proteins with integrated sequence, structure, and function, facilitating the design of entirely novel entities such as fluorescent proteins. Hybrid methods combine computational with , where initial designs are iteratively improved through random mutagenesis and selection. Frances Arnold's pioneering work on , awarded the 2018 , demonstrated the creation of enzymes with new specificities, such as variants active in organic solvents; integrating this with computational tools like has yielded enzymes with catalytic efficiencies rivaling natural ones. Notable examples include computationally designed enzymes for the Kemp , a benchmark for proton abstraction not catalyzed by natural proteins. In 2008, eight de novo enzymes were created using , achieving rate accelerations up to 10^5-fold over uncatalyzed reactions through theozyme-based placement. For therapeutics, de novo miniproteins—compact scaffolds of 40-60 residues—have been designed as high-affinity binders. RFdiffusion-generated miniproteins inhibit viral proteins like the MERS-CoV spike with picomolar affinity, offering advantages over antibodies in stability and manufacturability, and have advanced to preclinical testing for infectious diseases and cancer targets. These designs are often validated using molecular simulations to confirm folding and dynamics.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.