Hubbry Logo
Nucleic acid structureNucleic acid structureMain
Open search
Nucleic acid structure
Community hub
Nucleic acid structure
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Nucleic acid structure
Nucleic acid structure
from Wikipedia

Nucleic acid primary structureNucleic acid secondary structureNucleic acid tertiary structureNucleic acid quaternary structure
The image above contains clickable links
The image above contains clickable links
Interactive image of nucleic acid structure (primary, secondary, tertiary, and quaternary) using DNA helices and examples from the VS ribozyme and telomerase and nucleosome. (PDB: ADNA, 1BNA, 4OCB, 4R4V, 1YMO, 1EQZ​)

Nucleic acid structure refers to the structure of nucleic acids such as DNA and RNA. Chemically speaking, DNA and RNA are very similar. Nucleic acid structure is often divided into four different levels: primary, secondary, tertiary, and quaternary.

Primary structure

[edit]
Chemical structure of DNA

Primary structure consists of a linear sequence of nucleotides that are linked together by phosphodiester bonds. It is this linear sequence of nucleotides that make up the primary structure of DNA or RNA. Nucleotides consist of 3 components:

  1. Nitrogenous base
    1. Adenine
    2. Guanine
    3. Cytosine
    4. Thymine (present in DNA only)
    5. Uracil (present in RNA only)
  2. 5-carbon sugar which is called deoxyribose (found in DNA) and ribose (found in RNA).
  3. One or more phosphate groups.[1]

The nitrogen bases adenine and guanine are purine in structure and form a glycosidic bond between their 9 nitrogen and the 1' -OH group of the deoxyribose. Cytosine, thymine, and uracil are pyrimidines, hence the glycosidic bonds form between their 1 nitrogen and the 1' -OH of the deoxyribose. For both the purine and pyrimidine bases, the phosphate group forms a bond with the deoxyribose sugar through an ester bond between one of its negatively charged oxygen groups and the 5' -OH of the sugar.[2] The polarity in DNA and RNA is derived from the oxygen and nitrogen atoms in the backbone. Nucleic acids are formed when nucleotides come together through phosphodiester linkages between the 5' and 3' carbon atoms.[3] A nucleic acid sequence is the order of nucleotides within a DNA (GACT) or RNA (GACU) molecule that is determined by a series of letters. Sequences are presented from the 5' to 3' end and determine the covalent structure of the entire molecule. Sequences can be complementary to another sequence in that the base on each position is complementary as well as in the reverse order. An example of a complementary sequence to AGCT is TCGA. DNA is double-stranded containing both a sense strand and an antisense strand. Therefore, the complementary sequence will be to the sense strand.[4]

Nucleic acid design can be used to create nucleic acid complexes with complicated secondary structures such as this four-arm junction. These four strands associate into this structure because it maximizes the number of correct base pairs, with As matched to Ts and Cs matched to Gs. Image from Mao, 2004.[5]

Complexes with alkali metal ions

[edit]

There are three potential metal binding groups on nucleic acids: phosphate, sugar, and base moieties. Solid-state structure of complexes with alkali metal ions have been reviewed.[6]

Secondary structure

[edit]

DNA

[edit]

Secondary structure is the set of interactions between bases, i.e., which parts of strands are bound to each other. In DNA double helix, the two strands of DNA are held together by hydrogen bonds. The nucleotides on one strand base pairs with the nucleotide on the other strand. The secondary structure is responsible for the shape that the nucleic acid assumes. The bases in the DNA are classified as purines and pyrimidines. The purines are adenine and guanine. Purines consist of a double ring structure, a six-membered and a five-membered ring containing nitrogen. The pyrimidines are cytosine and thymine. It has a single ring structure, a six-membered ring containing nitrogen. A purine base always pairs with a pyrimidine base (guanine (G) pairs with cytosine (C) and adenine (A) pairs with thymine (T) or uracil (U)). DNA's secondary structure is predominantly determined by base-pairing of the two polynucleotide strands wrapped around each other to form a double helix. Although the two strands are aligned by hydrogen bonds in base pairs, the stronger forces holding the two strands together are stacking interactions between the bases. These stacking interactions are stabilized by Van der Waals forces and hydrophobic interactions, and show a large amount of local structural variability.[7] There are also two grooves in the double helix, which are called major groove and minor groove based on their relative size.

RNA

[edit]
An example of RNA secondary structure. This image includes several structural elements, including; single-stranded and double-stranded areas, bulges, internal loops and hairpin loops. Double-stranded RNA forms an A-type helical structure, unlike the common B-type conformation taken by double-stranded DNA molecules.

The secondary structure of RNA consists of a single polynucleotide. Base pairing in RNA occurs when RNA folds between complementarity regions. Both single- and double-stranded regions are often found in RNA molecules.

The four basic elements in the secondary structure of RNA are:

  • Helices
  • Bulges
  • Loops
  • Junctions

The antiparallel strands form a helical shape.[3] Bulges and internal loops are formed by separation of the double helical tract on either one strand (bulge) or on both strands (internal loops) by unpaired nucleotides.

Stem-loop or hairpin loop is the most common element of RNA secondary structure.[8] Stem-loop is formed when the RNA chains fold back on themselves to form a double helical tract called the 'stem', the unpaired nucleotides forms single stranded region called the 'loop'. A tetraloop is a four-base pairs hairpin RNA structure. There are three common families of tetraloop in ribosomal RNA: UNCG, GNRA, and CUUG (N is one of the four nucleotides and R is a purine). UNCG is the most stable tetraloop.[9]

Pseudoknot is an RNA secondary structure first identified in turnip yellow mosaic virus.[10] It is minimally composed of two helical segments connected by single-stranded regions or loops. H-type fold pseudoknots are best characterized. In H-type fold, nucleotides in the hairpin-loop pair with the bases outside the hairpin stem forming second stem and loop. This causes formation of pseudoknots with two stems and two loops.[11] Pseudoknots are functional elements in RNA structure having diverse function and found in most classes of RNA.

Secondary structure of RNA can be predicted by experimental data on the secondary structure elements, helices, loops, and bulges. DotKnot-PW method is used for comparative pseudoknots prediction. The main points in the DotKnot-PW method is scoring the similarities found in stems, secondary elements and H-type pseudoknots.[12]

Tertiary structure

[edit]
DNA structure and bases
A-B-Z-DNA Side View

Tertiary structure refers to the locations of the atoms in three-dimensional space, taking into consideration geometrical and steric constraints. It is a higher order than the secondary structure, in which large-scale folding in a linear polymer occurs and the entire chain is folded into a specific 3-dimensional shape. There are 4 areas in which the structural forms of DNA can differ.

  1. Handedness – right or left
  2. Length of the helix turn
  3. Number of base pairs per turn
  4. Difference in size between the major and minor grooves[3]

The tertiary arrangement of DNA's double helix in space includes B-DNA, A-DNA, and Z-DNA. Triple-stranded DNA structures have been demonstrated in repetitive polypurine:polypyrimidine Microsatellite sequences and Satellite DNA.

B-DNA is the most common form of DNA in vivo and is a more narrow, elongated helix than A-DNA. Its wide major groove makes it more accessible to proteins. On the other hand, it has a narrow minor groove. B-DNA's favored conformations occur at high water concentrations; the hydration of the minor groove appears to favor B-DNA. B-DNA base pairs are nearly perpendicular to the helix axis. The sugar pucker which determines the shape of the a-helix, whether the helix will exist in the A-form or in the B-form, occurs at the C2'-endo.[13]

A-DNA, is a form of the DNA duplex observed under dehydrating conditions. It is shorter and wider than B-DNA. RNA adopts this double helical form, and RNA-DNA duplexes are mostly A-form, but B-form RNA-DNA duplexes have been observed.[14] In localized single strand dinucleotide contexts, RNA can also adopt the B-form without pairing to DNA.[15] A-DNA has a deep, narrow major groove which does not make it easily accessible to proteins. On the other hand, its wide, shallow minor groove makes it accessible to proteins but with lower information content than the major groove. Its favored conformation is at low water concentrations. A-DNAs base pairs are tilted relative to the helix axis, and are displaced from the axis. The sugar pucker occurs at the C3'-endo and in RNA 2'-OH inhibits C2'-endo conformation.[13] Long considered little more than a laboratory artifice, A-DNA is now known to have several biological functions.

Z-DNA is a relatively rare left-handed double-helix. Given the proper sequence and superhelical tension, it can be formed in vivo but its function is unclear. It has a more narrow, more elongated helix than A or B. Z-DNA's major groove is not really a groove, and it has a narrow minor groove. The most favored conformation occurs when there are high salt concentrations. There are some base substitutions but they require an alternating purine-pyrimidine sequence. The N2-amino of G H-bonds to 5' PO, which explains the slow exchange of protons and the need for the G purine. Z-DNA base pairs are nearly perpendicular to the helix axis. Z-DNA does not contain single base-pairs but rather a GpC repeat with P-P distances varying for GpC and CpG. On the GpC stack there is good base overlap, whereas on the CpG stack there is less overlap. Z-DNA's zigzag backbone is due to the C sugar conformation compensating for G glycosidic bond conformation. The conformation of G is syn, C2'-endo; for C it is anti, C3'-endo.[13]

A linear DNA molecule having free ends can rotate, to adjust to changes of various dynamic processes in the cell, by changing how many times the two chains of its double helix twist around each other. Some DNA molecules are circular and are topologically constrained. More recently circular RNA was described as well to be a natural pervasive class of nucleic acids, expressed in many organisms (see CircRNA).

A covalently closed, circular DNA (also known as cccDNA) is topologically constrained as the number of times the chains coiled around one other cannot change. This cccDNA can be supercoiled, which is the tertiary structure of DNA. Supercoiling is characterized by the linking number, twist and writhe. The linking number (Lk) for circular DNA is defined as the number of times one strand would have to pass through the other strand to completely separate the two strands. The linking number for circular DNA can only be changed by breaking of a covalent bond in one of the two strands. Always an integer, the linking number of a cccDNA is the sum of two components: twists (Tw) and writhes (Wr).[16]

Twists are the number of times the two strands of DNA are twisted around each other. Writhes are number of times the DNA helix crosses over itself. DNA in cells is negatively supercoiled and has the tendency to unwind. Hence the separation of strands is easier in negatively supercoiled DNA than in relaxed DNA. The two components of supercoiled DNA are solenoid and plectonemic. The plectonemic supercoil is found in prokaryotes, while the solenoidal supercoiling is mostly seen in eukaryotes.

Quaternary structure

[edit]

The quaternary structure of nucleic acids is similar to that of protein quaternary structure. Although some of the concepts are not exactly the same, the quaternary structure refers to a higher-level of organization of nucleic acids. Moreover, it refers to interactions of the nucleic acids with other molecules. The most commonly seen form of higher-level organization of nucleic acids is seen in the form of chromatin which leads to its interactions with the small proteins histones. Also, the quaternary structure refers to the interactions between separate RNA units in the ribosome or spliceosome.[17]

See also

[edit]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Nucleic acids are a class of large biomolecules essential to all known forms of , serving as the primary molecules for storing and transmitting genetic in cells and viruses. They consist of two main types: deoxyribonucleic acid (), which encodes the genetic instructions for protein synthesis and is primarily found in the nucleus of eukaryotic cells, and ribonucleic acid (), which plays diverse roles in protein synthesis, gene regulation, and other cellular processes. and are both polymers composed of repeating monomeric units called , linked together by phosphodiester bonds to form long chains. Each nucleotide monomer in nucleic acids comprises three key components: a nitrogenous base, a five-carbon sugar (ribose or deoxyribose), and one or more phosphate groups. In DNA, the sugar is deoxyribose, and the four nitrogenous bases are adenine (A), thymine (T), cytosine (C), and guanine (G); in RNA, the sugar is ribose, and thymine is replaced by uracil (U), with the other bases remaining the same. The sequence of these bases along the nucleic acid chain encodes genetic information, with specific base-pairing rules—A with T (or U in RNA), and C with G—enabling the complementary nature of strands. The most iconic structural feature of DNA is its double helix configuration, consisting of two antiparallel polynucleotide strands twisted around a common axis, stabilized by hydrogen bonds between complementary base pairs and hydrophobic interactions in the core. This right-handed helix, with a diameter of approximately 2 nm and 10 base pairs per helical turn (spaced 0.34 nm apart), was first proposed by and in 1953 based on X-ray diffraction data from and . In contrast, is typically single-stranded and flexible, allowing it to fold into intricate secondary structures such as hairpins, loops, and stems through intramolecular base pairing, which are crucial for its functional roles like in ribozymes or recognition in . These structural differences underpin the distinct biological functions of DNA as a stable genetic archive and as a versatile intermediary.

Basic components

Nucleobases

Nucleobases are the aromatic nitrogenous compounds that form the core informational components of nucleic acids, distinguishing DNA from RNA through specific variants. These molecules attach to sugar-phosphate backbones via glycosidic bonds and enable sequence-specific recognition through hydrogen bonding patterns. The primary nucleobases are classified into purines and pyrimidines based on their ring structures, with ionization properties governed by pKa values that ensure neutrality at physiological . Purine nucleobases, and , possess a bicyclic comprising a six-membered ring fused to a five-membered ring, providing extended and rigidity. , chemically known as 6-aminopurine, features an amino group at position 6, while , or 2-amino-6-oxopurine, includes an amino group at position 2 and a keto group at position 6. Both exist predominantly in their amino-keto tautomeric forms under neutral conditions, with rare enol or imino tautomers occurring transiently and potentially influencing base pairing fidelity. The pKa values for these purines—approximately 4.15 for (protonation at N1) and 9.2 for ( at N1-H)—position them as neutral species at 7, minimizing electrostatic repulsion in polymers. Pyrimidine nucleobases, cytosine (C), thymine (T) in DNA, and uracil (U) in RNA, are characterized by a single six-membered heterocyclic ring with nitrogens at positions 1 and 3. Cytosine is 4-amino-2-oxopyrimidine, bearing an amino group at position 4 and a keto group at position 2; thymine is 5-methyl-2,4-dioxopyrimidine, with keto groups at positions 2 and 4 and a methyl substituent at position 5; uracil is 2,4-dioxopyrimidine, identical to thymine but lacking the 5-methyl group. This methyl group in thymine enhances hydrophobic interactions and stability in DNA compared to uracil in RNA, contributing to distinct evolutionary roles in genetic storage versus expression. Their pKa values, around 4.5 for cytosine (protonation at N3), 9.7 for thymine, and 9.5 for uracil (deprotonation at N3-H), similarly favor neutral forms at physiological pH. The hydrogen bonding capabilities of these nucleobases dictate complementary pairing: forms two hydrogen bonds with or uracil via its N1 acceptor and N6-H donor pairing with the O4 and N3-H of T/U, respectively, while forms three hydrogen bonds with through its O6 and N1-H donors and N2-H donor interacting with cytosine's N3, O2, and N4-H. These patterns, illustrated in the base pair diagrams below, ensure specificity and stability in nucleic acid duplexes.
  • N6-H (A) ... O4 (T/U)
  • N1 (A) ... H-N3 (T/U)
Guanine-Cytosine Pair (3 H-bonds):
  • O6 (G) ... H-N4 (C)
  • N1-H (G) ... N3 (C)
  • N2-H (G) ... O2 (C)
Beyond the canonical set, rare modified nucleobases occur in specific contexts, such as hypoxanthine (C₅H₄N₄O; 1,7-dihydropurin-6-one, a deaminated adenine derivative) present as the nucleoside in the wobble position of certain tRNA anticodons, enabling flexible codon recognition during .

Sugars and phosphate backbone

The sugar-phosphate backbone forms the structural scaffold of nucleic acids, consisting of alternating deoxyribose (in DNA) or ribose (in RNA) sugars linked to phosphate groups via phosphodiester bonds. In DNA, the sugar is 2-deoxy-D-ribose, a pentose lacking a hydroxyl group at the 2' carbon position, while RNA incorporates D-ribose, which includes this 2'-OH group. Both sugars exist predominantly in the β-D-furanose (five-membered ring) conformation, with the anomeric carbon (C1') linked to the nucleobase via a β-glycosidic bond, ensuring a consistent orientation in the polymer chain. This furanose form provides rigidity to the backbone while allowing rotational flexibility around the C4'-C5' and C3'-O3' bonds. The absence of the 2'-OH in enhances DNA's by preventing intramolecular nucleophilic attacks that could disrupt the phosphodiester linkages, making DNA suitable for long-term genetic storage. In contrast, ribose's 2'-OH group increases RNA's susceptibility to but also imparts greater conformational flexibility, particularly in single-stranded regions, enabling diverse folding motifs essential for RNA's functional roles. This structural difference influences overall polymer dynamics: RNA tends toward A-form helices with a wider major groove due to the 2'-OH's steric and hydrogen-bonding effects, while DNA favors the more elongated B-form. Phosphodiester bonds are formed through condensation polymerization, where the 5'-phosphate group of one reacts with the 3'-OH group of another, eliminating a and creating a covalent linkage between the 5' carbon and 3' carbon across the . This unidirectional 5' to 3' polarity defines the orientation of chains, with synthesis and replication processes proceeding exclusively in this direction. The resulting backbone is polyanionic, as each carries a negative charge at physiological (pKa ≈ 1-2), necessitating counterions for charge neutralization and structural integrity. Monovalent cations such as Na⁺ and K⁺ serve as primary counterions, coordinating directly with the negatively charged oxygens to screen electrostatic repulsion and stabilize the . studies reveal that Na⁺ interacts more strongly with groups due to its higher , often forming closer ion- contacts, whereas K⁺ prefers interactions with nucleobase atoms in the grooves, influencing hydration patterns and minor conformational adjustments. These ion-specific bindings are crucial for maintaining backbone and preventing aggregation in cellular environments. The 2'-OH group in RNA uniquely enables base-catalyzed hydrolysis of the via a mechanism, where the deprotonated 2'-O⁻ acts as a to attack the adjacent , forming a 2',3'-cyclic phosphate intermediate and cleaving the chain. This reaction proceeds efficiently under alkaline conditions ( > 7), with rate enhancements from general base catalysis, rendering RNA far less stable than DNA—phosphodiester bonds in DNA are approximately 100-200 times more resistant to such cleavage due to the missing 2'-OH. This inherent lability contributes to RNA's transient nature , contrasting with DNA's robustness.

Nucleotides and polymerization

Nucleotides consist of a nucleobase linked to a pentose sugar (ribose in RNA or deoxyribose in DNA) via a β-N-glycosidic bond, with one to three phosphate groups attached to the 5'-oxygen of the sugar, forming nucleoside monophosphates (NMPs), diphosphates (NDPs), or triphosphates (NTPs). The triphosphate forms are the primary substrates for nucleic acid synthesis: deoxyribonucleoside triphosphates (dNTPs) for DNA and ribonucleoside triphosphates (NTPs) for RNA, providing the energy needed for polymerization through cleavage of the high-energy phosphoanhydride bonds. In cells, NTP levels are maintained higher than NMP or NDP levels to support efficient synthesis, with kinases such as nucleoside diphosphate kinase catalyzing the transfer of phosphate from ATP to NDPs. Nucleic acid polymerization occurs enzymatically via DNA or RNA polymerases, which catalyze the template-directed addition of s to a growing chain. , first isolated by in 1956, incorporates dNTPs complementary to a DNA template, forming a between the 3'-hydroxyl of the primer terminus and the 5'-phosphate of the incoming dNTP, with concomitant release of (PPi) that drives the reaction forward. follows a similar mechanism but initiates de novo without a primer, using NTPs to synthesize RNA complementary to a DNA template starting from a promoter sequence, also releasing PPi; this process was elucidated through studies of bacterial enzymes like E. coli . Both enzymes require a template strand to ensure base-pairing specificity, with the incoming selected via hydrogen bonding to the template base in the . Polymerization proceeds exclusively in the 5' to 3' direction, where new are added to the 3'-hydroxyl end of the chain, resulting in a linear with 5' and 3' hydroxyl termini. This directionality arises from the chemical mechanism of nucleophilic attack by the 3'-OH on the α- of the incoming NTP or dNTP, preventing 3' to 5' synthesis. The release of PPi is often hydrolyzed by pyrophosphatases to shift the equilibrium toward polymer elongation. DNA and RNA synthesis differ in substrates and fidelity: DNA polymerases use dNTPs lacking the 2'-hydroxyl group, enabling a more stable double , while RNA polymerases incorporate rNTPs with the 2'-OH, which introduces greater flexibility but higher reactivity. DNA polymerases possess 3' to 5' proofreading activity, achieving error rates of approximately 10^{-7} to 10^{-9} per , whereas RNA polymerases lack robust , resulting in higher error rates of about 10^{-4} to 10^{-5}, suitable for transient RNA molecules.

Primary structure

Nucleotide sequence and composition

The primary structure of nucleic acids is defined as the precise linear sequence of nucleotides, determined by the specific order of their nitrogenous bases—adenine (A), cytosine (C), guanine (G), and thymine (T) in deoxyribonucleic acid (DNA), or uracil (U) replacing thymine in ribonucleic acid (RNA)—connected via phosphodiester bonds in the 5' to 3' direction. This sequence encodes genetic information and is conventionally denoted as a string of single-letter symbols, such as 5'-ATGC-3' for a short DNA segment. The nucleotide composition, particularly the GC content (the percentage of guanine and cytosine bases), significantly affects the stability and thermal properties of the nucleic acid. In double-stranded DNA, higher GC content correlates with increased melting temperature (Tm), the point at which half the double helix dissociates into single strands, due to the stronger hydrogen bonding between G-C pairs compared to A-T pairs. For oligonucleotides shorter than 20 bases under standard PCR conditions (e.g., 50 mM monovalent salt), an approximate Tm is given by the Wallace rule: Tm=2×(A+T)+4×(G+C)T_m = 2 \times (A + T) + 4 \times (G + C) °C. Chargaff's rules describe the equimolar base ratios observed in most double-stranded DNA molecules: the quantity of adenine equals thymine (A = T), and guanine equals cytosine (G = C), arising from complementary base pairing along the two strands. These parity relationships, established through biochemical analyses of DNA from various organisms, do not apply universally; exceptions occur in single-stranded DNA or certain viral genomes where base pairing is absent or incomplete. Sequence motifs represent recurring patterns within the primary structure, such as simple tandem repeats. A prominent example is the poly-A tail in eukaryotic (mRNA), a homopolymeric stretch of 50–250 residues added post-transcriptionally at the 3' end.

Chemical modifications and stability

Chemical modifications to the primary structure of nucleic acids involve the addition of functional groups to nucleobases or the sugar-phosphate backbone, altering their chemical properties without changing the underlying sequence. In DNA, one of the most prevalent modifications is (5mC), which occurs primarily at CpG dinucleotides and plays a central role in epigenetic regulation by influencing through and transcriptional repression. This modification is catalyzed by DNA methyltransferases and is essential for processes such as genomic imprinting and X-chromosome inactivation. Another significant DNA modification is N6-methyladenine (m6A), which is widespread in bacterial genomes where it contributes to restriction-modification systems that protect against foreign DNA invasion. In bacteria like oryzae, m6A is installed by methyltransferases such as and helps regulate replication and repair pathways. In RNA, over 170 distinct chemical modifications have been identified, particularly in (rRNA) and (tRNA), where they fine-tune structure and function. Key examples include (Ψ), formed by isomerization of , which enhances base stacking and hydrogen bonding stability; N6-methyladenosine (m6A), the most abundant internal modification in eukaryotic mRNA that affects splicing, export, and translation; and 2'-O-methylation (Nm), which protects against degradation and modulates assembly. These modifications are enzymatically installed by writer proteins, such as pseudouridine synthases (PUS enzymes) that catalyze the reversible C-C glycosidic bond formation in Ψ without requiring cofactors. For instance, families like TruA and TruB in and Pus1-Pus10 in eukaryotes target specific sites in tRNA and rRNA. These modifications significantly impact nucleic acid stability by conferring resistance to enzymatic degradation and modulating helical properties. In therapeutic applications, such as (siRNA), incorporation of 2'-fluoro substitutions at the 2' position of the sugar enhances resistance, allowing prolonged activity while maintaining efficacy. Similarly, 5mC in DNA reduces backbone flexibility, increasing helix rigidity and protecting against hydrolytic cleavage. In RNA, Ψ and Nm stabilize secondary structures by improving thermodynamic stability and shielding against endonucleases like RNase A. Detection of these modifications relies on specialized techniques that preserve and identify the altered bases. For 5mC in DNA, is a cornerstone method that converts unmethylated cytosines to uracils via sulfonation and , while 5mC remains resistant, enabling differentiation through subsequent PCR amplification and sequencing. This approach provides genome-wide mapping but requires careful optimization to minimize DNA fragmentation. Base composition, particularly CpG density, influences the prevalence of modifiable sites like those for 5mC.

Secondary structure

Base pairing rules and hydrogen bonding

In nucleic acids, base pairing refers to the specific association between nucleobases that stabilizes secondary structures through ing. In DNA, the canonical Watson-Crick base pairing rules dictate that (A) pairs with (T), and (G) pairs with (C). These pairings occur between a on one strand and a on the complementary strand, ensuring geometric uniformity in the double helix. The specificity arises from complementary donor and acceptor sites on the bases, which form precise interactions: the A-T pair involves two hydrogen bonds, while the G-C pair forms three, contributing to greater stability in G-C rich regions. The hydrogen bonds in these pairs are typically N-H···O or N-H···N types, involving the Watson-Crick faces of the bases. For A-T, one bond forms between the N1 of (donor) and N3 of (acceptor), and the second between the amino group at C6 of (donor) and the carbonyl at C4 of (acceptor). In G-C pairing, the three bonds are: N1 of to N3 of , the amino group at C2 of to the carbonyl at C2 of , and the amino group at C4 of (donor) to the carbonyl at C6 of (acceptor). These interactions not only dictate pairing fidelity but also influence melting temperatures, with each additional G-C bond increasing duplex stability by approximately 1-2 kcal/mol compared to A-T. In RNA, the base pairing rules are analogous but substitute uracil (U) for , forming A-U pairs with two s (N1 of to N3 of uracil, and amino at C6 of to carbonyl at C4 of uracil) and retaining G-C pairs with three. RNA often adopts single-stranded conformations with intramolecular base pairing to form stems in hairpin loops or other motifs, where patterns remain similar but allow for greater flexibility. The G-C pair's stronger bonding (due to the third ) promotes more stable RNA duplexes, as evidenced by higher thermal denaturation temperatures in GC-rich sequences. Beyond strict Watson-Crick pairing, non-canonical interactions like wobble base pairs occur, particularly in . Proposed by , the wobble hypothesis describes relaxed base at the third position of codons during , allowing a single tRNA to recognize multiple synonymous codons. For instance, in the anticodon can pair with either or uracil in the mRNA via two hydrogen bonds, shifting the geometry to accommodate the "wobble" without disrupting overall specificity. G-U wobble pairs, common in structures, feature two hydrogen bonds (N1 of G to O2 of U, and O6 of G to N3 of U) and introduce functional diversity, such as in where they influence decoding accuracy and structural dynamics. These wobble interactions are weaker than canonical pairs but essential for the degeneracy of the , reducing the required number of tRNAs from 61 to about 40.
Base PairMoleculeHydrogen BondsKey Donors/Acceptors
A-TDNA2A(N1)-T(N3); A(N6)-T(O4)
G-CDNA/RNA3G(N1)-C(N3); G(N2)-C(O2); G(O6)-C(N4)
A-URNA2A(N1)-U(N3); A(N6)-U(O4)
G-U (wobble)RNA2G(N1)-U(O2); G(O6)-U(N3)
This table summarizes the primary base pairs, highlighting the hydrogen bonding patterns that underpin nucleic acid structure.

DNA double helix configurations

The DNA double helix, formed through complementary base pairing between adenine-thymine and guanine-cytosine, manifests in several distinct configurations that influence its overall geometry and biological function. These variants arise primarily from differences in backbone conformation, base stacking, and hydration levels, with the B-form representing the predominant structure under physiological conditions. B-DNA is a right-handed characterized by a smooth, elongated structure with approximately 10.5 base pairs per turn, a helical pitch of 3.4 nm, a rise of 0.34 nm per base pair, and a twist angle of 36° between adjacent base pairs. This configuration features distinct major and minor grooves, which facilitate interactions with proteins for processes such as replication and transcription. The structure was first proposed by Watson and Crick based on data, with refined parameters derived from studies. A-DNA, also right-handed but shorter and wider than B-DNA, adopts a more compact form with 11 s per turn, a pitch of about 2.8 nm, a rise of 0.23 nm per , and a twist of approximately 33°. In this conformation, the base pairs are tilted relative to the helix axis, resulting in a deep, narrow minor groove and a shallow major groove. A-DNA is favored under low-humidity conditions, such as in dehydrated fibers, and is commonly observed in DNA-RNA hybrids. Z-DNA represents a left-handed with a zig-zag backbone, accommodating 12 s per turn, a pitch of 4.5 nm, a rise of 0.37 nm per , and a twist of -30°. Unlike the right-handed forms, Z-DNA has a single deep groove and no distinct major/minor distinction, with glycosidic conformations for purines and anti for pyrimidines. This form is stabilized in sequences rich in alternating purine-pyrimidine tracts, particularly GC repeats, and was first identified through crystallographic analysis of synthetic . The prevalence of these helical forms is modulated by environmental factors, including hydration, , and sequence composition. B-DNA predominates in aqueous, physiological environments with moderate salt concentrations (e.g., ~150 mM NaCl), while A-DNA emerges at relative humidities below 75% or in the presence of alcohols that reduce . Z-DNA formation is promoted by high salt concentrations (e.g., >2 M NaCl) or multivalent cations like Mg²⁺, which screen repulsions, and is further enhanced in negatively supercoiled contexts or by specific protein binding, though the latter influences are secondary to ionic effects. Sequence motifs, such as AT-rich regions favoring B-DNA stability through optimal stacking, and GC-rich segments predisposing to Z-DNA via favorable syn-anti alternations, also play a key role. variations can induce transitions, with acidic conditions occasionally stabilizing A-like forms by protonating bases and altering hydrogen bonding.
Helix TypeHandednessBase Pairs per TurnPitch (nm)Rise per Base Pair (nm)Twist Angle (°)Key Features
B-DNARight10.53.40.3436Major/minor grooves; physiological form
A-DNARight112.80.2333Tilted bases; low humidity
Z-DNALeft124.50.37-30Zig-zag backbone; high salt/GC-rich

RNA folding motifs

RNA folding motifs are local secondary structural elements that arise from base pairing within single-stranded RNA molecules, enabling diverse functions such as , , and molecular recognition. Unlike the continuous double helix of DNA, these motifs feature discontinuous helical regions interrupted by unpaired , forming compact architectures stabilized by hydrogen bonding and stacking interactions. The most fundamental motif is the stem-loop, consisting of a double-stranded helical stem formed by complementary base pairing and an unpaired loop of 4-7 at the apex. Stem-loops can be further diversified by bulges and internal loops, where unpaired protrude from one or both strands, respectively, disrupting the continuity of the and introducing flexibility or binding sites. For instance, bulge loops with a single unpaired on one strand facilitate sharp turns in the backbone, while internal loops with unpaired residues on both strands allow for asymmetric expansions that accommodate tertiary contacts. Hairpins represent a specific class of stem-loops where the loop size and sequence confer exceptional stability, particularly tetraloops (four-nucleotide loops) with consensus sequences like GNRA, which exhibit enhanced thermodynamic stability due to non-canonical base interactions and stacking. These tetraloops are among the most stable loop configurations, with free energy contributions up to 4-6 kcal/mol more favorable than larger loops, as determined from optical studies. Magnesium ions (Mg²⁺) further stabilize these motifs by bridging negatively charged groups in loops and stems, reducing electrostatic repulsion and promoting compact folding, especially in bulged or internal regions. Beyond simple stem-loops, pseudoknots form when a single-stranded region base-pairs with a complementary outside an existing stem, creating interleaved helices that cross over like a and often enhance mechanical rigidity or signaling. Kissing loops occur when the apical loops of two separate hairpins interact via complementary base pairing, forming transient or stable intermolecular contacts that mediate dimerization or regulatory switching. In transfer RNA (tRNA), the cloverleaf secondary structure exemplifies the integration of multiple stem-loops, including the acceptor stem, D-loop, anticodon arm, and T-loop, which collectively form four helical arms connected by loops for amino acid attachment and codon recognition. Riboswitches, regulatory RNA elements in bacterial mRNAs, frequently incorporate pseudoknots and kissing loops alongside stems to sense metabolites like thiamine or guanine, undergoing conformational changes that control gene expression. Computational tools such as mfold and the ViennaRNA package enable prediction and identification of these motifs by minimizing free energy using dynamic programming algorithms that account for base-pairing rules, loop penalties, and stacking energies. Mfold, developed by Zuker, computes optimal and suboptimal foldings for sequences up to several hundred , while ViennaRNA extends this with advanced features like prediction and covariance models for motif detection in alignments.

Tertiary structure

DNA supercoiling and topology

DNA supercoiling represents a key aspect of the tertiary structure of closed circular DNA molecules, where the double helix, serving as the substrate, undergoes additional coiling beyond its intrinsic helical twist to achieve compaction and facilitate biological processes. In covalently closed circular DNA, such as bacterial plasmids or viral genomes, the topology is invariant unless broken and resealed, leading to superhelical tension that influences DNA accessibility and function. The topological state of supercoiled DNA is quantified by the (Lk), defined as the sum of the twist (Tw), which measures the helical turns of the two strands around each other, and the writhe (Wr), which captures the coiling of the helical axis in space:
Lk=Tw+Wr\mathrm{Lk = Tw + Wr}
This relationship, established through mathematical analysis of ribbon topology, holds for any closed DNA duplex. The relaxed linking number (Lk₀) corresponds to the state without supercoiling, typically about 10.5 base pairs per turn in B-form DNA. Supercoiling arises when Lk deviates from Lk₀, quantified by ΔLk = Lk - Lk₀; negative ΔLk indicates underwinding (negative supercoiling), while positive ΔLk indicates overwinding (positive supercoiling). Negative supercoiling predominates , promoting DNA unwinding for processes like replication and transcription, whereas positive supercoiling can accumulate ahead of progressing polymerases.
To manage superhelical tension, cells employ DNA topoisomerases, enzymes that transiently break and rejoin DNA strands to alter Lk. Type I topoisomerases, such as the enzyme discovered in 1971, relax supercoils by nicking one strand, changing Lk in steps of ±1 without requiring ATP; they preferentially relieve negative supercoils. Type II topoisomerases, including , act on both strands, altering Lk in steps of ±2 and often requiring ATP; while most type II enzymes relax supercoils bidirectionally, gyrase uniquely introduces negative supercoils using ATP hydrolysis, counteracting the positive supercoils generated during transcription. In , gyrase maintains an overall negative superhelical density (σ ≈ -0.06) essential for chromosomal compaction within the confined space. Supercoiled DNA adopts distinct three-dimensional configurations to partition the writhe component. Plectonemic supercoils form right-handed interwound structures where the DNA axis coils around itself, typical in unconstrained bacterial DNA and allowing dynamic partitioning of twist and writhe. In contrast, toroidal (or solenoidal) supercoils involve the DNA wrapping left-handedly around a core, as seen in eukaryotic nucleosomes where approximately 147 base pairs wrap 1.65–1.7 turns around the histone octamer, contributing about -1 supercoil per nucleosome. This wrapping constrains negative supercoils, reducing free writhe and aiding chromatin compaction. In bacteria, negative supercoiling driven by gyrase compacts the genome by favoring plectonemic structures and branched domains, while also linking to replication by relieving torsional stress at forks and to transcription by enhancing promoter opening and RNA polymerase progression.

RNA three-dimensional domains

RNA molecules fold into compact three-dimensional structures that integrate secondary structural motifs, such as and loops, to form functional tertiary domains essential for biological roles like and molecular recognition. These tertiary folds are stabilized by long-range interactions, including base triples, coaxial stacking of , and coordination with metal ions like Mg²⁺, which neutralize phosphate repulsions and position catalytic groups. Unlike the more rigid DNA double helix, RNA's single-stranded nature allows diverse architectures, often involving non-canonical base pairs and pseudoknots that pack secondary elements into globular shapes. A quintessential example of RNA tertiary structure is transfer RNA (tRNA), which adopts a conserved L-shaped fold approximately 7 nm long and 2 nm wide. This architecture arises from the coaxial stacking of two orthogonal helical domains: the acceptor stem and T-arm form one arm of the L, while the D-arm and anticodon arm stack to form the other, with the anticodon loop at the distal end and the 3' CCA acceptor site at the opposite tip. The fold is further stabilized by tertiary interactions, such as hydrogen bonds between the and T-loop, and Mg²⁺ ions that bridge phosphates in the core, enabling tRNA's role in translation by positioning the anticodon for mRNA decoding and the acceptor for aminoacylation. The atomic-resolution of yeast tRNA^Phe, determined at 1.93 Å resolution, revealed these features, highlighting the universal tertiary organization across tRNAs despite sequence variability. Ribozymes exemplify how RNA tertiary domains create active sites for catalysis, mimicking protein enzymes. The hammerhead , a small self-cleaving motif found in viroids and RNAs, folds into a Y-shaped with three helices flanking a conserved core of 13-15 that positions the scissile for inline attack by 2'-OH. Crystal structures of the hammerhead in pre-cleavage and post-cleavage states show a compact geometry where a sheared G-A and a turn organize catalytic residues, enabling site-specific with rate enhancements up to 10^6-fold over uncatalyzed cleavage. Similarly, group I introns, larger s (250-500 ) in organelles and , fold into a barrel-like tertiary domain with two stacked helical domains and peripheral elements that form a guanosine-binding site adjacent to the 5' splice site. This architecture, resembling the spliceosome's catalytic core, facilitates two steps for self-splicing, with the geometry involving a U-G wobble pair and coordinated Mg²⁺ ions to activate the ; crystallographic studies of the Azoarcus group I intron at 3.1 resolution confirmed these interactions, underscoring evolutionary conservation. Tertiary packing in RNA often relies on modular interactions between secondary motifs, such as tetraloop-receptor pairings and kissing loops, which enhance stability and specificity. GNRA-type tetraloops (e.g., GAGA) bind to internal receptor loops via non-canonical base pairs and hydrogen bonds, as seen in the P4-P6 domain of the group I , where the tetraloop docks into a G-C rich receptor to rigidify the fold and boost folding rates by orders of magnitude. Kissing loops, formed by Watson-Crick pairing between complementary hairpin loops, mediate intermolecular or intramolecular contacts; for instance, in ribosomal RNAs, these loops pack helices coaxially, contributing to domain assembly, while in viral RNAs like HIV-1 DIS, they initiate dimerization with dissociation constants in the nanomolar range. These interactions, first structurally characterized in the at 2.8 Å resolution, exemplify how RNA tertiary domains achieve architectural modularity. Predicting RNA tertiary structures remains challenging due to the landscape of multiple folding minima and electrostatic complexities, but computational tools have advanced significantly. Early methods like Rosetta's FARFAR protocol and (MD) simulations achieved median RMSDs of 4-6 Å for blind predictions of systems up to 100 . More recent AI-based approaches, such as AlphaFold3 (2024), have improved accuracy for RNA tertiary structures, often achieving RMSDs below 3 Å for diverse domains by integrating multimodal data and , as demonstrated in benchmarks up to 2025. These advances facilitate de novo modeling of complex RNA architectures, though challenges in long-range interactions persist.

Quaternary structure

Nucleic acid multimers and assemblies

Nucleic acid multimers and assemblies refer to quaternary structures where multiple strands of DNA or RNA interact to form complex, non-covalent architectures beyond simple duplexes, often playing roles in genetic recombination, gene regulation, and viral packaging. These structures exhibit higher-order symmetry and branching, stabilized by base stacking, hydrogen bonding, and metal ion coordination, without involvement of proteins. Examples include four-stranded DNA junctions and stacked guanine tetrads, which can adopt dynamic conformations influenced by sequence and environmental factors. In DNA, Holliday junctions represent a canonical four-stranded multimer formed during , consisting of two homologous duplexes connected at a crossover point to create a branched, X-shaped topology. This structure arises from reciprocal strand exchange between aligned DNA molecules, enabling branch migration and resolution to facilitate genetic exchange or repair. Crystal structures reveal that Holliday junctions adopt a stacked conformation with two continuous helices, where the junction angle is approximately 60 degrees, promoting stability through coaxial stacking of base pairs. Beyond recombination, similar four-way junctions occur in branched DNA intermediates, such as those resolved by endonucleases during DNA processing. G-quadruplexes exemplify another DNA multimer, formed by stacking of multiple G-tetrads—planar quartets of four guanine bases linked via Hoogsteen hydrogen bonding—in guanine-rich sequences. Each G-tetrad is stabilized by eight hydrogen bonds (four N1–N7 and four O6–N7 interactions) and a central monovalent cation, typically K⁺, which coordinates between the electronegative O6 atoms to enhance folding and thermal stability. These structures are prevalent in telomeric repeats, where they protect chromosome ends from degradation, and in promoter regions of oncogenes like c-MYC, where they regulate transcription by impeding progression. Parallel or antiparallel topologies yield or basket-like folds, with stability increasing with additional stacked tetrads (up to four in a quadruplex). Branched DNA structures, including Y-junctions and cruciforms, further illustrate multimeric assemblies. Y-junctions feature three duplex arms meeting at a central , mimicking replication forks or repair intermediates, with the junction stabilized by continuous base stacking across arms. Four-way junctions, akin to Holliday structures, can form transiently in synthetic or palindromic sequences, exhibiting open or stacked states depending on ion conditions. Cruciforms emerge from sequences under negative supercoiling, extruding two arms from a central four-way junction, which facilitates site-specific recognition in regulatory contexts. In RNA, multimers often involve symmetric dimerization critical for viral genomes. Kissing loops mediate dimerization in retroviral RNAs, such as HIV-1, where complementary loop sequences from two stem-loops base-pair via 6–8 , forming an initial symmetric complex that extends into extended duplexes for packaging. This kissing interaction is highly specific, with stability enhanced by coaxial stacking of adjacent stems, and is conserved across retroviruses to ensure dimer-selective encapsidation. Similarly, small interfering RNAs (siRNAs) form stable 19–21 duplexes with 2-nucleotide overhangs, adopting A-form helices that guide RNA interference; their thermodynamic asymmetry, with weaker 5'-end pairing, facilitates loading into proteins, though the duplex itself is a nucleic acid-only assembly prior to complexation. The stability of these multimers is profoundly influenced by cations. Monovalent ions like K⁺ are essential for G-quadruplexes, occupying the central channel and increasing melting temperatures by up to 20–30°C compared to Na⁺, due to optimal . In contrast, divalent Mg²⁺ ions preferentially stabilize Holliday junctions and branched structures by binding at the backbone near the , screening electrostatic repulsion and promoting the X-conformation with dissociation constants in the micromolar range. For kissing loops and siRNA duplexes, Mg²⁺ further enhances loop-loop interactions by reducing charge repulsion, though monovalent ions suffice for basic duplex stability.

Interactions with proteins and other molecules

Nucleic acids frequently form quaternary complexes with proteins, enabling essential cellular processes such as DNA packaging, RNA processing, and gene regulation. In DNA, the serves as the primary unit of organization, where approximately 147 base pairs of DNA wrap around a composed of two copies each of H2A, H2B, H3, and H4, forming 1.65 left-handed superhelical turns that compact the and facilitate access for regulatory proteins. This wrapping introduces negative supercoiling, which is modulated by histone interactions to influence dynamics. Beyond individual nucleosomes, adopts higher-order folding structures, such as the 30 nm fiber, where arrays of nucleosomes stack via histone H1-mediated interactions, further condensing DNA into looped domains that partition the into functional compartments affecting transcription and replication.00188-2) These protein-DNA assemblies ensure genomic stability while allowing dynamic remodeling by chromatin-modifying enzymes. In RNA, protein interactions are exemplified by ribonucleoprotein complexes like the , a assembly critical for protein synthesis. The bacterial ribosome consists of a small 30S subunit (containing 16S rRNA and 21 proteins) and a large 50S subunit (with 23S and 5S rRNAs and 34 proteins), which together form the 70S ; the rRNAs provide the catalytic core for formation, while proteins stabilize the structure and facilitate mRNA decoding and tRNA binding.00619-0) Similarly, the , responsible for pre-mRNA removal, is a dynamic ribonucleoprotein machine composed of five small nuclear RNPs (snRNPs: U1, , U4, U5, U6) each harboring snRNAs bound to Sm and other proteins, totaling over 300 protein components that assemble stepwise on the substrate to execute splicing catalysis.00146-9) These RNA-protein structures highlight how proteins scaffold RNA folding and enhance functional specificity. Additional interactions involve transcription factors binding to specific DNA motifs, often recognizing sequences through insertion of alpha-helices into the major groove for sequence-specific contacts with base edges and backbones, as seen in motifs of homeodomain proteins.00392-6) In RNA, non-protein ligands can bind structured domains, such as in riboswitches where the (TPP) folds into a three-helix junction that clamps the ligand via hydrogen bonding and base stacking, regulating allosterically without protein intermediaries.00157-0) Binding modes differ between DNA and RNA: DNA-protein recognition primarily exploits the wider major groove for direct readout of base pairs, enabling high specificity, whereas RNA-protein interfaces rely more on shape complementarity, with proteins molding to RNA's irregular 3D surfaces via electrostatic and van der Waals interactions to accommodate diverse folds.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.