Origin of replication
Origin of replication
Main page
2171057

Origin of replication

logo
Community Hub0 subscribers
Read side by side
from Wikipedia
Models for bacterial (A) and eukaryotic (B) DNA replication initiation. A) Circular bacterial chromosomes contain a cis-acting element, the replicator, that is located at or near replication origins. i) The replicator recruits initiator proteins in a DNA sequence-specific manner, which results in melting of the DNA helix and loading of the replicative helicase onto each of the single DNA strands (ii). iii) Assembled replisomes bidirectionally replicate DNA to yield two copies of the bacterial chromosome. B) Linear eukaryotic chromosomes contain many replication origins. Initiator binding (i) facilitates replicative helicase loading (ii) onto duplex DNA to license origins. iii) A subset of loaded helicases is activated for replisome assembly. Replication proceeds bidirectionally from origins and terminates when replication forks from adjacent active origins meet (iv).

The origin of replication (also called the replication origin) is a particular sequence in a genome at which replication is initiated.[1] Propagation of the genetic material between generations requires timely and accurate duplication of DNA by semiconservative replication prior to cell division to ensure each daughter cell receives the full complement of chromosomes.[2] This can either involve the replication of DNA in living organisms such as prokaryotes and eukaryotes, or that of DNA or RNA in viruses, such as double-stranded RNA viruses.[3] Synthesis of daughter strands starts at discrete sites, termed replication origins, and proceeds in a bidirectional manner until all genomic DNA is replicated. Despite the fundamental nature of these events, organisms have evolved surprisingly divergent strategies that control replication onset.[2] Although the specific replication origin organization structure and recognition varies from species to species, some common characteristics are shared.

Features

[edit]

A key prerequisite for DNA replication is that it must occur with extremely high fidelity and efficiency exactly once per cell cycle to prevent the accumulation of genetic alterations with potentially deleterious consequences for cell survival and organismal viability.[4] Incomplete, erroneous, or untimely DNA replication events can give rise to mutations, chromosomal polyploidy or aneuploidy, and gene copy number variations, each of which in turn can lead to diseases, including cancer.[5][6] To ensure complete and accurate duplication of the entire genome and the correct flow of genetic information to progeny cells, all DNA replication events are not only tightly regulated with cell cycle cues but are also coordinated with other cellular events such as transcription and DNA repair.[2][7][8][9] Additionally, origin sequences commonly have high AT-content across all kingdoms, since repeats of adenine and thymine are easier to separate because their base stacking interactions are not as strong as those of guanine and cytosine.[10]

DNA replication is divided into different stages. During initiation, the replication machineries – termed replisomes – are assembled on DNA in a bidirectional fashion. These assembly loci constitute the start sites of DNA replication or replication origins. In the elongation phase, replisomes travel in opposite directions with the replication forks, unwinding the DNA helix and synthesizing complementary daughter DNA strands using both parental strands as templates. Once replication is complete, specific termination events lead to the disassembly of replisomes. As long as the entire genome is duplicated before cell division, one might assume that the location of replication start sites does not matter; yet, it has been shown that many organisms use preferred genomic regions as origins.[11][12] The necessity to regulate origin location likely arises from the need to coordinate DNA replication with other processes that act on the shared chromatin template to avoid DNA strand breaks and DNA damage.[2][6][9][13][14][15][16][17]

Replicon model

[edit]

More than five decades ago, Jacob, Brenner, and Cuzin proposed the replicon hypothesis to explain the regulation of chromosomal DNA synthesis in E. coli.[18] The model postulates that a diffusible, trans-acting factor, a so-called initiator, interacts with a cis-acting DNA element, the replicator, to promote replication onset at a nearby origin. Once bound to replicators, initiators (often with the help of co-loader proteins) deposit replicative helicases onto DNA, which subsequently drive the recruitment of additional replisome components and the assembly of the entire replication machinery. The replicator thereby specifies the location of replication initiation events, and the chromosome region that is replicated from a single origin or initiation event is defined as the replicon.[2]

A fundamental feature of the replicon hypothesis is that it relies on positive regulation to control DNA replication onset, which can explain many experimental observations in bacterial and phage systems.[18] For example, it accounts for the failure of extrachromosomal DNAs without origins to replicate when introduced into host cells. It further rationalizes plasmid incompatibilities in E. coli, where certain plasmids destabilize each other's inheritance due to competition for the same molecular initiation machinery.[19] By contrast, a model of negative regulation (analogous to the replicon-operator model for transcription) fails to explain the above findings.[18] Nonetheless, research subsequent to Jacob's, Brenner's and Cuzin's proposal of the replicon model has discovered many additional layers of replication control in bacteria and eukaryotes that comprise both positive and negative regulatory elements, highlighting both the complexity and the importance of restricting DNA replication temporally and spatially.[2][20][21][22]

The concept of the replicator as a genetic entity has proven very useful in the quest to identify replicator DNA sequences and initiator proteins in prokaryotes, and to some extent also in eukaryotes, although the organization and complexity of replicators differ considerably between the domains of life.[23][24] While bacterial genomes typically contain a single replicator that is specified by consensus DNA sequence elements and that controls replication of the entire chromosome, most eukaryotic replicators – with the exception of budding yeast – are not defined at the level of DNA sequence; instead, they appear to be specified combinatorially by local DNA structural and chromatin cues.[25][26][27][28][29][30][31][32][33][34] Eukaryotic chromosomes are also much larger than their bacterial counterparts, raising the need for initiating DNA synthesis from many origins simultaneously to ensure timely replication of the entire genome. Additionally, many more replicative helicases are loaded than activated to initiate replication in a given cell cycle. The context-driven definition of replicators and selection of origins suggests a relaxed replicon model in eukaryotic systems that allows for flexibility in the DNA replication program.[23] Although replicators and origins can be spaced physically apart on chromosomes, they often co-localize or are located in close proximity; for simplicity, we will thus refer to both elements as 'origins' throughout this review. Taken together, the discovery and isolation of origin sequences in various organisms represents a significant milestone towards gaining mechanistic understanding of replication initiation. In addition, these accomplishments had profound biotechnological implications for the development of shuttle vectors that can be propagated in bacterial, yeast and mammalian cells.[2][35][36][37]

Bacterial

[edit]
Origin organization and recognition in bacteria. A) Schematic of the architecture of E. coli origin oriC, Thermotoga maritima oriC, and the bipartite origin in Helicobacter pylori. The DUE is flanked on one side by several high- and weak-affinity DnaA-boxes as indicated for E. coli oriC. B) Domain organization of the E. coli initiator DnaA. Magenta circle indicates the single-strand DNA binding site. C) Models for origin recognition and melting by DnaA. In the two-state model (left panel), the DnaA protomers transition from a dsDNA binding mode (mediated by the HTH-domains recognizing DnaA-boxes) to an ssDNA binding mode (mediated by the AAA+ domains). In the loop-back model, the DNA is sharply bent backwards onto the DnaA filament (facilitated by the regulatory protein IHF)[38] so that a single protomer binds both duplex and single-stranded regions. In either instance, the DnaA filament melts the DNA duplex and stabilizes the initiation bubble prior to loading of the replicative helicase (DnaB in E. coli). HTH – helix-turn-helix domain, DUE – DNA unwinding element, IHF – integration host factor.

Most bacterial chromosomes are circular and contain a single origin of chromosomal replication (oriC). Bacterial oriC regions are surprisingly diverse in size (ranging from 250 bp to 2 kbp), sequence, and organization;[39][40] nonetheless, their ability to drive replication onset typically depends on sequence-specific readout of consensus DNA elements by the bacterial initiator, a protein called DnaA.[41][42][43][44] Origins in bacteria are either continuous or bipartite and contain three functional elements that control origin activity: conserved DNA repeats that are specifically recognized by DnaA (called DnaA-boxes), an AT-rich DNA unwinding element (DUE), and binding sites for proteins that help regulate replication initiation.[11][45][46] Interactions of DnaA both with the double-stranded (ds) DnaA-box regions and with single-stranded (ss) DNA in the DUE are important for origin activation and are mediated by different domains in the initiator protein: a Helix-turn-helix (HTH) DNA binding element and an ATPase associated with various cellular activities (AAA+) domain, respectively.[47][48][49][50][51][52][53] While the sequence, number, and arrangement of origin-associated DnaA-boxes vary throughout the bacterial kingdom, their specific positioning and spacing in a given species are critical for oriC function and for productive initiation complex formation.[2][39][40][54][55][56][57][58]

Among bacteria, E. coli is a particularly powerful model system to study the organization, recognition, and activation mechanism of replication origins. E. coli oriC comprises an approximately ~260 bp region containing four types of initiator binding elements that differ in their affinities for DnaA and their dependencies on the co-factor ATP. DnaA-boxes R1, R2, and R4 constitute high-affinity sites that are bound by the HTH domain of DnaA irrespective of the nucleotide-binding state of the initiator.[41][59][60][61][62][63] By contrast, the I, τ, and C-sites, which are interspersed between the R-sites, are low-affinity DnaA-boxes and associate preferentially with ATP-bound DnaA, although ADP-DnaA can substitute for ATP-DnaA under certain conditions.[64][65][66][57] Binding of the HTH domains to the high- and low-affinity DnaA recognition elements promotes ATP-dependent higher-order oligomerization of DnaA's AAA+ modules into a right-handed filament that wraps duplex DNA around its outer surface, thereby generating superhelical torsion that facilitates melting of the adjacent AT-rich DUE.[47][67][68][69] DNA strand separation is additionally aided by direct interactions of DnaA's AAA+ ATPase domain with triplet repeats, so-called DnaA-trios, in the proximal DUE region.[70] The engagement of single-stranded trinucleotide segments by the initiator filament stretches DNA and stabilizes the initiation bubble by preventing reannealing.[51] The DnaA-trio origin element is conserved in many bacterial species, indicating it is a key element for origin function.[70] After melting, the DUE provides an entry site for the E. coli replicative helicase DnaB, which is deposited onto each of the single DNA strands by its loader protein DnaC.[2]

Although the different DNA binding activities of DnaA have been extensively studied biochemically and various apo, ssDNA-, or dsDNA-bound structures have been determined,[50][51][52][68] the exact architecture of the higher-order DnaA-oriC initiation assembly remains unclear. Two models have been proposed to explain the organization of essential origin elements and DnaA-mediated oriC melting. The two-state model assumes a continuous DnaA filament that switches from a dsDNA binding mode (the organizing complex) to an ssDNA binding mode in the DUE (the melting complex).[68][71] By contrast, in the loop-back model, the DNA is sharply bent in oriC and folds back onto the initiator filament so that DnaA protomers simultaneously engage double- and single-stranded DNA regions.[72] Elucidating how exactly oriC DNA is organized by DnaA remains thus an important task for future studies. Insights into initiation complex architecture will help explain not only how origin DNA is melted, but also how a replicative helicase is loaded directionally onto each of the exposed single DNA strands in the unwound DUE, and how these events are aided by interactions of the helicase with the initiator and specific loader proteins.[2]

Archaeal

[edit]
Origin organization and recognition in archaea. A) The circular chromosome of Sulfolobus solfataricus contains three different origins. B) Arrangement of initiator binding sites at two S. solfataricus origins, oriC1 and oriC2. Orc1-1 association with ORB elements is shown for oriC1. Recognition elements for additional Orc1/Cdc6 paralogs are also indicated, while WhiP binding sites have been omitted. C) Domain architecture of archaeal Orc1/Cdc6 paralogs. The orientation of ORB elements at origins leads to directional binding of Orc1/Cdc6 and MCM loading in between opposing ORBs (in B). (m)ORB – (mini-)origin recognition box, DUE – DNA unwinding element, WH – winged-helix domain.

Archaeal replication origins share some but not all of the organizational features of bacterial oriC. Unlike bacteria, Archaea often initiate replication from multiple origins per chromosome (one to four have been reported);[73][74][75][76][77][78][79][80][40] yet, archaeal origins also bear specialized sequence regions that control origin function.[81][82][83] These elements include both DNA sequence-specific origin recognition boxes (ORBs or miniORBs) and an AT-rich DUE that is flanked by one or several ORB regions.[79][84] ORB elements display a considerable degree of diversity in terms of their number, arrangement, and sequence, both among different archaeal species and among different origins in a single species.[74][79][85] An additional degree of complexity is introduced by the initiator, Orc1/Cdc6 in archaea, which binds to ORB regions. Archaeal genomes typically encode multiple paralogs of Orc1/Cdc6 that vary substantially in their affinities for distinct ORB elements and that differentially contribute to origin activities.[79][86][87][88] In Sulfolobus solfataricus, for example, three chromosomal origins have been mapped (oriC1, oriC2, and oriC3), and biochemical studies have revealed complex binding patterns of initiators at these sites.[79][80][89][90] The cognate initiator for oriC1 is Orc1-1, which associates with several ORBs at this origin.[79][87] OriC2 and oriC3 are bound by both Orc1-1 and Orc1-3.[79][87][90] Conversely, a third paralog, Orc1-2, footprints at all three origins but has been postulated to negatively regulate replication initiation.[79][90] Additionally, the WhiP protein, an initiator unrelated to Orc1/Cdc6, has been shown to bind all origins as well and to drive origin activity of oriC3 in the closely related Sulfolobus islandicus.[87][89] Because archaeal origins often contain several adjacent ORB elements, multiple Orc1/Cdc6 paralogs can be simultaneously recruited to an origin and oligomerize in some instances;[88][91] however, in contrast to bacterial DnaA, formation of a higher-order initiator assembly does not appear to be a general prerequisite for origin function in the archaeal domain.[2]

Structural studies have provided insights into how archaeal Orc1/Cdc6 recognizes ORB elements and remodels origin DNA.[91][92] Orc1/Cdc6 paralogs are two-domain proteins and are composed of a AAA+ ATPase module fused to a C-terminal winged-helix fold.[93][94][95] DNA-complexed structures of Orc1/Cdc6 revealed that ORBs are bound by an Orc1/Cdc6 monomer despite the presence of inverted repeat sequences within ORB elements.[91][92] Both the ATPase and winged-helix regions interact with the DNA duplex but contact the palindromic ORB repeat sequence asymmetrically, which orients Orc1/Cdc6 in a specific direction on the repeat.[91][92] Interestingly, the DUE-flanking ORB or miniORB elements often have opposite polarities,[74][79][88][96][97] which predicts that the AAA+ lid subdomains and the winged-helix domains of Orc1/Cdc6 are positioned on either side of the DUE in a manner where they face each other.[91][92] Since both regions of Orc1/Cdc6 associate with a minichromosome maintenance (MCM) replicative helicase,[98][99] this specific arrangement of ORB elements and Orc1/Cdc6 is likely important for loading two MCM complexes symmetrically onto the DUE.[79] Surprisingly, while the ORB DNA sequence determines the directionality of Orc1/Cdc6 binding, the initiator makes relatively few sequence-specific contacts with DNA.[91][92] However, Orc1/Cdc6 severely underwinds and bends DNA, suggesting that it relies on a mix of both DNA sequence and context-dependent DNA structural features to recognize origins.[91][92][100] Notably, base pairing is maintained in the distorted DNA duplex upon Orc1/Cdc6 binding in the crystal structures,[91][92] whereas biochemical studies have yielded contradictory findings as to whether archaeal initiators can melt DNA similarly to bacterial DnaA.[87][88][101] Although the evolutionary kinship of archaeal and eukaryotic initiators and replicative helicases indicates that archaeal MCM is likely loaded onto duplex DNA (see next section), the temporal order of origin melting and helicase loading, as well as the mechanism for origin DNA melting, in archaeal systems remains therefore to be clearly established. Likewise, how exactly the MCM helicase is loaded onto DNA needs to be addressed in future studies.[2]

Eukaryotic

[edit]
Origin organization and recognition in eukaryotes. Specific DNA elements and epigenetic features involved in ORC recruitment and origin function are summarized for S. cerevisiae, S. pombe, and metazoan origins. A schematic of the ORC architecture is also shown, highlighting the arrangement of the AAA+ and winged-helix domains into a pentameric ring that encircles origin DNA. Ancillary domains of several ORC subunits involved in targeting ORC to origins are included. Other regions in ORC subunits may also be involved in initiator recruitment, either by directly or indirectly associating with partner proteins. A few examples are listed. Note that the BAH domain in S. cerevisiae Orc1 binds nucleosomes[102] but does not recognize H4K20me2.[103]
BAH – bromo-adjacent homology domain, WH – winged-helix domain, TFIIB – transcription factor II B-like domain in Orc6, G4 – G quadruplex, OGRE – origin G-rich repeated element. ORC gene names are indicated by a single number; e.g. 3 refers to ORC3.

Origin organization, specification, and activation in eukaryotes are more complex than in bacterial or archaeal domains and significantly deviate from the paradigm established for prokaryotic replication initiation. The large genome sizes of eukaryotic cells, which range from 12 Mbp in S. cerevisiae to more than 100 Gbp in some plants, necessitates that DNA replication starts at several hundred (in budding yeast) to tens of thousands (in humans) origins to complete DNA replication of all chromosomes during each cell cycle.[21][30] With the exception of S. cerevisiae and related Saccharomycotina species, eukaryotic origins do not contain consensus DNA sequence elements but their location is influenced by contextual cues such as local DNA topology, DNA structural features, and chromatin environment.[23][29][31]

Eukaryotic origin function relies on a conserved initiator protein complex to load replicative helicases onto DNA during the late M and G1 phases of the cell cycle, a step known as origin licensing.[104] In contrast to their bacterial counterparts, replicative helicases in eukaryotes are loaded onto origin duplex DNA in an inactive, double-hexameric form and only a subset of them (10-20% in mammalian cells) is activated during any given S phase, events that are referred to as origin firing.[105][106][107]

The location of active eukaryotic origins is therefore determined on at least two different levels, origin licensing to mark all potential origins, and origin firing to select a subset that permits assembly of the replication machinery and initiation of DNA synthesis. The extra licensed origins serve as backup and are activated only upon slowing or stalling of nearby replication forks, ensuring that DNA replication can be completed when cells encounter replication stress.[108][109] In the absence of stress, firing of extra origins is suppressed by a replication-associated signaling mechanism.[110][111] Together, the excess of licensed origins and the tight cell cycle control of origin licensing and firing embody two important strategies to prevent under- and overreplication and to maintain the integrity of eukaryotic genomes.[2]

Early studies in S. cerevisiae indicated that replication origins in eukaryotes might be recognized in a DNA-sequence-specific manner analogously to those in prokaryotes. In budding yeast, the search for genetic replicators lead to the identification of autonomously replicating sequences (ARS) that support efficient DNA replication initiation of extrachromosomal DNA.[112][113][114] These ARS regions are approximately 100-200 bp long and exhibit a multipartite organization, containing A, B1, B2, and sometimes B3 elements that together are essential for origin function.[115][116] The A element encompasses the conserved 11 bp ARS consensus sequence (ACS),[117][118] which, in conjunction with the B1 element, constitutes the primary binding site for the heterohexameric origin recognition complex (ORC), the eukaryotic replication initiator.[119][120][121][122] Within ORC, five subunits are predicated on conserved AAA+ ATPase and winged-helix folds and co-assemble into a pentameric ring that encircles DNA.[122][123][124] In budding yeast ORC, DNA binding elements in the ATPase and winged-helix domains, as well as adjacent basic patch regions in some of the ORC subunits, are positioned in the central pore of the ORC ring such that they aid the DNA-sequence-specific recognition of the ACS in an ATP-dependent manner.[122][125] By contrast, the roles of the B2 and B3 elements are less clear. The B2 region is similar to the ACS in sequence and has been suggested to function as a second ORC binding site under certain conditions, or as a binding site for the replicative helicase core.[126][127][128][129][130] Conversely, the B3 element recruits the transcription factor Abf1, albeit B3 is not found at all budding yeast origins and Abf1 binding does not appear to be strictly essential for origin function.[2][115][131][132]

Origin recognition in eukaryotes other than S. cerevisiae or its close relatives does not conform to the sequence-specific read-out of conserved origin DNA elements. Pursuits to isolate specific chromosomal replicator sequences more generally in eukaryotic species, either genetically or by genome-wide mapping of initiator binding or replication start sites, have failed to identify clear consensus sequences at origins.[133][134][135][136][137][138][139][140][141][142][143][144] Thus, sequence-specific DNA-initiator interactions in budding yeast signify a specialized mode for origin recognition in this system rather than an archetypal mode for origin specification across the eukaryotic domain. Nonetheless, DNA replication does initiate at discrete sites that are not randomly distributed across eukaryotic genomes, arguing that alternative means determine the chromosomal location of origins in these systems. These mechanisms involve a complex interplay between DNA accessibility, nucleotide sequence skew (both AT-richness and CpG islands have been linked to origins), Nucleosome positioning, epigenetic features, DNA topology and certain DNA structural features (e.g., G4 motifs), as well as regulatory proteins and transcriptional interference.[11][12][28][29][31][145][146][138][147] Importantly, origin properties vary not only between different origins in an organism and among species, but some can also change during development and cell differentiation. The chorion locus in Drosophila follicle cells constitutes a well-established example for spatial and developmental control of initiation events. This region undergoes DNA-replication-dependent gene amplification at a defined stage during oogenesis and relies on the timely and specific activation of chorion origins, which in turn is regulated by origin-specific cis-elements and several protein factors, including the Myb complex, E2F1, and E2F2.[148][149][150][151][152] This combinatorial specification and multifactorial regulation of metazoan origins has complicated the identification of unifying features that determine the location of replication start sites across eukaryotes more generally.[2]

To facilitate replication initiation and origin recognition, ORC assemblies from various species have evolved specialized auxiliary domains that are thought to aid initiator targeting to chromosomal origins or chromosomes in general. For example, the Orc4 subunit in S. pombe ORC contains several AT-hooks that preferentially bind AT-rich DNA,[153] while in metazoan (animal) ORC the TFIIB-like domain of Orc6 is thought to perform a similar function.[154] Metazoan Orc1 proteins also harbor a bromo-adjacent homology (BAH) domain that interacts with H4K20me2-nucleosomes.[103] Particularly in mammalian cells, H4K20 methylation has been reported to be required for efficient replication initiation, and the Orc1's BAH domain facilitates ORC association with chromosomes and Epstein-Barr virus origin-dependent replication.[155][156][157][158][159] Therefore, it is intriguing to speculate that both observations are mechanistically linked at least in a subset of metazoa, but this possibility needs to be further explored in future studies. In addition to the recognition of certain DNA or epigenetic features, ORC also associates directly or indirectly with several partner proteins that could aid initiator recruitment, including LRWD1, PHIP (or DCAF14), HMGA1a, among others.[27][160][161][162][163][164][165][166] Interestingly, Drosophila ORC, like its budding yeast counterpart, bends DNA and negative supercoiling has been reported to enhance DNA binding of this complex, suggesting that DNA shape and malleability might influence the location of ORC binding sites across metazoan genomes.[25][122][167][168][169] A molecular understanding for how ORC's DNA binding regions might support the read out of structural properties of the DNA duplex in metazoans rather than of specific DNA sequences as in S. cerevisiae awaits high-resolution structural information of DNA-bound metazoan initiator assemblies. Likewise, whether and how different epigenetic factors contribute to initiator recruitment in metazoan systems is poorly defined and is an important question that needs to be addressed in more detail.[2]

Once recruited to origins, ORC and its co-factors Cdc6 and Cdt1 drive the deposition of the minichromosome maintenance 2-7 (Mcm2-7) complex onto DNA.[104][170] Like the archaeal replicative helicase core, Mcm2-7 is loaded as a head-to-head double hexamer onto DNA to license origins.[105][106][107] In S-phase, Dbf4-dependent kinase (DDK) and Cyclin-dependent kinase (CDK) phosphorylate several Mcm2-7 subunits and additional initiation factors to promote the recruitment of the helicase co-activators Cdc45 and GINS, DNA melting, and ultimately bidirectional replisome assembly at a subset of the licensed origins.[22][171] In both yeast and metazoans, origins are free or depleted of nucleosomes, a property that is crucial for Mcm2-7 loading, indicating that chromatin state at origins regulates not only initiator recruitment but also helicase loading.[139][172][173][174][175][176] A permissive chromatin environment is further important for origin activation and has been implicated in regulating both origin efficiency and the timing of origin firing. Euchromatic origins typically contain active chromatin marks, replicate early, and are more efficient than late-replicating, heterochromatic origins, which conversely are characterized by repressive marks.[21][174][177] Not surprisingly, several chromatin remodelers and chromatin-modifying enzymes have been found to associate with origins and certain initiation factors,[178][179] but how their activities impact different replication initiation events remains largely obscure. Remarkably, cis-acting "early replication control elements" (ECREs) have recently also been identified to help regulate replication timing and to influence 3D genome architecture in mammalian cells.[180] Understanding the molecular and biochemical mechanisms that orchestrate this complex interplay between 3D genome organization, local and higher-order chromatin structure, and replication initiation is an exciting topic for further studies.[2]

Why have metazoan replication origins diverged from the DNA sequence-specific recognition paradigm that determines replication start sites in prokaryotes and budding yeast? Observations that metazoan origins often co-localize with promoter regions in Drosophila and mammalian cells and that replication-transcription conflicts due to collisions of the underlying molecular machineries can lead to DNA damage suggest that proper coordination of transcription and replication is important for maintaining genome stability.[134][136][138][141][181][14][15][182] Recent findings also point to a more direct role of transcription in influencing the location of origins, either by inhibiting Mcm2-7 loading or by repositioning of loaded Mcm2-7 on chromosomes.[183][147] Sequence-independent (but not necessarily random) initiator binding to DNA additionally allows for flexibility in specifying helicase loading sites and, together with transcriptional interference and the variability in activation efficiencies of licensed origins, likely determines origin location and contributes to the co-regulation of DNA replication and transcriptional programs during development and cell fate transitions. Computational modeling of initiation events in S. pombe, as well as the identification of cell-type specific and developmentally-regulated origins in metazoans, are in agreement with this notion.[135][143][184][185][186][187][188][147] However, a large degree of flexibility in origin choice also exists among different cells within a single population,[138][144][185] albeit the molecular mechanisms that lead to the heterogeneity in origin usage remain ill-defined. Mapping origins in single cells in metazoan systems and correlating these initiation events with single-cell gene expression and chromatin status will be important to elucidate whether origin choice is purely stochastic or controlled in a defined manner.[2]

Viral

[edit]
HHV-6 genome
Genome of human herpesvirus-6, a member of the Herpesviridae family. The origin of replication is labeled as "OOR."

Viruses often possess a single origin of replication.

A variety of proteins have been described as being involved in viral replication. For instance, Polyoma viruses utilize host cell DNA polymerases, which attach to a viral origin of replication if the T antigen is present.

Variations

[edit]

Although DNA replication is essential for genetic inheritance, defined, site-specific replication origins are technically not a requirement for genome duplication as long as all chromosomes are copied in their entirety to maintain gene copy numbers. Certain bacteriophages and viruses, for example, can initiate DNA replication by homologous recombination independent of dedicated origins.[189] Likewise, the archaeon Haloferax volcanii uses recombination-dependent initiation to duplicate its genome when its endogenous origins are deleted.[75] Similar non-canonical initiation events through break-induced or transcription-initiated replication have been reported in E. coli and S. cerevisiae.[190][191][192][193][194] Nonetheless, despite the ability of cells to sustain viability under these exceptional circumstances, origin-dependent initiation is a common strategy universally adopted across different domains of life.[2]

In addition, detailed studies of replication initiation have focused on a limited number of model systems. The extensively studied fungi and metazoa are both members of the opisthokont supergroup and exemplify only a small fraction of the evolutionary landscape in the eukaryotic domain.[195] Comparably few efforts have been directed at other eukaryotic model systems, such as kinetoplastids or tetrahymena.[196][197][198][199][200][201][202] Surprisingly, these studies have revealed interesting differences both in origin properties and in initiator composition compared to yeast and metazoans.[2]

See also

[edit]

References

[edit]

Further reading

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
The origin of replication is a specific DNA sequence that serves as the starting point for DNA replication, where initiator proteins bind to recruit the replication machinery and unwind the double helix, enabling bidirectional synthesis of daughter strands to duplicate the genome prior to cell division.[1] This process ensures the accurate and timely copying of genetic information, coordinated with cell cycle progression, transcription, and DNA repair mechanisms.[2] In prokaryotes, replication typically initiates from a single origin per circular chromosome, such as oriC in Escherichia coli, a ~245 base pair sequence containing multiple DnaA boxes and an AT-rich duplex unwinding element (DUE).[2] The initiator protein DnaA binds cooperatively to these high-affinity sites, oligomerizes to melt the DNA at the DUE, and facilitates the loading of the DnaB helicase and other replisome components, ensuring once-per-cell-cycle replication.[1] This tightly regulated, sequence-specific mechanism supports the rapid replication of relatively small bacterial genomes. In contrast, eukaryotic genomes employ thousands of origins—approximately 1,600 (as of 2024) in budding yeast and 20,000–50,000 in humans—distributed across linear chromosomes to accommodate their larger size and complexity.[2][3][4] These origins often lack strict sequence consensus, except in certain model organisms like Saccharomyces cerevisiae, and their specification is influenced by chromatin accessibility, nucleosome positioning, DNA topology, and epigenetic marks rather than fixed motifs alone.[5] The origin recognition complex (ORC), a heterohexameric protein, binds to origins during G1 phase to license replication by loading the MCM2-7 helicase, with activation occurring later in S phase under cyclin-dependent kinase control.[1] This distributed and flexible system allows eukaryotes to replicate vast genomes efficiently while preventing re-replication within a single cell cycle.[2]

Fundamental Concepts

Definition and Role

The origin of replication is a discrete genomic locus where DNA unwinding initiates, marking the starting point for the assembly of replication forks that proceed either bidirectionally or unidirectionally to duplicate the genome.[1] This site enables the precise coordination of DNA synthesis, ensuring that parental strands serve as templates for the formation of complementary daughter strands by replicative polymerases.[2] In the cell cycle, origins play a central role by synchronizing genome duplication with S-phase entry, allowing each chromosome to be copied exactly once per cycle.[2] This regulation is achieved through licensing mechanisms, where origins are primed during G1 phase by the formation of pre-replicative complexes involving initiator proteins, which are then activated to prevent re-initiation and over-replication within the same cycle.[2] The basic initiation process begins with the specific recognition and binding of initiator proteins to the origin, followed by localized destabilization of the DNA helix—often facilitated by AT-rich regions—and the subsequent recruitment of helicases and polymerase machinery to establish active replication forks.[1] The proper functioning of origins is crucial for maintaining genomic stability, as disruptions can trigger replication stress, leading to DNA damage, mutations, chromosomal aberrations, or cell death.[1] Errors in origin activity have been implicated in diseases such as cancer, underscoring their role in preventing genomic instability.[1] This concept traces back to the historical discovery by Jacob, Brenner, and Cuzin in 1963, who proposed the replicon model linking origins to the regulated initiation of DNA replication.[6]

Replicon Model

The replicon model was proposed in 1963 by François Jacob, Sydney Brenner, and François Cuzin, drawing from genetic studies on DNA replication in Escherichia coli. This framework emerged from observations of bacterial chromosome behavior during conjugation and plasmid maintenance, positing that DNA replication is organized into discrete, independently controlled units. The model integrated concepts from earlier work on operons, adapting them to explain how replication initiates and is regulated at specific chromosomal sites. At its core, a replicon is defined as a chromosomal or extrachromosomal unit of DNA capable of autonomous replication, controlled by an independent initiation site. It consists of two primary components: the replicator, a cis-acting DNA sequence that functions as the origin where replication begins, and the initiator, a trans-acting diffusible factor (typically a protein) that recognizes the replicator and activates the replication machinery. Replicons also encompass associated control elements, such as partition systems, which ensure the stable segregation of replicated daughter molecules to progeny cells during division. In organisms with multiple origins per chromosome, such as eukaryotes, the length of a replicon is determined by the distance between adjacent origins and is typically 100–200 kb. In bacteria like E. coli with a single chromosomal origin, the replicon spans the entire genome of approximately 4.6 Mb.[7][8][9][10] Experimental evidence for replicon autonomy stemmed from conjugation studies in E. coli, where the F plasmid integrates into the chromosome to form Hfr strains, allowing transfer of chromosomal segments during mating. Upon excision, these hybrid molecules demonstrated independent replication, confirming that both plasmid and chromosomal segments function as self-sufficient replicons when separated. Such transfers revealed that replication control is localized to specific initiation sites, independent of the broader chromosomal context.[11] The replicon model underscores the origin as the rate-limiting element in replication, dictating the timing of initiation and the speed of fork progression to complete genome duplication once per cell cycle. This has profound implications for understanding replication fidelity and coordination in prokaryotes, influencing subsequent research on replication control across domains of life.[12]

Structural Features

Sequence Motifs

Origins of replication contain conserved DNA sequence motifs that serve as recognition sites for the replication machinery, enabling the initial steps of DNA unwinding and assembly of the pre-replication complex. These motifs are modular elements that collectively define the origin's functionality, with variations in arrangement contributing to efficiency and specificity across different organisms.[13] AT-rich regions, often referred to as DNA unwinding elements (DUEs), are a hallmark of replication origins and typically exhibit 50-70% AT content, which lowers the melting temperature of the DNA duplex to facilitate initial strand separation. These regions, spanning 20-50 base pairs, are prone to melting under physiological conditions due to weaker hydrogen bonding in AT pairs compared to GC pairs, allowing the exposure of single-stranded DNA for subsequent binding events. This structural feature is conserved in origins from bacteria, archaea, and eukaryotes, underscoring its essential role in replication initiation.[1][14] Consensus sequences represent short, highly conserved nucleotide motifs within origins that provide high-affinity binding sites, typically 9-17 base pairs in length. In bacterial origins, these include 9-bp motifs, while eukaryotic examples like yeast feature 11-17 bp autonomous consensus sequences (ACS); the binding strength and specificity are modulated by the precise orientation and spacing of these motifs relative to one another. Such arrangements ensure selective recognition and activation, with mismatches or altered spacing reducing origin efficiency by orders of magnitude.[15][16] Bending sites contribute to the architectural flexibility of origins through intrinsically curved DNA segments, often created by phased A-tracts—runs of 4-6 adenine residues spaced every 10-11 base pairs to align on one face of the helix. These elements induce a bend of 40-90 degrees, promoting the compaction and distortion of DNA necessary for the assembly of multi-protein complexes at the origin. By facilitating DNA looping or wrapping, bending sites enhance the local accessibility of adjacent motifs during initiation.[17] The overall length of replication origins varies from approximately 100 to 500 base pairs, accommodating a modular arrangement of the aforementioned motifs in a non-random fashion. This variability allows for evolutionary adaptation while maintaining core functionality, with shorter origins often relying on tightly packed elements and longer ones incorporating auxiliary sequences for regulation. The modular nature ensures that disruption of individual motifs can impair origin firing without abolishing it entirely, highlighting their interdependent roles.[2][18]

Associated Proteins

Initiator proteins play a central role in recognizing and activating replication origins across domains of life. In prokaryotes, the DnaA protein binds ATP and assembles into oligomeric complexes on origin DNA, forming a right-handed helical filament that wraps and distorts the double helix to promote unwinding. In eukaryotes, the origin recognition complex (ORC), a heterohexameric assembly of Orc1-6 subunits, similarly exhibits ATP-dependent DNA binding, with Orc1's ATPase activity facilitating stable association and oligomerization into ring-like structures that encircle the origin.[19] These nucleoprotein complexes, formed through ATP-driven oligomerization, serve as platforms for subsequent replication machinery recruitment while referencing underlying DNA sequence motifs as primary binding targets.[20] Helicase recruitment follows initiator binding, enabling the initial separation of DNA strands. In both prokaryotes and eukaryotes, initiator proteins coordinate the loading of replicative helicases: DnaA recruits DnaB in bacteria, while ORC, in conjunction with Cdc6 and Cdt1, loads the MCM2-7 complex as head-to-head double hexamers that encircle duplex DNA without initial unwinding.[21] Upon activation at the G1/S transition, these double hexamers encircle and translocate along single-stranded DNA, unwinding the duplex processively at rates of several hundred base pairs per second in prokaryotes and tens of base pairs per second in eukaryotes to establish bidirectional replication forks.[22][2] Accessory factors support origin activation by stabilizing unwound regions and managing topological constraints. Single-strand binding proteins (SSBs), such as SSB in prokaryotes and replication protein A (RPA) in eukaryotes, coat the exposed single-stranded DNA to prevent reannealing, secondary structure formation, and nucleolytic degradation, thereby facilitating polymerase access.[23] Concurrently, topoisomerases I and II alleviate torsional stress generated by helicase unwinding; type IA topoisomerases relax negative supercoils behind the fork, while type II enzymes decatenate intertwined daughter strands and relieve positive supercoils ahead of the fork.[24] Regulation of initiator and accessory proteins ensures precise control over origin licensing and firing. Phosphorylation by cyclin-dependent kinases (CDKs) in eukaryotes targets components like ORC subunits and Cdc6, promoting their dissociation from origins or inactivation to prevent re-replication.[25] Ubiquitination further modulates stability; for instance, CDK-phosphorylated Cdc6 is marked by SCF ubiquitin ligases for proteasomal degradation in S phase.[26] Additionally, Cdc6's intrinsic ATPase activity, stimulated upon MCM loading, disengages Cdc6 from the ORC-MCM complex, enforcing unidirectional licensing and inhibiting premature re-assembly.[27] These modifications collectively synchronize replication with the cell cycle, with analogous ATP hydrolysis mechanisms regulating DnaA activity in prokaryotes.[28]

Prokaryotic Origins

Bacterial Origins

In bacteria, chromosomal replication initiates at a unique origin known as oriC in Escherichia coli, which serves as the paradigm for prokaryotic origins and exemplifies the replicon model where a single origin controls replication of the entire chromosome.[29] The oriC locus spans approximately 245 base pairs and features 11 DnaA binding sites, termed DnaA boxes, including three high-affinity sites (R1, R2, and R4) that preferentially bind the initiator protein DnaA and several low-affinity τ sites that contribute to complex assembly under specific conditions. Adjacent to these boxes lies an AT-rich region with three tandem 13-bp repeats, which acts as the duplex unwinding element (DUE) to facilitate initial DNA strand separation during initiation.[30] This compact structure ensures precise recognition and activation once per cell cycle, with E. coli maintaining a single oriC per chromosome to coordinate bidirectional replication forks that progress to the terminus.[31] Initiation at oriC is orchestrated by the DnaA protein in its ATP-bound form (DnaA-ATP), which first occupies the high-affinity R1, R2, and R4 boxes to form a nucleoprotein complex, then recruits additional DnaA molecules to the low-affinity sites and DUE.[29] The integration host factor (IHF) binds nearby, inducing significant DNA bending that wraps the origin around the DnaA complex, thereby promoting torsional stress and melting of the AT-rich repeats within the DUE to expose single-stranded DNA.[31] This unwound region serves as a platform for loading two hexameric DnaB helicases in opposite orientations, delivered by the DnaC loader protein, which encircles the single-stranded DNA and unwinds the duplex ahead of the advancing forks to establish the replisome. Insights into DnaA's DNA recognition have been advanced by the 2003 crystal structure of its domain IV (the DNA-binding domain) complexed with a DnaA box, revealing how helix-turn-helix motifs insert into the major groove for sequence-specific binding.[32] To prevent over-replication, initiation is tightly regulated through multiple mechanisms, including sequestration of the newly replicated, hemimethylated oriC by the SeqA protein, which binds GATC sites and blocks DnaA access for about one-third of the cell cycle. Additional control occurs via titration of excess DnaA at the datA locus, a chromosomal site ~0.47 Mb from oriC containing multiple DnaA boxes that sequester the initiator and promote its conversion from active ATP-bound to inactive ADP-bound form through hydrolysis.[30] The critical role of DnaA was established in the 1970s through isolation of temperature-sensitive dnaA mutants (dnaAts), which cease initiation at non-permissive temperatures while allowing elongation to complete, demonstrating DnaA's specific function in origin activation. While most bacteria like E. coli rely on a single oriC per chromosome, variations occur in species with multiple chromosomes; for example, Vibrio cholerae has two origins—oriC1 on the large chromosome I and oriC2 on the small chromosome II—enabling staggered replication timing that facilitates resolution of chromosome dimers via site-specific recombination during segregation. This dual-origin system ensures coordinated replication and proper partitioning in a bacterium with a naturally bipartite genome, contrasting with the unimodal control in monogenomic species.

Archaeal Origins

Archaeal genomes typically contain multiple origins of replication, ranging from 1 to 5 per chromosome, which contrasts with the single origin found in most bacteria.[33] For instance, species in the genus Sulfolobus, such as S. islandicus and S. solfataricus, possess three active origins.[34] Each origin spans approximately 500 base pairs and features conserved 17-base pair sequences known as origin recognition boxes (ORBs), which serve as binding sites for initiator proteins.[35] These ORBs are AT-rich and facilitate the initial recognition step in replication initiation, with AT-rich DNA unwinding elements (DUEs) commonly present across archaeal origins to promote strand separation.[36] Initiation at archaeal origins is mediated by proteins homologous to eukaryotic Cdc6 and Orc1, often encoded by multiple genes adjacent to the origins themselves. These Cdc6/Orc1 homologs bind to ORBs either as monomers or dimers, with each ORB typically accommodating one monomer in species like Sulfolobus.[37] Structural studies have revealed that ATP binding induces conformational remodeling in these proteins, enabling DNA distortion and helicase recruitment in a manner analogous to the eukaryotic origin recognition complex (ORC).[38] In Sulfolobus, for example, Orc1-1 forms a complex with the origin DNA upon ATP hydrolysis, stabilizing the binding and preparing the site for further assembly. Recent 2025 studies have identified nucleoid-associated proteins that bind essential motifs within archaeal origins, further refining models of initiation specificity.[37][39] The replication mechanism proceeds with the loading of the MCM helicase, facilitated by the WhiP protein, a homolog of eukaryotic Cdt1, which ensures proper encircling of the DNA duplex.[40] Once loaded, the MCM helicases establish bidirectional replication forks that progress from each origin, coordinating with the cell cycle to complete genome duplication.[41] Archaeal origins are frequently integrated with transcription units, as many are located near or overlap with promoters of replication-related genes, allowing coordinated regulation of replication and transcription to minimize conflicts in these compact genomes.[42] Diversity in archaeal replication origins is evident across phyla, with Crenarchaeota (e.g., Sulfolobus and Pyrobaculum) generally featuring multiple, well-defined origins rich in ORBs, while Euryarchaeota (e.g., Haloferax and Methanothermobacter) exhibit greater variability, including cases with fewer origins or reliance on different initiator combinations.[36] A 2024 review highlights spatiotemporal control mechanisms in hyperthermophilic archaea, such as temporally staggered firing of origins to manage replication timing under extreme conditions, ensuring efficient progression despite thermal stress.[41]

Eukaryotic Origins

Model Organisms

In the budding yeast Saccharomyces cerevisiae, autonomously replicating sequence (ARS) elements serve as well-defined origins of replication, first identified in the late 1970s through assays demonstrating their ability to maintain plasmids independently of the chromosome. These compact elements, typically 100-150 base pairs in length, contain an essential ARS consensus sequence (ACS) with the motif 5'-TTTATYRTTTYA-3', where Y denotes C or T and R denotes A or G.[43] The S. cerevisiae genome contains approximately 400-500 such origins, which activate stochastically during S phase to ensure timely and complete DNA duplication without over-replication.[44] The fruit fly Drosophila melanogaster provides another key eukaryotic model, with genome-wide studies identifying roughly 5,000 replication origins distributed across its chromosomes.[45] Many of these origins are associated with CG-rich regions, which exhibit open chromatin and facilitate efficient initiation similar to CpG islands in vertebrates.[46] In early embryos, where cell cycles are abbreviated to under 10 minutes, origins are closely spaced at intervals of 5-10 kilobases to support the extraordinarily rapid genome replication required for syncytial divisions.[47] Replication initiation in these model organisms follows a conserved mechanism: the origin recognition complex (ORC) binds the ACS or analogous sequence motifs, recruiting Cdc6 and Cdt1 to load double hexamers of the MCM helicase onto origin DNA during G1 phase. Cyclin-dependent kinase (CDK) phosphorylation then regulates the process by inhibiting re-loading of MCM after G1 and promoting helicase activation in S phase through targeted modifications of ORC, Cdc6, and accessory factors.[48] This licensing strategy is broadly shared among eukaryotes. Key experimental approaches for mapping origins in yeast and Drosophila include two-dimensional gel electrophoresis, which visualizes replication bubble and fork structures in genomic DNA, and chromatin immunoprecipitation coupled with sequencing (ChIP-seq), which profiles binding of ORC and MCM proteins at high resolution across the genome.[49] A 2025 study in budding yeast elucidated the precise timing of MCM double hexamer assembly at origins like ARS1, demonstrating how CDK-mediated constraints on this step have evolutionarily shaped origin structure and firing efficiency.[50]

Mammalian Origins

In mammalian cells, including humans, origins of replication lack the consensus sequences characteristic of simpler eukaryotes like yeast, exhibiting instead a high degree of flexibility and sequence independence that complicates their identification and characterization.[51] This variability arises from contextual factors such as chromatin structure and epigenetic marks, allowing origins to form dynamically without fixed motifs.[52] The human genome contains an estimated 50,000 active origins per cell cycle, though the total potential number, including dormant ones, may reach 100,000, with inter-origin spacing typically ranging from 50 to 300 kb.[53] Many of these origins remain dormant during normal replication but can fire under replicative stress to ensure complete genome duplication and maintain stability.[54] Identification of mammalian origins has relied on methods like nascent strand abundance sequencing (NASBA), which quantifies short nascent DNA strands enriched at active origins to map their locations.[55] More recently, computational tools such as the 2023 deep learning model Ori-FinderH have improved prediction by analyzing Z-curve features of DNA sequences, achieving approximately 92% accuracy in identifying human origins of varying lengths.[56] The origin recognition complex (ORC), composed of subunits ORC1-6, binds origins in mammals but does so dynamically, with subunit associations fluctuating across the cell cycle rather than maintaining stable chromatin tethering.[57] A 2025 study using BrdU incorporation and single-molecule nanopore sequencing revealed that most replication initiation events are dispersed throughout gene bodies, rather than being confined to promoters, highlighting the stochastic nature of origin usage in human cells.[58] Regulation of mammalian origins involves tissue-specific timing programs, where origin firing correlates with cell-type-specific chromatin landscapes and transcription patterns.[59] ORC1 is subject to ubiquitination and proteasomal degradation during the S-to-M transition, preventing re-licensing and ensuring once-per-cycle replication.[60] Dysregulated origin firing contributes to genomic instability in cancer, as seen in human papillomavirus (HPV) integrations at common fragile sites, where replication stress promotes breakage and viral genome insertion.[61]

Viral Origins

Prokaryotic Viruses

Prokaryotic viruses, particularly bacteriophages, exhibit origins of replication that are compact and often leverage host bacterial machinery while incorporating specialized viral elements to ensure efficient propagation within infected cells. These origins enable rapid DNA synthesis tailored to the lytic or lysogenic cycles, with many phages initiating replication bidirectionally before transitioning to alternative modes for amplification. Such adaptations highlight the evolutionary fine-tuning of viral replication to bacterial hosts, drawing loosely from chromosomal origins like those in Escherichia coli oriC for sequence motifs but optimized for viral lifecycle demands.[62] A prominent example is the origin of replication (ori) in bacteriophage λ, a temperate phage that infects E. coli. The λ ori spans approximately a 200-bp region containing four iterons—repeated 17- to 19-bp sequences of hyphenated dyad symmetry—to which the viral O protein binds as dimers, forming a nucleoprotein complex that recruits host DnaB helicase for unwinding.[63][64] Replication initiates bidirectionally in a theta mode from this site early in infection, producing circular daughter molecules, before switching to a rolling-circle mechanism mediated by viral P protein and host factors to generate concatemers for packaging.[65][66] In contrast, the single-stranded DNA bacteriophage ΦX174 employs a distinct origin suited to its genome structure. Its 5,386-bp circular genome, fully sequenced in the 1970s, features the origin at nucleotide 4308, characterized by hairpin loops that serve as recognition sites for the host E. coli Rep helicase.[67] The viral gene A protein nicks the replicative form at this site to initiate synthesis, with Rep helicase unwinding the duplex while binding to the hairpin structures, facilitating primer-independent leading-strand synthesis and reliance on host primase for the lagging strand.[68][69] This setup enables conversion of the single-stranded viral genome to a double-stranded replicative form, followed by asymmetric rolling-circle replication for progeny production.[70] Bacteriophage P1, which maintains as a low-copy plasmid prophage, utilizes a plasmid-like origin with a dedicated partition module for stable segregation. The system includes parS centromere-like sites bound by ParB protein, which interacts with ParA ATPase to ensure equitable distribution during host division, independent of the host's DnaA for partitioning but requiring RepA for replication initiation.[71] RepA binds iterons at the origin to activate a secondary, DnaA-independent replicon, allowing controlled copy number maintenance in the lysogenic state before lytic replication shifts to host-dependent modes.[72] Host-virus interactions further refine these origins, as seen in phage T7, where the bifunctional gene 4 protein (gp4) acts as both helicase and primase. The primase domain recognizes specific hairpin sequences bearing 5'-GTC-3' and 5'-ATC-3' motifs on the lagging-strand template, synthesizing tetraribonucleotide primers every 40-50 nucleotides to support continuous replication fork progression.[73]

Eukaryotic Viruses

Eukaryotic viruses that infect mammalian and other eukaryotic hosts utilize origins of replication (oris) that are compact, autonomous elements capable of directing DNA synthesis using a mix of viral and host proteins. These oris typically feature sequence motifs for viral initiator protein binding and AT-rich regions prone to unwinding, mimicking aspects of host chromosomal origins to hijack cellular replication machinery. Unlike prokaryotic phages, these viral oris support replication of larger genomes within the complex eukaryotic nucleus, often linking to viral lifecycle stages such as latency or lytic growth.[74] A prominent example is the simian virus 40 (SV40) ori, which consists of a core region with three pentanucleotide T-antigen binding sites, an early palindrome, and an adjacent AT-rich DNA unwinding element (DUE). The upstream enhancer contains one or two 72-bp repeats that are also bound by the viral T-antigen helicase, enhancing replication efficiency by facilitating T-antigen assembly into double hexamers that unwind the DUE.[75] Studies from the 1980s established that SV40 initiation relies on T-antigen for origin recognition and unwinding, independent of host origin recognition complex (ORC) binding to the viral ori, though it recruits host MCM helicase for elongation.[76][77] In Epstein-Barr virus (EBV), the latent origin oriP comprises two key elements: the family of repeats (FR) for plasmid segregation and the dyad symmetry (DS) element, a palindromic sequence bound by the viral EBNA1 protein to recruit host replication factors. EBNA1 binding to DS establishes the replication start site, while FR binding stabilizes the episome during cell division; this dual function supports persistent infection linked to diseases like Burkitt's lymphoma.[78] EBV also employs a distinct lytic origin, oriLyt, activated during viral reactivation for amplified genome production, contrasting oriP's role in latency.[78] Human papillomavirus (HPV) oris are regulated by the viral E2 protein, which binds upstream regulatory elements and recruits the E1 helicase to three specific binding sites near the replication start, including palindromic E1 sites that facilitate E1 oligomerization. E1 forms a hexameric complex at the ori to unwind DNA, initiating bidirectional replication dependent on host polymerases. Recent structural studies have revealed the architecture of the E1 hexamer and its interaction with E2, highlighting how E2 stabilizes E1 loading for efficient viral persistence in epithelial cells.[79] Adenoviruses initiate replication at origins within their inverted terminal repeats (ITRs), where the viral terminal protein (TP) binds and forms a covalent linkage with the 5' dCMP, priming strand-displacement synthesis without RNA primers. The minimal ori spans the terminal 18 bp of the ITRs, featuring inverted repeats that position TP and the viral DNA polymerase for initiation, enabling replication of the linear genome from both ends.[80] This protein-DNA covalent mechanism distinguishes adenoviral oris from those relying on host primases. These viral oris often co-opt mammalian cellular replication proteins like MCM and polymerases, adapting host factors for autonomous propagation.[74]

Variations and Advances

Replication Directionality

Replication from origins of replication can proceed in either a unidirectional or bidirectional manner, with the latter being the predominant mode across prokaryotes and eukaryotes. In bidirectional replication, two replication forks diverge in opposite directions from the origin, effectively doubling the rate of genome duplication compared to a single fork. This process is initiated by the loading of two helicase complexes at the origin, each unwinding the DNA helix to allow polymerase access on both strands.[81] For instance, in Escherichia coli, bidirectional replication from the oriC origin covers the 4.6 Mb chromosome in approximately 40 minutes under optimal conditions, facilitated by the coordinated progression of the forks toward the terminus.[82] Unidirectional replication, in contrast, involves a single replication fork proceeding in one direction from the origin, which is less common and typically observed in certain plasmids rather than chromosomal contexts. A representative example is the γ origin of the R6K plasmid, where replication initiates unidirectionally due to specific sequence elements and initiator proteins like π that direct fork progression in only one orientation, often requiring a nick or specialized protein interactions to establish polarity.[83] The speed of a replication fork can be quantified as $ v = \frac{d}{t} $, where $ d $ is the distance replicated and $ t $ is the time taken; in bacteria, this rate averages around 600 base pairs per second.[84] Unidirectional modes demand mechanisms to prevent bidirectional initiation, such as asymmetric binding sites or terminators that block the opposing fork.[85] Some replication systems exhibit switching between modes, starting with bidirectional theta-form replication before transitioning to unidirectional rolling-circle replication, particularly in response to cellular cues or copy number control needs; this shift avoids head-on collisions between replication forks and transcription machinery, which are more frequent in unidirectional setups and can lead to replication stalling or genomic instability.[66] Bidirectional replication orients most highly expressed genes co-directionally with fork movement, minimizing such conflicts and reducing mutation rates.[86] The prevalence of bidirectional replication offers evolutionary advantages by halving the time required for genome duplication and lowering error accumulation, as shorter fork travel distances reduce exposure to replication stress; recent studies in archaea, which also predominantly employ multiple bidirectional origins, reinforce this dominance across domains.[87][88] Origin sequences, such as AT-rich regions and DnaA boxes in bacteria, facilitate helicase loading that enables this divergent fork establishment.[89]

Dormant and Flexible Origins

In eukaryotic cells, a significant proportion of licensed replication origins remain dormant and do not fire during a normal S phase, serving as a reserve to ensure complete genome duplication. In budding yeast, approximately 50% of origins exhibit low firing efficiency and function as dormant sites, while in mammals, up to 90% of licensed origins are dormant under unperturbed conditions. These dormant origins are passively replicated by forks from nearby active origins but can be activated when replication forks stall due to stress, such as treatment with hydroxyurea (HU), which slows fork progression and triggers their firing to rescue stalled replication. The activation of dormant origins during such stress is mediated by the ATR kinase, which promotes local origin firing in response to single-stranded DNA accumulation at stalled forks, thereby preventing replication gaps.[90][91][92][93] The firing of replication origins in eukaryotes is inherently stochastic and flexible, with only a subset activated in each cell cycle to maintain even progression of replication forks across the genome. This probabilistic selection ensures that dormant origins are interspersed at intervals of approximately 100 kb, providing redundancy without over-initiation. A 2025 study using BrdU incorporation and single-molecule nanopore sequencing in human cells revealed that under normal conditions, most replication initiation events (~80%) occur at dispersed sites throughout the genome, including gene bodies, rather than being confined to traditional initiation zones, highlighting the high cell-to-cell variability and stochastic nature of origin usage. Mechanisms underlying this flexibility include the pre-loading of excess MCM2-7 helicases during G1 phase, far exceeding the number needed for firing (e.g., ~100,000 complexes in human cells versus ~30,000-50,000 active origins), coupled with regulation by cyclin-dependent kinases (CDKs). Low CDK activity in G1 permits licensing, while rising S-phase CDK levels limit firing to a subset of origins, balancing initiation to avoid conflicts with transcription or excessive fork density.[94][95][58][54] The loss or dysfunction of dormant origins has profound consequences for genome integrity, leading to unresected replication fork collapse under stress and subsequent DNA double-strand breaks. In cells depleted of excess MCM2-7, stalled forks cannot be efficiently rescued, resulting in increased genomic instability, improper chromosome segregation, and heightened sensitivity to replication inhibitors. This vulnerability contributes to pathological states, including accelerated cellular aging through chronic replication stress and inflammation, as observed in vivo where aging tissues show dysregulated dormant origin activation and ATR-dependent responses. In cancer, impaired dormant origin function exacerbates oncogene-induced replication stress, promoting mutagenesis and tumor progression, underscoring their role as tumor suppressors.[96][54][97][98]

Recent Developments

In 2023, researchers introduced Ori-FinderH, a deep learning-based computational tool that integrates Z-curve representation of DNA sequences with convolutional neural networks (CNNs) to predict human origins of replication (ORIs) of varying lengths with high accuracy, outperforming previous methods by achieving up to 92% sensitivity and specificity on benchmark datasets.[99] Building on this, the 2025 development of OriGen, an AI-driven sequence generation model, marked a breakthrough in synthetic biology by designing de novo plasmid origins of replication that retain essential functional elements like AT-rich regions and DnaA-binding sites, with experimental validation showing successful replication in bacterial hosts and divergence from natural sequences by up to 50%.[100] Advancements in structural biology have illuminated the activation mechanisms of the minichromosome maintenance (MCM) double hexamer, a key replicative helicase. In 2024, cryo-electron microscopy (cryo-EM) studies of human proteins revealed the dynamic loading of the MCM double hexamer onto DNA, capturing intermediate states where the origin recognition complex (ORC) and CDC6 facilitate head-to-head hexamer assembly, with resolutions down to 3.2 Å highlighting conformational changes necessary for bidirectional helicase activation.[21] Complementing this, 2025 investigations in budding yeast demonstrated how cyclin-dependent kinase (CDK) regulation remodels the MCM double hexamer during the cell cycle, shaping origin firing timing by promoting G1-specific loading and inhibiting re-licensing, thereby influencing evolutionary patterns of origin efficiency across yeast species.[50] In mammalian systems, recent findings have challenged traditional views of replication initiation sites. A 2025 study using BrdU incorporation coupled with single-molecule nanopore sequencing uncovered that most human DNA replication initiates in a dispersed manner across gene bodies, often independent of promoter regions, with over 70% of events occurring in non-canonical, intergenic, or intronic loci rather than discrete ORIs.[58] Concurrently, integrative mapping via ChIP-exo in 2024 showed overlapping binding profiles of ORC and MCM2-7 at human origins, revealing a self-limiting licensing mechanism where MCM loading displaces ORC, ensuring equitable distribution across the genome with densities correlating to replication timing domains.[101] Synthetic applications of engineered origins have expanded into therapeutic contexts, particularly for designing viral vectors in gene therapy. In extremophile biotechnology, models of archaeal replication timing—derived from species like Haloferax volcanii that initiate replication without fixed origins—have informed the engineering of robust replication systems for industrial enzymes, enhancing production yields in harsh conditions like high salinity or temperature, as reviewed in comparative archaeal studies.[41] These dormant origins can activate under stress, providing adaptive flexibility in synthetic constructs.

References

User Avatar
No comments yet.