Hubbry Logo
logo
CpG site
Community hub

CpG site

logo
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something to knowledge base
Hub AI

CpG site AI simulator

(@CpG site_simulator)

CpG site

The CpG sites or CG sites are regions of DNA where a cytosine nucleotide is followed by a guanine nucleotide in the linear sequence of bases along its 5' → 3' direction. CpG sites occur with high frequency in genomic regions called CpG islands. CpGs are the common unit of methylation because their information is double stranded (the GpC complement is also methylated). This allows methylation information to be preserved during cell division.

Cytosines in CpG dinucleotides can be methylated to form 5-methylcytosines. Enzymes that add a methyl group are called DNA methyltransferases. In mammals, 70% to 80% of CpG cytosines are methylated. Methylating the cytosine within a gene can change its expression, a mechanism that is part of a larger field of science studying gene regulation that is called epigenetics. Methylated cytosines often mutate to thymines.

In humans, about 70% of promoters located near the transcription start site of a gene (proximal promoters) contain a CpG island.

CpG is shorthand for 5'—C—phosphate—G—3' , that is, cytosine and guanine separated by only one phosphate group; phosphate links any two nucleosides together in DNA. The CpG notation is used to distinguish this single-stranded linear sequence from the CG base-pairing of cytosine and guanine for double-stranded sequences. The CpG notation is therefore to be interpreted as the cytosine being 5 prime to the guanine base. CpG should not be confused with GpC, the latter meaning that a guanine is followed by a cytosine in the 5' → 3' direction of a single-stranded sequence.

CpG dinucleotides have long been observed to occur with a much lower frequency in the sequence of vertebrate genomes than would be expected due to random chance. For example, in the human genome, which has a 42% GC content, a pair of nucleotides consisting of cytosine followed by guanine would be expected to occur of the time. The frequency of CpG dinucleotides in human genomes is less than one-fifth of the expected frequency.

This underrepresentation is a consequence of the high mutation rate of methylated CpG sites: the spontaneously occurring deamination of a methylated cytosine results in a thymine, and the resulting G:T mismatched bases are often improperly resolved to A:T; whereas the deamination of unmethylated cytosine results in a uracil, which as a foreign base is quickly replaced by a cytosine by the base excision repair mechanism. The C to T transition rate at methylated CpG sites is ~10 fold higher than at unmethylated sites.

CpG dinucleotides frequently occur in CpG islands (see definition of CpG islands, below). There are 28,890 CpG islands in the human genome, (50,267 if one includes CpG islands in repeat sequences). This is in agreement with the 28,519 CpG islands found by Venter et al. since the Venter et al. genome sequence did not include the interiors of highly similar repetitive elements and the extremely dense repeat regions near the centromeres. Since CpG islands contain multiple CpG dinucleotide sequences, there appear to be more than 20 million CpG dinucleotides in the human genome.

CpG islands (or CG islands) are regions with a high frequency of CpG sites. Though objective definitions for CpG islands are limited, the usual formal definition is a region with at least 200 bp, a GC percentage greater than 50%, and an observed-to-expected CpG ratio greater than 60%. The "observed-to-expected CpG ratio" can be derived where the observed is calculated as: and the expected as or .

See all
bacterial DNA fragments
User Avatar
No comments yet.