Hubbry Logo
logo
Coding region
Community hub

Coding region

logo
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something to knowledge base
Hub AI

Coding region AI simulator

(@Coding region_simulator)

Coding region

The coding region of a gene, also known as the coding DNA sequence (CDS), is the portion of a gene's DNA or RNA that codes for a protein. Studying the length, composition, regulation, splicing, structures, and functions of coding regions compared to non-coding regions over different species and time periods can provide a significant amount of important information regarding gene organization and evolution of prokaryotes and eukaryotes. This can further assist in mapping the human genome and developing gene therapy.

Although this term is also sometimes used interchangeably with exon, it is not the exact same thing: the exon can be composed of the coding region as well as the 3' and 5' untranslated regions of the RNA, and so therefore, an exon would be partially made up of coding region. The 3' and 5' untranslated regions of the RNA, which do not code for protein, are termed non-coding regions and are not discussed on this page.

There is often confusion between coding regions and exomes and there is a clear distinction between these terms. While the exome refers to all exons within a genome, the coding region refers to sections of the DNA (or primary transcript) or a singular section of processed mRNA which specifically codes for a certain kind of protein.  

In 1978, Walter Gilbert published "Why Genes in Pieces" which first began to explore the idea that the gene is a mosaic—that each full nucleic acid strand is not coded continuously but is interrupted by "silent" non-coding regions. This was the first indication that there needed to be a distinction between the parts of the genome that code for protein, now called coding regions, and those that do not.

The evidence suggests that there is a general interdependence between base composition patterns and coding region availability. The coding region is thought to contain a higher GC-content than non-coding regions. There is further research that discovered that the longer the coding strand, the higher the GC-content. Short coding strands are comparatively still GC-poor, similar to the low GC-content of the base composition translational stop codons like TAG, TAA, and TGA.

GC-rich areas are also where the ratio point mutation type is altered slightly: there are more transitions, which are changes from purine to purine or pyrimidine to pyrimidine, compared to transversions, which are changes from purine to pyrimidine or pyrimidine to purine. The transitions are less likely to change the encoded amino acid and remain a silent mutation (especially if they occur in the third nucleotide of a codon) which is usually beneficial to the organism during translation and protein formation.

This indicates that essential coding regions (gene-rich) are higher in GC-content and more stable and resistant to mutation compared to accessory and non-essential regions (gene-poor). However, it is still unclear whether this came about through neutral and random mutation or through a pattern of selection. There is also debate on whether the methods used, such as gene windows, to ascertain the relationship between GC-content and coding region are accurate and unbiased.

In DNA, the coding region is flanked by the promoter sequence on the 5' end of the template strand and the termination sequence on the 3' end. During transcription, the RNA Polymerase (RNAP) binds to the promoter sequence and moves along the template strand to the coding region. RNAP then adds RNA nucleotides complementary to the coding region in order to form the mRNA, substituting uracil in place of thymine. This continues until the RNAP reaches the termination sequence.

See all
User Avatar
No comments yet.