CAAT box
CAAT box
Main page

CAAT box

logo
Community Hub0 subscribers
What are your thoughts?
Be the first to start a discussion here.
Be the first to start a discussion here.
CAAT box

In molecular biology, a CCAAT box (also sometimes abbreviated a CAAT box or CAT box) is a distinct pattern of nucleotides with GGCCAATCT consensus sequence that occur upstream by 60–100 bases to the initial transcription site. The CAAT box signals the binding site for the RNA transcription factor, and is typically accompanied by a conserved consensus sequence. It is an invariant DNA sequence at about minus 70 base pairs from the origin of transcription in many eukaryotic promoters. Genes that have this element seem to require it for the gene to be transcribed in sufficient quantities. It is frequently absent from genes that encode proteins used in virtually all cells. This box along with the GC box is known for binding general transcription factors. Both of these consensus sequences belong to the regulatory promoter. Full gene expression occurs when transcription activator proteins bind to each module within the regulatory promoter. Protein specific binding is required for the CCAAT box activation. These proteins are known as CCAAT box binding proteins/CCAAT box binding factors.

A CCAAT box is a feature frequently found before eukaryote coding regions, but is not found in prokaryotes.

In the direction of transcription of the template strand, the consensus sequence, or the calculated order of the most frequent residues, for the CAAT box was 3'-TG ATTGG (T/C)(T/C)(A/G)-5'. The use of parentheses denotes that either base is present, but it is not specified as to their relative frequencies. For example, "(T/C)" would mean that either thymine or cytosine are preferentially selected for. Within metazoa (animal kingdom), the core binding factor (CBF)-DNA complex retains a high degree of conservation within the CCAAT binding motif, as well as the sequences flanking this pentameric motif. The CCAAT motif in plants (spinach was used in an experiment) differs slightly from metazoa in that it is actually a CAAT binding motif; the promoter lacks one of the two C residues from the pentameric motif, and the artificial addition of the second C has no significant effects on binding activity. Some sequences lack the CAAT-box completely. Secondly, the surrounding nucleotides in plants do not match the consensus sequence above determined by Bi et al.

The CAAT box is what is known as a core promoter, also known as the basal promoter or simply the promoter, is a region of DNA that initiates transcription of a particular gene. This region, in particular for the CAAT box, is located about 60–100 bases upstream (towards the 5' end), however no less than 27 base pairs away, from the initial transcription site or a eukaryote gene in which a complex of general transcription factors bind with RNA polymerase II prior to the initiation of transcription. It is essential to the transcription that these core binding factors (also referred to as nuclear factor Y or NF-Y) are able to bind to the CCAAT motif. Experiments in many laboratories have shown that mutations to the CCAAT motif that cause a loss of CBF binding also decreases transcriptional activity in these promoters, suggesting that CBF-CCAAT complexes are essential for optimum transcriptional activity.

In an experiment done with core binding factors (CBF) and DNA complexes, researchers were able to determine the preferential sequences of the promoter in a region over and immediately adjacent to the CAAT box, and two regions on either side of the CAAT box. By using PCR-mediated random binding selection process, researchers were able to show that the sequence "3' - (T/C)G ATTGG (T/C)(T/C)(A/G) - 5'" immediately flanking the ATTGG region (CCAAT in the complementary strand) was preferentially selected on the coding strand (opposite of the template strand). This was shown using an oligonucleotide sequence (R1) which contained 27 random nucleotides, flanked by a defined 20 nucleotide sequence on each side. While no single nucleotide was selected in every clone on either side of the ATTGG motif (CCAAT in the complementary strand), there were several nucleotides in positions selected with high frequency. Most notably from the sequence above was the G residue towards the 5' end of the ATTGG. The other residues also listed were notable, but there is a split between two residues. This same experiment also yielded the same sequence as shown above when using a different oligonucleotide (R2) that contained an ATTGG core and flanked by 12 5' random nucleotides and 10 3' random nucleotides. Both these sequences are very similar and confirmed in multiple experiments. For sequences that flanked the ATTGG motif with two adenine residues (AA) on its 5' end and G(A/G) on its 3' end, seems to have inhibited formation of the CBF-DNA complex and subsequently occurred in only 1% of the promoter sequences. In another experiment performed with the major late promoter (MLP) of adenoviruses from a variety of host species, it was shown that the mutation of the CAAT box and CCAAT sequence, which is thought to play a pivotal role in the (MLP) of subgroup C human adenoviruses, in species with a deficient CAAT sequence. The transcription initiation at mutant MLP species was significantly reduced compared with that of the wild type or species in which there was a CAAT mutant. The failure to restore the normally functional adenoviruses, exhibited by a CAAT box, is consistent with the idea that the CAAT box plays a vital role in the adenovirus MLP and is preferred over other transcriptional elements.

These core binding factors, or nuclear factors (NF-Y), are composed of three subunits – NF-YA, NF-YB, and NF-YC. Whereas in animals each NF-Y subunit is encoded by a single gene, there has been a diversification in plants in both structure and function. Families of NF-Y consist of between eight and 39 members per subunit. A large reason for this diversification is because of gene duplications and tandem duplications, which have helped contribute to the larger family sizes of NF-Y compared to the single encoded animal nuclear factors. Each subunit contains an evolutionarily conserved part – the C-terminal of NF-YA, the central part of NF-YB, and the N-terminal of NF-YC, greater than 70% of these across species remains conserved. Neighboring regions however are generally not conserved.

The NF-YA family encodes transcription factors that are variable in length (between 207 and 347 amino acids for M. truncatula). The NF-YA proteins are generally characterized by two domains that are strongly conserved in all higher eukaryotes investigated to date. The first domain (A1) contains 20 amino acids that forms an alpha helix that appears significant in its interactions with NF-YB and NF-YC. The second domain (A2) is adjacent to the A1 domain by a conserved linker sequence is a sequence of 21 amino acids vital in the specific DNA to CCAAT box binding. The A1 and A2 domains are conserved towards the C-terminus of mammals, but occupy a more central region in plant NF-YA subunits. In plants, the NF-YA subunit has evolved to regulate the development of a facultative root organ only present in leguminous plants and shown to be expressed in root tissue. It was shown to have drought-resistant-like properties, becoming upregulated during drought stress in the roots and leaves of Arabidopsis. NF-YA mutants have shown a loss of function and a hypersensitivity to drought-like conditions, and, in contrast, overexpression of NF-YA has resulted in drought resistance.

The NF-YB family is, similar to the NF-YA subunit, variable in length, however, on average much smaller than the NF-YA subunit (90–240 amino acids in "M. truncatula"). They have been characterized with a structure and amino acid composition similar to the histone fold motif (HFM). This is composed of three alpha-helices separated by two beta strand-loop domains. Similar to NF-YA, NF-YB has been shown to also improve drought resistance when overexpressed and also the promotion of flowering in Arabidopsis.

See all
User Avatar
No comments yet.