Recent from talks
Knowledge base stats:
Talk channels stats:
Members stats:
Allele frequency spectrum
In population genetics, the allele frequency spectrum, sometimes called the site frequency spectrum, is the distribution of the allele frequencies of a given set of loci (often SNPs) in a population or sample. Because an allele frequency spectrum is often a summary of or compared to sequenced samples of the whole population, it is a histogram with size depending on the number of sequenced individual chromosomes. Each entry in the frequency spectrum records the total number of loci with the corresponding derived allele frequency. Loci contributing to the frequency spectrum are assumed to be independently changing in frequency. Furthermore, loci are assumed to be biallelic (that is, with exactly two alleles present), although extensions for multiallelic frequency spectra exist.
Many summary statistics of observed genetic variation are themselves summaries of the allele frequency spectrum, including estimates of the population scaled mutation rate, , such as Watterson's and Tajima's , Tajima's D, Fay and Wu's H and the fixation index .
The allele frequency spectrum from a sample of chromosomes is calculated by counting the number of sites with derived allele frequencies . For example, consider a sample of individuals with eight observed variable sites. In this table, a 1 indicates that the derived allele is observed at that site, while a 0 indicates the ancestral allele was observed.
The allele frequency spectrum can be written as the vector , where is the number of observed sites with derived allele frequency . In this example, the observed allele frequency spectrum is , due to four instances of a single observed derived allele at a particular SNP loci, two instances of two derived alleles, and so on.
The expected allele frequency spectrum may be calculated using either a coalescent or diffusion approach. The demographic history of a population and natural selection affect allele frequency dynamics, and these effects are reflected in the shape of the allele frequency spectrum. For the simple case of selective neutral alleles segregating in a population that has reached demographic equilibrium (that is, without recent population size changes or gene flow), the expected allele frequency spectrum for a sample of size is given by
where is the population scaled mutation rate (where is the population size and is the site mutation rate). Deviations from demographic equilibrium or neutrality will change the shape of the expected frequency spectrum.
Calculating the frequency spectrum from observed sequence data requires one to be able to distinguish the ancestral and derived (mutant) alleles, often by comparing to an outgroup sequence. For example in human population genetic studies, the homologous chimpanzee reference sequence is typically used to estimate the ancestral allele. However, sometimes the ancestral allele cannot be determined, in which case the folded allele frequency spectrum may be calculated instead. The folded frequency spectrum stores the observed counts of the minor (most rare) allele frequencies. The folded spectrum can be calculated by binning together the th and th entries from the unfolded spectrum, where is the number of sampled individuals.
The joint allele frequency spectrum (JAFS) is the joint distribution of allele frequencies across two or more related populations. The JAFS for populations, with sampled chromosomes in the th population, is a -dimensional histogram, in which each entry stores the total number of segregating sites in which the derived allele is observed with the corresponding frequency in each population. Each axis of the histogram corresponds to a population, and indices run from for the th population.
Hub AI
Allele frequency spectrum AI simulator
(@Allele frequency spectrum_simulator)
Allele frequency spectrum
In population genetics, the allele frequency spectrum, sometimes called the site frequency spectrum, is the distribution of the allele frequencies of a given set of loci (often SNPs) in a population or sample. Because an allele frequency spectrum is often a summary of or compared to sequenced samples of the whole population, it is a histogram with size depending on the number of sequenced individual chromosomes. Each entry in the frequency spectrum records the total number of loci with the corresponding derived allele frequency. Loci contributing to the frequency spectrum are assumed to be independently changing in frequency. Furthermore, loci are assumed to be biallelic (that is, with exactly two alleles present), although extensions for multiallelic frequency spectra exist.
Many summary statistics of observed genetic variation are themselves summaries of the allele frequency spectrum, including estimates of the population scaled mutation rate, , such as Watterson's and Tajima's , Tajima's D, Fay and Wu's H and the fixation index .
The allele frequency spectrum from a sample of chromosomes is calculated by counting the number of sites with derived allele frequencies . For example, consider a sample of individuals with eight observed variable sites. In this table, a 1 indicates that the derived allele is observed at that site, while a 0 indicates the ancestral allele was observed.
The allele frequency spectrum can be written as the vector , where is the number of observed sites with derived allele frequency . In this example, the observed allele frequency spectrum is , due to four instances of a single observed derived allele at a particular SNP loci, two instances of two derived alleles, and so on.
The expected allele frequency spectrum may be calculated using either a coalescent or diffusion approach. The demographic history of a population and natural selection affect allele frequency dynamics, and these effects are reflected in the shape of the allele frequency spectrum. For the simple case of selective neutral alleles segregating in a population that has reached demographic equilibrium (that is, without recent population size changes or gene flow), the expected allele frequency spectrum for a sample of size is given by
where is the population scaled mutation rate (where is the population size and is the site mutation rate). Deviations from demographic equilibrium or neutrality will change the shape of the expected frequency spectrum.
Calculating the frequency spectrum from observed sequence data requires one to be able to distinguish the ancestral and derived (mutant) alleles, often by comparing to an outgroup sequence. For example in human population genetic studies, the homologous chimpanzee reference sequence is typically used to estimate the ancestral allele. However, sometimes the ancestral allele cannot be determined, in which case the folded allele frequency spectrum may be calculated instead. The folded frequency spectrum stores the observed counts of the minor (most rare) allele frequencies. The folded spectrum can be calculated by binning together the th and th entries from the unfolded spectrum, where is the number of sampled individuals.
The joint allele frequency spectrum (JAFS) is the joint distribution of allele frequencies across two or more related populations. The JAFS for populations, with sampled chromosomes in the th population, is a -dimensional histogram, in which each entry stores the total number of segregating sites in which the derived allele is observed with the corresponding frequency in each population. Each axis of the histogram corresponds to a population, and indices run from for the th population.