Recent from talks
Nothing was collected or created yet.
Human genetic variation
View on Wikipedia


Human genetic variation is the genetic differences in and among populations. There may be multiple variants of any given gene in the human population (alleles), a situation called polymorphism.
No two humans are genetically identical. Even monozygotic twins (who develop from one zygote) have infrequent genetic differences due to mutations occurring during development and gene copy-number variation.[1] Differences between individuals, even closely related individuals, are the key to techniques such as genetic fingerprinting.
The human genome has a total length of approximately 3.2 billion base pairs (bp) in 46 chromosomes of DNA as well as slightly under 17,000 bp DNA in cellular mitochondria. In 2015, the typical difference between an individual's genome and the reference genome was estimated at 20 million base pairs (or 0.6% of the total).[2] As of 2017, there were a total of 324 million known variants from sequenced human genomes.[3]
Comparatively speaking, humans are a genetically homogeneous species. Although a small number of genetic variants are found more frequently in certain geographic regions or in people with ancestry from those regions, this variation accounts for a small portion (~15%) of human genome variability. The majority of variation exists within the members of each human population. For comparison, rhesus macaques exhibit 2.5-fold greater DNA sequence diversity compared to humans.[4] These rates differ depending on what macromolecules are being analyzed. Chimpanzees have more genetic variance than humans when examining nuclear DNA, but humans have more genetic variance when examining at the level of proteins.[5]
The lack of discontinuities in genetic distances between human populations, absence of discrete branches in the human species, and striking homogeneity of human beings globally, imply that there is no scientific basis for inferring races or subspecies in humans, and for most traits, there is much more variation within populations than between them.[6][7][8][9][10][11][12][13] Despite this, modern genetic studies have found substantial average genetic differences across human populations in traits such as skin colour, bodily dimensions, lactose and starch digestion, high altitude adaptions, drug response, taste receptors, and predisposition to developing particular diseases.[14][12] The greatest diversity is found within and among populations in Africa,[15] and gradually declines with increasing distance from the African continent, consistent with the Out of Africa theory of human origins.[15]
The study of human genetic variation has evolutionary significance and medical applications. It can help scientists reconstruct and understand patterns of past human migration. In medicine, study of human genetic variation may be important because some disease-causing alleles occur more often in certain population groups. For instance, the mutation for sickle-cell anemia is more often found in people with ancestry from certain sub-Saharan African, south European, Arabian, and Indian populations, due to the evolutionary pressure from mosquitos carrying malaria in these regions.
New findings show that each human has on average 60 new mutations compared to their parents.[16][17]
Causes of variation
[edit]Causes of differences between individuals include independent assortment, the exchange of genes (crossing over and recombination) during reproduction (through meiosis) and various mutational events.
There are at least three reasons why genetic variation exists between populations. Natural selection may confer an adaptive advantage to individuals in a specific environment if an allele provides a competitive advantage. Alleles under selection are likely to occur only in those geographic regions where they confer an advantage. A second important process is genetic drift, which is the effect of random changes in the gene pool, under conditions where most mutations are neutral (that is, they do not appear to have any positive or negative selective effect on the organism). Finally, small migrant populations have statistical differences – called the founder effect – from the overall populations where they originated; when these migrants settle new areas, their descendant population typically differs from their population of origin: different genes predominate and it is less genetically diverse.
In humans, the main cause is genetic drift.[18] Serial founder effects and past small population size (increasing the likelihood of genetic drift) may have had an important influence in neutral differences between populations.[citation needed] The second main cause of genetic variation is due to the high degree of neutrality of most mutations. A small, but significant number of genes appear to have undergone recent natural selection, and these selective pressures are sometimes specific to one region.[19][20]
Measures of variation
[edit]Genetic variation among humans occurs on many scales, from gross alterations in the human karyotype to single nucleotide changes.[21] Chromosome abnormalities are detected in 1 of 160 live human births. Apart from sex chromosome disorders, most cases of aneuploidy result in death of the developing fetus (miscarriage); the most common extra autosomal chromosomes among live births are 21, 18 and 13.[22]
Nucleotide diversity is the average proportion of nucleotides that differ between two individuals. As of 2004, the human nucleotide diversity was estimated to be 0.1%[23] to 0.4% of base pairs.[24] In 2015, the 1000 Genomes Project, which sequenced one thousand individuals from 26 human populations, found that "a typical [individual] genome differs from the reference human genome at 4.1 million to 5.0 million sites … affecting 20 million bases of sequence"; the latter figure corresponds to 0.6% of total number of base pairs.[2] Nearly all (>99.9%) of these sites are small differences, either single nucleotide polymorphisms or brief insertions or deletions (indels) in the genetic sequence, but structural variations account for a greater number of base-pairs than the SNPs and indels.[2][25]
As of 2017[update], the Single Nucleotide Polymorphism Database (dbSNP), which lists SNP and other variants, listed 324 million variants found in sequenced human genomes.[3]
Single nucleotide polymorphisms
[edit]
A single nucleotide polymorphism (SNP) is a difference in a single nucleotide between members of one species that occurs in at least 1% of the population. The 2,504 individuals characterized by the 1000 Genomes Project had 84.7 million SNPs among them.[2] SNPs are the most common type of sequence variation, estimated in 1998 to account for 90% of all sequence variants.[26] Other sequence variations are single base exchanges, deletions and insertions.[27] SNPs occur on average about every 100 to 300 bases[28] and so are the major source of heterogeneity.
A functional, or non-synonymous, SNP is one that affects some factor such as gene splicing or messenger RNA, and so causes a phenotypic difference between members of the species. About 3% to 5% of human SNPs are functional (see International HapMap Project). Neutral, or synonymous SNPs are still useful as genetic markers in genome-wide association studies, because of their sheer number and the stable inheritance over generations.[26]
A coding SNP is one that occurs inside a gene. There are 105 Human Reference SNPs that result in premature stop codons in 103 genes. This corresponds to 0.5% of coding SNPs. They occur due to segmental duplication in the genome. These SNPs result in loss of protein, yet all these SNP alleles are common and are not purified in negative selection.[29]
Structural variation
[edit]Structural variation is the variation in structure of an organism's chromosome. Structural variations, such as copy-number variation and deletions, inversions, insertions and duplications, account for much more human genetic variation than single nucleotide diversity. This was concluded in 2007 from analysis of the diploid full sequences of the genomes of two humans: Craig Venter and James D. Watson. This added to the two haploid sequences which were amalgamations of sequences from many individuals, published by the Human Genome Project and Celera Genomics respectively.[30]
According to the 1000 Genomes Project, a typical human has 2,100 to 2,500 structural variations, which include approximately 1,000 large deletions, 160 copy-number variants, 915 Alu insertions, 128 L1 insertions, 51 SVA insertions, 4 NUMTs, and 10 inversions.[2]
Copy number variation
[edit]A copy-number variation (CNV) is a difference in the genome due to deleting or duplicating large regions of DNA on some chromosome. It is estimated that 0.4% of the genomes of unrelated humans differ with respect to copy number. When copy number variation is included, human-to-human genetic variation is estimated to be at least 0.5% (99.5% similarity).[31][32][33][34] Copy number variations are inherited but can also arise during development.[35][36][37][38]
A visual map with the regions with high genomic variation of the modern-human reference assembly relatively to a Neanderthal of 50k[39] has been built by Pratas et al.[40]
Epigenetics
[edit]Epigenetic variation is variation in the chemical tags that attach to DNA and affect how genes get read. The tags, "called epigenetic markings, act as switches that control how genes can be read."[41] At some alleles, the epigenetic state of the DNA, and associated phenotype, can be inherited across generations of individuals.[42]
Genetic variability
[edit]Genetic variability is a measure of the tendency of individual genotypes in a population to vary (become different) from one another. Variability is different from genetic diversity, which is the amount of variation seen in a particular population. The variability of a trait is how much that trait tends to vary in response to environmental and genetic influences.
Clines
[edit]In biology, a cline is a continuum of species, populations, varieties, or forms of organisms that exhibit gradual phenotypic and/or genetic differences over a geographical area, typically as a result of environmental heterogeneity.[43][44][45] In the scientific study of human genetic variation, a gene cline can be rigorously defined and subjected to quantitative metrics.
Haplogroups
[edit]In the study of molecular evolution, a haplogroup is a group of similar haplotypes that share a common ancestor with a single nucleotide polymorphism (SNP) mutation. The study of haplogroups provides information about ancestral origins dating back thousands of years.[46]
The most commonly studied human haplogroups are Y-chromosome (Y-DNA) haplogroups and mitochondrial DNA (mtDNA) haplogroups, both of which can be used to define genetic populations. Y-DNA is passed solely along the patrilineal line, from father to son, while mtDNA is passed down the matrilineal line, from mother to both daughter or son. The Y-DNA and mtDNA may change by chance mutation at each generation.
Variable number tandem repeats
[edit]A variable number tandem repeat (VNTR) is the variation of length of a tandem repeat. A tandem repeat is the adjacent repetition of a short nucleotide sequence. Tandem repeats exist on many chromosomes, and their length varies between individuals. Each variant acts as an inherited allele, so they are used for personal or parental identification. Their analysis is useful in genetics and biology research, forensics, and DNA fingerprinting.
Short tandem repeats (about 5 base pairs) are called microsatellites, while longer ones are called minisatellites.
History and geographic distribution
[edit]

Recent African origin of modern humans
[edit]The recent African origin of modern humans paradigm assumes the dispersal of non-African populations of anatomically modern humans after 70,000 years ago. Dispersal within Africa occurred significantly earlier, at least 130,000 years ago. The "out of Africa" theory originates in the 19th century, as a tentative suggestion in Charles Darwin's Descent of Man,[47] but remained speculative until the 1980s when it was supported by the study of present-day mitochondrial DNA, combined with evidence from physical anthropology of archaic specimens.
According to a 2000 study of Y-chromosome sequence variation,[48] human Y-chromosomes trace ancestry to Africa, and the descendants of the derived lineage left Africa and eventually were replaced by archaic human Y-chromosomes in Eurasia. The study also shows that a minority of contemporary populations in East Africa and the Khoisan are the descendants of the most ancestral patrilineages of anatomically modern humans that left Africa 35,000 to 89,000 years ago.[48] Other evidence supporting the theory is that variations in skull measurements decrease with distance from Africa at the same rate as the decrease in genetic diversity. Human genetic diversity decreases in native populations with migratory distance from Africa, and this is thought to be due to bottlenecks during human migration, which are events that temporarily reduce population size.[49][50]
A 2009 genetic clustering study, which genotyped 1327 polymorphic markers in various African populations, identified six ancestral clusters. The clustering corresponded closely with ethnicity, culture and language.[51] A 2018 whole genome sequencing study of the world's populations observed similar clusters among the populations in Africa. At K=9, distinct ancestral components defined the Afroasiatic-speaking populations inhabiting North Africa and Northeast Africa; the Nilo-Saharan-speaking populations in Northeast Africa and East Africa; the Ari populations in Northeast Africa; the Niger-Congo-speaking populations in West-Central Africa, West Africa, East Africa and Southern Africa; the Pygmy populations in Central Africa; and the Khoisan populations in Southern Africa.[52]
In May 2023, scientists reported, based on genetic studies, a more complicated pathway of human evolution than previously understood. According to the studies, humans evolved from different places and times in Africa, instead of from a single location and period of time.[53][54]
Population genetics
[edit]Because of the common ancestry of all humans, only a small number of variants have large differences in frequency between populations. However, some rare variants in the world's human population are much more frequent in at least one population (more than 5%).[55]


It is commonly assumed that early humans left Africa, and thus must have passed through a population bottleneck before their African-Eurasian divergence around 100,000 years ago (ca. 3,000 generations). The rapid expansion of a previously small population has two important effects on the distribution of genetic variation. First, the so-called founder effect occurs when founder populations bring only a subset of the genetic variation from their ancestral population. Second, as founders become more geographically separated, the probability that two individuals from different founder populations will mate becomes smaller. The effect of this assortative mating is to reduce gene flow between geographical groups and to increase the genetic distance between groups.[citation needed]
The expansion of humans from Africa affected the distribution of genetic variation in two other ways. First, smaller (founder) populations experience greater genetic drift because of increased fluctuations in neutral polymorphisms. Second, new polymorphisms that arose in one group were less likely to be transmitted to other groups as gene flow was restricted.[citation needed]
Populations in Africa tend to have lower amounts of linkage disequilibrium than do populations outside Africa, partly because of the larger size of human populations in Africa over the course of human history and partly because the number of modern humans who left Africa to colonize the rest of the world appears to have been relatively low.[57] In contrast, populations that have undergone dramatic size reductions or rapid expansions in the past and populations formed by the mixture of previously separate ancestral groups can have unusually high levels of linkage disequilibrium[57]
Distribution of variation
[edit]
The distribution of genetic variants within and among human populations are impossible to describe succinctly because of the difficulty of defining a "population," the clinal nature of variation, and heterogeneity across the genome (Long and Kittles 2003). In general, however, an average of 85% of genetic variation exists within local populations, ~7% is between local populations within the same continent, and ~8% of variation occurs between large groups living on different continents.[58][59] The recent African origin theory for humans would predict that in Africa there exists a great deal more diversity than elsewhere and that diversity should decrease the further from Africa a population is sampled.
Phenotypic variation
[edit]Sub-Saharan Africa has the most human genetic diversity and the same has been shown to hold true for phenotypic variation in skull form.[49][60] Phenotype is connected to genotype through gene expression. Genetic diversity decreases smoothly with migratory distance from that region, which many scientists believe to be the origin of modern humans, and that decrease is mirrored by a decrease in phenotypic variation. Skull measurements are an example of a physical attribute whose within-population variation decreases with distance from Africa.
The distribution of many physical traits resembles the distribution of genetic variation within and between human populations (American Association of Physical Anthropologists 1996; Keita and Kittles 1997). For example, ~90% of the variation in human head shapes occurs within continental groups, and ~10% separates groups, with a greater variability of head shape among individuals with recent African ancestors (Relethford 2002).
A prominent exception to the common distribution of physical characteristics within and among groups is skin color. Approximately 10% of the variance in skin color occurs within groups, and ~90% occurs between groups (Relethford 2002). This distribution of skin color and its geographic patterning – with people whose ancestors lived predominantly near the equator having darker skin than those with ancestors who lived predominantly in higher latitudes – indicate that this attribute has been under strong selective pressure. Darker skin appears to be strongly selected for in equatorial regions to prevent sunburn, skin cancer, the photolysis of folate, and damage to sweat glands.[61]
Understanding how genetic diversity in the human population impacts various levels of gene expression is an active area of research. While earlier studies focused on the relationship between DNA variation and RNA expression, more recent efforts are characterizing the genetic control of various aspects of gene expression including chromatin states,[62] translation,[63] and protein levels.[64] A study published in 2007 found that 25% of genes showed different levels of gene expression between populations of European and Asian descent.[65][66][67][68][69] The primary cause of this difference in gene expression was thought to be SNPs in gene regulatory regions of DNA. Another study published in 2007 found that approximately 83% of genes were expressed at different levels among individuals and about 17% between populations of European and African descent.[70][71]
Wright's fixation index as measure of variation
[edit]The population geneticist Sewall Wright developed the fixation index (often abbreviated to FST) as a way of measuring genetic differences between populations. This statistic is often used in taxonomy to compare differences between any two given populations by measuring the genetic differences among and between populations for individual genes, or for many genes simultaneously.[72] It is often stated that the fixation index for humans is about 0.15. This translates to an estimated 85% of the variation measured in the overall human population is found within individuals of the same population, and about 15% of the variation occurs between populations. These estimates imply that any two individuals from different populations may be more similar to each other than either is to a member of their own group.[73][74] "The shared evolutionary history of living humans has resulted in a high relatedness among all living people, as indicated for example by the very low fixation index (FST) among living human populations." Richard Lewontin, who affirmed these ratios, thus concluded neither "race" nor "subspecies" were appropriate or useful ways to describe human populations.[58]
Wright himself believed that values >0.25 represent very great genetic variation and that an FST of 0.15–0.25 represented great variation. However, about 5% of human variation occurs between populations within continents, therefore FST values between continental groups of humans (or races) of as low as 0.1 (or possibly lower) have been found in some studies, suggesting more moderate levels of genetic variation.[72] Graves (1996) has countered that FST should not be used as a marker of subspecies status, as the statistic is used to measure the degree of differentiation between populations,[72] although see also Wright (1978).[75]
Jeffrey Long and Rick Kittles give a long critique of the application of FST to human populations in their 2003 paper "Human Genetic Diversity and the Nonexistence of Biological Races". They find that the figure of 85% is misleading because it implies that all human populations contain on average 85% of all genetic diversity. They argue the underlying statistical model incorrectly assumes equal and independent histories of variation for each large human population. A more realistic approach is to understand that some human groups are parental to other groups and that these groups represent paraphyletic groups to their descent groups. For example, under the recent African origin theory the human population in Africa is paraphyletic to all other human groups because it represents the ancestral group from which all non-African populations derive, but more than that, non-African groups only derive from a small non-representative sample of this African population. This means that all non-African groups are more closely related to each other and to some African groups (probably east Africans) than they are to others, and further that the migration out of Africa represented a genetic bottleneck, with much of the diversity that existed in Africa not being carried out of Africa by the emigrating groups. Under this scenario, human populations do not have equal amounts of local variability, but rather diminished amounts of diversity the further from Africa any population lives. Long and Kittles find that rather than 85% of human genetic diversity existing in all human populations, about 100% of human diversity exists in a single African population, whereas only about 70% of human genetic diversity exists in a population derived from New Guinea. Long and Kittles argued that this still produces a global human population that is genetically homogeneous compared to other mammalian populations.[76]
Archaic admixture
[edit]Anatomically modern humans interbred with Neanderthals during the Middle Paleolithic. In May 2010, the Neanderthal Genome Project presented genetic evidence that interbreeding took place and that a small but significant portion, around 2–4%, of Neanderthal admixture is present in the DNA of modern Eurasians and Oceanians, and nearly absent in sub-Saharan African populations.[77][78]
Between 4% and 6% of the genome of Melanesians (represented by the Papua New Guinean and Bougainville Islander) appears to derive from Denisovans – a previously unknown hominin which is more closely related to Neanderthals than to Sapiens. It was possibly introduced during the early migration of the ancestors of Melanesians into Southeast Asia. This history of interaction suggests that Denisovans once ranged widely over eastern Asia.[79]
Thus, Melanesians emerge as one of the most archaic-admixed populations, having Denisovan/Neanderthal-related admixture of ~8%.[79]
In a study published in 2013, Jeffrey Wall from University of California studied whole sequence-genome data and found higher rates of introgression in Asians compared to Europeans.[80] Hammer et al. tested the hypothesis that contemporary African genomes have signatures of gene flow with archaic human ancestors and found evidence of archaic admixture in the genomes of some African groups, suggesting that modest amounts of gene flow were widespread throughout time and space during the evolution of anatomically modern humans.[81]
A study published in 2020 found that the Yoruba and Mende populations of West Africa derive between 2% and 19% of their genome from an as-yet unidentified archaic hominin population that likely diverged before the split of modern humans and the ancestors of Neanderthals and Denisovans,[82] potentially making these groups the most archaic-admixed human populations identified yet.
Categorization of the world population
[edit]

New data on human genetic variation has reignited the debate about a possible biological basis for categorization of humans into races. Most of the controversy surrounds the question of how to interpret the genetic data and whether conclusions based on it are sound. Some researchers argue that self-identified race can be used as an indicator of geographic ancestry for certain health risks and medications.
Although the genetic differences among human groups are relatively small, these differences in certain genes such as duffy, ABCC11, SLC24A5, called ancestry-informative markers (AIMs) nevertheless can be used to reliably situate many individuals within broad, geographically based groupings. For example, computer analyses of hundreds of polymorphic loci sampled in globally distributed populations have revealed the existence of genetic clustering that roughly is associated with groups that historically have occupied large continental and subcontinental regions (Rosenberg et al. 2002; Bamshad et al. 2003).
Some commentators have argued that these patterns of variation provide a biological justification for the use of traditional racial categories. They argue that the continental clusterings correspond roughly with the division of human beings into sub-Saharan Africans; Europeans, Western Asians, Central Asians, Southern Asians and Northern Africans; Eastern Asians, Southeast Asians, Polynesians and Native Americans; and other inhabitants of Oceania (Melanesians, Micronesians & Australian Aborigines) (Risch et al. 2002). Other observers disagree, saying that the same data undercut traditional notions of racial groups (King and Motulsky 2002; Calafell 2003; Tishkoff and Kidd 2004[24]). They point out, for example, that major populations considered races or subgroups within races do not necessarily form their own clusters.
Racial categories are also undermined by findings that genetic variants which are limited to one region tend to be rare within that region, variants that are common within a region tend to be shared across the globe, and most differences between individuals, whether they come from the same region or different regions, are due to global variants.[85] No genetic variants have been found which are fixed within a continent or major region and found nowhere else.[86]
Furthermore, because human genetic variation is clinal, many individuals affiliate with two or more continental groups. Thus, the genetically based "biogeographical ancestry" assigned to any given person generally will be broadly distributed and will be accompanied by sizable uncertainties (Pfaff et al. 2004).
In many parts of the world, groups have mixed in such a way that many individuals have relatively recent ancestors from widely separated regions. Although genetic analyses of large numbers of loci can produce estimates of the percentage of a person's ancestors coming from various continental populations (Shriver et al. 2003; Bamshad et al. 2004), these estimates may assume a false distinctiveness of the parental populations, since human groups have exchanged mates from local to continental scales throughout history (Cavalli-Sforza et al. 1994; Hoerder 2002). Even with large numbers of markers, information for estimating admixture proportions of individuals or groups is limited, and estimates typically will have wide confidence intervals (Pfaff et al. 2004).
Genetic clustering
[edit]Genetic data can be used to infer population structure and assign individuals to groups that often correspond with their self-identified geographical ancestry. Jorde and Wooding (2004) argued that "Analysis of many loci now yields reasonably accurate estimates of genetic similarity among individuals, rather than populations. Clustering of individuals is correlated with geographic origin or ancestry."[23] However, identification by geographic origin may quickly break down when considering historical ancestry shared between individuals back in time.[87]
An analysis of autosomal SNP data from the International HapMap Project (Phase II) and CEPH Human Genome Diversity Panel samples was published in 2009. The study of 53 populations taken from the HapMap and CEPH data (1138 unrelated individuals) suggested that natural selection may shape the human genome much more slowly than previously thought, with factors such as migration within and among continents more heavily influencing the distribution of genetic variations.[88] A similar study published in 2010 found strong genome-wide evidence for selection due to changes in ecoregion, diet, and subsistence particularly in connection with polar ecoregions, with foraging, and with a diet rich in roots and tubers.[89] In a 2016 study, principal component analysis of genome-wide data was capable of recovering previously-known targets for positive selection (without prior definition of populations) as well as a number of new candidate genes.[90]
Forensic anthropology
[edit]Forensic anthropologists can assess the ancestry of skeletal remains by analyzing skeletal morphology as well as using genetic and chemical markers, when possible.[91] While these assessments are never certain, the accuracy of skeletal morphology analyses in determining true ancestry has been estimated at 90%.[92]

Gene flow and admixture
[edit]Gene flow between two populations reduces the average genetic distance between the populations, only totally isolated human populations experience no gene flow and most populations have continuous gene flow with other neighboring populations which create the clinal distribution observed for most genetic variation. When gene flow takes place between well-differentiated genetic populations the result is referred to as "genetic admixture".
Admixture mapping is a technique used to study how genetic variants cause differences in disease rates between population.[93] Recent admixture populations that trace their ancestry to multiple continents are well suited for identifying genes for traits and diseases that differ in prevalence between parental populations. African-American populations have been the focus of numerous population genetic and admixture mapping studies, including studies of complex genetic traits such as white cell count, body-mass index, prostate cancer and renal disease.[94]
An analysis of phenotypic and genetic variation including skin color and socio-economic status was carried out in the population of Cape Verde which has a well documented history of contact between Europeans and Africans. The studies showed that pattern of admixture in this population has been sex-biased (involving mostly matings between European men and African women) and there is a significant interaction between socioeconomic status and skin color, independent of ancestry.[95] Another study shows an increased risk of graft-versus-host disease complications after transplantation due to genetic variants in human leukocyte antigen (HLA) and non-HLA proteins.[96]
Impact on gene function and health
[edit]Given that each individual has millions of genetic variants (compared to the reference genome), it is an important question what impact these variants have on human health or gene function. Most genetic variants have only small to moderate effects, if any. Frequently cited examples include hypertension (Douglas et al. 1996), diabetes,[97] obesity (Fernandez et al. 2003), and prostate cancer (Platz et al. 2000). However, the role of genetic factors in generating these differences remains uncertain.[98]
Effect on protein function
[edit]The human genome encodes about 20,000 protein-coding genes with about 550 amino acids each.[99] Hence, human proteins span about 11 million amino acids (22 million per diploid genome). The median number of missense mutations in individual human genomes is about 8600, that is, two individuals differ by 1 in about 2600 amino acids or in about 20% of their proteins. The average individual has about 137 (predicted) loss of function mutations, including 71 frameshift and 148 in-frame deletions or insertions.[100] Mutations at 32.2% and 9.5% of all possible genomic positions, respectively, can lead to missense and stop-gained variants (i.e., truncated proteins).[100] In a sample of almost 1 million people, almost 5000 genes were identified that had loss-of-function variants in both alleles of the same individual. That is, if these 5000 genes can tolerate homozygous loss of function mutations, they are unlikely to be essential.[100]
Monogenetic diseases
[edit]Differences in allele frequencies contribute to group differences in the incidence of some monogenic diseases, and they may contribute to differences in the incidence of some common diseases.[101] For the monogenic diseases, the frequency of causative alleles usually correlates best with ancestry, whether familial (for example, Ellis–Van Creveld syndrome among the Pennsylvania Amish), ethnic (Tay–Sachs disease among Ashkenazi Jewish populations), or geographical (hemoglobinopathies among people with ancestors who lived in malarial regions). To the extent that ancestry corresponds with racial or ethnic groups or subgroups, the incidence of monogenic diseases can differ between groups categorized by race or ethnicity, and health-care professionals typically take these patterns into account in making diagnoses.[102]
Beneficial variants
[edit]Some other variations on the other hand are beneficial to human, as they prevent certain diseases and increase the chance to adapt to the environment. For example, mutation in CCR5 gene that protects against AIDS. CCR5 gene is absent on the surface of cell due to mutation. Without CCR5 gene on the surface, there is nothing for HIV viruses to grab on and bind into. Therefore, the mutation on CCR5 gene decreases the chance of an individual's risk with AIDS. The mutation in CCR5 is also quite common in certain areas, with more than 14% of the population carry the mutation in Europe and about 6–10% in Asia and North Africa.[103]

Many genetic variants may have aided humans in ancient times but plague us today. For example, genes that allow humans to more efficiently process food also make people susceptible to obesity and diabetes today.[104]
Genome projects and organizations
[edit]Human genome projects are scientific endeavors that determine or study the structure of the human genome. The Human Genome Project was a landmark genome project.
There are numerous related projects that deal with genetic variation (or variation in the encoded proteins), e.g. organized by the following organizations:
- HUman Genome Organisation (HUGO) -- organizes activities around human genome sequencing, including variants
- Human Genome Variation Society (HGVS) -- develops nomenclatural standards for human genetic variants
- HGVS Variant Nomenclature Committee (HVNC) -- maps and organizes variant nomenclature
See also
[edit]- Archaeogenetics
- Chimera (genetics)
- Genealogical DNA test
- Human evolutionary genetics
- Isolation by distance
- Multiregional hypothesis
- Neurodiversity
- Race and genetics
- Recent single origin hypothesis
- Y-chromosome haplogroups in populations of the world
Regional
[edit]- 1000 Genomes Project
- African admixture in Europe
- Genetic history of Europe
- Genetic history of indigenous peoples of the Americas
- Genetic history of South Asia
- Genetic history of the British Isles
Projects
[edit]References
[edit]- ^ Bruder CE, Piotrowski A, Gijsbers AA, Andersson R, Erickson S, Diaz de Ståhl T, et al. (March 2008). "Phenotypically concordant and discordant monozygotic twins display different DNA copy-number-variation profiles". American Journal of Human Genetics. 82 (3): 763–71. doi:10.1016/j.ajhg.2007.12.011. PMC 2427204. PMID 18304490.
- ^ a b c d e Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, et al. (October 2015). "A global reference for human genetic variation". Nature. 526 (7571): 68–74. Bibcode:2015Natur.526...68T. doi:10.1038/nature15393. PMC 4750478. PMID 26432245.
- ^ a b NCBI (8 May 2017). "dbSNP's human build 150 has doubled the amount of RefSNP records!". NCBI Insights. Retrieved 16 May 2017.
- ^ Xue, Cheng; Raveendran, Muthuswamy; Harris, R. Alan; Fawcett, Gloria L.; Liu, Xiaoming; White, Simon; Dahdouli, Mahmoud; Deiros, David Rio; Below, Jennifer E.; Salerno, William; Cox, Laura (1 December 2016). "The population genomics of rhesus macaques (Macaca mulatta) based on whole-genome sequences". Genome Research. 26 (12): 1651–1662. doi:10.1101/gr.204255.116. ISSN 1088-9051. PMC 5131817. PMID 27934697.
- ^ Curnoe, Darren (2003). "Number of ancestral human species: a molecular perspective". Homo. 53 (3): 208–209. doi:10.1078/0018-442x-00051. PMID 12733395.
- ^ Reich, David (23 March 2018). "Opinion | How Genetics Is Changing Our Understanding of 'Race'". The New York Times. ISSN 0362-4331. Retrieved 15 August 2022.
- ^ Williams, David R. (1 July 1997). "Race and health: Basic questions, emerging directions". Annals of Epidemiology. Special Issue: Interface Between Molecular and Behavioral Epidemiology. 7 (5): 322–333. doi:10.1016/S1047-2797(97)00051-3. ISSN 1047-2797. PMID 9250627.
- ^ "1". Race and racism in theory and practice. Berel Lang. Lanham, Md.: Rowman & Littlefield. 2000. ISBN 0-8476-9692-8. OCLC 42389561.
{{cite book}}: CS1 maint: others (link) - ^ Lee, Jun-Ki; Aini, Rahmi Qurota; Sya'bandari, Yustika; Rusmana, Ai Nurlaelasari; Ha, Minsu; Shin, Sein (1 April 2021). "Biological Conceptualization of Race". Science & Education. 30 (2): 293–316. Bibcode:2021Sc&Ed..30..293L. doi:10.1007/s11191-020-00178-8. ISSN 1573-1901. S2CID 231598896.
- ^ Kolbert, Elizabeth (4 April 2018). "There's No Scientific Basis for Race—It's a Made-Up Label". National Geographic. Retrieved 15 August 2022.
- ^ Templeton, Alan Robert (2018). Human Population Genetics and Genomics. London. pp. 445–446. ISBN 978-0-12-386026-2. OCLC 1062418886.
{{cite book}}: CS1 maint: location missing publisher (link) - ^ a b Reich, David (2018). Who we are and how we got here: ancient DNA and the new science of the human past (First ed.). Oxford, United Kingdom. p. 255. ISBN 978-0-19-882125-0. OCLC 1006478846.
{{cite book}}: CS1 maint: location missing publisher (link) - ^ Witherspoon, D. J.; Wooding, S.; Rogers, A. R.; Marchani, E. E.; Watkins, W. S.; Batzer, M. A.; Jorde, L. B. (2007). "Genetic Similarities Within and Between Human Populations". Genetics. 176 (1): 351–359. doi:10.1534/genetics.106.067355. ISSN 0016-6731. PMC 1893020. PMID 17339205.
- ^ Campbell, Michael (2008). "African Genetic Diversity: Implications for Human Demographic History, Modern Human Origins, and Complex Disease Mapping". Annual Review of Genomics and Human Genetics. 9: 403–433. doi:10.1146/annurev.genom.9.081307.164258. PMC 2953791. PMID 18593304.
- ^ a b Campbell, Michael C.; Tishkoff, Sarah A. (2008). "AFRICAN GENETIC DIVERSITY: Implications for Human Demographic History, Modern Human Origins, and Complex Disease Mapping". Annual Review of Genomics and Human Genetics. 9: 403–433. doi:10.1146/annurev.genom.9.081307.164258. ISSN 1527-8204. PMC 2953791. PMID 18593304.
- ^ "We are all mutants: First direct whole-genome measure of human mutation predicts 60 new mutations in each of us". Science Daily. 13 June 2011. Retrieved 5 September 2011.
- ^ Conrad DF, Keebler JE, DePristo MA, Lindsay SJ, Zhang Y, Casals F, et al. (June 2011). "Variation in genome-wide mutation rates within and between human families". Nature Genetics. 43 (7): 712–4. doi:10.1038/ng.862. PMC 3322360. PMID 21666693.
- ^ Ackermann, R. R.; Cheverud, J. M. (16 December 2004). "Detecting genetic drift versus selection in human evolution". Proceedings of the National Academy of Sciences. 101 (52): 17946–17951. Bibcode:2004PNAS..10117946A. doi:10.1073/pnas.0405919102. ISSN 0027-8424. PMC 539739. PMID 15604148.
- ^ Guo J, Wu Y, Zhu Z, Zheng Z, Trzaskowski M, Zeng J, Robinson MR, Visscher PM, Yang J (May 2018). "Global genetic differentiation of complex traits shaped by natural selection in humans". Nature Communications. 9 (1) 1865. Bibcode:2018NatCo...9.1865G. doi:10.1038/s41467-018-04191-y. PMC 5951811. PMID 29760457.
- ^ Wang ET, Kodama G, Baldi P, Moyzis RK (January 2006). "Global landscape of recent inferred Darwinian selection for Homo sapiens". Proceedings of the National Academy of Sciences of the United States of America. 103 (1): 135–40. Bibcode:2006PNAS..103..135W. doi:10.1073/pnas.0509691102. PMC 1317879. PMID 16371466.
By these criteria, 1.6% of Perlegen SNPs were found to exhibit the genetic architecture of selection.
- ^ Kidd JM, Cooper GM, Donahue WF, Hayden HS, Sampas N, Graves T, et al. (May 2008). "Mapping and sequencing of structural variation from eight human genomes". Nature. 453 (7191): 56–64. Bibcode:2008Natur.453...56K. doi:10.1038/nature06862. PMC 2424287. PMID 18451855.
- ^ Driscoll DA, Gross S (June 2009). "Clinical practice. Prenatal screening for aneuploidy". The New England Journal of Medicine. 360 (24): 2556–62. doi:10.1056/NEJMcp0900134. PMID 19516035.
- ^ a b Jorde LB, Wooding SP (November 2004). "Genetic variation, classification and 'race'". Nature Genetics. 36 (11 Suppl): S28–33. doi:10.1038/ng1435. PMID 15508000.
- ^ a b Tishkoff SA, Kidd KK (November 2004). "Implications of biogeography of human populations for 'race' and medicine". Nature Genetics. 36 (11 Suppl): S21–7. doi:10.1038/ng1438. PMID 15507999.
- ^ Mullaney JM, Mills RE, Pittard WS, Devine SE (October 2010). "Small insertions and deletions (INDELs) in human genomes". Human Molecular Genetics. 19 (R2): R131–6. doi:10.1093/hmg/ddq400. PMC 2953750. PMID 20858594.
- ^ a b Collins FS, Brooks LD, Chakravarti A (December 1998). "A DNA polymorphism discovery resource for research on human genetic variation". Genome Research. 8 (12): 1229–31. doi:10.1101/gr.8.12.1229. PMID 9872978.
- ^ Thomas PE, Klinger R, Furlong LI, Hofmann-Apitius M, Friedrich CM (2011). "Challenges in the association of human single nucleotide polymorphism mentions with unique database identifiers". BMC Bioinformatics. 12 (Suppl 4) S4. doi:10.1186/1471-2105-12-S4-S4. PMC 3194196. PMID 21992066.
- ^ Ke X, Taylor MS, Cardon LR (April 2008). "Singleton SNPs in the human genome and implications for genome-wide association studies". European Journal of Human Genetics. 16 (4): 506–15. doi:10.1038/sj.ejhg.5201987. PMID 18197193.
- ^ Ng PC, Levy S, Huang J, Stockwell TB, Walenz BP, Li K, et al. (August 2008). Schork NJ (ed.). "Genetic variation in an individual human exome". PLOS Genetics. 4 (8) e1000160. doi:10.1371/journal.pgen.1000160. PMC 2493042. PMID 18704161.
- ^ Gross L (October 2007). "A new human genome sequence paves the way for individualized genomics". PLOS Biology. 5 (10) e266. doi:10.1371/journal.pbio.0050266. PMC 1964778. PMID 20076646.
- ^ "First Individual Diploid Human Genome Published By Researchers at J. Craig Venter Institute". J. Craig Venter Institute. 3 September 2007. Archived from the original on 16 July 2011. Retrieved 5 September 2011.
- ^ Levy S, Sutton G, Ng PC, Feuk L, Halpern AL, Walenz BP, et al. (September 2007). "The diploid genome sequence of an individual human". PLOS Biology. 5 (10) e254. doi:10.1371/journal.pbio.0050254. PMC 1964779. PMID 17803354.
- ^ "Understanding Genetics: Human Health and the Genome". The Tech Museum of Innovation. 24 January 2008. Archived from the original on 29 April 2012. Retrieved 5 September 2011.
- ^ "First Diploid Human Genome Sequence Shows We're Surprisingly Different". Science Daily. 4 September 2007. Retrieved 5 September 2011.
- ^ "Copy number variation may stem from replication misstep". EurekAlert!. 27 December 2007. Archived from the original on 7 June 2011. Retrieved 5 September 2011.
- ^ Lee JA, Carvalho CM, Lupski JR (December 2007). "A DNA replication mechanism for generating nonrecurrent rearrangements associated with genomic disorders". Cell. 131 (7): 1235–47. doi:10.1016/j.cell.2007.11.037. PMID 18160035. S2CID 9263608.
- ^ Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, et al. (November 2006). "Global variation in copy number in the human genome". Nature. 444 (7118): 444–54. Bibcode:2006Natur.444..444R. doi:10.1038/nature05329. PMC 2669898. PMID 17122850.
- ^ Dumas L, Kim YH, Karimpour-Fard A, Cox M, Hopkins J, Pollack JR, et al. (September 2007). "Gene copy number variation spanning 60 million years of human and primate evolution". Genome Research. 17 (9): 1266–77. doi:10.1101/gr.6557307. PMC 1950895. PMID 17666543.
- ^ Prüfer K, Racimo F, Patterson N, Jay F, Sankararaman S, Sawyer S, et al. (January 2014). "The complete genome sequence of a Neanderthal from the Altai Mountains". Nature. 505 (7481): 43–9. Bibcode:2014Natur.505...43P. doi:10.1038/nature12886. PMC 4031459. PMID 24352235.
- ^ Pratas D, Hosseini M, Silva R, Pinho A, Ferreira P (20–23 June 2017). "Visualization of Distinct DNA Regions of the Modern Human Relatively to a Neanderthal Genome". Pattern Recognition and Image Analysis. Lecture Notes in Computer Science. Vol. 10255. pp. 235–242. doi:10.1007/978-3-319-58838-4_26. ISBN 978-3-319-58837-7.
- ^ "Human Genetic Variation Fact Sheet". National Institute of General Medical Sciences. 19 August 2011. Archived from the original on 16 September 2008. Retrieved 5 September 2011.
- ^ Rakyan V, Whitelaw E (January 2003). "Transgenerational epigenetic inheritance". Current Biology. 13 (1): R6. Bibcode:2003CBio...13...R6R. doi:10.1016/S0960-9822(02)01377-5. PMID 12526754.
- ^ "Cline". Microsoft Encarta Premium. 2009.
- ^ King RC, Stansfield WD, Mulligan PK (2006). "Cline". A dictionary of genetics (7th ed.). Oxford University Press. ISBN 978-0-19-530761-0.
- ^ Begon M, Townsend CR, Harper JL (2006). Ecology: From individuals to ecosystems (4th ed.). Wiley-Blackwell. p. 10. ISBN 978-1-4051-1117-1.
- ^ "Haplogroup". DNA-Newbie Glossary. International Society of Genetic Genealogy. Retrieved 5 September 2012.
- ^ "The descent of man Chapter 6 – On the Affinities and Genealogy of Man". Darwin-online.org.uk. Retrieved 11 January 2011.
In each great region of the world the living mammals are closely related to the extinct species of the same region. It is, therefore, probable that Africa was formerly inhabited by extinct apes closely allied to the gorilla and chimpanzee; and as these two species are now man's nearest allies, it is somewhat more probable that our early progenitors lived on the African continent than elsewhere. But it is useless to speculate on this subject, for an ape nearly as large as a man, namely the Dryopithecus of Lartet, which was closely allied to the anthropomorphous Hylobates, existed in Europe during the Upper Miocene period; and since so remote a period the earth has certainly undergone many great revolutions, and there has been ample time for migration on the largest scale.
- ^ a b Underhill PA, Shen P, Lin AA, Jin L, Passarino G, Yang WH, et al. (November 2000). "Y chromosome sequence variation and the history of human populations". Nature Genetics. 26 (3): 358–61. doi:10.1038/81685. PMID 11062480. S2CID 12893406.
- ^ a b "New Research Proves Single Origin of Humans in Africa". Science Daily. 19 July 2007. Retrieved 5 September 2011.
- ^ Manica A, Amos W, Balloux F, Hanihara T (July 2007). "The effect of ancient population bottlenecks on human phenotypic variation". Nature. 448 (7151): 346–8. Bibcode:2007Natur.448..346M. doi:10.1038/nature05951. PMC 1978547. PMID 17637668.
- ^ Tishkoff SA, Reed FA, Friedlaender FR, Ehret C, Ranciaro A, Froment A, et al. (May 2009). "The genetic structure and history of Africans and African Americans" (PDF). Science. 324 (5930): 1035–44. Bibcode:2009Sci...324.1035T. doi:10.1126/science.1172257. PMC 2947357. PMID 19407144.
We incorporated geographic data into a Bayesian clustering analysis, assuming no admixture (TESS software) (25) and distinguished six clusters within continental Africa (Fig. 5A). The most geographically widespread cluster (orange) extends from far Western Africa (the Mandinka) through central Africa to the Bantu speakers of South Africa (the Venda and Xhosa) and corresponds to the distribution of the Niger-Kordofanian language family, possibly reflecting the spread of Bantu-speaking populations from near the Nigerian/Cameroon highlands across eastern and southern Africa within the past 5000 to 3000 years (26,27). Another inferred cluster includes the Pygmy and SAK populations (green), with a noncontiguous geographic distribution in central and southeastern Africa, consistent with the STRUCTURE (Fig. 3) and phylogenetic analyses (Fig. 1). Another geographically contiguous cluster extends across northern Africa (blue) into Mali (the Dogon), Ethiopia, and northern Kenya. With the exception of the Dogon, these populations speak an Afroasiatic language. Chadic-speaking and Nilo-Saharan–speaking populations from Nigeria, Cameroon, and central Chad, as well as several Nilo-Saharan–speaking populations from southern Sudan, constitute another cluster (red). Nilo-Saharan and Cushitic speakers from the Sudan, Kenya, and Tanzania, as well as some of the Bantu speakers from Kenya, Tanzania, and Rwanda (Hutu/Tutsi), constitute another cluster (purple), reflecting linguistic evidence for gene flow among these populations over the past ~5000 years (28,29). Finally, the Hadza are the sole constituents of a sixth cluster (yellow), consistent with their distinctive genetic structure identified by PCA and STRUCTURE.
- ^ Schlebusch CM, Jakobsson M (August 2018). "Tales of Human Migration, Admixture, and Selection in Africa". Annual Review of Genomics and Human Genetics. 19: 405–428. doi:10.1146/annurev-genom-083117-021759. PMID 29727585. S2CID 19155657. Retrieved 28 May 2018.
- ^ Zimmer, Carl (17 May 2023). "Study Offers New Twist in How the First Humans Evolved – A new genetic analysis of 290 people suggests that humans emerged at various times and places in Africa". The New York Times. Archived from the original on 17 May 2023. Retrieved 18 May 2023.
- ^ Ragsdale, Aaron P.; et al. (17 May 2023). "A weakly structured stem for human origins in Africa". Nature. 167 (7962): 755–763. Bibcode:2023Natur.617..755R. doi:10.1038/s41586-023-06055-y. PMC 10208968. PMID 37198480.
- ^ Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, et al. (1000 Genomes Project Consortium) (October 2015). "A global reference for human genetic variation". Nature. 526 (7571): 68–74. Bibcode:2015Natur.526...68T. doi:10.1038/nature15393. PMC 4750478. PMID 26432245.
- ^ Li, Hui; Cho, Kelly; Kidd, J.; Kidd, K. (2009). "Genetic landscape of Eurasia and "admixture" in Uyghurs". American Journal of Human Genetics. 85 (6): 934–937. doi:10.1016/j.ajhg.2009.10.024. PMC 2790568. PMID 20004770. S2CID 37591388.
- ^ a b Gabriel SB, Schaffner SF, Nguyen H, Moore JM, Roy J, Blumenstiel B, et al. (June 2002). "The structure of haplotype blocks in the human genome". Science. 296 (5576): 2225–9. Bibcode:2002Sci...296.2225G. doi:10.1126/science.1069424. PMID 12029063. S2CID 10069634.
- ^ a b Lewontin RC (1972). "The Apportionment of Human Diversity". In Theodosius Dobzhansky, Max K. Hecht, William C. Steere (eds.). Evolutionary Biology. Vol. 6. New York: Appleton–Century–Crofts. pp. 381–97. doi:10.1007/978-1-4684-9063-3_14. ISBN 978-1-4684-9065-7. S2CID 21095796.
- ^ Bamshad MJ, Wooding S, Watkins WS, Ostler CT, Batzer MA, Jorde LB (March 2003). "Human population genetic structure and inference of group membership". American Journal of Human Genetics. 72 (3): 578–89. doi:10.1086/368061. PMC 1180234. PMID 12557124.
- ^ Manica, Andrea, William Amos, François Balloux, and Tsunehiko Hanihara. "The Effect of Ancient Population Bottlenecks on Human Phenotypic Variation". Nature 448, no. 7151 (July 2007): 346–48. doi:10.1038/nature05951.
- ^ Jablonski NG (10 January 2014). "The Biological and Social Meaning of Skin Color". Living Color: The Biological and Social Meaning of Skin Color. University of California Press. ISBN 978-0-520-28386-2. JSTOR 10.1525/j.ctt1pn64b.
- ^ Grubert F, Zaugg JB, Kasowski M, Ursu O, Spacek DV, Martin AR, et al. (August 2015). "Genetic Control of Chromatin States in Humans Involves Local and Distal Chromosomal Interactions". Cell. 162 (5): 1051–65. doi:10.1016/j.cell.2015.07.048. PMC 4556133. PMID 26300125.
- ^ Cenik C, Cenik ES, Byeon GW, Grubert F, Candille SI, Spacek D, et al. (November 2015). "Integrative analysis of RNA, translation, and protein levels reveals distinct regulatory variation across humans". Genome Research. 25 (11): 1610–21. doi:10.1101/gr.193342.115. PMC 4617958. PMID 26297486.
- ^ Wu L, Candille SI, Choi Y, Xie D, Jiang L, Li-Pook-Than J, Tang H, Snyder M (July 2013). "Variation and genetic control of protein abundance in humans". Nature. 499 (7456): 79–82. Bibcode:2013Natur.499...79W. doi:10.1038/nature12223. PMC 3789121. PMID 23676674.
- ^ Phillips ML (9 January 2007). "Ethnicity tied to gene expression". The Scientist. Archived from the original on 8 May 2015. Retrieved 5 September 2011.
- ^ Spielman RS, Bastone LA, Burdick JT, Morley M, Ewens WJ, Cheung VG (February 2007). "Common genetic variants account for differences in gene expression among ethnic groups". Nature Genetics. 39 (2): 226–31. doi:10.1038/ng1955. PMC 3005333. PMID 17206142.
- ^ Swaminathan N (9 January 2007). "Ethnic Differences Traced to Variable Gene Expression". Scientific American. Retrieved 5 September 2011.
- ^ Check E (2007). "Genetic expression speaks as loudly as gene type". Nature News. doi:10.1038/news070101-8. S2CID 84380725.
- ^ Bell L (15 January 2007). "Variable gene expression seen in different ethnic groups". BioNews.org. Archived from the original on 26 March 2016. Retrieved 5 September 2011.
- ^ Kamrani K (28 February 2008). "Differences of gene expression between human populations". Anthropology.net. Archived from the original on 30 September 2011. Retrieved 5 September 2011.
- ^ Storey JD, Madeoy J, Strout JL, Wurfel M, Ronald J, Akey JM (March 2007). "Gene-expression variation within and among human populations". American Journal of Human Genetics. 80 (3): 502–9. doi:10.1086/512017. PMC 1821107. PMID 17273971.
- ^ a b c Graves JL (2006). "What We Know and What We Don't Know: Human Genetic Variation and the Social Construction of Race". Is Race "Real"?. Social Science Research Council. Archived from the original on 3 June 2019. Retrieved 22 January 2011.
- ^ Keita SO, Kittles RA, Royal CD, Bonney GE, Furbert-Harris P, Dunston GM, Rotimi CN (November 2004). "Conceptualizing human variation". Nature Genetics. 36 (11 Suppl): S17–20. doi:10.1038/ng1455. PMID 15507998.
- ^ Hawks J (2013). Significance of Neandertal and Denisovan Genomes in Human Evolution. Vol. 42. Annual Reviews. pp. 433–49. doi:10.1146/annurev-anthro-092412-155548. ISBN 978-0-8243-1942-7.
{{cite book}}:|journal=ignored (help) - ^ * Wright S (1978). Evolution and the Genetics of Populations. Vol. 4, Variability Within and Among Natural Populations. Chicago, Illinois: Univ. Chicago Press. p. 438.
- ^ Long JC, Kittles RA (August 2003). "Human genetic diversity and the nonexistence of biological races". Human Biology. 75 (4): 449–71. doi:10.1353/hub.2003.0058. PMID 14655871. S2CID 26108602.
- ^ Harris, Kelley; Nielsen, Rasmus (June 2016). "The Genetic Cost of Neanderthal Introgression". Genetics. 203 (2): 881–891. doi:10.1534/genetics.116.186890. ISSN 0016-6731. PMC 4896200. PMID 27038113.
- ^ Wall, Jeffrey D.; Yang, Melinda A.; Jay, Flora; Kim, Sung K.; Durand, Eric Y.; Stevison, Laurie S.; Gignoux, Christopher; Woerner, August; Hammer, Michael F.; Slatkin, Montgomery (May 2013). "Higher Levels of Neanderthal Ancestry in East Asians than in Europeans". Genetics. 194 (1): 199–209. doi:10.1534/genetics.112.148213. ISSN 0016-6731. PMC 3632468. PMID 23410836.
- ^ a b Reich D, Green RE, Kircher M, Krause J, Patterson N, Durand EY, et al. (December 2010). "Genetic history of an archaic hominin group from Denisova Cave in Siberia". Nature. 468 (7327): 1053–60. Bibcode:2010Natur.468.1053R. doi:10.1038/nature09710. PMC 4306417. PMID 21179161.
- ^ Wall JD, Yang MA, Jay F, Kim SK, Durand EY, Stevison LS, et al. (May 2013). "Higher levels of neanderthal ancestry in East Asians than in Europeans". Genetics. 194 (1): 199–209. doi:10.1534/genetics.112.148213. PMC 3632468. PMID 23410836.
- ^ Hammer MF, Woerner AE, Mendez FL, Watkins JC, Wall JD (September 2011). "Genetic evidence for archaic admixture in Africa". Proceedings of the National Academy of Sciences of the United States of America. 108 (37): 15123–8. Bibcode:2011PNAS..10815123H. doi:10.1073/pnas.1109300108. PMC 3174671. PMID 21896735.
- ^ Durvasula A, Sankararaman S (February 2020). "Recovering signals of ghost archaic introgression in African populations". Science Advances. 6 (7) eaax5097. Bibcode:2020SciA....6.5097D. doi:10.1126/sciadv.aax5097. PMC 7015685. PMID 32095519.
- ^ Kim, Byung-Ju; Choi, Jaejin; Kim, Sung-Hou (2023). "On whole-genome demography of world's ethnic groups and individual genomic identity". Scientific Reports. 13 (1): 6316. Bibcode:2023NatSR..13.6316K. doi:10.1038/s41598-023-32325-w. PMC 10113208. PMID 37072456.
- ^ Wohns, Anthony Wilder; Wong, Yan; Jeffery, Ben; Akbari, Ali; Mallick, Swapan; Pinhasi, Ron; Patterson, Nick; Reich, David; Kelleher, Jerome; McVean, Gil (15 April 2021). "A unified genealogy of modern and ancient genomes". bioRxiv 10.1101/2021.02.16.431497.
- ^ Biddanda A, Rice DP, Novembre J (2020). "A variant-centric perspective on geographic patterns of human allele frequency variation". eLife. 9. doi:10.7554/eLife.60107. PMC 7755386. PMID 33350384.
- ^ Bergström A, McCarthy SA, Hui R, Almarri MA, Ayub Q, Danecek P, et al. (2020). "Insights into human genetic variation and population history from 929 diverse genomes". Science. 367 (6484). doi:10.1126/science.aay5012. PMC 7115999. PMID 32193295.
- ^ Albers, Patrick K.; McVean, Gil (13 September 2018). "Dating genomic variants and shared ancestry in population-scale sequencing data". bioRxiv. 18 (1) 416610. doi:10.1101/416610. PMC 6992231. PMID 31951611.
- ^ Coop G, Pickrell JK, Novembre J, Kudaravalli S, Li J, Absher D, et al. (June 2009). Schierup MH (ed.). "The role of geography in human adaptation". PLOS Genetics. 5 (6) e1000500. doi:10.1371/journal.pgen.1000500. PMC 2685456. PMID 19503611. See also: Brown D (22 June 2009). "Among Many Peoples, Little Genomic Variety". The Washington Post. Retrieved 25 June 2009.. "Geography And History Shape Genetic Differences in Humans". Science Daily. 7 June 2009. Retrieved 25 June 2009..
- ^ Hancock AM, Witonsky DB, Ehler E, Alkorta-Aranburu G, Beall C, Gebremedhin A, et al. (May 2010). "Colloquium paper: human adaptations to diet, subsistence, and ecoregion are due to subtle shifts in allele frequency". Proceedings of the National Academy of Sciences of the United States of America. 107 (Suppl 2): 8924–30. Bibcode:2010PNAS..107.8924H. doi:10.1073/pnas.0914625107. PMC 3024024. PMID 20445095.
- ^ Duforet-Frebourg N, Luu K, Laval G, Bazin E, Blum MG (April 2016). "Detecting Genomic Signatures of Natural Selection with Principal Component Analysis: Application to the 1000 Genomes Data". Molecular Biology and Evolution. 33 (4): 1082–93. arXiv:1504.04543. doi:10.1093/molbev/msv334. PMC 4776707. PMID 26715629.
- ^ Cunha, Eugénia; Ubelaker, Douglas H. (23 December 2019). "Evaluation of ancestry from human skeletal remains: a concise review". Forensic Sciences Research. 5 (2): 89–97. doi:10.1080/20961790.2019.1697060. ISSN 2096-1790. PMC 7476619. PMID 32939424.
- ^ Thomas, Richard M.; Parks, Connie L.; Richard, Adam H. (July 2017). "Accuracy Rates of Ancestry Estimation by Forensic Anthropologists Using Identified Forensic Cases". Journal of Forensic Sciences. 62 (4): 971–974. doi:10.1111/1556-4029.13361. ISSN 1556-4029. PMID 28133721. S2CID 3453064.
- ^ Winkler CA, Nelson GW, Smith MW (2010). "Admixture mapping comes of age". Annual Review of Genomics and Human Genetics. 11: 65–89. doi:10.1146/annurev-genom-082509-141523. PMC 7454031. PMID 20594047.
- ^ Bryc K, Auton A, Nelson MR, Oksenberg JR, Hauser SL, Williams S, et al. (January 2010). "Genome-wide patterns of population structure and admixture in West Africans and African Americans". Proceedings of the National Academy of Sciences of the United States of America. 107 (2): 786–91. Bibcode:2010PNAS..107..786B. doi:10.1073/pnas.0909559107. PMC 2818934. PMID 20080753.
- ^ Beleza S, Campos J, Lopes J, Araújo II, Hoppfer Almada A, Correia e Silva A, et al. (2012). "The admixture structure and genetic variation of the archipelago of Cape Verde and its implications for admixture mapping studies". PLOS ONE. 7 (11) e51103. Bibcode:2012PLoSO...751103B. doi:10.1371/journal.pone.0051103. PMC 3511383. PMID 23226471.
- ^ Arrieta-Bolaños E, Madrigal JA, Shaw BE (2012). "Human leukocyte antigen profiles of Latin American populations: differential admixture and its potential impact on hematopoietic stem cell transplantation". Bone Marrow Research. 2012: 1–13. doi:10.1155/2012/136087. PMC 3506882. PMID 23213535.
- ^ Gower, Barbara A.; Fernández, José R.; Beasley, T. Mark; Shriver, Mark D.; Goran, Michael I. (April 2003). "Using genetic admixture to explain racial differences in insulin-related phenotypes". Diabetes. 52 (4): 1047–1051. doi:10.2337/diabetes.52.4.1047. ISSN 0012-1797. PMID 12663479.
- ^ Mountain, Joanna L.; Risch, Neil (November 2004). "Assessing genetic contributions to phenotypic differences among 'racial' and 'ethnic' groups". Nature Genetics. 36 (11 Suppl): S48–53. doi:10.1038/ng1456. ISSN 1061-4036. PMID 15508003.
- ^ "UniProt". www.uniprot.org. Retrieved 18 February 2025.
- ^ a b c Sun, Kathie Y.; Bai, Xiaodong; Chen, Siying; Bao, Suying; Zhang, Chuanyi; Kapoor, Manav; Backman, Joshua; Joseph, Tyler; Maxwell, Evan; Mitra, George; Gorovits, Alexander; Mansfield, Adam; Boutkov, Boris; Gokhale, Sujit; Habegger, Lukas (July 2024). "A deep catalogue of protein-coding variation in 983,578 individuals". Nature. 631 (8021): 583–592. Bibcode:2024Natur.631..583S. doi:10.1038/s41586-024-07556-0. ISSN 1476-4687. PMC 11254753. PMID 38768635.
- ^ Risch N, Burchard E, Ziv E, Tang H (July 2002). "Categorization of humans in biomedical research: genes, race and disease". Genome Biology. 3 (7) comment2007. doi:10.1186/gb-2002-3-7-comment2007. PMC 139378. PMID 12184798.
- ^ Lu YF, Goldstein DB, Angrist M, Cavalleri G (July 2014). "Personalized medicine and human genetic diversity". Cold Spring Harbor Perspectives in Medicine. 4 (9) a008581. doi:10.1101/cshperspect.a008581. PMC 4143101. PMID 25059740.
- ^ Limborska SA, Balanovsky OP, Balanovskaya EV, Slominsky PA, Schadrina MI, Livshits LA, et al. (2002). "Analysis of CCR5Delta32 geographic distribution and its correlation with some climatic and geographic factors". Human Heredity. 53 (1): 49–54. doi:10.1159/000048605. PMID 11901272. S2CID 1538974.
- ^ Tishkoff SA, Verrelli BC (2003). "Patterns of human genetic diversity: implications for human evolutionary history and disease". Annual Review of Genomics and Human Genetics. 4 (1): 293–340. doi:10.1146/annurev.genom.4.070802.110226. PMID 14527305.
Further reading
[edit]- Race, Ethnicity (October 2005). "The use of racial, ethnic, and ancestral categories in human genetics research". American Journal of Human Genetics. 77 (4): 519–32. doi:10.1086/491747. PMC 1275602. PMID 16175499.
- Altmüller J, Palmer LJ, Fischer G, Scherb H, Wjst M (November 2001). "Genomewide scans of complex human diseases: true linkage is hard to find". American Journal of Human Genetics. 69 (5): 936–50. doi:10.1086/324069. PMC 1274370. PMID 11565063.
- Aoki K (2002). "Sexual selection as a cause of human skin colour variation: Darwin's hypothesis revisited". Annals of Human Biology. 29 (6): 589–608. doi:10.1080/0301446021000019144. PMID 12573076. S2CID 22703861.
- Bamshad M, Wooding S, Salisbury BA, Stephens JC (August 2004). "Deconstructing the relationship between genetics and race". Nature Reviews. Genetics. 5 (8): 598–609. doi:10.1038/nrg1401. PMID 15266342. S2CID 12378279. reprint-zip
- Bamshad M, Wooding SP (February 2003). "Signatures of natural selection in the human genome". Nature Reviews. Genetics. 4 (2): 99–111. doi:10.1038/nrg999. PMID 12560807. S2CID 13722452.
- Cann RL, Stoneking M, Wilson AC (1987). "Mitochondrial DNA and human evolution". Nature. 325 (6099): 31–36. Bibcode:1987Natur.325...31C. doi:10.1038/325031a0. PMID 3025745. S2CID 4285418.
- Cardon LR, Abecasis GR (March 2003). "Using haplotype blocks to map human complex trait loci" (PDF). Trends in Genetics. 19 (3): 135–40. doi:10.1016/S0168-9525(03)00022-2. PMID 12615007.
- Cavalli-Sforza LL, Feldman MW (March 2003). "The application of molecular genetic approaches to the study of human evolution". Nature Genetics. 33 Suppl (3s): 266–75. doi:10.1038/ng1113. PMID 12610536. S2CID 8314161.
- Collins FS (November 2004). "What we do and don't know about 'race', 'ethnicity', genetics and health at the dawn of the genome era". Nature Genetics. 36 (11 Suppl): S13–15. doi:10.1038/ng1436. PMID 15507997. S2CID 26968169.
- Collins FS, Green ED, Guttmacher AE, Guyer MS (April 2003). "A vision for the future of genomics research". Nature. 422 (6934): 835–47. Bibcode:2003Natur.422..835C. doi:10.1038/nature01626. PMID 12695777. S2CID 205209730.
- Ebersberger I, Metzler D, Schwarz C, Pääbo S (June 2002). "Genomewide comparison of DNA sequences between humans and chimpanzees". American Journal of Human Genetics. 70 (6): 1490–97. doi:10.1086/340787. PMC 379137. PMID 11992255.
- Edwards AW (August 2003). "Human genetic diversity: Lewontin's fallacy". BioEssays. 25 (8): 798–801. doi:10.1002/bies.10315. PMID 12879450.
- Foster MW, Sharp RR (October 2004). "Beyond race: towards a whole-genome perspective on human populations and genetic variation". Nature Reviews. Genetics. 5 (10): 790–96. doi:10.1038/nrg1452. PMID 15510170. S2CID 25764082.
- Foster MW, Sharp RR, Freeman WL, Chino M, Bernsten D, Carter TH (June 1999). "The role of community review in evaluating the risks of human genetic variation research". American Journal of Human Genetics. 64 (6): 1719–27. doi:10.1086/302415. PMC 1377916. PMID 10330360.
- Gabriel SB, Schaffner SF, Nguyen H, Moore JM, Roy J, Blumenstiel B, Higgins J, DeFelice M, Lochner A, Faggart M, Liu-Cordero SN, Rotimi C, Adeyemo A, Cooper R, Ward R, Lander ES, Daly MJ, Altshuler D (June 2002). "The structure of haplotype blocks in the human genome". Science. 296 (5576): 2225–29. Bibcode:2002Sci...296.2225G. doi:10.1126/science.1069424. PMID 12029063. S2CID 10069634.
- Harding RM, Healy E, Ray AJ, Ellis NS, Flanagan N, Todd C, Dixon C, Sajantila A, Jackson IJ, Birch-Machin MA, Rees JL (April 2000). "Evidence for variable selective pressures at MC1R". American Journal of Human Genetics. 66 (4): 1351–61. doi:10.1086/302863. PMC 1288200. PMID 10733465.
- Ingman M, Kaessmann H, Pääbo S, Gyllensten U (December 2000). "Mitochondrial genome variation and the origin of modern humans". Nature. 408 (6813): 708–13. Bibcode:2000Natur.408..708I. doi:10.1038/35047064. PMID 11130070. S2CID 52850476.
- The International Hapmap Consortium (December 2003). "The International HapMap Project". Nature. 426 (6968): 789–96. Bibcode:2003Natur.426..789G. doi:10.1038/nature02168. hdl:2027.42/62838. PMID 14685227. S2CID 4387110.
- The International Hapmap Consortium (June 2004). "Integrating ethics and science in the International HapMap Project". Nature Reviews. Genetics. 5 (6): 467–75. doi:10.1038/nrg1351. PMC 2271136. PMID 15153999.
- Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, et al. (February 2001). "Initial sequencing and analysis of the human genome". Nature. 409 (6822): 860–921. Bibcode:2001Natur.409..860L. doi:10.1038/35057062. hdl:2027.42/62798. PMID 11237011.
- Jorde LB, Bamshad M, Rogers AR (February 1998). "Using mitochondrial and nuclear DNA markers to reconstruct human evolution" (PDF). BioEssays. 20 (2): 126–36. doi:10.1002/(SICI)1521-1878(199802)20:2<126::AID-BIES5>3.0.CO;2-R. PMID 9631658. S2CID 17203268. Archived from the original (PDF) on 28 November 2007. Retrieved 28 October 2007.
- Jorde LB, Watkins WS, Bamshad MJ, Dixon ME, Ricker CE, Seielstad MT, Batzer MA (March 2000). "The distribution of human genetic diversity: a comparison of mitochondrial, autosomal, and Y-chromosome data". American Journal of Human Genetics. 66 (3): 979–88. doi:10.1086/302825. PMC 1288178. PMID 10712212.
- Jorde LB, Watkins WS, Kere J, Nyman D, Eriksson AW (2000). "Gene mapping in isolated populations: new roles for old friends?". Human Heredity. 50 (1): 57–65. doi:10.1159/000022891. PMID 10545758. S2CID 26960216.
- Kaessmann H, Heissig F, von Haeseler A, Pääbo S (May 1999). "DNA sequence variation in a non-coding region of low recombination on the human X chromosome". Nature Genetics. 22 (1): 78–81. doi:10.1038/8785. PMID 10319866. S2CID 9153915.
- Kaessmann H, Wiebe V, Weiss G, Pääbo S (February 2001). "Great ape DNA sequences reveal a reduced diversity and an expansion in humans". Nature Genetics. 27 (2): 155–56. doi:10.1038/84773. PMID 11175781. S2CID 19384784.
- Keita SO, Kittles RA (1997). "The Persistence of Racial Thinking and the Myth of Racial Divergence". American Anthropologist. 99 (3): 534–44. doi:10.1525/aa.1997.99.3.534.
- Marks J (1995). Human Biodiversity: Genes, Race, and History. Aldine Transaction. ISBN 978-0-202-02033-4.
- Mountain JL, Risch N (November 2004). "Assessing genetic contributions to phenotypic differences among 'racial' and 'ethnic' groups". Nature Genetics. 36 (11 Suppl): S48–53. doi:10.1038/ng1456. PMID 15508003.
- Pääbo S (January 2003). "The mosaic that is our genome". Nature. 421 (6921): 409–12. Bibcode:2003Natur.421..409P. doi:10.1038/nature01400. PMID 12540910.
- Ramachandran S, Deshpande O, Roseman CC, Rosenberg NA, Feldman MW, Cavalli-Sforza LL (November 2005). "Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa". Proceedings of the National Academy of Sciences of the United States of America. 102 (44): 15942–47. Bibcode:2005PNAS..10215942R. doi:10.1073/pnas.0507611102. PMC 1276087. PMID 16243969.
- Relethford JH (August 2002). "Apportionment of global human genetic diversity based on craniometrics and skin color". American Journal of Physical Anthropology. 118 (4): 393–98. CiteSeerX 10.1.1.473.5972. doi:10.1002/ajpa.10079. PMID 12124919. S2CID 8717358.
- Sankar P, Cho MK (November 2002). "Genetics. Toward a new vocabulary of human genetic variation". Science. 298 (5597): 1337–38. doi:10.1126/science.1074447. PMC 2271140. PMID 12434037.
- Sankar P, Cho MK, Condit CM, Hunt LM, Koenig B, Marshall P, Lee SS, Spicer P (June 2004). "Genetic research and health disparities". JAMA. 291 (24): 2985–89. doi:10.1001/jama.291.24.2985. PMC 2271142. PMID 15213210.
- Serre D, Pääbo S (September 2004). "Evidence for gradients of human genetic diversity within and among continents". Genome Research. 14 (9): 1679–85. doi:10.1101/gr.2529604. PMC 515312. PMID 15342553.
- Templeton AR (1998). "Human Races: A Genetic and Evolutionary Perspective". American Anthropologist. 100 (3): 632–50. doi:10.1525/aa.1998.100.3.632.
- Weiss KM (1998). "Coming to Terms with Human Variation". Annual Review of Anthropology. 27: 273–300. doi:10.1146/annurev.anthro.27.1.273.
- Weiss KM, Terwilliger JD (October 2000). "How many diseases does it take to map a gene with SNPs?". Nature Genetics. 26 (2): 151–57. doi:10.1038/79866. PMID 11017069. S2CID 685795.
- Yu N, Jensen-Seaman MI, Chemnick L, Kidd JR, Deinard AS, Ryder O, Kidd KK, Li WH (August 2003). "Low nucleotide diversity in chimpanzees and bonobos". Genetics. 164 (4): 1511–18. doi:10.1093/genetics/164.4.1511. PMC 1462640. PMID 12930756.
- Zietkiewicz E, Yotova V, Gehl D, Wambach T, Arrieta I, Batzer M, Cole DE, Hechtman P, Kaplan F, Modiano D, Moisan JP, Michalski R, Labuda D (November 2003). "Haplotypes in the dystrophin DNA segment point to a mosaic origin of modern human diversity". American Journal of Human Genetics. 73 (5): 994–1015. doi:10.1086/378777. PMC 1180505. PMID 14513410.
- Pennisi E (December 2007). "Breakthrough of the year. Human genetic variation". Science. 318 (5858): 1842–43. doi:10.1126/science.318.5858.1842. PMID 18096770.
- Ramachandran S, Tang H, Gutenkunst RN, Bustamante CD (2010). "Genetics and Genomics of Human Population Structure". In Speicher MR, Antonarakis SE, Motulsky AG (eds.). Vogel and Motulsky's Human Genetics: Problems and Approaches (4th ed.). Springer. ISBN 978-3-540-37653-8.
External links
[edit]Human genetic variation
View on GrokipediaFundamentals
Definition and Scope
Human genetic variation refers to the differences in nucleotide sequences, chromosomal arrangements, and gene regulatory elements among individuals within the species Homo sapiens. These differences, primarily inherited, result from mutations, recombination, and selection pressures acting over evolutionary time, and they underlie phenotypic diversity in traits such as disease susceptibility, physical characteristics, and responses to environmental factors.[2] The study of such variation focuses on heritable genomic differences rather than somatic mutations or epigenetic modifications, though the latter can interact with genetic factors.[5] The scope of human genetic variation extends beyond simple base-pair substitutions to include a hierarchy of variant types. Single-nucleotide polymorphisms (SNPs), substitutions at individual bases occurring in at least 1% of the population, represent the most abundant class, with catalogs identifying tens of millions across global samples. Insertions and deletions (indels) of small segments (typically 1-50 base pairs) add further diversity, while copy number variations (CNVs) involve duplications or deletions of larger genomic regions (often thousands of base pairs), collectively accounting for a substantial portion of inter-individual differences—estimated at up to 12% of genomic sequence when including structural variants. Larger structural variants, such as inversions, translocations, and segmental duplications, also contribute, with recent sequencing efforts revealing over 40,000 CNVs in diverse cohorts. Mitochondrial DNA and sex chromosome variants further expand this scope, reflecting uniparental inheritance patterns.[7][5][8] Quantitatively, the extent of variation is modest relative to genome size, with average pairwise nucleotide diversity (π)—the probability that two randomly selected nucleotides differ—estimated at approximately 0.00088 (or 0.088%), equivalent to about 6-8 million differences per diploid genome of 6 billion base pairs. This low diversity reflects a history of population bottlenecks and expansions, yet it suffices to explain significant functional impacts, as rare and common variants together influence thousands of genes. Comprehensive projects like the 1000 Genomes Project have documented over 88 million variants (including 84 million SNPs and indels) across 2,500+ individuals from 26 populations, underscoring that while most variation (~85-90%) occurs within continental groups, structured differences between groups enable population-specific inferences.[9][1] The integration of whole-genome sequencing has refined these estimates, revealing that structural variants alone can differ by several percent of genome length between individuals, amplifying the functional scope beyond nucleotide-level metrics.[10][8]Types of Variants
Human genetic variants are classified by their molecular nature and scale, encompassing small-scale changes such as single-nucleotide variants (SNVs) and insertions/deletions (indels), as well as larger structural variants including copy number variants (CNVs) and other rearrangements.[7] These variants arise primarily from errors in DNA replication, repair, or recombination and collectively account for approximately 0.4% sequence divergence from a reference genome across individuals.[7] SNVs, the most abundant type, involve substitution of one nucleotide for another at a specific position and occur at an average frequency of about 3.5 to 5 million per diploid genome.[7] [11] When present in at least 1% of a population, SNVs are designated single-nucleotide polymorphisms (SNPs), which approximate one difference every 1,000 base pairs when comparing two haploid genomes.[2] SNPs comprise roughly 90% of known human polymorphisms and can influence traits through effects on protein coding, gene regulation, or splicing.[12] Indels represent insertions or deletions of nucleotides, typically ranging from 1 to 50 base pairs, with an average of around 500,000 to 600,000 such events per genome, collectively spanning about 2 million nucleotides.[7] [11] These variants often disrupt reading frames or alter protein function, as seen in conditions like cystic fibrosis caused by a 3-base-pair deletion in the CFTR gene.[12] CNVs entail duplications or deletions that modify the copy number of genomic segments, usually 1 kilobase to several megabases in length, and overlap with protein-coding regions in ways that contribute to complex diseases such as autism and schizophrenia.[12] Larger structural variants, including inversions, translocations, and balanced rearrangements exceeding 50 base pairs, number approximately 25,000 per genome and affect over 20 million nucleotides, representing nearly half of which involve tandem repeats.[7] These structural changes can alter gene dosage, disrupt regulatory elements, or promote genomic instability.[7]Mechanisms of Origin
Human genetic variation primarily originates from mutations, which introduce changes in the DNA sequence during replication, repair, or due to external factors.[2] The de novo mutation rate in humans is approximately 1.2 × 10^{-8} per nucleotide per generation, resulting in about 60-100 new mutations per diploid genome.[13] These include single nucleotide variants (SNVs), the most common form, occurring roughly every 1,000 base pairs between individuals.[2] Point mutations, such as transitions and transversions, arise from errors in DNA polymerase activity, spontaneous chemical changes like deamination, or unrepaired damage from ionizing radiation and chemicals.[2] Insertions and deletions (indels) typically result from slippage during replication in repetitive sequences or errors in double-strand break (DSB) repair.[14] Structural variants (SVs), encompassing copy number variations (CNVs) and larger rearrangements, often stem from DSBs—occurring at rates up to 50 per cell cycle—repaired via error-prone pathways like non-homologous end joining (NHEJ) or microhomology-mediated end joining (MMEJ).[14] Non-allelic homologous recombination (NAHR), a form of unequal recombination between misaligned repetitive sequences, generates recurrent CNVs such as deletions and duplications, particularly in regions with low-copy repeats or segmental duplications.[15] [16] Recombination hotspots can also elevate local diversity through biased gene conversion, favoring GC alleles and contributing to subtle sequence changes beyond mere reshuffling of existing variants.[17] While sexual recombination primarily combines existing alleles into novel haplotypes during meiosis, it indirectly fosters variation by exposing mutants to selection.[2] De novo mutations increase with advanced paternal age due to higher numbers of cell divisions in spermatogenesis, accounting for a significant portion of heritable variation.[18]Measurement and Analysis
Molecular Markers
Molecular markers are specific DNA sequence variations that serve as identifiable landmarks for studying genetic differences among individuals and populations. In human genetics, these markers enable the quantification of variation through genotyping and sequencing technologies, facilitating analyses of ancestry, migration, and disease susceptibility. Common markers include single nucleotide polymorphisms (SNPs), insertions/deletions (indels), short tandem repeats (STRs), and copy number variations (CNVs), each differing in mutation rate, abundance, and utility for population-level studies.[7] SNPs represent the most prevalent type of molecular marker, consisting of single base substitutions occurring at frequencies greater than 1% in populations to qualify as polymorphic. The human genome harbors approximately 10 million common SNPs (minor allele frequency >1%), with over 88 million total variants, predominantly SNPs, identified across diverse global samples in the 1000 Genomes Project Phase 3, which sequenced 2,504 individuals from 26 populations. SNPs are biallelic, stable, and amenable to high-throughput genotyping via arrays, making them ideal for genome-wide association studies (GWAS) and principal component analysis (PCA) of population structure. Their low mutation rate (~10^{-8} per site per generation) ensures reliability for inferring historical relationships over evolutionary timescales.[7][1] Indels, encompassing small insertions or deletions of nucleotides (typically 1-50 bp), constitute the second most common variant class, accounting for about 0.1% of total variants but contributing significantly to protein-coding changes. In the 1000 Genomes dataset, short indels numbered around 3.6 million, often co-occurring with SNPs in non-coding regions. These markers are detected primarily through whole-genome sequencing and provide complementary resolution to SNPs, particularly in regions of high indel density like microsatellites, though their ascertainment can be biased in array-based methods.[1] STRs, or microsatellites, are tandem repeats of 1-6 bp motifs that exhibit high polymorphism due to replication slippage, with mutation rates 10^3 to 10^5 times higher than SNPs. In humans, thousands of such loci exist, used historically in linkage mapping and forensics (e.g., CODIS panel of 20 STRs), but their homoplasy limits utility in deep phylogenetic studies compared to SNPs. Recent sequencing efforts have cataloged over 1 million microsatellite variants, revealing population-specific allele distributions.[12] CNVs involve larger-scale duplications, deletions, or inversions (>50 bp), impacting 12-18% of the genome by base coverage despite comprising fewer events per individual (typically 1,000-2,500 per diploid genome). Databases like dbVar annotate over 1.5 million CNVs from structural variant consortia, with recent whole-genome sequencing uncovering rare CNVs influencing complex traits. Unlike SNPs, CNVs often span genes, contributing disproportionately to phenotypic variation, as evidenced by their enrichment in disease-associated regions, though detection requires specialized algorithms to resolve from sequencing noise.[19][1]Population Genetic Metrics
The fixation index (FST), introduced by Sewall Wright, measures the proportion of total genetic variation attributable to differences between subpopulations relative to the total population, calculated as FST = (HT - HS) / HT, where HT is total heterozygosity and HS is average subpopulation heterozygosity.[20] In human populations, genome-wide FST values between continental groups typically range from 0.05 to 0.15, with an overall estimate of approximately 0.11, indicating modest differentiation despite substantial within-population variation.[21] These values derive primarily from single nucleotide polymorphism (SNP) data and are lower than in many other species, reflecting recent common ancestry and ongoing gene flow, though rare variants can inflate estimates if not accounted for.[22] Nucleotide diversity (π), the average pairwise nucleotide differences per site, quantifies within-population variation and equals 4Neμ under neutrality, where Ne is effective population size and μ is mutation rate.[23] Human genome-wide π averages 7.5 × 10-4, with African populations showing higher values (around 8.5 × 10-4) than non-Africans (around 6.8 × 10-4), as sequenced in noncoding regions across diverse samples.[9] This low diversity—about tenfold lower than in chimpanzees—stems from historical bottlenecks during the out-of-Africa migration, reducing standing variation outside Africa.[24] Expected heterozygosity (He), the probability that two alleles at a locus differ, serves as a SNP-based analog to π and is estimated as 2p(1-p) averaged over loci, where p is allele frequency. In humans, autosomal He averages approximately 0.001 across common SNPs, with regional variation mirroring π patterns: higher in Africans due to deeper coalescence times.[25] Genome-wide scans reveal He gradients correlating with distance from East Africa, underscoring serial founder effects in non-African expansion. Effective population size (Ne), the idealized population size yielding observed drift, is inferred from linkage disequilibrium decay or polymorphism levels; long-term human Ne is estimated at 10,000–20,000, far below census sizes, due to bottlenecks around 70,000 years ago reducing it to ~1,000–10,000 transiently.[26] Recent Ne has increased to ~4,000 in the last 10,000 years per LD analyses, reflecting population growth post-agriculture.[26] These metrics collectively inform admixture detection and drift quantification, with tools like PSMC estimating temporal Ne trajectories from individual genomes.[27]Statistical and Computational Tools
Principal component analysis (PCA) serves as a fundamental statistical tool for visualizing and summarizing patterns of genetic variation in human populations. This dimensionality reduction method transforms high-dimensional single nucleotide polymorphism (SNP) data into principal components that capture the largest variances, enabling the detection of population structure and ancestry-related clustering without assuming predefined groups.[28] In human genomics, PCA applied to whole-genome data often reveals continental-scale gradients aligning with geographic origins, as demonstrated in analyses of thousands of individuals from diverse ancestries.[29] Specialized implementations, such as those optimized for large-scale genotyping arrays, address computational demands by incorporating best practices for preprocessing and outlier detection to minimize artifacts from linkage disequilibrium or sample relatedness.[29] The fixation index (FST) quantifies population differentiation by measuring the proportion of total genetic variance explained by differences between subpopulations, typically estimated from allele frequency divergences across loci. In human genetic studies, FST values between continental groups average around 0.10-0.15, reflecting moderate differentiation shaped by historical migration and drift, with estimators adjusted for rare variants to avoid inflation in low-frequency SNP-heavy datasets.[20] Computational pipelines compute genome-wide FST scans to identify outlier regions potentially under selection, using formulas like Weir and Cockerham's unbiased estimator on phased haplotypes or unphased genotypes from sequencing projects such as the 1000 Genomes.[30] Bayesian model-based approaches, exemplified by the STRUCTURE software, infer discrete population clusters and individual admixture proportions from multilocus genotype data under assumptions of Hardy-Weinberg equilibrium within clusters and linkage equilibrium between loci. Originally developed for investigating substructure in simulated and empirical human datasets, STRUCTURE employs Markov chain Monte Carlo sampling to estimate the number of ancestral populations (K) and has been applied to detect fine-scale structure in global human samples.[31] For larger datasets, ADMIXTURE extends similar maximum-likelihood clustering in a supervised or unsupervised manner, accelerating inference on millions of SNPs while producing ancestry proportions comparable to STRUCTURE but with reduced runtime.[32] Admixture models computationally deconvolve ancestry contributions in hybrid populations by modeling linkage disequilibrium decay from admixture events, with tools like RFMix enabling local ancestry inference at the haplotype level for downstream association studies. These methods integrate probabilistic frameworks to trace segment lengths informative of admixture timing, as validated in simulations and real admixed cohorts such as African Americans or Latin Americans.[33] Recent advances incorporate machine learning to refine global ancestry predictions, enhancing accuracy over traditional PCA in complex demographic scenarios while maintaining interpretability.[34]Evolutionary History
Out-of-Africa Expansion
The Out-of-Africa expansion refers to the dispersal of anatomically modern humans, Homo sapiens, from Africa to populate the rest of the world, occurring primarily between 70,000 and 50,000 years ago.[35] This model posits that modern human populations outside Africa descend from a small subset of African ancestors who underwent a significant population bottleneck during migration, resulting in reduced genetic diversity in non-African groups compared to those remaining in Africa.[36] Genetic data from mitochondrial DNA (mtDNA), Y-chromosome, and autosomal markers consistently support this framework, with coalescence ages for non-African lineages tracing back to African origins within this timeframe.[37] African populations exhibit the highest levels of genetic variation among humans, reflecting a longer history of habitation and larger effective population sizes on the continent.[38] For instance, sub-Saharan African groups harbor nearly a million more genetic variants per genome than non-Africans on average, underscoring Africa's role as the cradle of human genetic diversity.[39] In contrast, non-African populations show a subset of this diversity, consistent with a serial founder effect where successive migratory groups carried progressively smaller samples of genetic variation away from the origin point.[40] This pattern manifests as a decline in heterozygosity with increasing geographic distance from East Africa, observable in both neutral markers and linkage disequilibrium decay.[36] Uniparental inheritance markers provide direct evidence for the expansion's timing and route. Mitochondrial DNA haplogroups outside Africa derive from African L3 lineages that emerged around 70,000 years ago, with non-African M and N clades appearing post-dispersal.[41] Similarly, Y-chromosome haplogroups in Eurasians and beyond coalesce to African ancestors dated to approximately 50,000–60,000 years ago, with markers like those in haplogroup CT supporting a single major exodus rather than multiple independent waves.[42] These uniparental systems reveal star-like phylogenies in non-Africans indicative of rapid expansion from small founding groups, while African lineages display deeper branching and greater basal diversity.[43] Autosomal genome-wide studies reinforce the bottleneck's severity, estimating the non-African ancestral population at 1,000–10,000 individuals during the out-of-Africa event, leading to elevated mutational loads and reduced allelic richness in descendant populations.[44] Ancient DNA from early Eurasian sites confirms continuity with modern non-African genomes, showing minimal archaic admixture at this stage and primary ancestry from the African emigrants.[45] Climatic and archaeological correlates, such as favorable migration windows through the Arabian Peninsula around 60,000 years ago, align with genetic signals of adaptation and isolation in the founding groups.[46] Despite debates over minor earlier dispersals, the dominant genetic signature points to the Late Pleistocene expansion as the source of global human variation outside Africa.[47]Archaic Human Admixture
Genetic evidence indicates that modern human populations outside sub-Saharan Africa carry approximately 1-2% Neanderthal-derived DNA on average, resulting from interbreeding events between Homo sapiens and Neanderthals following the out-of-Africa migration.[48] This admixture is detected through methods such as identifying long haplotype segments matching Neanderthal genomes and statistical tests for excess archaic ancestry in non-African populations.[49] Sequencing of high-coverage Neanderthal genomes from sites like Vindija Cave has confirmed that the introgressed material is not uniformly distributed, with some regions depleted due to purifying selection against deleterious variants.[50] Recent analyses of early modern human genomes from Europe, dated to over 45,000 years ago, constrain the primary Neanderthal admixture pulse to roughly 47,000-65,000 years ago, though multiple episodes may have occurred over several thousand years.[51] Denisovan admixture, identified from a finger bone in Denisova Cave, Siberia, contributes more variably to modern genomes, with the highest proportions—up to 4-6%—found in Melanesian and some Oceanian populations, reflecting interbreeding after the divergence of East Asian and Oceanian lineages.[52] Evidence supports at least two distinct Denisovan introgression events: one closely related to the Altai Denisovan specimen, affecting East Asians and Native Americans, and another more divergent pulse influencing island Southeast Asians and Oceanians.[53] These signals are inferred from shared archaic haplotypes and admixture graph modeling, with Denisovan-derived alleles often linked to high-altitude adaptation in Tibetans via the EPAS1 gene.[54] Unlike Neanderthal admixture, Denisovan contributions show geographic structure, absent or minimal in mainland Eurasians but detectable in up to 0.1-0.2% across broader Asian groups.[55] Sub-Saharan African populations exhibit signals of admixture with unidentified "ghost" archaic hominins, distinct from Neanderthals or Denisovans, based on excess archaic-like divergence in haplotype scans using statistics like S*.[56] These events likely occurred independently within Africa, with estimates suggesting 2-19% archaic contribution in some West African groups like Yoruba, though the exact proportions remain debated due to methodological challenges in distinguishing ancient structure from introgression.[57] Southern African Khoesan and Pygmy populations show additional archaic signals, potentially from multiple ghost lineages diverging before the Neanderthal-modern human split around 600,000-800,000 years ago.[58] Such admixture complicates models of human origins, indicating recurrent gene flow with diverse archaic groups rather than a single out-of-Africa bottleneck devoid of back-mixing.[59] Overall, archaic introgression has introduced adaptive alleles—such as those for immunity and skin pigmentation—while contributing to modern genetic diversity, with negative selection removing much maladaptive material over time.[60]Insights from Ancient DNA
Ancient DNA (aDNA) sequencing has enabled direct examination of genetic variation in prehistoric human populations, revealing dynamic changes in allele frequencies, population structures, and admixture events that shaped modern human diversity beyond what modern genomes alone can infer. By analyzing thousands of ancient genomes spanning from the Upper Paleolithic to the medieval period, studies have identified distinct ancestral components and turnover events, such as the replacement of up to 90% of Neolithic farmer ancestry in parts of Europe by incoming steppe pastoralists around 5,000–4,000 years ago. These findings underscore how migrations and cultural transitions, like the spread of farming and pastoralism, drove genetic discontinuities rather than gradual isolation by distance in many regions.[61][62] In Eurasia, aDNA documents multiple waves of population movement and replacement; for instance, Early Neolithic farmers from Anatolia contributed ancestry to modern Europeans, but subsequent Bronze Age incursions from the Pontic-Caspian steppe introduced Indo-European languages alongside Y-chromosome haplogroups like R1b and R1a, which dominate today in Western and Eastern Europe, respectively. In East Asia, genomes from the Neolithic period indicate a southward migration and admixture around 6,000–4,000 years ago, blending northern and southern ancestries to form the genetic basis of diverse modern groups, with evidence of endogamy and local adaptations in island populations like those in the Aegean. African aDNA further reveals deep substructure, with ancient North African genomes from 15,000–7,500 years ago showing isolation and continuity in some lineages, while sub-Saharan samples highlight early divergences predating Out-of-Africa expansions.[63][64][65][66] aDNA also illuminates natural selection acting on genetic variants post-migration; for example, alleles for lactase persistence (LCT gene) rose rapidly in Europe and pastoralist groups after dairy farming's advent around 7,000 years ago, while immune-related loci like HLA show frequency shifts driven by pathogen exposure, with Neanderthal-derived variants maintained under balancing selection in some ancient cohorts. In the Americas and Oceania, limited but growing aDNA datasets confirm serial founder effects reducing diversity during Holocene expansions, with admixture from archaic sources varying regionally. These temporal snapshots demonstrate that human genetic variation reflects episodic admixture and selection rather than equilibrium models, challenging prior assumptions of static population boundaries.[67][5][61] Recent analyses of over 900 ancient Eurasian genomes have uncovered thousands of variants absent or rare in modern populations, indicating loss of diversity through bottlenecks and drift, with effective population sizes fluctuating from lows of ~1,000–2,000 during glacial maxima to expansions post-Last Glacial Maximum. Such data refute notions of uniform genetic continuity, instead evidencing causal links between environmental pressures, mobility, and variant fixation, as seen in pigmentation genes (e.g., SLC45A2) selected for lighter skin in northern latitudes among ancient Europeans.[5][68]Population Structure
Genetic Clustering
Genetic clustering in human populations refers to the grouping of individuals based on shared patterns of genetic variation, typically identified through statistical methods that reveal discrete or semi-discrete ancestral components despite continuous geographic gradients.[69] These clusters emerge from differences in allele frequencies across loci, reflecting historical isolation, migration, and admixture.[70] Methods such as principal component analysis (PCA) and model-based clustering using software like STRUCTURE analyze multilocus genotype data to infer population structure.[31] A seminal study by Rosenberg et al. (2002) genotyped 1,056 individuals from 52 populations at 377 autosomal microsatellite loci, finding that 93-95% of genetic variation occurs within populations, while 3-5% differentiates major continental groups.[69] Using STRUCTURE, the analysis inferred five to six primary clusters at varying levels of assumed population number (K), corresponding approximately to sub-Saharan Africans, Europeans (including Middle Easterners), East Asians, Pacific Islanders (Melanesians), Native Americans, and Central/South Asians.[70] This clustering was robust across different numbers of loci and populations sampled, though admixture blurred boundaries, with many individuals showing mixed ancestry.[69] Principal component analysis of single nucleotide polymorphisms (SNPs) from large-scale datasets, such as the Human Genome Diversity Project or 1000 Genomes Project, consistently reproduces continental-scale clusters, with the first few principal components capturing 0.1-1% of total variation but aligning strongly with geography.[71] For instance, PCA plots separate Africans, Europeans, East Asians, and South Asians along PC1 and PC2 axes, with Oceanic and Native American groups forming distinct branches.[72] These patterns arise because, although most variation is within groups (per Lewontin's 1972 observation of ~85% within populations), the remaining inter-group differences involve correlated alleles across many loci, enabling accurate ancestry assignment even at low differentiation levels (F_ST ~0.10-0.15 between continents).[69] Substructure within continents is also evident; for example, STRUCTURE at higher K values resolves finer clusters like Northern vs. Southern Europeans or Bantu vs. Pygmy Africans.[73] Recent whole-genome sequencing of diverse cohorts, including 929 high-coverage genomes from 54 populations, confirms these hierarchies, with admixture proportions traceable to source clusters via tools like ADMIXTURE.[5] While clinal variation exists due to gene flow, clustering persists because isolation by distance and founder effects concentrate specific variants, allowing forensic and medical applications to predict biogeographic ancestry with >99% accuracy for major groups using hundreds of ancestry informative markers.[71] Critics arguing against biological race often emphasize within-group variance, but empirical clustering data demonstrate that human genetic diversity organizes into hierarchically nested groups mirroring migration history, independent of social constructs.[72]Geographic Patterns
Human genetic variation exhibits pronounced geographic patterns, with genetic dissimilarity increasing as a function of physical distance between populations, a phenomenon known as isolation by distance. This results from restricted gene flow due to geographic barriers and limited migration, allowing genetic drift and local selection to accumulate differences over time. Studies using genome-wide single nucleotide polymorphisms (SNPs) confirm that genetic correlations decay exponentially with geographic separation on continental scales, though sharper discontinuities occur across major barriers like oceans.[74][75][76] Continental-scale differentiation is evident in fixation index (FST) values, which quantify the proportion of genetic variance attributable to differences between groups. For example, pairwise FST between African, European, and East Asian populations typically ranges from 0.10 to 0.15, indicating that approximately 10-15% of total human genetic variation occurs between these broad continental clusters, far exceeding within-group differences in structured analyses. These values derive from large-scale genotyping of hundreds of thousands of SNPs across diverse cohorts, underscoring the role of historical migrations and isolation in shaping inter-population divergence.[77][22] Principal component analysis (PCA) of whole-genome data further illustrates these patterns, revealing that the primary axes of variation align closely with geographic coordinates. The first principal component often separates sub-Saharan Africans from non-Africans, while subsequent components distinguish Europeans from East Asians and other groups, with clusters forming along latitudinal and longitudinal gradients. This geographic structuring persists even after accounting for admixture, as demonstrated in analyses of over 1,000 individuals from global populations, where Euclidean genetic distances mirror great-circle geographic distances.[28][78][79] Genetic diversity metrics, such as heterozygosity and allele richness, peak in African populations and decline progressively with distance from East Africa, consistent with serial founder effects during the out-of-Africa expansion around 60,000-70,000 years ago. Within continents, clinal variation predominates, but inter-continental comparisons show steeper gradients; for instance, variant frequency spectra in a variant-centric framework highlight continent-specific allele patterns, with rare variants more localized to their origin regions. These patterns hold across datasets like the 1000 Genomes Project, which sampled 2,504 individuals from 26 populations, affirming geography's dominant influence on neutral and functional variation alike.[80][81][82]Gene Flow and Barriers
Gene flow, the transfer of genetic alleles between human populations through migration and interbreeding, counteracts genetic divergence driven by drift and local selection, thereby shaping patterns of human genetic variation.[78] Historical gene flow in humans occurred via episodic migrations, such as the Out-of-Africa expansion and subsequent dispersals, but remained limited by barriers that preserved differentiation, as evidenced by elevated FST values across geographic divides.[83] Geographic features have imposed strong barriers to gene flow throughout human history. The Sahara Desert, aridified approximately 5,000 years ago, has restricted exchange between North and sub-Saharan African populations, yielding distinct autosomal and uniparental genetic signatures on either side, with minimal shared ancestry post-aridification except via trans-Saharan routes.[84][85] Similarly, the Tibetan Plateau functions as a barrier in East Asia, genomic data showing northern populations with higher Tibetan ancestry and southern ones with greater East Asian components, alongside reduced effective migration rates across the high-elevation divide.[86] Oceans and mountain ranges, such as the Himalayas and Andes, further isolated continental populations for millennia, limiting interbreeding until maritime expansions around 500 years ago. Isolation by distance manifests as a clinal decrease in genetic similarity with geographic separation, a pattern confirmed in global datasets where genetic differentiation rises predictably with distance under limited long-range dispersal.[76][83] This reflects step-wise migration and local mate choice, with effective gene flow decaying exponentially beyond tens of kilometers, as modeled in human SNP and ancient DNA analyses spanning Eurasia to the Americas. Cultural and social practices have reinforced barriers through endogamy, curtailing gene flow even within admixed regions. In India, caste endogamy, established over 2,000-3,000 years ago, has produced marked genetic stratification; despite common mixture of Ancestral North Indian (related to West Eurasians) and Ancestral South Indian ancestries around 1,900-4,200 years ago, castes exhibit differential admixture proportions and elevated differentiation (FST up to 0.05-0.1 between groups).[87][88] Upper castes show reduced Ancestral South Indian ancestry due to enforced isolation, amplifying founder effects and disease allele frequencies. Religious endogamy in groups like Samaritans or certain Indo-European isolates similarly sustains distinct haplotypes, with inbreeding coefficients (F) exceeding 0.01 in some cases. Isolated populations, such as those on remote islands or endogamous groups like the Amish, further illustrate how founder effects, genetic drift, and natural selection shape adaptation; small founding populations reduce overall genetic variation, amplify random allele frequency changes, and elevate the prevalence of specific alleles, including advantageous ones for local environments and deleterious variants leading to higher disorder rates.[89] Ancient DNA corroborates barrier effects, revealing isolation-by-distance zones in Mesolithic Eurasia from Central Europe to Siberia, interrupted by admixture events but sustained by topographic and climatic constraints.[90] Modern globalization has eroded many barriers, elevating admixture rates—evident in increased intermediate ancestries in urban populations—but residual social endogamy and geographic isolation in remote areas continue to influence local variation.[91]Ancestry Categorization
Ancestry Informative Markers
Ancestry informative markers (AIMs) are genetic variants, typically single nucleotide polymorphisms (SNPs), characterized by substantial allele frequency differences between human populations, often quantified using the fixation index (FST), where values exceeding 0.15 indicate high informativeness.[92] [93] These markers enable probabilistic inference of an individual's biogeographic ancestry by leveraging population-specific allele distributions, distinguishing continental origins with panels as small as 24 SNPs achieving over 99% accuracy for broad categorizations in diverse datasets.[94] Selection of AIMs involves screening genome-wide data for loci with maximal frequency divergence, such as Δ allele frequency thresholds or high FST, prioritizing autosomal biallelic SNPs to minimize linkage disequilibrium effects and ensure portability across studies.[95] [96] Specialized panels have been developed for targeted applications, including a 446-marker set optimized for Latin American admixed populations to estimate European, African, and Native American contributions, and African-focused AIMs for fine-scale sub-Saharan structure.[95] [97] While SNPs dominate due to their abundance and genotyping ease, insertion-deletion variants (INDELs) and microhaplotypes serve as complementary AIMs for enhanced resolution in forensics.[98] In ancestry categorization, AIMs facilitate admixture mapping and self-reported ancestry validation by modeling individual genomes as mixtures of reference population allele frequencies, often via maximum likelihood or Bayesian methods.[99] Applications extend to forensics, where AIM panels predict biogeographic ancestry from trace DNA to aid suspect prioritization, with machine learning integrations improving accuracy for multi-ancestry inference.[100] [101] In medicine, they correct for population stratification in genome-wide association studies (GWAS) by adjusting for cryptic ancestry, reducing false positives in polygenic risk assessments across diverse cohorts.[102] Despite high utility for continental-level assignments, AIMs exhibit limitations in resolving fine-scale or highly admixed ancestries due to gene flow and shared drift, necessitating integration with dense genomic data for precision.[103]Principal Component Analysis
Principal component analysis (PCA) is a multivariate statistical technique that transforms high-dimensional genetic data, such as allele frequencies at hundreds of thousands of single nucleotide polymorphisms (SNPs), into a lower-dimensional space by identifying orthogonal axes of maximum variance. In human population genetics, PCA processes genotype matrices to project individuals onto principal components (PCs), enabling visualization of genetic similarities and differences without assuming predefined population labels.[28] Applied to genome-wide SNP datasets from projects like the 1000 Genomes Project, PCA consistently reveals distinct clusters corresponding to major continental ancestries, with sub-Saharan Africans separated along PC1 from non-Africans due to reduced genetic diversity outside Africa, and Europeans differentiated from East Asians along PC2.[104][105][106] These patterns reflect historical demographic events, including the out-of-Africa expansion and subsequent regional divergences, accounting for approximately 1-2% of total genomic variation between continents.[104] Higher-order PCs uncover subcontinental structure, such as east-west or north-south gradients within Eurasia, correlating with isolation-by-distance and admixture events; for example, in European datasets, PC1 often aligns with a latitudinal cline from southern to northern populations.[107][108] PCA thus serves as a foundational tool for inferring individual ancestry proportions, detecting cryptic relatedness, and correcting for stratification in genome-wide association studies (GWAS).[109] Despite its utility, PCA outcomes depend on factors like sample size imbalances, SNP selection, and linkage disequilibrium pruning, which can distort clusters and genetic distance estimates; analyses show that uneven sampling may artifactually position populations like South Asians variably between European and East Asian groups.[28] Only about 12% of human genetic variation occurs between continental populations, emphasizing that while PCA highlights broad structure, it does not capture the full spectrum of local adaptation or rare variants.[28] Advanced implementations, such as fast PCA algorithms, enhance computational efficiency for large-scale datasets, revealing signals of selection along PC axes.[110]Applications in Forensics and Medicine
Human genetic variation, especially polymorphisms like short tandem repeats (STRs) and single nucleotide polymorphisms (SNPs), enables DNA profiling in forensics for individual identification, familial relationships, and linking suspects to crime scenes. Autosomal STR markers, analyzed via polymerase chain reaction (PCR) amplification, produce unique profiles with match probabilities often exceeding 1 in 10^18 for unrelated individuals, forming the basis of systems like CODIS in the United States.[111] SNPs complement STRs in challenging samples, such as degraded DNA, due to their shorter amplicon requirements and utility in massively parallel sequencing, with applications in forensic investigative genetic genealogy (FIGG) for identifying unknown remains or perpetrators via kinship matching.[112] [113] Ancestry informative markers (AIMs), subsets of SNPs showing large allele frequency differences across populations, aid forensic investigations by estimating biogeographical ancestry, which narrows suspect pools or aids in victim identification when direct matches fail. Panels of 128 to over 1,000 AIMs can predict continental origins with accuracies above 99% for broad categories like African, European, or East Asian, though finer subcontinental resolution remains probabilistic due to admixture.[94] [114] Y-chromosomal STRs and SNPs further trace paternal lineages, useful in cases involving male-specific evidence.[115] In medicine, genetic variation underpins pharmacogenomics, guiding drug selection and dosing to optimize efficacy and minimize adverse reactions based on variants in genes encoding drug-metabolizing enzymes, transporters, and targets. For instance, the HLA-B*57:01 allele, present in about 5-8% of Europeans but rarer in other groups, predicts hypersensitivity to abacavir, an antiretroviral, prompting pre-treatment screening that reduces severe reactions by over 50%.[116] CYP2C19 variants influence clopidogrel metabolism, with poor metabolizers (e.g., carrying *2 or *3 alleles) facing 2-3 fold higher cardiovascular event risks, leading to alternative therapies like prasugrel.[117] TPMT and NUDT15 polymorphisms affect thiopurine dosing in leukemia treatment, where deficient variants necessitate 10- to 100-fold reductions to avoid myelotoxicity.[117] Population-specific variant frequencies inform ancestry-adjusted risk models in personalized medicine, such as elevated APOL1 variants in African ancestry conferring kidney disease susceptibility, or PCSK9 loss-of-function mutations more common in certain European groups enhancing statin response.[12] Whole-genome sequencing increasingly integrates such variants for polygenic risk scores in disease prediction, though clinical utility varies by trait heritability and environmental confounders.[118] Forensic and medical applications converge in biobanking and identification of disaster victims, where genetic profiles ensure accurate linkage to health records.[119]Phenotypic and Functional Effects
Impacts on Protein Function
Human genetic variation in protein-coding sequences primarily affects function through nonsynonymous single nucleotide variants (nsSNVs), insertions, deletions, and copy number changes that alter the amino acid composition or length of polypeptides. Missense variants, the most common nsSNVs, substitute one amino acid for another, potentially disrupting secondary and tertiary structures, thermodynamic stability, catalytic sites, ligand-binding interfaces, or protein-protein interactions.[120] Nonsense variants introduce premature termination codons, typically yielding truncated, nonfunctional proteins via nonsense-mediated decay or aggregation.[121] Frameshift indels shift the reading frame, often producing aberrant polypeptides with loss-of-function (LoF) consequences.[122] Whole-genome sequencing reveals that each individual harbors approximately 100-250 predicted LoF variants across protein-coding genes, with tolerance enabled by diploidy, genetic redundancy, and compensatory mechanisms rather than inherent neutrality.[122] Deleterious missense variants, comprising about 20% of coding variants per genome, frequently impair folding kinetics or stability, as quantified by changes in Gibbs free energy (ΔΔG > 1 kcal/mol often indicating destabilization).[123] Computational predictors like AlphaMissense, trained on human and primate variation alongside structural data, classify ~80% of possible missense changes as benign, ~14% as pathogenic, and the rest ambiguous, with pathogenic ones enriching in evolutionarily constrained residues.[124] Beyond simple LoF, variants can elicit gain-of-function (GoF) effects by enhancing activity, altering specificity, or enabling ectopic expression, or dominant-negative interference where mutants sequester wild-type subunits in nonproductive complexes.[123] For instance, structural analyses show nsSNVs at protein interfaces reduce binding affinity by up to 10-fold in ~30% of cases, disrupting quaternary assemblies essential for complexes like hemoglobin or ion channels.[125] High-throughput saturation mutagenesis across 500 human protein domains confirms that tolerated variants cluster in flexible loops, while disruptive ones hit cores or functional motifs, aligning predictions with fitness costs in cellular assays.[126] Population-level variation modulates these impacts, with rare alleles (<1% frequency) disproportionately deleterious due to purifying selection, whereas common nsSNVs often reflect neutral drift or historical adaptation, as evidenced by higher LoF burdens in out-of-Africa populations from serial founder effects.[121] Functional assays underscore that ~10-20% of common missense variants subtly alter enzyme kinetics or allosteric regulation without overt pathology, contributing to quantitative trait variation.[127] These effects aggregate across the proteome, where even mild per-protein perturbations can yield emergent phenotypes under environmental stressors.[120]Complex Traits and Heritability
Complex traits encompass phenotypes such as height, body mass index (BMI), and cognitive abilities, which arise from the combined effects of multiple genetic loci and environmental influences. Heritability (h²) measures the fraction of phenotypic variance in a population explained by genetic variance, distinct from the notion of trait determination in individuals. In twin studies, monozygotic twins, sharing nearly 100% of their genetic material, exhibit greater similarity for these traits than dizygotic twins, who share about 50%, yielding broad-sense heritability estimates that include additive, dominance, and epistatic effects.[128] Narrow-sense heritability, focusing on additive genetic variance, is estimated via methods like genomic restricted maximum likelihood (GREML) or linkage disequilibrium score regression applied to GWAS data.[129] For height, twin and family studies consistently report heritability around 0.80 to 0.90, indicating that genetic factors account for most variation in well-nourished populations.[130] GWAS have identified over 12,000 variants explaining approximately 40% of height variance as of 2023, with the gap attributed partly to rare variants and incomplete linkage disequilibrium capture.[130] BMI heritability varies from 0.40 to 0.70 across studies, influenced by age, sex, and population; for instance, estimates are higher in adults (around 0.70) than children, reflecting gene-environment interactions like dietary availability.[131] Cognitive traits, including intelligence quotient (IQ), show heritability of 0.50 to 0.80 in adulthood from twin studies, rising with age as environmental influences equalize.[132] The "missing heritability" refers to the discrepancy where SNP-based estimates from early GWAS captured only 20-30% of twin-derived h² for many traits, now partially resolved for height and BMI through larger sample sizes and polygenic scoring.[133] Remaining gaps stem from rare and structural variants, epistatic interactions, and imperfect tagging of causal loci by common SNPs.[134] Heritability estimates are population-specific and context-dependent; for example, they decline under strong selection or bottlenecks, as genetic drift reduces variance.[135] These findings underscore that while genetics substantially shapes complex trait variation, environmental modulation and non-additive effects must be considered in causal models.[136]Adaptation and Selection Pressures
Human populations have encountered diverse environmental challenges since migrating out of Africa approximately 60,000–100,000 years ago, resulting in localized genetic adaptations driven by natural selection.[137] These adaptations are evident in genomic signatures of recent positive selection, such as allele frequency sweeps, elevated population differentiation (FST), and reduced nucleotide diversity around selected loci.[138] Selection pressures include climate, diet, altitude, and pathogens, with evidence from genome-wide scans identifying hundreds of loci under selection in the past 10,000–50,000 years.[139] Dietary adaptations exemplify rapid evolutionary responses to cultural practices. Lactose persistence, enabling adult digestion of milk, arose independently in pastoralist populations through mutations in the MCM6 enhancer of the LCT gene, with the European variant dated to around 7,500 years ago coinciding with dairy farming spread.[140] Similar alleles occur in East African and Middle Eastern groups, reflecting convergent selection for caloric exploitation in herding societies.[137] Climatic pressures have shaped pigmentation and thermoregulation. Lighter skin in northern latitudes, facilitated by alleles in SLC24A5 and SLC45A2, enhances vitamin D synthesis under low UV exposure, with selection signals strongest in Europeans dated to 10,000–20,000 years ago.[138] In East Asians, variants in EDAR influence straight hair, shovel-shaped incisors, and increased sweat glands, likely adapting to cold, dry environments via altered ectodermal development.[137] High-altitude hypoxia selected for specialized oxygen-handling genes. Tibetans carry an EPAS1 haplotype inherited from Denisovans, reducing hemoglobin overproduction and dated to 3,000–5,000 years ago, while Andeans evolved distinct EGLN1 mutations for similar physiological benefits.[141] These independent adaptations highlight polygenic responses to low-oxygen stress without convergent genetic changes.[140] Pathogen exposure drove resistance alleles via balancing or positive selection. The HBB sickle-cell variant (rs334) persists at 10–20% frequency in malaria-endemic African regions, conferring heterozygote protection against Plasmodium falciparum, with selection estimates of 1–15% fitness advantage.[137] Duffy-null FY*0 alleles in West Africans block Plasmodium vivax entry, nearly fixing under selection, while G6PD deficiencies provide broad malaria resistance across Africa, the Mediterranean, and Asia.[138] The European CCR5-Δ32 deletion, at 5–15% frequency, likely selected by smallpox or plague, incidentally confers HIV resistance.[139] Ongoing selection persists despite modern interventions, with scans detecting signals for height, immune response, and reproduction in contemporary populations, though weakened by medicine and migration.[142] These adaptations underscore how genetic variation enables survival in varied niches, with incomplete sweeps reflecting standing variation and gene flow.[143]Health and Disease Implications
Monogenic Disorders
Monogenic disorders arise from pathogenic variants in a single gene, leading to disrupted protein function or expression with typically high penetrance and adherence to Mendelian inheritance patterns.[144] Unlike polygenic conditions, these disorders often manifest predictably based on the variant's dominance and zygosity, with prevalence generally low but varying by population due to allele frequency differences shaped by historical bottlenecks and founder effects.[145] Over 10,000 such disorders have been identified, many cataloged in resources like OMIM, though only a subset exceed a population frequency of 1:20,000.[145] Inheritance modes include autosomal dominant, where a single heterozygous variant suffices (e.g., Huntington's disease via HTT CAG repeat expansion, affecting ~5-10 per 100,000 in Western populations); autosomal recessive, requiring biallelic variants (e.g., cystic fibrosis from CFTR mutations, with carrier rates up to 1:25 in Europeans); and X-linked, often recessive in males (e.g., Duchenne muscular dystrophy via DMD deletions).[146] Sickle cell anemia, caused by a homozygous HBB Glu6Val substitution, exemplifies recessive inheritance with heterozygote advantage against malaria, yielding carrier frequencies of 10-40% in sub-Saharan African-descended groups.[147] Tay-Sachs disease, due to HEXA variants impairing ganglioside degradation, shows elevated incidence (1:3,600 births) among Ashkenazi Jews from founder mutations like the 1278insTATC, tracing to medieval population bottlenecks.[148] These patterns underscore how genetic variation—particularly rare loss-of-function alleles—concentrates in isolated groups, amplifying disease risk without invoking selection for heterozygote benefits in all cases.[149] Diagnosis relies on sequencing the candidate gene or exome, with newborn screening programs detecting conditions like phenylketonuria (PAH variants) in over 50 U.S. states since the 1960s, preventing intellectual disability via dietary intervention.[150] Treatments historically manage symptoms, but advances in gene therapy target root causes: ex vivo editing of hematopoietic stem cells corrected BCL11A-enhanced fetal hemoglobin in sickle cell trials, yielding FDA-approved Casgevy (exagamglogene autotemcel) in December 2023 for severe cases.[151] CRISPR-based base editing shows promise for precise correction in disorders like cystic fibrosis, with preclinical models restoring CFTR function in airway epithelia as of 2024, though delivery challenges and off-target risks persist.[152] Population-specific variant spectra necessitate tailored screening, as European-biased databases may underrepresent non-European alleles, affecting global equity in precision medicine.[153]| Disorder | Gene | Inheritance | Key Variant Example | Population Prevalence Notes |
|---|---|---|---|---|
| Cystic Fibrosis | CFTR | Autosomal Recessive | ΔF508 deletion | 1:2,500-3,500 in Europeans; lower elsewhere[148] |
| Sickle Cell Anemia | HBB | Autosomal Recessive | Glu6Val (rs334) | 1:365 births in African Americans; heterozygote advantage in malaria zones[147] |
| Tay-Sachs Disease | HEXA | Autosomal Recessive | 1278insTATC | 1:3,600 in Ashkenazi Jews due to founder effect[149] |
| Huntington's Disease | HTT | Autosomal Dominant | CAG repeat >36 | 5-10:100,000 globally, uniform in Europeans[150] |
Polygenic Risks and GWAS
Genome-wide association studies (GWAS) systematically scan the genomes of large cohorts to identify single nucleotide polymorphisms (SNPs) associated with complex traits and diseases by comparing allele frequencies between cases and controls.[154] These studies have identified thousands of loci contributing to polygenic traits, explaining a portion of heritability for conditions such as type 2 diabetes, coronary artery disease, and schizophrenia, with effect sizes typically small per variant.[155] Since the first major GWAS in 2007, sample sizes have expanded to millions, enhancing statistical power and enabling discovery of variants with subtler effects, as demonstrated in meta-analyses aggregating data across biobanks like UK Biobank.[156] However, associations reflect correlation rather than direct causation, necessitating functional validation through methods like colocalization with expression quantitative trait loci.[157] Polygenic risk scores (PRS) aggregate the weighted effects of GWAS-identified variants to estimate an individual's genetic liability for a trait, often improving risk prediction beyond single loci.[158] For instance, PRS for breast cancer, derived from over 300 loci, can stratify lifetime risk, with high-score individuals facing up to threefold elevated odds compared to low-score counterparts in validation cohorts.[159] In clinical contexts, PRS augment traditional factors like family history for diseases including prostate cancer and atrial fibrillation, though standalone discriminative accuracy remains modest, with area under the curve values around 0.6-0.7 for many traits.[154] Recent advances, such as multi-ancestry GWAS incorporating diverse populations, aim to mitigate biases from European-centric training data, which currently limit PRS portability.[160] Despite progress, PRS face challenges including population stratification, where unaccounted ancestry differences inflate false associations, and linkage disequilibrium heterogeneity across groups, reducing predictive accuracy in non-European ancestries by 20-50% for traits like height or educational attainment.[161] [162] For example, European-derived PRS explain only 10-20% of variance in African ancestry samples for schizophrenia, compared to 7-10% in Europeans, highlighting the need for ancestry-specific models.[163] Environmental interactions and missing heritability from rare variants or structural changes further constrain utility, with GWAS capturing less than half of estimated SNP heritability for most complex diseases.[164] Ongoing efforts, including deep learning integrations and pathway-enriched scores, seek to enhance resolution and generalizability as of 2024.[165] [166]Population-Specific Medical Outcomes
Human genetic variation manifests in population-specific medical outcomes through differences in allele frequencies for disease-associated variants and pharmacogenomic loci, influencing disease susceptibility, severity, and therapeutic responses. For instance, certain monogenic disorders exhibit elevated prevalence in discrete ancestral groups due to founder effects and historical selection pressures, such as the hemoglobin S mutation underlying sickle cell disease, which confers heterozygote advantage against malaria and reaches carrier frequencies of approximately 1 in 13 among African Americans, resulting in disease incidence of about 1 in 365 births in this group.[167][168] Similarly, Tay-Sachs disease, caused by mutations in the HEXA gene, has a carrier rate of 1 in 27 among Ashkenazi Jews, far exceeding rates in other populations, attributable to historical bottlenecks in this group.[169][170] Isolated populations like the Amish illustrate founder effects from small founding groups, leading to reduced genetic variation and elevated prevalence of certain monogenic disorders such as Ellis-van Creveld syndrome, while enabling studies of unique genetic patterns with implications for health maintenance through endogamy.[171] Genomic studies of Native Hawaiians reveal population-specific susceptibilities to conditions like type 2 diabetes and obesity, influenced by genetic factors shaped by historical isolation and environmental interactions, highlighting adaptations and risks in such groups.[172] In pharmacogenomics, allele frequency disparities lead to varied drug efficacy and adverse event risks. The HLA-B*15:02 allele, prevalent in Han Chinese (up to 8-12%) and other Southeast Asian populations but rare in Europeans and Africans, strongly predicts carbamazepine-induced Stevens-Johnson syndrome/toxic epidermal necrolysis, with odds ratios exceeding 100 in affected cohorts; prospective screening in at-risk groups has reduced incidence by avoiding the drug in carriers.[173][174] CYP2D6 variants, which metabolize drugs like codeine and tamoxifen, show poor metabolizer phenotypes in 5-10% of Europeans but only 0.4-1% of East Asians, potentially leading to undertreatment or toxicity; ultra-rapid metabolizers, conversely, are more common in some Ethiopian and Middle Eastern groups, risking overdose from standard doses.[175][176] For complex traits, APOL1 high-risk variants (G1 and G2) are nearly exclusive to individuals of recent African ancestry, with two-copy carriers comprising 13-15% of African Americans and explaining up to 70% of the excess risk for nondiabetic chronic kidney disease in this population compared to Europeans, via mechanisms like podocyte toxicity and inflammatory dysregulation.[177][178] Understanding genetic adaptation in isolated populations, such as through founder effects and drift in groups like the Amish or Native Hawaiians, informs personalized medicine by enabling targeted screening and interventions for prevalent conditions, enhancing public health strategies in these communities.[179] These patterns underscore that while genetic variation is predominantly within-population (over 85-90%), systematic between-group differences in actionable variants necessitate ancestry-informed clinical strategies, as evidenced by guidelines from bodies like the Clinical Pharmacogenetics Implementation Consortium recommending preemptive genotyping for high-risk ancestries.[180] Such approaches have improved outcomes, though implementation lags due to equitable access challenges.[181]Intergroup Differences
Between-Population Variation
Between-population genetic variation in humans reflects historical demographic processes including serial founder effects during migrations out of Africa, genetic drift in isolated groups, local adaptation to environmental pressures, and subsequent gene flow through admixture. Major continental populations—such as those of sub-Saharan Africa, Europe, East Asia, and the Americas—display differentiated allele frequency distributions at thousands of loci across the genome.[69] These differences accumulate due to reduced gene flow across geographic barriers, with sub-Saharan African populations retaining the highest overall diversity as the source of modern human ancestry.[182] The fixation index (FST), which quantifies the proportion of genetic variance attributable to differences between populations, averages 0.10 to 0.15 for pairwise comparisons between continental groups.[77] For instance, FST between East Asians and Europeans is approximately 0.10, while values involving sub-Saharan Africans are higher, around 0.15 to 0.19, consistent with greater divergence times and the out-of-Africa bottleneck that reduced non-African diversity.[77] Human maximum pairwise FST values reach approximately 0.25 (e.g., between African and Oceanian populations), lower than in some mammal subspecies like tigers (FST ~0.20–0.30) or chimpanzees (FST ~0.20–0.40) due to the recent divergence time (~200–300,000 years), ongoing gene flow, and clinal rather than discrete population splits during the Out-of-Africa expansion and subsequent migrations; this overlapping variation precludes formal subspecies recognition in humans.[184] Approximately 12% of total human genetic variation occurs between continental populations, with the remainder partitioned within populations or subpopulations.[77] Analyses of genome-wide data, such as principal component analysis (PCA), reveal discrete clusters aligning with continental ancestries, where the first few principal components capture over 90% of between-group variance and separate individuals by geographic origin with minimal misclassification.[28] Genetic ancestry inference using ancestry-informative markers achieves accuracies exceeding 95% for assigning individuals to broad continental categories, enabling forensic and medical applications.[185] These patterns follow isolation by distance, with genetic dissimilarity increasing predictably with physical separation, though punctuated by admixture events like those introducing Neanderthal DNA to non-Africans or Denisovan to Oceanians.[186] Despite comprising only 10-15% of total variation, between-population differences are structured and functionally significant, influencing allele frequencies for traits under selection, such as lactase persistence in Europeans or high-altitude adaptation in Tibetans.[187] Population-specific variant frequencies also contribute to differential disease risks, underscoring the biological reality of group-level genetic distinctions.[188]Evidence for Genetic Contributions to Traits
Twin studies and meta-analyses provide robust evidence that genetic factors substantially influence variation in human traits. A 2015 meta-analysis aggregating data from 2,748 twin studies encompassing 14.5 million twin pairs estimated the broad-sense heritability—the proportion of phenotypic variance attributable to genetic differences—at 49% across 17,804 traits, ranging from physical characteristics to behavioral and cognitive measures.[189] Narrow-sense heritability, reflecting additive genetic effects, is similarly high for many traits; for example, height exhibits heritability estimates of 80% or more in adulthood, corroborated by family and adoption designs that control for shared environments.[190] These estimates hold across diverse populations and increase with age for cognitive traits, from 20% in infancy to 80% in later adulthood, indicating developmental stabilization of genetic influences.[191] Genome-wide association studies (GWAS) offer molecular corroboration by linking specific single-nucleotide polymorphisms (SNPs) to trait variance. For complex traits, GWAS have identified thousands of loci; in height, over 700 variants explain approximately 40% of heritability, demonstrating polygenicity where many small-effect alleles cumulatively drive differences.[192] Polygenic scores (PGS), aggregating these variants' effects, predict 10-20% of variance in traits like body mass index and educational attainment in independent cohorts, with predictive power validated in within-family designs that minimize confounding from population stratification or assortative mating.[193] For intelligence, recent GWAS meta-analyses of over 3 million individuals have pinpointed loci explaining up to 10% of variance via PGS, aligning with twin-based heritability and rejecting purely environmental causation.[194] Evidence extends to between-population comparisons, where PGS derived from European-ancestry GWAS predict trait differences in non-European groups, albeit with reduced accuracy due to allele frequency variation and linkage disequilibrium differences.[195] For educational attainment, PGS account for systematic mean differences across continental ancestries that exceed what shared environmental models predict, as seen in admixture studies where genetic ancestry correlates with cognitive outcomes independent of socioeconomic status.[196] Such patterns, observed in traits under historical selection like lactase persistence or pigmentation, underscore causal genetic roles in intergroup phenotypic disparities, though environmental interactions and ascertainment biases in GWAS warrant ongoing scrutiny.[197] Adoption and transnational studies further support this by showing persistent trait gaps tied to biological origins rather than rearing environments.[198]Intelligence, Behavior, and Physical Differences
Human genetic variation contributes substantially to individual differences in intelligence, with twin studies and meta-analyses estimating heritability at around 50% in adulthood, increasing from lower values in childhood due to gene-environment interactions.[194] [193] Genome-wide association studies (GWAS) have identified over 1,000 genetic loci associated with intelligence, confirming its polygenic architecture where thousands of common variants each exert small effects, collectively accounting for 10-20% of variance in polygenic scores within European-ancestry populations.[191] [199] These scores predict cognitive performance across independent samples, with differences between deciles corresponding to 10-15 IQ points, underscoring causal genetic influences beyond shared environment.[196] Evidence for genetic contributions to between-population differences in intelligence remains contentious but supported by polygenic scores for cognitive traits, which vary systematically across continental ancestries and correlate with observed mean IQ disparities after controlling for socioeconomic factors in some analyses.[200] For instance, higher average polygenic scores for educational attainment—a proxy for intelligence—align with elevated performance in East Asian and Ashkenazi Jewish groups relative to others, though direct causation is complicated by historical selection pressures and potential GWAS biases toward European samples.[201] Critics argue environmental confounders dominate, yet the persistence of gaps despite convergence in living standards in adopted or immigrant cohorts suggests a partial genetic role.[202] Behavioral traits, including personality, exhibit moderate to high heritability, with meta-analyses of twin studies placing estimates for the Big Five traits (openness, conscientiousness, extraversion, agreeableness, neuroticism) at 40-60%, reflecting additive genetic effects on temperament and impulsivity.[203] [204] GWAS for these traits have pinpointed loci influencing neurotransmitter pathways, such as serotonin and dopamine systems, explaining up to 10% of variance in polygenic predictions, with overlaps to psychiatric risks like anxiety.[205] Population-level variations, such as higher conscientiousness-linked alleles in certain groups, may underlie cultural differences in achievement orientation, though direct cross-group comparisons are limited by sampling. Physical differences mediated by genetics include height, where heritability reaches 80% in well-nourished populations, with GWAS identifying over 700 variants explaining 40% of variance and contributing to 10-15 cm mean disparities between Northern Europeans and other groups due to selection on growth-related genes.[206] Body composition and athletic aptitude also show genetic clustering: West African-descended populations have higher frequencies of ACTN3 R-allele variants promoting fast-twitch muscle fibers, correlating with dominance in sprint events (e.g., 100m Olympic medals since 1968 disproportionately from this ancestry), while East African highlanders exhibit enrichments in endurance genes like those for oxygen efficiency.[207] These patterns persist across environments, indicating adaptation to ancestral ecologies rather than solely training or culture.[208]Controversies and Debates
Race as a Biological Proxy
Population genetic analyses reveal structured variation in human genomes that aligns with continental-scale ancestry groups, for which traditional racial categories serve as imperfect but informative proxies. Using methods like principal component analysis (PCA) on thousands of single nucleotide polymorphisms (SNPs), individuals cluster into distinct groups corresponding to African, European, East Asian, Native American, and Oceanian ancestries, reflecting historical isolation and migration patterns.[69] [209] These clusters capture 3-5% of total genetic variation between major groups, with the remainder within populations, yet the structured differences enable reliable inference of biogeographical origins from genomic data.[210] Model-based clustering algorithms, such as STRUCTURE, applied to microsatellite loci across 52 global populations, infer K=5-6 ancestry components that match continental divisions, with admixture in intermediate regions but clear modal assignments for most individuals.[69] Ancestry informative markers (AIMs)—SNPs with highly differentiated allele frequencies between groups—allow prediction of continental ancestry with over 99% accuracy using as few as 200-300 markers, demonstrating race's utility as a proxy despite clinal gradients and recent admixture.[209] For example, panels of AIMs distinguish African from non-African ancestry effectively, aiding forensic identification and population stratification correction in genome-wide association studies (GWAS).[211] In medicine, self-reported race correlates sufficiently with genetic ancestry to proxy differences in allele frequencies relevant to drug metabolism and disease risk. Variants in genes like CYP2C9 and VKORC1, which influence warfarin dosing, show frequency gradients aligning with racial groups, justifying race-based guidelines while finer ancestry estimates refine predictions.[212] Similarly, pharmacogenomic responses to drugs like codeine vary by ancestry due to CYP2D6 polymorphisms, where European-ancestry individuals have higher poor-metabolizer rates (5-10%) compared to African-ancestry (1-2%).[212] Although critics highlight admixture's imprecision, empirical validation shows self-reported race predicts AIM-inferred ancestry with 80-95% concordance for broad categories, outperforming null models in clinical utility.[213] The fixation index (F_ST), quantifying differentiation, yields values of 0.10-0.15 between continental populations, indicating moderate genetic divergence comparable to recognized subspecies in other species, though human variation's continuity tempers strict taxonomic application.[21] This structure arises from serial founder effects during Out-of-Africa migrations, amplifying drift and selection differences, as evidenced by higher F_ST for non-neutral loci under local adaptation.[21] While institutional sources often emphasize within-group variation to downplay racial differences, potentially influenced by ideological priors against hierarchy, the data substantiate race as a biologically grounded heuristic for accessing between-population genetic realities in applied contexts like epidemiology and personalized medicine.[214]Environmental vs. Genetic Explanations
Twin and family studies consistently estimate the heritability of intelligence within populations at 50-80%, indicating substantial genetic influence on individual differences, with meta-analyses of over 14 million twin pairs across thousands of traits confirming broad heritability for cognitive abilities around 50%.[189] [199] These estimates rise with age, from approximately 20-40% in childhood to 70-80% in adulthood, as environmental influences equalize while genetic effects amplify through gene-environment correlations. High within-group heritability implies that between-group differences cannot be dismissed as purely environmental without evidence of systematically divergent causal pathways, yet post-World War II scholarship often prioritized nurture-based explanations to counter eugenics associations, sometimes overlooking data favoring partial genetic causation.[200] Transracial adoption studies provide direct tests by isolating children from disparate genetic backgrounds in similar rearing environments. The Minnesota Transracial Adoption Study (1976-1992 follow-ups) found black children adopted by upper-middle-class white families had average IQs of 89 at age 17, compared to 106 for white adoptees and 99 for mixed-race adoptees, with gaps persisting or widening despite equivalent socioeconomic advantages and no evidence of prenatal or early-life deprivation explaining the disparity.[215] [216] Similar patterns emerge in other datasets, such as Korean adoptees in the U.S. achieving IQs near population norms but black adoptees lagging, critiquing claims that cultural or nutritional deficits alone account for 15-point black-white gaps observed globally.[217] Environmental interventions like improved nutrition and education have narrowed some gaps historically via the Flynn effect (3-5 IQ points per decade in developing populations), but residual differences endure after controlling for socioeconomic status, parenting quality, and lead exposure, undermining purely nurture-based models.[218] Genome-wide association studies (GWAS) and polygenic scores (PGS) offer molecular evidence, predicting 7-11% of intelligence variance within Europeans and correlating with cognitive traits across ancestries, with allele frequencies for educational attainment PGS aligning with observed national IQ differences (e.g., higher scores in East Asians vs. Europeans vs. Africans).[196] [219] Admixture studies, examining individuals with varying ancestral proportions, show IQ correlating with European genetic ancestry in African Americans (0.2-0.3 standard deviation per 10% increase), independent of skin color or self-identification proxies for discrimination.[220] Comprehensive reviews, synthesizing adoption, regression, and genetic data, estimate 50-80% of U.S. black-white IQ variance as heritable, with environmental factors like stereotype threat or test bias failing replication in controlled designs.[221] [220] Critiques of environmental primacy highlight its reliance on post-hoc correlations rather than causal mechanisms; for instance, equalizing school quality or income explains less than 10% of gaps, and cross-national data show sub-Saharan African IQs averaging 70-80 despite aid-driven development since the 1960s.[218] Anonymous surveys of intelligence researchers indicate 50% or more attribute half or greater of group differences to genetics, though public acknowledgment is rare due to institutional pressures favoring egalitarian priors over empirical patterns.[200] For physical traits, genetic-environmental partitioning is clearer—e.g., East Asian lactose intolerance stems from LP allele absence rather than dairy access—but complex behaviors like impulsivity or educational outcomes follow similar logic, with PGS and heritability converging on multifactorial causation where genes predominate in stable environments.[222] This evidence supports causal realism: environments modulate expression, but population-level genetic variation, shaped by migration and selection, underpins enduring trait disparities absent uniform global conditions.Suppression of Research and Ideological Biases
In the field of human genetic variation, investigations into potential genetic contributions to between-population differences in complex traits such as intelligence have encountered substantial resistance, often framed as a necessary safeguard against historical abuses like eugenics but resulting in self-censorship and institutional disincentives. Surveys of U.S. psychology professors reveal high levels of self-censorship on topics involving genetic or evolutionary explanations for group differences, with the strongest taboos surrounding research on genetic factors in IQ disparities across racial or ethnic groups.[223] This reluctance stems from ideological commitments prioritizing environmental explanations and egalitarian outcomes over empirical exploration, despite genomic data indicating that 10-15% of human genetic variation occurs between continental populations.[200] Prominent cases illustrate the consequences of challenging these norms. In 2007, Nobel laureate James Watson, co-discoverer of DNA's structure, suggested in interviews that genetic factors might underlie observed IQ differences between sub-Saharan Africans and Europeans, prompting the cancellation of speaking engagements, professional ostracism, and, in 2019, the revocation of his honorary titles by Cold Spring Harbor Laboratory.[224][225] Watson's remarks, while speculative, highlighted a broader pattern where hypotheses of genetic group differences trigger sanctions disproportionate to scientific merit, as evidenced by peer commentary noting the "inconvenient truth" of average IQ gaps persisting across environments.[226] The 1994 publication of The Bell Curve by Richard Herrnstein and Charles Murray further exemplifies backlash, with the book's analysis of IQ heritability and racial patterns eliciting widespread condemnation, media campaigns against its authors, and a chilling effect on subsequent funding and publication for similar inquiries.[227] Critics, often from ideologically aligned academic circles, emphasized environmental causation without refuting the data on within-group heritability (around 50-80% for IQ), yet the controversy reinforced norms against pursuing genetic hypotheses for between-group outcomes.[228] More recently, geneticist David Reich's 2018 New York Times op-ed argued that ancient DNA studies reveal biological ancestry clusters aligning with traditional racial categories and that ignoring average genetic differences risks obscuring medically relevant variation. This prompted rebukes from colleagues, including statements from genetics associations decrying race as a biological construct, despite Reich's evidence from admixture and population structure analyses showing structured genetic divergence.[229][230] Such responses underscore an institutional bias where acknowledging heritable group differences is equated with endorsing inequality, potentially hindering advances in precision medicine and trait polygenics.[231] Proponents of open inquiry contend that this suppression, while motivated by anti-racist intent, distorts scientific priorities away from causal realism toward ideological conformity, as seen in the rarity of grants exploring polygenic scores across ancestries despite their predictive power within groups.[200]Recent Developments
Pangenome Initiatives
The Human Pangenome Reference Consortium (HPRC), established in 2019 with funding from the National Human Genome Research Institute, coordinates efforts to produce at least 350 high-quality, phased diploid genome assemblies representing diverse human ancestries, using a graph-based structure to model genomic variation more comprehensively than the linear GRCh38 reference, which derives primarily from European-descent individuals.[232][233] This approach accommodates insertions, deletions, and structural variants that single-reference mappings often miss or misalign, particularly in non-European populations where alignment error rates can exceed 20% for certain loci.[234] In May 2023, the HPRC released its first draft pangenome, comprising 47 phased diploid assemblies (94 haplotypes) from individuals of African, Amish, East Asian, South Asian, and other ancestries, which identified over 119 million novel DNA base pairs and 125,000 new gene copies not in the prior reference.[235][234] These additions revealed previously undetected alleles at medically relevant sites, such as those influencing immune response genes, and improved short-read mapping accuracy by 34% on average across diverse test sets compared to GRCh38.[234][236] The graph format preserves haplotype diversity, enabling better detection of rare variants and reducing ascertainment bias in variant calling, which had historically underrepresented structural variation comprising up to 20% of human genomic differences.[234][237] Subsequent expansions target completion of the 350-genome set by incorporating telomere-to-telomere assemblies, with interim releases enhancing tools like the HPRC Data Explorer for querying variation across populations.[238] These initiatives underscore pangenomes' utility in capturing the full spectrum of human genetic diversity, including population-specific structural elements that influence trait heritability and disease risk, thereby facilitating more equitable genomic medicine applications.[239][236] Complementary projects, such as the SEN-GENOME effort in Africa, integrate local data governance to address continental underrepresentation, aligning with global aims to model causal genetic contributions without overreliance on Eurocentric baselines.[240]Advances in Sequencing and Assembly
The advent of long-read sequencing technologies has markedly improved the resolution of human genetic variation by enabling the traversal of repetitive genomic regions that short-read methods, dominant since the early 2010s, often failed to resolve. Technologies such as Pacific Biosciences' (PacBio) Single Molecule Real-Time (SMRT) sequencing with High-Fidelity (HiFi) reads and Oxford Nanopore Technologies' (ONT) ultra-long reads, which emerged prominently around 2019, produce reads exceeding 10-100 kilobases, facilitating the detection of structural variants (SVs) like insertions, deletions, and inversions that constitute a substantial portion of human interindividual differences. These approaches address limitations in next-generation short-read sequencing, where read lengths under 300 base pairs fragmented assemblies and obscured complex variants, thereby underestimating variation by up to 20-30% in repetitive loci.[241][242][243] Advancements in de novo genome assembly pipelines have paralleled these sequencing innovations, with tools like PacBio's hifiasm and ONT's Shasta assembler achieving contig N50 lengths over 50 megabases in human genomes, compared to under 1 megabase in short-read assemblies. Hybrid strategies combining long reads for scaffolding with short reads for error correction further enhance accuracy, reducing indel error rates to below 0.1% in HiFi-based assemblies. A pivotal milestone occurred in 2022 with the Telomere-to-Telomere (T2T) Consortium's assembly of the CHM13 human cell line, yielding the first gapless, end-to-end reference genome spanning 3.055 billion base pairs, which incorporated centromeric and telomeric sequences previously unresolvable.[242][244][244] By 2025, these technologies enabled haplotype-resolved assemblies of 130 haplotypes from 65 diverse human genomes, achieving median continuity of 130 megabases and closing 92% of gaps in prior references, thus revealing novel SVs and copy-number variants contributing to population-specific variation. Such assemblies improve variant calling precision, particularly for phased haplotypes that distinguish cis-trans effects in polygenic traits, and support the identification of rare variants missed in array-based or short-read genotyping, which typically capture only 80-90% of common SNPs. These developments underscore long-read sequencing's superiority for capturing the full spectrum of human genetic diversity, including non-SNP variation estimated at 10-20% of total differences between individuals.[243][243][241]Synthetic Genomics and Editing
Synthetic genomics encompasses the design and chemical synthesis of entire genomes or large chromosomal segments, enabling the recreation or modification of genetic sequences to probe biological function. In the context of human genetic variation, this approach allows for the construction of synthetic chromosomes incorporating specific allelic variants, facilitating causal inference about their phenotypic effects beyond correlative studies. A landmark effort, the Synthetic Human Genome (SynHG) project, launched in June 2025 with £10 million funding from Wellcome, aims to develop scalable DNA synthesis tools capable of assembling human-scale genomes, potentially revolutionizing the study of variant-driven traits by enabling de novo creation of diverse genomic backgrounds.[245][246] This builds on earlier synthetic biology milestones, such as the 2010 creation of a synthetic bacterial genome by J. Craig Venter's team, which demonstrated viability of chemically synthesized DNA in living cells, though human applications remain preclinical due to ethical and technical barriers.[247] Genome editing technologies, particularly CRISPR-Cas systems, complement synthetic genomics by enabling precise alterations to existing human genetic variants in cellular models, organoids, or vivo. CRISPR-Cas9, adapted from bacterial immune mechanisms and first demonstrated for eukaryotic editing in 2012, targets specific DNA sequences for cleavage and repair, allowing introduction or correction of single-nucleotide variants (SNVs) or insertions/deletions (indels) that underlie much of human variation. By 2025, over 50 clinical trials have tested CRISPR for editing disease-associated variants, such as those in the BCL11A gene for sickle cell disease, where base editing achieved durable hemoglobin production in patients without severe off-target effects in initial cohorts. Advanced variants like prime editing, which avoids double-strand breaks to minimize unintended structural variations, entered first-in-human trials in May 2025 for personalized correction of rare mutations, demonstrating feasibility for tailoring edits to individual genetic profiles.[151][248][249] These tools have elucidated causal roles of variants in human traits by editing isogenic cell lines differing only at loci of interest, revealing, for instance, how regulatory variants influence gene expression levels across populations. In functional genomics assays, multiplexed CRISPR screens have quantified variant effects on thousands of SNVs simultaneously, identifying those with strong causal impacts on cellular phenotypes like immune response or metabolism, which correlate with population-level variation. However, challenges persist: off-target edits and unintended genomic rearrangements, observed in up to 10-20% of CRISPR applications in human cells, underscore the need for improved specificity, as evidenced by structural variation risks in long-read sequencing analyses of edited genomes. Synthetic approaches also raise concerns about scalability, with current synthesis limited to megabase-scale segments, far short of the 3-gigabase human genome. Despite these hurdles, integration with pangenome references enhances variant prioritization for editing, promising deeper insights into adaptive versus deleterious variation.[250][251][252]References
- https://www.[biorxiv](/page/BioRxiv).org/content/10.1101/2020.09.28.317552v1.full.pdf