Hubbry Logo
HaplogroupHaplogroupMain
Open search
Haplogroup
Community hub
Haplogroup
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Haplogroup
Haplogroup
from Wikipedia

A haplotype is a group of alleles in an organism that are inherited together from a single parent,[1][2] and a haplogroup (haploid from the Greek: ἁπλοῦς, haploûs, "onefold, simple" and English: group) is a group of similar haplotypes that share a common ancestor with a single-nucleotide polymorphism mutation.[3] More specifically, a haplotype is a combination of alleles at different chromosomal regions that are closely linked and tend to be inherited together. As a haplogroup consists of similar haplotypes, it is usually possible to predict a haplogroup from haplotypes. Haplogroups pertain to a single line of descent. As such, membership of a haplogroup, by any individual, relies on a relatively small proportion of the genetic material possessed by that individual.

Y-DNA haplogroups map of the world

Each haplogroup originates from, and remains part of, a preceding single haplogroup (or paragroup). As such, any related group of haplogroups may be precisely modelled as a nested hierarchy, in which each set (haplogroup) is also a subset of a single broader set (as opposed, that is, to biparental models, such as human family trees). Haplogroups can be further divided into subclades.

Haplogroups are normally identified by an initial letter of the alphabet, and refinements consist of additional number and letter combinations, such as (for example) A → A1 → A1a. The alphabetical nomenclature was published in 2002 by the Y Chromosome Consortium.[4]

In human genetics, the haplogroups most commonly studied are Y-chromosome (Y-DNA) haplogroups and mitochondrial DNA (mtDNA) haplogroups, each of which can be used to define genetic populations. Y-DNA is passed solely along the patrilineal line, from father to son, while mtDNA is passed down the matrilineal line, from mother to offspring of both sexes. Neither recombines, and thus Y-DNA and mtDNA change only by chance mutation at each generation with no intermixture between parents' genetic material.

Haplogroup formation

[edit]
  Ancestral haplogroup
  Haplogroup A (Hg A)
  Haplogroup B (Hg B)
All of these molecules are part of the ancestral haplogroup, but at some point in the past a mutation occurred in the ancestral molecule, mutation A, which produced a new lineage; this is haplogroup A and is defined by mutation A. At some more recent point in the past, a new mutation, mutation B, happened in a person carrying haplogroup A; mutation B defined haplogroup B. Haplogroup B is a subgroup, or subclade of haplogroup A; both haplogroups A and B are subclades of the ancestral haplogroup.

Mitochondria are small organelles that lie in the cytoplasm of eukaryotic cells, such as those of humans. Their primary function is to provide energy to the cell. Mitochondria are thought to be reduced descendants of symbiotic bacteria that were once free living. One indication that mitochondria were once free living is that each contains a circular DNA, called mitochondrial DNA (mtDNA), whose structure is more similar to bacteria than eukaryotic organisms (see endosymbiotic theory). The overwhelming majority of a human's DNA is contained in the chromosomes in the nucleus of the cell, but mtDNA is an exception. An individual inherits their cytoplasm and the organelles contained by that cytoplasm exclusively from the maternal ovum (egg cell); sperm only pass on the chromosomal DNA, all paternal mitochondria are digested in the oocyte. When a mutation arises in a mtDNA molecule, the mutation is therefore passed down in a direct female line of descent. Mutations are changes in the nitrogen bases of the DNA sequence. Single changes from the original sequence are called single nucleotide polymorphisms (SNPs).[dubiousdiscuss]

Human Y chromosomes are male-specific sex chromosomes; nearly all humans that possess a Y chromosome will be morphologically male. Although Y chromosomes are situated in the cell nucleus and paired with X chromosomes, they only recombine with the X chromosome at the ends of the Y chromosome; the remaining 95% of the Y chromosome does not recombine. Therefore, the Y chromosome and any mutations that arise in it are passed down in a direct male line of descent.

Other chromosomes, autosomes and X chromosomes (when another X chromosome is available to pair with it), cross over and exchange their genetic material during meiosis, the process of cell division which produces gametes. Effectively this means that the genetic material from these chromosomes gets mixed up in every generation, and so any new mutations are passed down randomly from parents to offspring. That said, genetic linkage remains in play: closer-together mutations tend to be passed down together due to how cross-over works. As a result, it remains possible for one to "guess" haplotypes over a few generations, but the categorization of "groups" over many generations is impossible.

The special feature that both Y chromosomes and mtDNA display is that mutations can accrue along a certain segment of both molecules and these mutations remain fixed in place on the DNA. Furthermore, the historical sequence of these mutations can also be inferred. For example, if a set of ten Y chromosomes (derived from ten different individuals) contains a mutation, A, but only five of these chromosomes contain a second mutation, B, then it is overwhelmingly likely that mutation B occurred after mutation A.

Furthermore, all ten individuals who carry the chromosome with mutation A are the direct male line descendants of the same man who was the first person to carry this mutation. The first man to carry mutation B was also a direct male line descendant of this man, but is also the direct male line ancestor of all men carrying mutation B. Series of mutations such as this form molecular lineages. Furthermore, each mutation defines a set of specific Y chromosomes called a haplogroup.

All humans carrying mutation A form a single haplogroup, and all humans carrying mutation B are part of this haplogroup, but mutation B also defines a more recent haplogroup (which is a subgroup or subclade) of its own to which humans carrying only mutation A do not belong. Both mtDNA and Y chromosomes are grouped into lineages and haplogroups; these are often presented as tree-like diagrams.

Human Y-chromosome DNA haplogroups

[edit]

Human Y chromosome DNA (Y-DNA) haplogroups are named from A to T, and are further subdivided using numbers and lower case letters. Y chromosome haplogroup designations are established by the Y Chromosome Consortium.[5]

Y-chromosomal Adam is the name given by researchers to the male who is the most recent common patrilineal (male-lineage) ancestor of all living humans.

Major Y-chromosome haplogroups, and their geographical regions of occurrence (prior to the recent European colonization), include:

Groups without mutation M168

[edit]

Groups with mutation M168

[edit]

(mutation M168 occurred ~50,000 bp)

  • Haplogroup C (M130) (Oceania, North/Central/East Asia, North America and a minor presence in South America, Southeast Asia, South Asia, West Asia, and Europe)
  • YAP+ haplogroups
    • Haplogroup DE (M1, M145, M203)
      • Haplogroup D (CTS3946) (Tibet, Nepal, Japan, the Andaman Islands, Central Asia, and a sporadic presence in Nigeria, Syria, and Saudi Arabia)
      • Haplogroup E (M96)
        • Haplogroup E1b1a (V38) West Africa and surrounding regions; formerly known as E3a
        • Haplogroup E1b1b (M215) Associated with the spread of Afroasiatic languages; now concentrated in North Africa and the Horn of Africa, as well as parts of the Middle East, the Mediterranean, and the Balkans; formerly known as E3b

Groups with mutation M89

[edit]

(mutation M89 occurred ~45,000 bp)

  • Haplogroup F (M89) Oceania, Europe, Asia, North and South America
  • Haplogroup G (M201) (present among many ethnic groups in Eurasia, usually at low frequency; most common in the Caucasus, the Iranian plateau, and Anatolia; in Europe mainly in Greece, Italy, Iberia, the Tyrol, Bohemia; rare in Northern Europe)
  • Haplogroup H (L901/M2939)
    • H1'3 (Z4221/M2826, Z13960)
      • H1 (L902/M3061)
        • H1a (M69/Page45) India, Sri Lanka, Nepal, Pakistan, Iran, Central Asia
        • H1b (B108) Found in a Burmese individual in Myanmar.[7]
      • H3 (Z5857) India, Sri Lanka, Pakistan, Bahrain, Qatar
    • H2 (P96) Formerly known as haplogroup F3. Found with low frequency in Europe and western Asia.
  • Haplogroup IJK (L15, L16)

Groups with mutations L15 & L16

[edit]

Groups with mutation M9

[edit]

(mutation M9 occurred ~40,000 bp)

  • Haplogroup K
    • Haplogroup LT (L298/P326)
      • Haplogroup L (M11, M20, M22, M61, M185, M295) (South Asia, Central Asia, Southwestern Asia, the Mediterranean)
      • Haplogroup T (M70, M184/USP9Y+3178, M193, M272) (North Africa, Horn of Africa, Southwest Asia, the Mediterranean, South Asia); formerly known as Haplogroup K2
    • Haplogroup K(xLT) (rs2033003/M526)
Groups with mutation M526
[edit]

Human mitochondrial DNA haplogroups

[edit]
Human migrations and mitochondrial haplogroups

Human mtDNA haplogroups are lettered: A, B, C, CZ, D, E, F, G, H, HV, I, J, pre-JT, JT, K, L0, L1, L2, L3, L4, L5, L6, M, N, O, P, Q, R, R0, S, T, U, V, W, X, Y, and Z.

The versions of the mtDNA tree was maintained by Mannis van Oven on the PhyloTree website up to 2016.[9] When the number of new mtDNA tests started to heavily increase, other companies started to develop the mtDNA halpotree. First the company YFull introduced their MTree.[10] In 2025 FamilyTreeDNA introduced their MitoTree(Beta).[11]

Phylogenetic tree of human mitochondrial DNA (mtDNA) haplogroups

  Mitochondrial Eve (L)    
L0 L1–6  
L1 L2   L3     L4 L5 L6
M N  
CZ D E G Q   O A S R   I W X Y
C Z B F R0   pre-JT   P   U
HV JT K
H V J T

Mitochondrial Eve is the name given by researchers to the woman who is the most recent common matrilineal (female-lineage) ancestor of all living humans.

Defining populations

[edit]
Map of human haplotype migration, according to mitochondrial DNA, with key (coloured) indicating periods in numbered thousands of years before the present.

Haplogroups can be used to define genetic populations and are often geographically oriented. For example, the following are common divisions for mtDNA haplogroups:

The mitochondrial haplogroups are divided into three main groups, which are designated by the sequential letters L, M, N. Humanity first split within the L group between L0 and L1-6. L1-6 gave rise to other L groups, one of which, L3, split into the M and N group.

The M group comprises the first wave of human migration which is thought to have evolved outside of Africa, following an eastward route along southern coastal areas. Descendant lineages of haplogroup M are now found throughout Asia, the Americas, and Melanesia, as well as in parts of the Horn of Africa and North Africa; almost none have been found in Europe. The N haplogroup may represent another macrolineage that evolved outside of Africa, heading northward instead of eastward. Shortly after the migration, the large R group split off from the N.

Haplogroup R consists of two subgroups defined on the basis of their geographical distributions, one found in southeastern Asia and Oceania and the other containing almost all of the modern European populations. Haplogroup N(xR), i.e. mtDNA that belongs to the N group but not to its R subgroup, is typical of Australian aboriginal populations, while also being present at low frequencies among many populations of Eurasia and the Americas.

The L type consists of nearly all Africans.

The M type consists of:

M1 – Ethiopian, Somali and Indian populations. Likely due to much gene flow between the Horn of Africa and the Arabian Peninsula (Saudi Arabia, Yemen, Oman), separated only by a narrow strait between the Red Sea and the Gulf of Aden.

CZ – Many Siberians; branch C – Some Amerindian; branch Z – Many Saami, some Korean, some North Chinese, some Central Asian populations.

D – Some Amerindians, many Siberians and northern East Asians

E – Malay, Borneo, Philippines, Taiwanese aborigines, Papua New Guinea

G – Many Northeast Siberians, northern East Asians, and Central Asians

Q – Melanesian, Polynesian, New Guinean populations

The N type consists of:

A – Found in many Amerindians and some East Asians and Siberians

I – 10% frequency in Northern, Eastern Europe

S – Some Indigenous Australian (First Nations People of Australia)

W – Some Eastern Europeans, South Asians, and southern East Asians

X – Some Amerindians, Southern Siberians, Southwest Asians, and Southern Europeans

Y – Most Nivkhs and people of Nias; many Ainus, Tungusic people, and Austronesians; also found with low frequency in some other populations of Siberia, East Asia, and Central Asia

R – Large group found within the N type. Populations contained therein can be divided geographically into West Eurasia and East Eurasia. Almost all European populations and a large number of Middle-Eastern population today are contained within this branch. A smaller percentage is contained in other N type groups (See above). Below are subclades of R:

B – Some Chinese, Tibetans, Mongolians, Central Asians, Koreans, Amerindians, South Siberians, Japanese, Austronesians

F – Mainly found in southeastern Asia, especially Vietnam; 8.3% in Hvar Island in Croatia.[13]

R0 – Found in Arabia and among Ethiopians and Somalis; branch HV (branch H; branch V) – Europe, Western Asia, North Africa;

Pre-JT – Arose in the Levant (modern Lebanon area), found in 25% frequency in Bedouin populations; branch JT (branch J; branch T) – North, Eastern Europe, Indus, Mediterranean

U – High frequency in West Eurasia, Indian sub-continent, and Algeria, found from India to the Mediterranean and to the rest of Europe; U5 in particular shows high frequency in Scandinavia and Baltic countries with the highest frequency in the Sami people.

Y-chromosome and MtDNA geographic haplogroup assignment

[edit]

Here is a list of Y-chromosome and MtDNA geographic haplogroup assignment proposed by Bekada et al. 2013.[14]

Y-chromosome

[edit]

According to SNPS haplogroups which are the age of the first extinction event tend to be around 45–50 kya. Haplogroups of the second extinction event seemed to diverge 32–35 kya according to Mal'ta. The ground zero extinction event appears to be Toba during which haplogroup CDEF* appeared to diverge into C, DE and F. C and F have almost nothing in common while D and E have plenty in common. Extinction event #1 according to current estimates occurred after Toba, although older ancient DNA could push the ground zero extinction event to long before Toba, and push the first extinction event here back to Toba. Haplogroups with extinction event notes by them have a dubious origin and this is because extinction events lead to severe bottlenecks, so all notes by these groups are just guesses. Note that the SNP counting of ancient DNA can be highly variable meaning that even though all these groups diverged around the same time no one knows when.[15][16]

Origin Haplogroup Marker
Europe (Second Extinction Event?) I M170, M253, P259, M227, M507
Europe I1b P215, M438, P37.2, M359, P41.2
Europe I1b2 M26
Europe I1c M223, M284, P78, P95
Europe J2a1 M47
Europe J2a2 M67, M166
Europe J2a2a M92
Europe J2b M12, M102, M280, M241
Europe R1b1b1a M412, P310
Europe R1b1b1a1 L11
Europe R1b1b1a1a U106
Europe R1b1b1a1b U198, P312, S116
Europe R1b1b1a1b1 U152
Europe R1b1b1a1b2 M529
Europe R1b1b1a1b3,4 M65, M153
Europe R1b1b1a1b5 SRY2627
South Asia or Melanesia C1(formerly known as CxC3) Z1426
North Asia C2 (formerly known as C3) M217+
Indonesia or South Asia F M89, M282
Europe (Caucasus) G M201, M285, P15, P16, M406
South Asia H M69, M52, M82, M197, M370
Europe or Middle East J1 M304, M267, P58, M365, M368, M369
Europe or Middle East J2 M172, M410, M158, M319, DYS445=6, M339, M340
West of Burma in Eurasia (First Extinction Event?)[17]
Indonesia (First Extinction Event?) [17] K2 (NOPS) M526
South Asia L M11, M20, M27, M76, M317, M274, M349, M357
East Asia, South East Asia N M231, M214, LLY22g, Tat, M178
East Asia, South East Asia, South Asia O M175, M119
Indonesia, Philippines P (xQR) 92R7, M207, M173, M45
South Asia, Siberia R and Q (QR) split [17] MEH2, M242, P36.2, M25, M346
Middle East, Europe, Siberia, South Asia R1a1 M420, M17, M198, M204, M458
Anatolia, South East Europe ? R1b M173, M343, P25, M73
Europe R1b1b M269
Europe R1b1b1 L23
Pakistan, India R2 M479, M124
Middle East T M70
North Africa E1b1b1 M35
North Africa E1b1b1a M78
West Asia E1b1b1a2 V13
North Africa E1b1b1a1 V12
North Africa E1b1b1a1b V32
North Africa E1b1b1a3 V22
North Africa E1b1b1a4 V65
North Africa E1b1b1b M81
North Africa E1b1b1c M123, M34
West Africa, North Africa A M91, M13
East Africa B M60, M181, SRY10831.1, M150, M109, M112
Asia, Africa DE M1, YAP, M174, M40, M96, M75, M98
East Asia, Nepal D M174
West Africa (First Extinction Event?) E1a M33
East Africa (First Extinction Event is the split between E1b1 and E1a, second extinction event is the split between E1b1b and E1b1a) E1b1 P2, M2, U175, M191
Middle East J1 P58

mtDNA

[edit]
Origin Haplogroup
Europe H1
Europe H11a
Europe H1a
Europe H1b
Europe H2a
Europe H3
Europe H5a
Europe H6a
Europe H7
Europe HV0/HV0a/V
Europe I4
Europe J1c7
Europe J2b1
Europe T2b*
Europe T2b4
Europe T2e
Europe U4c1
Europe U5*
Europe U5a
Europe U5a1b1
Europe U5b*
Europe U5b1b*
Europe U5b1c
Europe U5b3
Europe X2c'e
Middle East I
Middle East A
Middle East B
Middle East C/Z
Middle East D/G/M9/E
India F
Middle East H*
Middle East H13a1
Middle East H14a
Middle East H20
Middle East H2a1
Middle East H4
Middle East H6b
Middle East H8
Middle East HV1
Middle East I1
Middle East J / J1c / J2
Middle East J1a'b'e
Middle East J1b1a1
Middle East J1b2a
Middle East J1d / J2b
Middle East J1d1
Middle East J2a
Middle East J2a2a1
Middle East K*
Middle East K1a*
Middle East K1b1*
Middle East N1a*
Middle East N1b
Middle East N1c
Middle East N2
Middle East N9
Middle East R*
Middle East R0a
Middle East T
Middle East T1*
West Asia T1a
Middle East T2
Middle East T2c
Middle East T2i
Middle East U1*
Middle East U2*
Middle East U2e
Eurasia U3*
Middle East U4
Middle East U4a*
Middle East U7
Middle East U8*
Middle East U9a
Middle East X
Middle East X1a
Middle East X2b1
North Africa L3e5
North Africa M1
North Africa M1a1
North Africa U6a
North Africa U6a1'2'3
North Africa U6b'c'd
East Africa L0*
East Africa L0a1
East Africa L0a1b
East Africa L0a2*
East Africa L3c/L4/M
East Africa L3d1a1
East Africa L3d1d
East Africa L3e1*
East Africa L3f*
East Africa L3h1b*
East Africa L3i*
East Africa L3x*
East Africa L4a'b*
East Africa L5*
East Africa L6
East Africa N* / M* / L3*
West Africa L1b*
West Africa L1b3
West Africa L1c*
West Africa L1c2
West Africa L2*
West Africa L2a
West Africa L2a1*
West Africa L2a1a2'3'4
West Africa L2a1b
West Africa L2a1b'f
West Africa L2a1c1'2
West Africa L2a1(16189)
West Africa L2a2
West Africa L2b*
West Africa L2c1'2
West Africa L2d
West Africa L2e
West Africa L3b
West Africa L3b1a3
West Africa L3b(16124!)
West Africa L3b2a
West Africa L3d*
West Africa L3e2'3'4
West Africa L3f1b*

See also

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
A haplogroup is a genetic group of individuals who share a common along either the maternal (matrilineal) or paternal (patrilineal) line, defined by specific genetic markers such as single nucleotide polymorphisms (SNPs) in the (mtDNA) or the non-recombining portion of the . These markers represent branches on the phylogenetic trees of ancestry, formed through accumulated over tens of thousands of years. Haplogroups are fundamental in because they do not undergo recombination, allowing direct tracing of uniparental inheritance patterns across generations. Mitochondrial DNA haplogroups, inherited solely from the mother, encompass major clades such as L (originating in Africa), M, and N, which diverged during early human migrations out of Africa around 60,000–70,000 years ago. Y-chromosome haplogroups, passed from father to son, include prominent groups like A and B (African origins), C and F (Eurasian expansions), and are similarly structured to reflect paternal lineage evolution. Both types of haplogroups are cataloged in standardized phylogenies, such as the Y-DNA haplotree by FamilyTreeDNA and YFull, and the mtDNA haplotree by FamilyTreeDNA, which are regularly updated with new sequencing data to refine branch definitions (as of 2025). Haplogroups play a crucial role in reconstructing human population history, including major migrations, such as the peopling of , , and the , by correlating genetic distributions with archaeological and linguistic evidence. They also inform studies on , admixture events, and even associations with disease susceptibility, as certain haplogroup-specific variants influence metabolic or immune functions. Advancements in next-generation sequencing have further refined haplogroup resolution, enabling detailed tracing of recent population movements (as of 2025). In modern applications, commercial often assigns individuals to haplogroups to provide insights into deep ancestry and ethnic origins.

Fundamentals of Haplogroups

Definition and Characteristics

A haplogroup is a monophyletic group of haplotypes that share a common , defined by the presence of specific derived genetic variants, such as single nucleotide polymorphisms (SNPs), that arose in that ancestor and are inherited by all descendants. These groups form clades in phylogenetic trees, allowing researchers to reconstruct evolutionary relationships among populations based on shared mutational histories. In , haplogroups are most commonly studied in non-recombining regions of the , providing insights into ancient migrations and ancestry. Key characteristics of haplogroups stem from their identification in uniparental genetic markers: the Y-chromosome, which traces patrilineal descent exclusively through males, and (mtDNA), which traces matrilineal descent through females. Unlike autosomal DNA, these markers do not undergo recombination during , preserving the integrity of ancestral s and enabling the reconstruction of lineages spanning tens of thousands of years. A , by contrast, refers to an individual's specific combination of alleles at multiple linked loci inherited from a single parent, whereas a haplogroup encompasses a broader cluster of such haplotypes united by a common founding . The concept of haplogroups emerged in the amid advances in , with the term "haplogroup"—short for "haplotype group"—first formally introduced to describe clusters of related mtDNA sequences sharing diagnostic mutations. Earlier foundational work in the laid the groundwork through mtDNA studies that revealed distinct lineages, such as the initial classification of Native American maternal ancestries into haplogroups , C, and D based on restriction fragment length polymorphisms. Haplogroups are organized in a hierarchical, tree-like phylogeny, with major clades designated by capital letters (e.g., H for a prevalent European mtDNA haplogroup) and subclades denoted by alternating numbers and lowercase letters (e.g., H1b), each defined by additional unique mutations that mark successive branches from the common ancestor. This facilitates precise mapping of and evolutionary divergence.

Types and Classification

Haplogroups are categorized primarily by the genomic region they represent, including Y-chromosome DNA (Y-DNA) and (mtDNA), each with distinct patterns prized for tracing deep ancestry due to lack of recombination. Y-DNA haplogroups are male-specific, following strict patrilineal as the Y passes unchanged from father to son, enabling reconstruction of deep paternal lineages. In contrast, mtDNA haplogroups are present in all individuals but follow matrilineal , transmitted solely from mother to all offspring, which facilitates tracking of maternal ancestry over long timescales. Autosomal DNA, derived from the 22 pairs of non-sex chromosomes, involves biparental contributions from both parents and is subject to frequent recombination during , making it more suitable for analyzing recent admixture and population structure using blocks rather than stable haplogroups. Classification systems focus on uniparental markers (Y-DNA and mtDNA) for delineating deep evolutionary history, while autosomal variants provide insights into more recent genetic mixing across populations. This framework extends beyond humans to non-human species in , where haplogroups help elucidate phylogenetic relationships; for instance, mtDNA haplogroups in animals and reveal events and migration patterns. In s, Y-DNA haplogroups are denoted by major s A through T, forming the backbone of paternal phylogeny as established by standardized . Similarly, mtDNA haplogroups in humans are often grouped into African macrohaplogroups L0 through L6, representing basal branches of the global maternal tree. Non-human examples include Neanderthal mtDNA haplogroups, such as the NA identified in ancient Siberian specimens, which diverged early from modern human lineages and inform interspecies evolutionary dynamics. A key limitation of autosomal analysis for deep ancestry tracing lies in recombination, which shuffles genetic material across generations, rapidly eroding long-range blocks and complicating the identification of ancient common ancestors compared to uniparental systems. This recombination-driven fragmentation contrasts with the non-recombining nature of Y-DNA and mtDNA, which preserve over millennia, though autosomal data remains invaluable for quantifying admixture proportions in diverse populations.

Mechanisms of Formation

Genetic Mutations and Inheritance

Haplogroups arise primarily through single nucleotide polymorphisms (SNPs), which are point mutations involving the substitution of one nucleotide for another in the DNA sequence of either the non-recombining portion of the Y chromosome or the mitochondrial genome. These SNPs serve as stable markers that define the boundaries of a haplogroup, as they accumulate over generations without recombination, allowing researchers to trace lineages back to a common ancestor. While SNPs form the foundational identifiers, other genetic variants such as insertions/deletions (indels) and short tandem repeats (STRs) contribute to higher-resolution subclade distinctions within haplogroups, particularly for recent divergences where SNP density is low. Inheritance of haplogroups follows strict uniparental patterns due to the biology of these genomic regions. Y-chromosomal haplogroups are transmitted exclusively from father to son, as the Y chromosome is present only in males and does not undergo recombination with other chromosomes.06098-5) In contrast, mitochondrial DNA (mtDNA) haplogroups are passed from mother to all offspring, regardless of sex, because mtDNA is located in the cytoplasm of the egg and not contributed by sperm. This uniparental, non-recombining transmission results in clonal inheritance, producing star-like phylogenetic structures where all descendants of a mutated ancestor share the identical marker set until further mutations occur. The formation of a new haplogroup begins when a defining , typically an SNP, arises de novo in a single individual and is subsequently inherited by their descendants, establishing a novel lineage branch. Over time, additional mutations in this lineage create nested subclades, forming a hierarchical structure that reflects the cumulative mutational history. For Y-DNA haplogroups, these mutations occur at a rate of approximately one SNP every 100-150 years, providing a for estimating divergence times. mtDNA mutations, however, accumulate more rapidly—roughly 10-20 times faster than those in Y-DNA or nuclear DNA—due to the higher error-prone replication of the mitochondrial genome and its exposure to . Mutation rates can vary due to factors such as generational length, environmental influences, and demographic events like population bottlenecks, which reduce and can accelerate apparent divergence by fixing rare variants in small populations. These dynamics underscore the utility of haplogroups in reconstructing evolutionary while highlighting the need for calibrated rates in phylogenetic analyses.

Phylogenetic Relationships

Haplogroups are organized into rooted phylogenetic trees that depict their evolutionary relationships, with the most ancient basal haplogroup at the root and successive subclades branching outward from specific mutational events. For Y-chromosome DNA (Y-DNA), the tree, often called the Y-tree, begins with haplogroup A00 as the root, representing the earliest known divergence in paternal lineages. Similarly, (mtDNA) haplogroups form an mt-tree, structured as a of nested clades defined by shared mutations. These trees illustrate uniparental patterns, where branches correspond to the spread of particular genetic variants through populations over time. Phylogenetic trees for haplogroups are constructed primarily using maximum parsimony methods, which seek the simplest evolutionary explanation by minimizing the number of mutational changes required to explain the observed , or , which incorporates probabilistic models to estimate tree topologies from (SNP) data. These approaches rely on high-throughput sequencing of entire genomes or targeted regions, compiling data from thousands of samples to resolve branching patterns. Specialized resources like PhyloTree for mtDNA, which integrates over 24,000 full mitogenome sequences into a tree with more than 5,400 nodes, and the International Society of Genetic Genealogy (ISOGG) Y-DNA tree, which curates SNP-based clades from global testing, maintain and update these structures through community-driven validation and software tools for alignment and tree building. Interpretation of these trees involves estimating the time to the (TMRCA) for each branch, calculated by applying calibrated mutation rates—typically derived from or pedigree studies—to the number of accumulated SNPs along lineages, often within the framework that probabilistically models lineage mergers backward in time. provides a foundation for understanding branching points as reflections of bottlenecks, expansions, or migrations, allowing researchers to infer demographic histories without direct evidence. These estimates help contextualize the scale of , such as placing major divergences tens of thousands of years ago. Haplogroup trees are inherently dynamic, undergoing frequent revisions as advances in sequencing technology uncover novel variants; for instance, testing in the 2020s has identified thousands of private SNPs, refining the Y-tree by adding hundreds of new subclades annually and improving resolution for recent branches. Such updates ensure the trees remain accurate representations of , incorporating data from diverse populations to avoid biases in earlier models. Geographic patterns emerge from these refined structures, linking clades to migration routes observed in archaeological records.

Y-Chromosome Haplogroups in Humans

Major Clades and Mutations

Human Y-chromosome DNA (Y-DNA) haplogroups are organized into a rooted in , with the most basal clades comprising haplogroup A, which includes subclades A0 through A3. These African-specific lineages represent the deepest branches of the human Y-DNA phylogeny, with A0-T (defined by P108) being among the oldest, estimated to have originated approximately 200,000 to 300,000 years ago in . Haplogroup A is characterized by ancient subclades such as A1a (M31) and A1b (P82), which exhibit high genetic diversity and are defined by early mutations in the non-recombining region. Subsequent basal clades, including B (M60), diversified within over tens of thousands of years, reflecting regional expansions. The transition to non-African haplogroups occurred through haplogroup BT (M91), which arose around 100,000 years ago and served as the progenitor for all lineages outside the most basal A branches. From BT, the primary macrohaplogroup C (M130) emerged, marking early expansions into Eurasia and Oceania, while DE (M145/M203) split into D (M174, associated with Asian populations like Ainu and Tibetans) and E (M96, prominent in Africa and Europe). Key defining mutations for BT include the transition at M91, alongside other SNPs that distinguish it from A and B clades. Within the F (M89) macrohaplogroup, downstream from BT, major clades diversified into G (M201, and ), H (M69, ), I (M170, ), J (M304, and ), and K (M9). Haplogroup R (M207), a of K via P (M45), further diversified into R1 (M173, widespread in and ) and R2, with R1b (M343) defined by additional mutations like M269, associated with Western European expansions. Representative subclades include I1 (M253) and I2 (P215) under I, linked to Northern and Southern European lineages, and Q (M242) under P, associated with Siberian and Native American populations. Recent advances in whole Y-chromosome sequencing during the have refined the resolution of clades, revealing finer structures within E (e.g., E1b1b-M35 with V22 defined by specific SNPs) and R (e.g., R1a-Z93 in ), tracing back to ancient migrations around 20,000 years ago. These studies highlight increased diversity from analyses worldwide. The evolution of Y-DNA haplogroups is influenced by a estimated at approximately 0.76 × 10^{-9} mutations per per year in the non-recombining region, which is slower than mtDNA and facilitates tracking of deep paternal lineages over long timescales. Historical paternal bottlenecks, such as those during out-of-Africa migrations, reduced effective population sizes and amplified , leading to lower diversity in non-African lineages compared to African ones.

Nomenclature and Phylogeny

The nomenclature for human Y-chromosome DNA (Y-DNA) haplogroups evolved significantly from the 2000s, when initial classifications relied on binary markers like (DYS287) to define broad lineages such as A through I. This approach, limited by technology, focused on a few polymorphic sites and resulted in coarse groupings, as seen in early Consortium (YCC) studies compiling global datasets. By the 2010s, advances in next-generation sequencing enabled analysis of the full non-recombining Y region (~23 Mb), allowing for precise definitions incorporating thousands of single nucleotide polymorphisms (SNPs), which refined haplogroups and resolved subclades. The standardized naming system, maintained by the International Society of Genetic Genealogy (ISOGG), employs an alphanumeric scheme where major clades are designated by capital letters (e.g., A, B, R), and subclades by nested numbers and lowercase letters (e.g., R1b1a1b), reflecting phylogenetic branching based on diagnostic SNPs. This system, formalized by the YCC in 2002 and updated annually by ISOGG, ensures hierarchical consistency, with names assigned to monophyletic groups sharing derived mutations. The ISOGG Y-DNA Haplogroup Tree (version 2024) incorporates over 300,000 SNPs and defines thousands of haplogroups, with ongoing revisions driven by from commercial testing and ancient genomes; as of 2025, community efforts like YFull and FamilyTreeDNA have extended the tree using datasets exceeding 1 million Y-chromosomes. Phylogenetic updates to the Y-DNA tree integrate new sequences through maximum likelihood methods, often using tools like RAxML or IQ-TREE under substitution models accounting for rate variation, supplemented by Bayesian approaches for branch support. Recent incorporations include ancient Y-DNA from analyses, such as those revealing admixture signals in non-African lineages, prompting reevaluations of branches like F and . Time to most recent common ancestor (TMRCA) estimates for clades are derived using the rho statistic or Bayesian skyline plots, calibrated against pedigree rates (e.g., ~130 years per ); this provides age approximations assuming a , though calibration improves accuracy for recent events. Compared to mtDNA haplogroups, Y-DNA handles a larger (~23 Mb vs. 16.5 kb), requiring more SNPs for resolution but benefiting from uniparental without recombination. However, Y-DNA faces challenges from structural variants and copy-number differences in ampliconic regions, which can complicate assignment in low-coverage samples, necessitating high-depth sequencing (>30x) for reliable calls in ancient or diverse datasets.

Mitochondrial DNA Haplogroups in Humans

Major Clades and Mutations

Human (mtDNA) haplogroups are organized into a rooted in , with the most basal clades comprising , which includes L0 through L6. These African-specific lineages represent the deepest branches of the human mtDNA phylogeny, with L0 being the oldest, estimated to have originated approximately 150,000 to 200,000 years ago in eastern or . L0 is characterized by ancient subclades such as L0a, L0d, and L0k, which exhibit high and are defined by early mutations like those at positions 195 and 2472. Subsequent basal clades, including L1, L2, L3, L4, L5, and L6, diversified within over tens of thousands of years, reflecting regional population expansions and adaptations. The transition to non-African haplogroups occurred through , which arose around 70,000 years ago in eastern and served as the progenitor for all Eurasian and American lineages. From L3, the two primary macrohaplogroups and emerged, marking the foundational split that facilitated the out-of-Africa dispersal. Macrohaplogroup includes subclades like M1 (primarily African) and derivatives such as M8, while N encompasses diverse branches including , which is analogous in its basal role to the Y-chromosome haplogroup R-M207 but traces matrilineal inheritance. Key defining mutations for L3 include the I transition at position 16311, alongside coding mutations like 769 and 1018, which distinguish it from earlier L clades. Within macrohaplogroup N, haplogroup further diversified into major European and Asian clades, giving rise to H and . Haplogroup H, defined by mutations such as 2706 and 7028, represents a prominent branch associated with post-glacial expansions in . Haplogroup U, characterized by transitions at 11467 and 12308, exhibits broad distribution and includes subclades like U5 and U8. Representative subclades include H1 and H3 under H, which carry additional mutations like 3010 for H1, and B4 under , defined by 8281-8289del, linked to coastal migrations across and into the Pacific. Recent advances in full mtDNA genome sequencing during the 2020s have refined the resolution of Native American clades derived from M and N, revealing finer structures within A2 (with subclade A2o defined by 5154 and 9773) and C1 (including C1b with 3552), which trace back to Beringian founders around 20,000 years ago. These studies highlight increased subclade diversity from ancient DNA analyses in the Americas. The evolution of mtDNA haplogroups is influenced by a relatively high mutation rate, estimated at approximately 2.87 × 10^{-6} mutations per base pair per generation in the coding region, which is about ten times faster than nuclear DNA and facilitates rapid lineage divergence. Historical maternal bottlenecks, such as those during the out-of-Africa migration, reduced effective population sizes and amplified genetic drift, leading to the fixation of specific mutations and lower diversity in non-African lineages compared to African ones.

Nomenclature and Phylogeny

The nomenclature for human mitochondrial DNA (mtDNA) haplogroups evolved significantly from the , when initial classifications relied primarily on sequencing the 1 (HVR1) of the control region to define broad lineages such as L0 through L3. This approach, limited by technology, focused on polymorphic sites in the non-coding control region (approximately 1.1 kb) and resulted in coarse groupings based on shared motifs, as seen in early studies compiling databases of HVR1 sequences from global populations. By the post-2000 era, advances in sequencing enabled full mitogenome analysis (16.569 kb), allowing for more precise definitions incorporating single polymorphisms (SNPs), which refined haplogroups and resolved subclades previously indistinguishable. The standardized naming system, maintained by PhyloTree.org, employs an alphanumeric scheme where major clades are designated by capital letters (e.g., H, U), and subclades by nested numbers and lowercase letters (e.g., H1a1), reflecting phylogenetic branching based on diagnostic SNPs in both the control region and coding sequence. This system, introduced in the mid-1990s and formalized through revisions, ensures hierarchical consistency, with names assigned to nodes in the tree representing monophyletic groups sharing derived mutations. PhyloTree Build 17, released in 2016, incorporated over 24,000 sequences and defined more than 5,400 haplogroups, with periodic revisions driven by accumulating full-sequence data; while official updates ceased after 2016, community efforts and commercial databases have extended refinements into the 2020s using expanded datasets exceeding 100,000 mitogenomes, including FamilyTreeDNA's February 2025 update adding 35,000 new branches from over 250,000 sequences via the Million Mito Project. Phylogenetic updates to the mtDNA tree integrate new sequences through automated maximum likelihood reconstruction, often using tools like RAxML under the GTR+Γ model, supplemented by in specialized studies to account for substitution rate heterogeneity and improve branch support. Recent incorporations include ancient mtDNA from 2020s analyses, such as those revealing admixture signals in Eurasian lineages, which have prompted reevaluations of basal branches like and by aligning archaic sequences to modern trees. Time to (TMRCA) estimates for s are commonly derived using the ρ (rho) statistic, which calculates mean branch lengths from mutation distances to the clade founder, calibrated against established rates (e.g., one transition per 3,624 years in the ); this method provides robust age approximations, though it assumes a and can underestimate deep-time events without ancient . Compared to Y-chromosome haplogroups, mtDNA benefits from the smaller (16.5 kb versus ~23 Mb non-recombining Y region), enabling comprehensive sequencing and higher resolution with fewer variants, which contributes to greater phylogenetic stability due to uniparental inheritance and lack of recombination. However, mtDNA faces unique challenges from —the coexistence of wild-type and mutant mtDNA molecules within cells—which can complicate haplogroup assignment in low-level variants, particularly in ancient or degraded samples, necessitating thresholds (e.g., >70% frequency) for reliable calls.

Geographic and Population Distribution

Y-DNA Patterns Worldwide

Y-DNA haplogroups exhibit distinct geographic distributions that reflect ancient human migrations and population expansions. In , the of origin for modern humans, basal haplogroups such as A and B predominate among indigenous groups like the , comprising approximately 80% of their paternal lineages, while haplogroup E1b1a reaches frequencies of around 60% in West African populations, associated with the spread of Bantu-speaking peoples. Outside these core areas, Y-DNA diversity is relatively low, with limited penetration of non-African clades until recent historical admixture. In the Americas, haplogroup Q dominates paternal lineages among indigenous populations, with frequencies often exceeding 80% in groups such as the Maya and Amazonian tribes, tracing back to Paleolithic migrations across the Beringian around 15,000–20,000 years ago. In , patterns shift markedly. dominates , with frequencies ranging from 50% to over 90% in regions like and the Basque Country, linked to post-Ice Age expansions and later movements. In contrast, R1a prevails in and parts of , occurring at 20-50% in populations such as Poles, , and northern Indians, reflecting eastern influences. Haplogroup J is prominent in the , at 20-40% among Arab and Levantine groups, tied to dispersals from the . Across and , haplogroup distributions highlight coastal and inland migrations. Haplogroup C reaches about 60% among , evidencing early arrivals via the southern route around 50,000 years ago. In , haplogroup O is the most common, at 50-70% in and Japanese populations, originating from Southeast Asian expansions during the . Recent genomic studies from the 2020s in reveal mixed Q and N haplogroups, with Q at up to 90% among Kets and N dominant in Uralic speakers, indicating connections to Native American ancestors and later Eurasian admixtures. These patterns inform inferences about major human dispersals. The Out-of-Africa migration around 60,000-70,000 years ago is marked by the emergence and spread of macro-haplogroup CT beyond , carrying precursors to most non-African Y-DNA lineages via a southern coastal route. Later, Indo-European expansions from the Pontic-Caspian steppe approximately 4,000-5,000 years ago are evidenced by the rapid dissemination of R1a and R1b subclades into and , correlating with linguistic and archaeological shifts.
RegionMajor Haplogroup(s)Approximate FrequencyAssociated Migration/Event
Africa (Khoisan)A, B~80%Basal human origins
West AfricaE1b1a~60%
Western EuropeR1b50-90%Bronze Age steppe influx
Eastern Europe/IndiaR1a20-50%Indo-European dispersal
Middle EastJ20-40% farming spread
AustraliaC~60%Initial Out-of-Africa wave
East AsiaO50-70% Asian expansions
SiberiaQ, NVariable (Q up to 90% in some groups) Beringian links
AmericasQ>80%Beringian migration

mtDNA Patterns Worldwide

In , () haplogroups belonging to the L macrohaplogroup dominate maternal lineages, reflecting deep-rooted African ancestry. Specifically, haplogroups L2 and L3 together comprise approximately 70% of mtDNA variation in many populations across this region, with L2 often exceeding 30% and L3 around 20-30% in West and Central African groups. , one of the most ancient branches, reaches notably high frequencies in Pygmy populations of , where it can account for up to 50% or more of lineages, underscoring their distinct genetic isolation and ancient divergence. Across , mtDNA patterns exhibit marked regional specificity tied to post-African dispersals. In , haplogroup H prevails as the most common lineage, with frequencies averaging 40% among modern populations, a distribution linked to Neolithic expansions from the . In contrast, shows a predominance of macrohaplogroup M, reaching about 50% in East Asian groups such as the , highlighting early coastal migrations along southern routes. Ancient European samples frequently carry haplogroup U5, which constituted 20-30% of mtDNA but has declined to around 7% today due to subsequent admixture events. Among Native American populations, haplogroups A, B, C, D, and X appear at moderate to high frequencies, with B (particularly sublineages like B2) often exceeding 20% in South American indigenous groups, tracing back to Asian founders. In and , haplogroup distributions further illustrate adaptive dispersals into diverse environments. Haplogroups A and D are prevalent in Siberian populations, with frequencies up to 20-30% in indigenous groups like the Evenks and Yakuts, serving as direct precursors to lineages in the via ancient crossings. In , haplogroup N9 stands out, comprising approximately 7-10% of modern mtDNA and reflecting Jomon period continuity with minimal external admixture. Recent studies from the on Polynesian populations have revealed the rapid expansion of subhaplogroup B4a1a1, known as the "Polynesian motif," which dominates over 90% of maternal lineages in remote islands like the Marquesas, originating from around 5,000 years ago and spreading eastward at rates exceeding 1,000 km per century. These global mtDNA patterns infer key historical migrations, with conceptual frequency gradients showing a decline in L-derived lineages from Africa outward. The out-of-Africa dispersal around 60,000-70,000 years ago likely followed coastal routes, where L3 gave rise to non-African macrohaplogroups M and N, evident in their higher frequencies along southern Eurasian peripheries compared to inland areas. Later, the Beringian land bridge facilitated entry into the Americas approximately 15,000-20,000 years ago, carried by haplogroups A, B, C, and D, whose Siberian peaks form a gradient decreasing southward across the continents. Such gradients, combined with star-like phylogenies in peripheral regions, highlight serial founder effects during maternal lineage expansions.

Applications in Research

Population Genetics and Migration Studies

Haplogroups have been instrumental in reconstructing human migration patterns by correlating specific lineages with archaeological and historical events. For instance, Y-chromosome haplogroup is strongly associated with the spread of farming from into around 8,000–6,000 years ago, as evidenced by from early agricultural sites showing high frequencies of G2a among farmer populations. Similarly, haplogroup E1b1a correlates with the , a series of migrations originating in West-Central approximately 3,000–5,000 years ago that disseminated and farming practices across , with genetic diversity patterns indicating an eastern route through the . haplogroups, tracing exclusively maternal lineages, have illuminated female-mediated migration routes, such as the southern coastal dispersal out of via haplogroup M derivatives, which spread to and around 60,000 years ago. Admixture events are revealed through discrepancies between uniparental markers, highlighting sex-biased . In Native American populations, Y-chromosome haplogroups like (of Siberian origin) dominate paternal lineages, while mtDNA haplogroups A, B, C, D, and X reflect maternal ancestries from the same Beringian source, but post-contact admixture has introduced European Y-haplogroups at higher rates than European mtDNA, indicating male-biased European immigration. Such discordances aid in estimating effective population sizes (Ne), where lower Y-chromosome diversity compared to mtDNA suggests smaller male Ne due to social structures or bottlenecks; for example, simulations using haplogroup data estimate global male Ne at around 2,000–4,000 individuals in recent millennia, contrasting with larger female Ne. Population bottlenecks and expansions are evident in haplogroup phylogenies and diversity metrics. A pronounced Y-chromosome bottleneck occurred 5,000–7,000 years ago across and , reducing male lineage diversity to about 1/17th of levels, likely due to patrilineal kin group competition and cultural practices rather than disease or climate. Recent studies from the 2020s confirm migrations around 5,000 years ago, with expanding from Yamnaya pastoralists into , contributing up to 50% of modern Western European male ancestry and facilitating Indo-European language dispersal. Methodological advances in population genetics leverage haplogroups for hypothesis testing. Coalescent simulations model lineage branching under demographic scenarios, simulating haplogroup trees to infer migration timings and rates, as in msprime-based approaches that integrate Y and mtDNA data with fossil-calibrated phylogenies. Approximate Bayesian computation (ABC) further refines these by comparing observed haplogroup frequencies against simulated datasets, enabling evaluation of complex models like serial founder effects in human dispersals. Increasingly, haplogroup insights are integrated with autosomal DNA for holistic admixture mapping, enhancing resolution of effective population trajectories beyond uniparental markers alone.

Genealogy and Forensic Analysis

Commercial genetic testing companies such as and FamilyTreeDNA utilize haplogroup assignments to aid individuals in tracing their ancestry. These services analyze (mtDNA) for maternal lineages and Y-chromosome DNA for paternal lineages, employing single polymorphisms (SNPs) to determine deep ancestral haplogroups that trace back thousands of years. In contrast, Y-short tandem repeats (Y-STRs) are used for identifying more recent matches, particularly in surname projects where participants compare haplotypes to find common male ancestors within the last few centuries. In forensic applications, mtDNA haplogroups play a key role in identifying unidentified remains due to their maternal inheritance and stability in degraded samples. For instance, in the 2007 identification of the missing Romanov children, mtDNA analysis confirmed the haplotype 16111T, 16357C, 16519C, 263G, 315.1C, 524.1A, 524.2C from the remains, matching that of Tsarina Alexandra and linking to known relatives like HRH Prince Philip, thus verifying maternal lineage. Y-haplogroups, often inferred from Y-STR profiles, assist in sexual assault investigations by isolating male perpetrator DNA in mixtures dominated by female victim profiles; Y-STR kits enable haplotype matching against databases like the U.S. Y-STR Database, which contains over 10,000 profiles for frequency estimation. While the Combined DNA Index System (CODIS) primarily uses autosomal STRs, forensic labs extend analysis with Y-STR and SNP panels to predict haplogroups and enhance identifications in challenging cases. Certain rare mtDNA haplogroups are associated with increased disease risks, such as those carrying mutations like m.11778G>A in (LHON), where haplogroup background influences and clinical expression. In pharmacogenomics, mtDNA haplogroup variations affect drug responses; for example, haplogroup H carriers show altered efficacy to therapy in treatment due to differences in mitochondrial function. These insights highlight potential for but require further validation. Despite these applications, haplogroup analysis faces limitations and ethical challenges, including incomplete phylogenetic trees for recent subclades that evolve rapidly and may not yet be fully resolved in testing databases. Privacy concerns are paramount, as tests expose sensitive genetic data to risks like unauthorized sharing or sale; the 2025 bankruptcy of led to the transfer of over 15 million customers' data to new entities with varying safeguards. In response, regulations such as Utah's Genetic Information Privacy Act (enacted 2021) mandate clear disclosures and restrict data sales without consent, aiming to bolster protections amid growing commercial pressures.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.