Missing heritability problem

In genetics, the missing heritability problem^[1]^[2]^[3]^[4]^[5]^[6] refers to a difference between heritability estimates obtained from early genome-wide association studies (GWAS) and heritability estimates from twin and family data across many physical and mental traits, including diseases, behaviors, and other phenotypes.

An influential review article^[7] in 2008 noted that the amount of phenotypic variance explained by significant loci in GWAS studies up to that point was usually far less than expected based on family studies. This gap was referred to as "missing heritability". Using height as a model trait, a paper in 2010 showed that most of the missing heritability can be explained by the presence of large numbers of low variants whose effect sizes were too small to detect at the sample sizes that were then available.^[8] This conclusion has subsequently been confirmed using much larger sample sizes, including a study of 5.4 million individuals that identified around 12,000 independent variants that affect human height.^[9] While studies of height have particularly large power due to their very large sample size, other complex traits likely have similar genetic architecture. Thus, the missing heritability problem is largely resolved by the presence of tens of thousands of variants of small effects that could not be detected in early GWAS studies.

Discovery

The missing heritability problem was named as such in 2008. The Human Genome Project led to optimistic forecasts that the large genetic contributions to many traits and diseases (which were identified by quantitative genetics and behavioral genetics in particular) would soon be mapped and pinned down to specific genes and their genetic variants by methods such as candidate-gene studies which used small samples with limited genetic sequencing to focus on specific genes believed to be involved, examining single-nucleotide polymorphisms (SNPs). While many hits were found, they often failed to replicate in other studies. The exponential fall in genome genotyping costs led to the use of genome-wide association studies (GWASes) which could simultaneously examine all candidate-genes in larger samples than the earlier candidate-gene studies. For the first time these produced replicatable signals; however by 2008 investigators were surprised to find that the detected signals could only explain a small fraction of the expected genetic variance.

Dilemma

Standard genetics methods have long estimated large heritabilities such as 80% for traits such as height or intelligence, yet none of the genes had been found despite sample sizes that, while small, should have been able to detect variants of reasonable effect size such as 1 inch or 5 IQ points. If genes have such strong cumulative effects - where were they? Several resolutions have been proposed, that the missing heritability is some combination of:

Twin studies and other methods were grossly biased by issues long raised by their critics; there was little genetic influence to be found. Therefore, it has been proposed that the genes that supposedly underlie behavior genetic estimates of heritability simply do not exist.^[10] For instance, twin studies may have neglected to measure cross-cultural environmental variation by design.^[11]
Genetic effects are actually epigenetics
Genetic effects are generally non-additive and due to complex interactions. Among many proposals, a model has been introduced that takes into account epigenetic inheritance on the risk and recurrence risk of a complex disease.^[4] The limiting pathway (LP) model has been introduced in which a trait depends on the value of k inputs that can have rate limitations due to stoichiometric ratios, reactants required in a biochemical pathway, or proteins required for transcription of a gene. Each of these k inputs is a strictly additive trait that depends on a set of common or rare variants. When k = 1, the LP model is simply a standard additive trait.^[2]
Genetic effects are not due to the common SNPs examined in the candidate-gene studies & GWASes, but due to very rare mutations, copy-number variations, and other exotic kinds of genetic variants. These variants tend to be harmful and kept at low frequencies by natural selection. Whole-genome sequencing would be required to track down specific rare variants.
Traits are all misdiagnoses: one person's 'schizophrenia' is due to entirely different causes than another schizophrenic, and so while there may be a gene involved in one case, it will not be involved in another, rendering GWASes futile
GWASes are unable to detect genes with moderate effects on phenotypes when those genes segregate at high frequencies^[12]
Traits are genuine but inconsistently diagnosed or genetically influenced from country to country and time to time, leading to measurement error, which combined with genetic heterogeneity, either due to race or environment, will bias meta-analyzed GWAS & GCTA results towards zero,^[13]^[14]^[15]^[16]^[17]^[18]
Genetic effects are indeed through common SNPs acting additively, but are highly polygenic: dispersed over hundreds or thousands of variants each of small effect like a fraction of an inch or a fifth of an IQ point and with low prior probability: unexpected enough that a candidate-gene study is unlikely to select the right SNP out of hundreds of thousands of known SNPs, and GWASes up to 2010, with n<20000, would be unable to find hits which reach genome-wide statistical-significance thresholds. Much larger GWAS sample sizes, often n>100k, would be required to find any hits at all, and would steadily increase after that.

This resolution to the missing heritability problem was supported by the introduction of Genome-wide complex trait analysis (GCTA) in 2010, which demonstrated that trait similarity could be predicted by the genetic similarity of unrelated strangers on common SNPs treated additively, and for many traits the SNP heritability was indeed a substantial fraction of the overall heritability. The GCTA results were further supported by findings that a small percent of trait variance could be predicted in GWASes without any genome-wide statistically-significant hits by a linear model including all SNPs regardless of p-value; if there were no SNP contribution, this would be unlikely, but it would be what one expected from SNPs whose effects were very imprecisely estimated by a too-small sample. Combined with the upper bound on maximum effect sizes set by the GWASes up to then, this strongly implied that the highly polygenic theory was correct. Examples of complex traits where increasingly large-scale GWASes have yielded the initial hits and then increasing numbers of hits as sample sizes increased from n<20k to n>100k or n>300k include height,^[19] educational attainment,^[20] and schizophrenia.

References

^ Manolio, T. A.; Collins, F. S.; Cox, N. J.; Goldstein, D. B.; Hindorff, L. A.; Hunter, D. J.; McCarthy, M. I.; Ramos, E. M.; Cardon, L. R.; Chakravarti, A.; Cho, J. H.; Guttmacher, A. E.; Kong, A.; Kruglyak, L.; Mardis, E.; Rotimi, C. N.; Slatkin, M.; Valle, D.; Whittemore, A. S.; Boehnke, M.; Clark, A. G.; Eichler, E. E.; Gibson, G.; Haines, J. L.; MacKay, T. F. C.; McCarroll, S. A.; Visscher, P. M. (2009). "Finding the missing heritability of complex diseases". Nature. 461 (7265): 747–753. Bibcode:2009Natur.461..747M. doi:10.1038/nature08494. PMC 2831613. PMID 19812666.
^ ^a ^b Zuk, O.; Hechter, E.; Sunyaev, S. R.; Lander, E. S. (2012). "The mystery of missing heritability: Genetic interactions create phantom heritability". Proceedings of the National Academy of Sciences. 109 (4): 1193–1198. Bibcode:2012PNAS..109.1193Z. doi:10.1073/pnas.1119675109. PMC 3268279. PMID 22223662.
^ Lee, S. H.; Wray, N. R.; Goddard, M. E.; Visscher, P. M. (2011). "Estimating Missing Heritability for Disease from Genome-wide Association Studies". American Journal of Human Genetics. 88 (3): 294–305. doi:10.1016/j.ajhg.2011.02.002. PMC 3059431. PMID 21376301.
^ ^a ^b Slatkin, M. (2009). "Epigenetic Inheritance and the Missing Heritability Problem". Genetics. 182 (3): 845–850. doi:10.1534/genetics.109.102798. PMC 2710163. PMID 19416939.
^ Eichler, E. E.; Flint, J.; Gibson, G.; Kong, A.; Leal, S. M.; Moore, J. H.; Nadeau, J. H. (2010). "Missing heritability and strategies for finding the underlying causes of complex disease". Nature Reviews Genetics. 11 (6): 446–450. doi:10.1038/nrg2809. PMC 2942068. PMID 20479774.
^ Maher, Brendan (2008). "Personal genomes: The case of the missing heritability". Nature. 456 (7218): 18–21. doi:10.1038/456018a. PMID 18987709.
^ {PMC|19812666}
^ {PMID|20562875}
^ {PMID|36224396}
^ Chaufan, Claudia; Joseph, Jay (April 2013). "The 'Missing Heritability' of Common Disorders: Should Health Researchers Care?". International Journal of Health Services. 43 (2): 281–303. doi:10.2190/hs.43.2.f. ISSN 0020-7314. PMID 23821906. S2CID 25092977.
^ Gillett, George (April 2024). "The problem with genetic heritability estimates in psychiatry: 'missing heritability' or missed cross-cultural environmental variation?". Psychiatry Research. 336 115916. doi:10.1016/j.psychres.2024.115916. PMID 38640570.
^ Caballero, Armando; Tenesa, Albert; Keightley, Peter D. (December 2015). "The Nature of Genetic Variation for Complex Traits Revealed by GWAS and Regional Heritability Mapping Analyses". Genetics. 201 (4): 1601–1613. doi:10.1534/genetics.115.177220. ISSN 1943-2631. PMC 4676519. PMID 26482794.
^ De Vlaming, Ronald; Okbay, Aysu; Rietveld, Cornelius A.; Johannesson, Magnus; Magnusson, Patrik K.E.; Uitterlinden, André G.; Van Rooij, Frank J.A.; Hofman, Albert; Groenen, Patrick J.F.; Thurik, A. Roy; Koellinger, Philipp D. (2016). "Meta-GWAS Accuracy and Power (MetaGAP) calculator shows that hiding heritability is partially due to imperfect genetic correlations across studies". bioRxiv 10.1101/048322.
^ Wray, Naomi R.; Maier, Robert (2014). "Genetic Basis of Complex Genetic Disease: The Contribution of Disease Heterogeneity to Missing Heritability". Current Epidemiology Reports. 1 (4): 220–227. doi:10.1007/s40471-014-0023-3.
^ Wray, Naomi R.; Lee, Sang Hong; Kendler, Kenneth S. (2012). "Impact of diagnostic misclassification on estimation of genetic correlations using genome-wide genotypes". European Journal of Human Genetics. 20 (6): 668–674. doi:10.1038/ejhg.2011.257. PMC 3355255. PMID 22258521.
^ Lee et al 2013a, "Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs"
^ Lee et al 2013b, "General framework for meta-analysis of rare variants in sequencing association studies"
^ Sham & Purcell 2014, "Statistical power and significance testing in large-scale genetic studies"
^ "Defining the role of common variation in the genomic and biological architecture of adult human height", Wood et al 2014
^ Chabris et al 2012 reported only 1 possible hit using a few thousand; "GWAS of 126,559 Individuals Identifies Genetic Variants Associated with Educational Attainment", Rietveld et al 2013 with n=100k reported 3 hits; "Genome-wide association study identifies 74 loci associated with educational attainment", Okbay et al 2016 reported 74 hits using n=293k and ~160 when extended to n=404k

[1] Manolio, T. A.; Collins, F. S.; Cox, N. J.; Goldstein, D. B.; Hindorff, L. A.; Hunter, D. J.; McCarthy, M. I.; Ramos, E. M.; Cardon, L. R.; Chakravarti, A.; Cho, J. H.; Guttmacher, A. E.; Kong, A.; Kruglyak, L.; Mardis, E.; Rotimi, C. N.; Slatkin, M.; Valle, D.; Whittemore, A. S.; Boehnke, M.; Clark, A. G.; Eichler, E. E.; Gibson, G.; Haines, J. L.; MacKay, T. F. C.; McCarroll, S. A.; Visscher, P. M. (2009). "Finding the missing heritability of complex diseases". Nature. 461 (7265): 747–753. Bibcode:2009Natur.461..747M. doi:10.1038/nature08494. PMC 2831613. PMID 19812666.

[zuk-2] Zuk, O.; Hechter, E.; Sunyaev, S. R.; Lander, E. S. (2012). "The mystery of missing heritability: Genetic interactions create phantom heritability". Proceedings of the National Academy of Sciences. 109 (4): 1193–1198. Bibcode:2012PNAS..109.1193Z. doi:10.1073/pnas.1119675109. PMC 3268279. PMID 22223662.

[3] Lee, S. H.; Wray, N. R.; Goddard, M. E.; Visscher, P. M. (2011). "Estimating Missing Heritability for Disease from Genome-wide Association Studies". American Journal of Human Genetics. 88 (3): 294–305. doi:10.1016/j.ajhg.2011.02.002. PMC 3059431. PMID 21376301.

[slatkin-4] Slatkin, M. (2009). "Epigenetic Inheritance and the Missing Heritability Problem". Genetics. 182 (3): 845–850. doi:10.1534/genetics.109.102798. PMC 2710163. PMID 19416939.

[5] Eichler, E. E.; Flint, J.; Gibson, G.; Kong, A.; Leal, S. M.; Moore, J. H.; Nadeau, J. H. (2010). "Missing heritability and strategies for finding the underlying causes of complex disease". Nature Reviews Genetics. 11 (6): 446–450. doi:10.1038/nrg2809. PMC 2942068. PMID 20479774.

[6] Maher, Brendan (2008). "Personal genomes: The case of the missing heritability". Nature. 456 (7218): 18–21. doi:10.1038/456018a. PMID 18987709.

[7] {PMC|19812666}

[8] {PMID|20562875}

[9] {PMID|36224396}

[10] Chaufan, Claudia; Joseph, Jay (April 2013). "The 'Missing Heritability' of Common Disorders: Should Health Researchers Care?". International Journal of Health Services. 43 (2): 281–303. doi:10.2190/hs.43.2.f. ISSN 0020-7314. PMID 23821906. S2CID 25092977.

[11] Gillett, George (April 2024). "The problem with genetic heritability estimates in psychiatry: 'missing heritability' or missed cross-cultural environmental variation?". Psychiatry Research. 336 115916. doi:10.1016/j.psychres.2024.115916. PMID 38640570.

[12] Caballero, Armando; Tenesa, Albert; Keightley, Peter D. (December 2015). "The Nature of Genetic Variation for Complex Traits Revealed by GWAS and Regional Heritability Mapping Analyses". Genetics. 201 (4): 1601–1613. doi:10.1534/genetics.115.177220. ISSN 1943-2631. PMC 4676519. PMID 26482794.

[13] De Vlaming, Ronald; Okbay, Aysu; Rietveld, Cornelius A.; Johannesson, Magnus; Magnusson, Patrik K.E.; Uitterlinden, André G.; Van Rooij, Frank J.A.; Hofman, Albert; Groenen, Patrick J.F.; Thurik, A. Roy; Koellinger, Philipp D. (2016). "Meta-GWAS Accuracy and Power (MetaGAP) calculator shows that hiding heritability is partially due to imperfect genetic correlations across studies". bioRxiv 10.1101/048322.

[14] Wray, Naomi R.; Maier, Robert (2014). "Genetic Basis of Complex Genetic Disease: The Contribution of Disease Heterogeneity to Missing Heritability". Current Epidemiology Reports. 1 (4): 220–227. doi:10.1007/s40471-014-0023-3.

[15] Wray, Naomi R.; Lee, Sang Hong; Kendler, Kenneth S. (2012). "Impact of diagnostic misclassification on estimation of genetic correlations using genome-wide genotypes". European Journal of Human Genetics. 20 (6): 668–674. doi:10.1038/ejhg.2011.257. PMC 3355255. PMID 22258521.

[16] Lee et al 2013a, "Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs"

[17] Lee et al 2013b, "General framework for meta-analysis of rare variants in sequencing association studies"

[18] Sham & Purcell 2014, "Statistical power and significance testing in large-scale genetic studies"

[Wood2014-19] "Defining the role of common variation in the genomic and biological architecture of adult human height", Wood et al 2014

[20] Chabris et al 2012 reported only 1 possible hit using a few thousand; "GWAS of 126,559 Individuals Identifies Genetic Variants Associated with Educational Attainment", Rietveld et al 2013 with n=100k reported 3 hits; "Genome-wide association study identifies 74 loci associated with educational attainment", Okbay et al 2016 reported 74 hits using n=293k and ~160 when extended to n=404k

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

v t e Genetics
Introduction Outline History Timeline Index Glossary
Key components	Chromosome DNA RNA Genome Heredity Nucleotide Mutation Genetic variation Allele Amino acid
Fields	Classical Conservation Cytogenetics Ecological Immunogenetics Microbial Molecular Population Quantitative
Archaeogenetics of	Africa the Americas the British Isles Europe Italy the Middle East South Asia
Related topics	Behavioural genetics Epigenetics Geneticist Genome editing Genomics Genetic code Genetic engineering Genetic diversity Genetic monitoring Genetic genealogy Heredity He Jiankui genome editing incident Medical genetics Missing heritability problem Molecular evolution Plant genetics Population genomics Reverse genetics
Lists	List of genetic codes List of genetics research organizations
Category

v t e Evolutionary biology
Introduction Outline Timeline of evolution History of life Index
Evolution	Abiogenesis Adaptation Adaptive radiation Altruism Cheating Reciprocal Baldwin effect Cladistics Coevolution Mutualism Common descent Convergence Divergence Earliest known life forms Evidence of evolution Evolutionary arms race Evolutionary pressure Exaptation Extinction Event Homology Last universal common ancestor Macroevolution Microevolution Mismatch Non-adaptive radiation Origin of life Panspermia Parallel evolution Pleiotropy Signalling theory Handicap principle Speciation Species Species complex Taxonomy Tradeoff Unit of selection Gene-centered view of evolution
Population genetics	Artificial selection Biodiversity Evolutionarily stable strategy Fisher's principle Fitness Inclusive Gene flow Genetic drift Kin selection Inbreeding avoidance Kin recognition Parental investment Parent–offspring conflict Mutation Population Natural selection Sexual dimorphism Sexual selection Flowering plants Fungi Mate choice Social selection Trivers–Willard hypothesis Variation
Development	Canalisation Evolutionary developmental biology Genetic assimilation Inversion Modularity Phenotypic plasticity
Of taxa	Bacteria Birds origin Brachiopods Molluscs Cephalopods Dinosaurs Fish Fungi Insects butterflies Life Mammals cats canids wolves dogs hyenas dolphins and whales horses Kangaroos primates humans lemurs sea cows Plants pollinator-mediated Reptiles Spiders Tetrapods Viruses
Of organs	Cell DNA Flagella Eukaryotes symbiogenesis chromosome endomembrane system mitochondria nucleus plastids In animals eye hair auditory ossicle nervous system brain
Of processes	Aging Antagonistic pleiotropy Death Programmed cell death Avian flight Biological complexity Cooperation Color vision in primates Emotion Empathy Ethics Eusociality Immune system Metabolism Monogamy Morality Mosaic evolution Multicellularity Sexual reproduction Gamete differentiation/sexes Life cycles/nuclear phases Mating types Meiosis Sex-determination Red Queen hypothesis Snake venom
Tempo and modes	Deep time Gradualism/Punctuated equilibrium/Saltationism Micromutation/Macromutation Uniformitarianism/Catastrophism
Speciation	Allopatric Anagenesis Catagenesis Cladogenesis Cospeciation Ecological Hybrid Non-ecological Parapatric Peripatric Reinforcement Sympatric
History	Renaissance and Enlightenment Transmutation of species David Hume Dialogues Concerning Natural Religion Charles Darwin On the Origin of Species History of paleontology Transitional fossil Blending inheritance Mendelian inheritance The eclipse of Darwinism Neo-Darwinism Modern synthesis History of molecular evolution Extended evolutionary synthesis
Philosophy	Darwinism Alternatives Catastrophism Lamarckism Orthogenesis Mutationism Saltationism Structuralism Spandrel Theistic Vitalism Teleology in biology Teleonomy
Related	Biogeography Ecological genetics Evolutionary medicine Group selection Cultural evolution Cultural group selection Dual inheritance theory Vicar of Bray hypothesis Hologenome theory of evolution Missing heritability problem Molecular evolution Astrobiology Phylogenetics Tree Polymorphism Protocell Systematics Transgenerational epigenetic inheritance
Category Portal

Knowledge Base

Talk Channels

Special Pages

Missing heritability problem

Missing heritability problem

Missing heritability problem

Discovery

Dilemma

References