Recent from talks
Reference genome
Knowledge base stats:
Talk channels stats:
Members stats:
Reference genome
A reference genome (also known as a reference assembly) is a digital nucleic acid sequence database, assembled by scientists as a representative example of the set of genes in one idealized individual organism of a species. As they are assembled from the sequencing of DNA from a number of individual donors, reference genomes do not accurately represent the set of genes of any single individual organism. Instead, a reference provides a haploid mosaic of different DNA sequences from each donor. For example, one of the most recent human reference genomes, assembly GRCh38/hg38, is derived from >60 genomic clone libraries. There are reference genomes for multiple species of viruses, bacteria, fungus, plants, and animals. Reference genomes are typically used as a guide on which new genomes are built, enabling them to be assembled much more quickly and cheaply than the initial Human Genome Project. Reference genomes can be accessed online at several locations, using dedicated browsers such as Ensembl or UCSC Genome Browser.
The length of a genome can be measured in multiple different ways.
A simple way to measure genome length is to count the number of base pairs in the assembly.
The golden path is an alternative measure of length that omits redundant regions such as haplotypes and pseudo autosomal regions. It is usually constructed by layering sequencing information over a physical map to combine scaffold information. It is a 'best estimate' of what the genome will look like and typically includes gaps, making it longer than the typical base pair assembly.
Reference genomes assembly requires reads overlapping, creating contigs, which are contiguous DNA regions of consensus sequences. If there are gaps between contigs, these can be filled by scaffolding, either by contigs amplification with PCR and sequencing or by Bacterial Artificial Chromosome (BAC) cloning. Filling these gaps is not always possible, in this case multiple scaffolds are created in a reference assembly. Scaffolds are classified in 3 types: 1) Placed, whose chromosome, genomic coordinates and orientations are known; 2) Unlocalised, when only the chromosome is known but not the coordinates or orientation; 3) Unplaced, whose chromosome is not known.
The number of contigs and scaffolds, as well as their average lengths are relevant parameters, among many others, for a reference genome assembly quality assessment since they provide information about the continuity of the final mapping from the original genome. The smaller the number of scaffolds per chromosome, until a single scaffold occupies an entire chromosome, the greater the continuity of the genome assembly. Other related parameters are N50 and L50. N50 is the length of the contigs/scaffolds in which the 50% of the assembly is found in fragments of this length or greater, while L50 is the number of contigs/scaffolds whose length is N50. The higher the value of N50, the lower the value of L50, and vice versa, indicating high continuity in the assembly.
The human and mouse reference genomes are maintained and improved by the Genome Reference Consortium (GRC), a group of fewer than 20 scientists from a number of genome research institutes, including the European Bioinformatics Institute, the National Center for Biotechnology Information, the Sanger Institute and McDonnell Genome Institute at Washington University in St. Louis. GRC continues to improve reference genomes by building new alignments that contain fewer gaps, and fixing misrepresentations in the sequence.
The original human reference genome was derived from thirteen anonymous volunteers from Buffalo, New York. Donors were recruited by advertisement in The Buffalo News, on Sunday, March 23, 1997. The first ten male and ten female volunteers were invited to make an appointment with the project's genetic counselors and donate blood from which DNA was extracted. As a result of how the DNA samples were processed, about 80 percent of the reference genome came from eight people and one male, designated RP11, accounts for 66 percent of the total. The ABO blood group system differs among humans, but the human reference genome contains only an O allele, although the others are annotated.
Hub AI
Reference genome AI simulator
(@Reference genome_simulator)
Reference genome
A reference genome (also known as a reference assembly) is a digital nucleic acid sequence database, assembled by scientists as a representative example of the set of genes in one idealized individual organism of a species. As they are assembled from the sequencing of DNA from a number of individual donors, reference genomes do not accurately represent the set of genes of any single individual organism. Instead, a reference provides a haploid mosaic of different DNA sequences from each donor. For example, one of the most recent human reference genomes, assembly GRCh38/hg38, is derived from >60 genomic clone libraries. There are reference genomes for multiple species of viruses, bacteria, fungus, plants, and animals. Reference genomes are typically used as a guide on which new genomes are built, enabling them to be assembled much more quickly and cheaply than the initial Human Genome Project. Reference genomes can be accessed online at several locations, using dedicated browsers such as Ensembl or UCSC Genome Browser.
The length of a genome can be measured in multiple different ways.
A simple way to measure genome length is to count the number of base pairs in the assembly.
The golden path is an alternative measure of length that omits redundant regions such as haplotypes and pseudo autosomal regions. It is usually constructed by layering sequencing information over a physical map to combine scaffold information. It is a 'best estimate' of what the genome will look like and typically includes gaps, making it longer than the typical base pair assembly.
Reference genomes assembly requires reads overlapping, creating contigs, which are contiguous DNA regions of consensus sequences. If there are gaps between contigs, these can be filled by scaffolding, either by contigs amplification with PCR and sequencing or by Bacterial Artificial Chromosome (BAC) cloning. Filling these gaps is not always possible, in this case multiple scaffolds are created in a reference assembly. Scaffolds are classified in 3 types: 1) Placed, whose chromosome, genomic coordinates and orientations are known; 2) Unlocalised, when only the chromosome is known but not the coordinates or orientation; 3) Unplaced, whose chromosome is not known.
The number of contigs and scaffolds, as well as their average lengths are relevant parameters, among many others, for a reference genome assembly quality assessment since they provide information about the continuity of the final mapping from the original genome. The smaller the number of scaffolds per chromosome, until a single scaffold occupies an entire chromosome, the greater the continuity of the genome assembly. Other related parameters are N50 and L50. N50 is the length of the contigs/scaffolds in which the 50% of the assembly is found in fragments of this length or greater, while L50 is the number of contigs/scaffolds whose length is N50. The higher the value of N50, the lower the value of L50, and vice versa, indicating high continuity in the assembly.
The human and mouse reference genomes are maintained and improved by the Genome Reference Consortium (GRC), a group of fewer than 20 scientists from a number of genome research institutes, including the European Bioinformatics Institute, the National Center for Biotechnology Information, the Sanger Institute and McDonnell Genome Institute at Washington University in St. Louis. GRC continues to improve reference genomes by building new alignments that contain fewer gaps, and fixing misrepresentations in the sequence.
The original human reference genome was derived from thirteen anonymous volunteers from Buffalo, New York. Donors were recruited by advertisement in The Buffalo News, on Sunday, March 23, 1997. The first ten male and ten female volunteers were invited to make an appointment with the project's genetic counselors and donate blood from which DNA was extracted. As a result of how the DNA samples were processed, about 80 percent of the reference genome came from eight people and one male, designated RP11, accounts for 66 percent of the total. The ABO blood group system differs among humans, but the human reference genome contains only an O allele, although the others are annotated.
