Twenty years ago this month, the International Human Genome Sequencing Consortium announced the first draft of the human genome reference sequence. The Human Genome Project, as it was called, required 11 years of work and involved more than 1000 scientists from 40 countries. This reference, however, did not represent a single individual, but instead was a composite of humans that could not accurately capture the complexity of human genetic variation.
Building on this, scientists have conducted several sequencing projects over the last 20 years to identify and catalog genetic differences between an individual and the reference genome. Those differences usually focused on small single base changes and missed larger genetic alterations. Current technologies now are beginning to detect and characterize larger differences – called structural variants – such as insertions of new genetic material. Structural variants are more likely than smaller genetic differences to interfere with gene function.
The new finding in Science, "Haplotype-resolved diverse human genomes and integrated analysis of structural variation" announced a new and significantly more comprehensive reference dataset that was obtained using a combination of advanced sequencing and mapping technologies. The new reference dataset reflects 64 assembled human genomes, representing 25 different human populations from across the globe. Importantly, each of the genomes was assembled without guidance from the first human genome composite. As a result, the new dataset better captures genetic differences from different human populations.