This article is a preprint and has not been certified by peer review
Posted May 27, 2021
doi: https://doi.org/10.1101/2021.05.26.445798
Sergey Nurk1,*, Sergey Koren1,*, Arang Rhie1,*, Mikko Rautiainen1,*, Andrey V. Bzikadze2, Alla Mikheenko3, Mitchell R. Vollger4, Nicolas Altemose5, Lev Uralsky6,7, Ariel Gershman8, Sergey Aganezov9, Savannah J. Hoyt10, Mark Diekhans11, Glennis A. Logsdon4, Michael Alonge9, Stylianos E. Antonarakis12, Matthew Borchers13, Gerard G. Bouffard14, Shelise Y. Brooks14, Gina V. Caldas15, Haoyu Cheng16,17, Chen-Shan Chin18, William Chow19, Leonardo G. de Lima13, Philip C. Dishuck4, Richard Durbin21, Tatiana Dvorkina3, Ian T. Fiddes22, Giulio Formenti23,24, Robert S. Fulton25, Arkarachai Fungtammasan18, Erik Garrison11,26, Patrick G.S. Grady10, Tina A. Graves-Lindsay27, Ira M. Hall28, Nancy F. Hansen29, Gabrielle A. Hartley10, Marina Haukness11, Kerstin Howe19, Michael W. Hunkapiller30, Chirag Jain1,31, Miten Jain11, Erich D. Jarvis23,24, Peter Kerpedjiev32, Melanie Kirsche9, Mikhail Kolmogorov33, Jonas Korlach30, Milinn Kremitzki27, Heng Li16,17, Valerie V. Maduro34, Tobias Marschall35, Ann M. McCartney1, Jennifer McDaniel36, Danny E. Miller4,37, James C. Mullikin14,29, Eugene W. Myers38, Nathan D. Olson36, Benedict Paten11, Paul Peluso30, Pavel A. Pevzner33, David Porubsky4, Tamara Potapova13, Evgeny I. Rogaev6,7,39,40, Jeffrey A. Rosenfeld41, Steven L. Salzberg9,42, Valerie A. Schneider43, Fritz J. Sedlazeck44, Kishwar Shafin11, Colin J. Shew20, Alaina Shumate42, Yumi Sims19, Arian F. A. Smit 45, Daniela C. Soto20, Ivan Sović30,46, Jessica M. Storer45, Aaron Streets5,47, Beth A. Sullivan48, Françoise Thibaud-Nissen43, James Torrance19, Justin Wagner36, Brian P. Walenz1, Aaron Wenger30, Jonathan M. D. Wood19, Chunlin Xiao43, Stephanie M. Yan49, Alice C. Young14, Samantha Zarate9, Urvashi Surti50, Rajiv C. McCoy49, Megan Y. Dennis20, Ivan A. Alexandrov 3,7,51, Jennifer L. Gerton13, Rachel J. O'Neill10, Winston Timp8,42, Justin M. Zook36, Michael C. Schatz9,49, Evan E. Eichler4,24,†, Karen H. Miga11,†, Adam M. Phillippy1,†
Abstract
In 2001, Celera Genomics and the International Human Genome Sequencing Consortium published their initial drafts of the human genome, which revolutionized the field of genomics. While these drafts and the updates that followed effectively covered the euchromatic fraction of the genome, the heterochromatin and many other complex regions were left unfinished or erroneous. Addressing this remaining 8% of the genome, the Telomere-to-Telomere (T2T) Consortium has finished the first truly complete 3.055 billion base pair (bp) sequence of a human genome, representing the largest improvement to the human reference genome since its initial release. The new T2T-CHM13 reference includes gapless assemblies for all 22 autosomes plus Chromosome X, corrects numerous errors, and introduces nearly 200 million bp of novel sequence containing 2,226 paralogous gene copies, 115 of which are predicted to be protein coding. The newly completed regions include all centromeric satellite arrays and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies for the first time.
Posted May 27, 2021
doi: https://doi.org/10.1101/2021.05.26.445798
Sergey Nurk1,*, Sergey Koren1,*, Arang Rhie1,*, Mikko Rautiainen1,*, Andrey V. Bzikadze2, Alla Mikheenko3, Mitchell R. Vollger4, Nicolas Altemose5, Lev Uralsky6,7, Ariel Gershman8, Sergey Aganezov9, Savannah J. Hoyt10, Mark Diekhans11, Glennis A. Logsdon4, Michael Alonge9, Stylianos E. Antonarakis12, Matthew Borchers13, Gerard G. Bouffard14, Shelise Y. Brooks14, Gina V. Caldas15, Haoyu Cheng16,17, Chen-Shan Chin18, William Chow19, Leonardo G. de Lima13, Philip C. Dishuck4, Richard Durbin21, Tatiana Dvorkina3, Ian T. Fiddes22, Giulio Formenti23,24, Robert S. Fulton25, Arkarachai Fungtammasan18, Erik Garrison11,26, Patrick G.S. Grady10, Tina A. Graves-Lindsay27, Ira M. Hall28, Nancy F. Hansen29, Gabrielle A. Hartley10, Marina Haukness11, Kerstin Howe19, Michael W. Hunkapiller30, Chirag Jain1,31, Miten Jain11, Erich D. Jarvis23,24, Peter Kerpedjiev32, Melanie Kirsche9, Mikhail Kolmogorov33, Jonas Korlach30, Milinn Kremitzki27, Heng Li16,17, Valerie V. Maduro34, Tobias Marschall35, Ann M. McCartney1, Jennifer McDaniel36, Danny E. Miller4,37, James C. Mullikin14,29, Eugene W. Myers38, Nathan D. Olson36, Benedict Paten11, Paul Peluso30, Pavel A. Pevzner33, David Porubsky4, Tamara Potapova13, Evgeny I. Rogaev6,7,39,40, Jeffrey A. Rosenfeld41, Steven L. Salzberg9,42, Valerie A. Schneider43, Fritz J. Sedlazeck44, Kishwar Shafin11, Colin J. Shew20, Alaina Shumate42, Yumi Sims19, Arian F. A. Smit 45, Daniela C. Soto20, Ivan Sović30,46, Jessica M. Storer45, Aaron Streets5,47, Beth A. Sullivan48, Françoise Thibaud-Nissen43, James Torrance19, Justin Wagner36, Brian P. Walenz1, Aaron Wenger30, Jonathan M. D. Wood19, Chunlin Xiao43, Stephanie M. Yan49, Alice C. Young14, Samantha Zarate9, Urvashi Surti50, Rajiv C. McCoy49, Megan Y. Dennis20, Ivan A. Alexandrov 3,7,51, Jennifer L. Gerton13, Rachel J. O'Neill10, Winston Timp8,42, Justin M. Zook36, Michael C. Schatz9,49, Evan E. Eichler4,24,†, Karen H. Miga11,†, Adam M. Phillippy1,†
Abstract
In 2001, Celera Genomics and the International Human Genome Sequencing Consortium published their initial drafts of the human genome, which revolutionized the field of genomics. While these drafts and the updates that followed effectively covered the euchromatic fraction of the genome, the heterochromatin and many other complex regions were left unfinished or erroneous. Addressing this remaining 8% of the genome, the Telomere-to-Telomere (T2T) Consortium has finished the first truly complete 3.055 billion base pair (bp) sequence of a human genome, representing the largest improvement to the human reference genome since its initial release. The new T2T-CHM13 reference includes gapless assemblies for all 22 autosomes plus Chromosome X, corrects numerous errors, and introduces nearly 200 million bp of novel sequence containing 2,226 paralogous gene copies, 115 of which are predicted to be protein coding. The newly completed regions include all centromeric satellite arrays and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies for the first time.