PgmNr 1582: The portrait of fully phased assembled diploid human genome.Authors:
A. Fungtammasan; N. Thangaraj; B. Hannigan; N. Hill; C.S. Chin
View Session Add to Schedule
Affiliation: DNAnexus, MountainView, CA.
Personalized de novo genome assembly offers potential benefits in discovering complex structural variants and unique sequences with less reference bias as compared to standard whole-genome resequencing. However, the challenge of variant calling from the assembly-based method is the heterozygous regions, which obscures the variant calls from the unphased assembled genome. Besides calling variants, genomic phasing is also useful in putting the genetic variation into context. This is critical to study allele-specific regulation, compound heterozygous mutations, or population genomics. For these reasons, the de novo assembly and phasing have become increasingly important in the study of personalized genomics. To date, the phasing of the assembled human genome is challenging due to the low heterozygosity of the human population.
Here we present the fully phased diploid human assembly from single molecule sequencing data of Oxford Nanopore Technology (ONT) using a trio-binning assembly approach. The ultra-long ONT data of the HG002 individual was provided by Genome In a Bottle Consortium (GIAB) and UCSC. These raw long reads were fully segregated into paternal and maternal reads using parental specific k-mer sequences from Illumina data of the parents. Then, the paternal and maternal raw long read data were successfully assembled into two 3 Gbp genomes and polished to improve their overall accuracy. Consequently, we explore the benefits of fully phased assembled genomes in the context of variant discovery. We compare between diploid-assembly-based variant calling and mapping-based variant calling against GRCh38 using the GIAB truth set as the ground truth. The fully phased assembled human genome is publicly available on the DNAnexus platform and the GIAB Consortium ftp site.