Enter Note Done


Keywords: Computational tools; Variant calling11; Bioinformatics; Copy number/structural variation

J. Didion; S. Osazuwa; A. Zalcman; N. Thangaraj

Affiliation: DNAnexus, Mountain View, CA.

As genomic technologies move from the research laboratory to the clinic, the scale of genomic data is growing at an increasingly rapid pace. It is critically important for researchers, clinicians, and patients that data analysis pipelines do not become a bottleneck to diagnosis and treatment. Acceleration of next-generation sequencing (NGS) pipelines requires creative approaches to optimization that are informed by simultaneous consideration of biology, informatics, and systems architecture, especially in cloud-based computing environments. Common index formats, such as BAM Index (BAI) and Tabix (TBI), contain coarse-grained information on the density of NGS reads along the genome that may be leveraged for rapid approximation of read depth-based metrics. We present IndexTools, an open-source toolkit for extremely fast NGS analysis based on index files. We demonstrate that IndexTools 1) substantially accelerates parallel processing of BAM and VCF files by optimizing file splitting based on estimated read density; 2) provides reasonably accurate estimates of genome coverage (similar to indexcov, Pedersen et al., 2017); 3) is able to infer relative sex chromosome and mitochondrial genome number relative to the autosome; 4) is able to accurately call large deletions and copy-number expansions in minutes (compared to hours for traditional CNV pipelines). We further show that small variant and CNV calling pipelines implemented using IndexTools on the DNAnexus platform save considerable time and cost compared to equivalent non-accelerated pipelines. We expect that IndexTools will substantially reduce the turn-around time to deriving insight from NGS data.