PgmNr 297: The unbiased length spectrum of human de novo mutations in 4,330 children.Authors:
A. Farrell 1; W. Richards 1; A. Docherty 2; H. Coon 3, 4; G. Marth 1
View Session Add to Schedule
1) Biology, University of Utah, Salt Lake City, Utah.; 2) University of Utah, Department of Psychiatry and Human Genetics, Salt Lake City, UT; 3) University of Utah, Department of Psychiatry, Salt Lake City, UT; 4) University of Utah, Department of Biomedical Informatics, Salt Lake City, UT
De novo genetic variation introduces new variations into the population and can cause genetic diseases. Despite its importance, the true spectrum of de novo human mutations has been elusive primarily for technical reasons: accurately detecting the roughly 70 de novo mutations in 6 billion base pairs poses a formidable challenge; compounded by the biases of mapping-based variant detection methods. Moreover, distinct classes of tools are currently used to reliably detect single nucleotide variations (SNVs) and short insertions-deletions (INDELs) of up to ~50bp; and 500bp or larger structural variation events (SVs); leaving a “detection gap” for medium-sized (50-500bp) events.
We developed RUFUS, a k-mer based reference-free detection method for all types and sizes of de novo mutations. With RUFUS, mutations are identified and assembled before reads are compared to the reference, removing all reference bias and variant size limitation, resulting in completely even sensitivity and specificity across variations of all sizes. Experimental validations of RUFUS mutation calls in multiple datasets indicate extremely high accuracy. We applied this method to detect de novo mutations in 4,330 children in the Simons Foundation Simons Simplex Collection dataset, the largest dataset to date in which to study de novo genetic variation. Here we present the first comprehensive de novo somatic mutation dataset where variants of all sizes were detected by a single algorithm. Across both autistic probands and siblings, we saw an average of 72.6 denovo mutations per sample; 86.14% of these are SNVs, 13.58% are 1-50bp INDELs, 0.20% are medium-sized (50-500bp) INDELs, and 0.08% are >500bp SV events (i.e. on average 62.54 SNVs, 9.86 short INDELs, 0.145 medium INDELs, and 0.058 SVs per individual). When accounting for the size of the events, de novo SVs alter the genome of the child by far the most, on average by 3,804bp, and are present in approximately 1 of 17 births. Medium-sized INDELs, the rate of which until this study could not be fully ascertained, alter on average 20.69bp per birth, an effect comparable to that of SNVs (62.54bp per birth), and occur in 1 of 7 births. The overall distribution of de novo event size is indistinguishable between probands and siblings. However, de novo variation within autism-associated genes in the probands is markedly higher higher with RUFUS (i.e. more than double) as compared to the the siblings, a phenomenon part of ongoing investigation.