PgmNr 98: Public platform with 42,291 exome control samples enables association studies without genotype sharing.Authors:
M. Artomov 1,2; A.A. Loboda 2,3; M.N. Artyomov 4; M.J. Daly 1,2,5
View Session Add to Schedule
1) Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA; 2) Broad Institute, Cambridge, MA; 3) ITMO University, St. Petersburg, Russia; 4) Department of Immunology and Pathology, Washington University in St. Louis, St. Louis, MO; 5) Institute for Molecular Medicine Finland, Helsinki, Finland
Introduction: Acquiring a sufficiently powered cohort of control samples can be time consuming or, sometimes, impossible. Accordingly, an ability to leverage control samples that were already collected and sequenced elsewhere could dramatically improve power in genetic association studies. Majority of the genotyped and sequenced human DNA samples to date are subject to strict data sharing regulations, large-scale sharing of, in particular, control samples is extremely challenging. We developed a method allowing selection of the best-matching controls in an external pool of samples that is compliant with personal genotype data protection restrictions.
Materials and Methods: We provide a web platform that stores 42,291 exome sequencing samples available for control selection and a complimentary R-package to be used on a user side for generation of anonymous data from case genotypes that will be uploaded to the web-platform.
Results: Our approach uses singular value decomposition of the matrix of case genotypes to rank external controls by similarity to cases without disclosing any individual-level data. We demonstrate that this recovers an accurate case-control association results for both ultra-rare and common variants independently of the sequncing platforms. We implemented framework for meta-analysis of multiple ancestries and sequencing platforms as a single step from the user side. Finally, we provide a free access to a database of 42,291 exomes to be used as external controls that enables association studies for case cohorts lacking control subjects and facilitates data sharing among projects with strict regulations for individual level data access.
Discussion: We present a freely accessible resource with a large-scale control database enabling association studies for “case-only” sequencing data with carefully selected controls and controllable error rates.