EXOME SEQUENCING IN 30,000 CASES DEFINES NOVEL RISK FACTORS FOR CROHN'S DISEASE
Presentation Number: 778View Presentation
AuthorBlock: Christine Stevens1, Kai Yuan1, Aleksejs Sazonovs3, Guhan Ram Venkataraman4, Manuel Antonio Rivas4, John D. Rioux5, Dermot P.B. Mcgovern6, Ramnik Xavier1, Hailiang Huang1, Carl Anderson3, Mark J. Daly1,2, on behalf of the International IBD Genetics Consortium3
1Analytic and Translational Genetics Unit, Massachusetts General Hospital & Broad Institute, Boston, Massachusetts, United States; 2Institute for Molecular Medicine Finland (FIMM), Helsinki, Finland; 3Wellcome Sanger Institute, Cambridge, Cambridgeshire, United Kingdom; 4Stanford University, Stanford, California, United States; 5Universite de Montreal, Montreal, Quebec, Canada; 6Cedars-Sinai Medical Center, Los Angeles, California, United States;
Genome-wide association studies (GWASs) have identified >200 genomic regions associated with IBD. These signals are mainly driven by common variants in non-coding regions, making it a challenge to extract biological insight. Targeted sequencing of genes in these regions has successfully identified putative causal variants, often rarer and independent of the initial GWAS hit. These coding variants have led to more direct functional experiments demonstrating causal mechanisms for at least ten genes.
To further rare variant-based discovery, we pursued large-scale exome sequencing of more than 30,000 Crohn’s disease (CD) cases and population controls from more than 20 centers in the International IBD Genetics Consortium. Sequencing was performed at the Broad Institute using both Nextera (11,125 CD cases) and TWIST (7,442 CD cases) exome captures - and at the Sanger Institute (whole genomes 6,404 CD cases and Agilent exome 3,848 CD cases). Further replication was performed with a sample of 4,359 cases from Kiel sequenced at Regeneron. In each technical experiment, a comparable or larger number of sequenced controls were available for genetic association analysis.
We conducted meta-analysis of two exome captures at the Broad Institute. To expand on GWAS, we focused on rare and low-frequency coding variants between 0.01% and 10% and estimated roughly 85% of all protein coding variants in this frequency range are reliably analyzed in both exome captures. 119 variants (figure below) were identified with p<.0002 association to CD and advanced for further analysis in the additional data sets. 28 variants already achieved exome-wide significance p<3x10-7 in the first analysis, including variants at genes from prior GWAS and sequencing efforts: NOD2, IL23R, LRRK2, TYK2, SLC39A8, HGFAC, IRGM and CARD9.
After replication, coding variants in novel genes exceeding study-wise significance include genes related to pathogenic processes in IBD include DOK2:P274L (downstream of tyrosine kinase 2: myeloid cell development and negative regulator of TLR2), TAGAP:E147K (Th17 differentiation and antifungal signaling), PTAFR:N114S (regulates NLRP3 inflammasome), CCR7:M7V (homing of T cells and dendritic cells, lymphocyte egress, regulatory and memory T cell function), IL10RA:P295L (a VEOIBD gene, regulator of innate/adaptive immune responses), RELA:D288N (Th17 regulator and AD chronic mucocutaneous ulceration) as well as variants at PDLIM5, INSC and SDF2L1.
These findings provide novel launch points for mechanistic studies that will further our understanding of disease pathogenesis. For example, SDF2L1 (R161H) flags a gene not previously implicated in IBD but which is reported to regulate feeding-induced ER stress, with a series of CRISPR screens identifying it as an essential regulator of ER homeostasis.