Enter Note Done


Decrease font size Increase font size

PgmNr 263: Identification of functionally essential sites using 3D single protein and multiprotein complexes across 73 neurodevelopmental disorder-associated genes.

T. Brünger 1; S. Iqbal 2,3; E. Perez-Palma 1; M.J. Daly 2,3,4; A.J. Campbell 2; P. May 5; D. Lal 1,2,6,7

View Session  Add to Schedule

1) Cologne Center for Genomics (CCG), University of Cologne, Cologne, 50931, Germany; 2) Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA; 3) Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA; 4) Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, FI-00014, Finland; 5) Luxembourg Centre for Systems Biomedicine, University Luxembourg, Luxembourg; 6) Epilepsy Center, Neurological Institute, Cleveland Clinic, Cleveland, OH 44106, USA; 7) Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44106, USA

Interpretation of missense variants is challenging in the phenotypically and genetically heterogeneous group of neurodevelopmental disorders (NDD). Recent large-scale genomic screens have identified >100 missense variant intolerant NDD-genes. Interpretation of variants on protein structure represents an opportunity to gain mechanistic insights into the molecular pathology. However, only a few genetic variants have been mapped on NDD-genes so far. For most NDD-genes, human protein structures are not available, subsequently, a systematic screen to identify essential sites in NDD-proteins and multiprotein complexes has not been performed.

Here we systematically collected molecular-solved, modeled and predicted protein structures for 73 NDD-associated genes, which are intolerant for missense variants. To explore the utility of these structures and to identify essential sites within proteins, we normalized the linear sequenced based missense variant constrained score (MTR) and an amino acid level paralog conservation score using 12 Å distance spatial windows for each of the 73 protein structures. The corresponding 3D constrained and 3D paralog conserved protein sites show a higher burden for ClinVar and HGMD ascertained pathogenic variants compared to the conserved sites defined by the (original) linear scores. Next, we mapped pathogenic variants (ClinVar and HGMD databases) and population variants (gnomAD database) onto 3D structure and identified pathogenic-variant-enriched amino acids (3D-hotspots) for which no patient variant has yet been reported. Across 73 genes, we identified 192 3D-hotspots for pathogenic variants (mean: 2.7 ± 9.3 per NDD-gene) without a reported variant. For a subset of NDD-proteins, we were able to perform the spatial enrichment analyses for single proteins and multiprotein complexes and observed an increased number of identified 3D-hotspots in the complexes.

In summary, we present the first large scale human protein structure-based analysis of missense variants in NDD-genes. We show that incorporation of solved and predicted human protein structures, as well as multiprotein complexes, represent a useful tool for variant interpretation.