S. Shi1, S. Rubinacci2, S. Hu3, L. Moutsianas4, A. Stuckey4, C. Cabrera5,6, V. Cipriani5,6, D. P. Smedley5,6, M. J. Caulfield7, S. R. Myers1, J. L. Marchini8;
1University of Oxford, Oxford, United Kingdom, 2University of Lausanne, Ecublens, Switzerland, 3Novo Nordisk Research Centre, Oxford, United Kingdom, 4Genomics England, London, United Kingdom, 5William Harvey Research Institute, London, United Kingdom, 6Queen Mary University of London, London, United Kingdom, 7Genomics England, Queen Mary, London, United Kingdom, 8Regeneron Genetics Center, Tarrytown, NY

The Genomics England (GEL) 100,000 genome project has sequenced over 85,000 genomes across England. By using high coverage whole-genome sequencing (WGS), this constitutes the largest human genetic variation resource ever collected in the UK, and represents a near-complete characterization of genetic variation in the population. We generated a GEL haplotype reference panel, comprising 341 million autosomal variants and 156,390 haplotypes from diverse ancestries. We exploit both the sample size and relatedness structure among individuals, 61.3% of whom possess at least one sequenced first-degree relative, to allow high-precision haplotypic phasing.
We used 1000 Genomes WGS data to assess the imputation performance across ancestries, and observe improvements in some populations. In samples of British origin the mean imputation r2 at 0.01% allele frequency is 0.45, 0.67 and 0.74 when using the HRC, TOPMed and GEL reference panel. In samples of South Asian origin the mean imputation r2 at 0.01% allele frequency is 0.04, 0.24 and 0.61 when using the HRC, TOPMed and GEL reference panel.
We used the GEL reference panel to impute the UK Biobank dataset, that was previously imputed at 39 million autosomal variants, using an HRC+UK10K reference panel. It results in a ~6 fold increase in the number of imputed variants. Mean information scores at imputed SNPs, from the GEL and HRC-UK10K reference panels, were 0.65 and 0.61 respectively. At low allele frequencies the differences were larger. For example, for SNPs with allele frequency between 0.01% to 0.1% mean information scores were 0.88 and 0.66 for the GEL and HRC-UK10K reference panels respectively. This translates into an appreciable boost in power to detect associations. The GEL-imputed UK Biobank dataset is being made available to all approved researchers of the UK Biobank.
We will also report results of experiments of examine the implications for fine mapping and burden association tests in the context of imputed GWAS for blood pressure and other traits.
Session Type
Poster Talk
Poster Talk
Reviewers’ Choice
Statistical Genetics and Genetic Epidemiology