Enter Note Done


Decrease font size Increase font size

PgmNr 84: Recent population history inferred from more than 5,000 high-coverage South Asian genomes.

J. Wall 1; J. Robinson 1; S. Belsare 1; A. Bhaskar 2; R. Gupta 3; J. Tom 4; T. Bhangale 4; R.K. Rai 5; A. Butterworth 6; J. Danesh 6; V. Mohan 7; A. Ghosh 8; A. Barik 5; A. Chowdhury 5; D. Saleheen 9; S. Kathiresan 10; E. Stawiski 3; A. Peterson 3

View Session  Add to Schedule

1) Inst Human Gen, Univ California San Francisco, San Francisco, California.; 2) News Feed Division, Facebook, Menlo Park, CA; 3) Genomic Medicines Division, MedGenome, Foster City, CA; 4) Dept of Bioinformatics, Genentech, South San Francisco, CA; 5) Society for Health and Demographic Surveillance, Birbhum, West Bengal, India; 6) Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK; 7) Madras Diabetes Research Foundation, Chennai, Tamil Nadu, India; 8) GROW Research Laboratory, Narayana Nethralaya Foundation, Bengaluru, India; 9) Dept of Biostatistics and Epidemiology, University of Pennsylvania School of Medicine, Philadelphia, PA; 10) Preventative Cardiology, Massachusetts General Hospital, Boston, MA

South Asia contains hundreds of different ethnic or caste groups, many of which are thought to be mostly or completely endogamous. However, the age of this extreme population structure (and the underlying caste system) is unknown, with estimates ranging from hundreds to thousands of years. We analyzed high-coverage whole-genome sequence data from 6,610 individuals, including 1,812 from Pakistan, 500 from Bangladesh, 1,356 from urban South India and 1,180 from the Birbhum district of West Bengal. We used these data to estimate recent changes in population size and split times between caste groups. We do not observe the huge excess of extremely rare variants that has been observed in multiple studies of European and African-American populations. This observation cannot be fully explained by recent inbreeding: simulations suggest that the estimated levels of consanguinity (7.7% are the offspring of 1st cousin marriages, and 27.8% are the offspring of 2nd cousin marriages) will have a modest effect on the site frequency spectrum. Inbreeding with longstanding endogamy though may mostly explain our results.
Next, we developed a novel method for estimating the genome-wide average divergence time between a single individual and a focal group. This method focuses on extremely rare variants, which should be the most informative about very recent demographic events, and is robust to demographic events affecting the particular individual studied. We focused this work on samples from Birbhum district, West Bengal due to the presence of additional metadata on caste and religion. We used 704 general-caste individuals from Birbhum as the focal group, and estimated divergence times for all other individuals. Mean divergence times ranged from ~2,600 years for the Santal, an Austro-Asiatic language speaking tribal group, to 850 years for “scheduled castes” (i.e., Dalits), 625 years for Bangladeshis and 225 years for “Other Backward Castes” (OBC) individuals. The recent divergence times for OBC individuals confirms that this category is more of a political construct than a long-lived social grouping, while the other divergence times suggest a substantial amount of gene flow between groups. Finally, we extended our approach to thousands of other genomes from around the world. We show how patterns of rare variation can be used to detect asymmetrical migration, and document evidence for more migration from East Asia into Bengal than the converse.