phasing

Imputation of low-coverage sequencing data from 150,119 UK Biobank genomes

Recent work highlights the advantages of low-coverage whole genome sequencing (lcWGS), followed by genotype imputation, as a cost-effective genotyping technology for statistical and population genetics. The release of whole genome sequencing data for …

Efficient phasing and imputation of low-coverage sequencing data using large reference panels

In this work, we address the challenge of genotype imputation and haplotype phasing of low-coverage sequencing datasets using a reference panel of haplotypes. To this aim, we propose a novel method, GLIMPSE (Genotype Likelihoods Imputation and PhaSing mEthod), that is designed for large-scale studies and reference panels, typically comprising thousands of genomes. We show the remarkable performance of GLIMPSE using low-coverage whole genome sequencing data for both European and African American populations, and we demonstrate that low-coverage sequencing can be confidently used in downstream analyses. We provide GLIMPSE as a part of an open source software suite that makes imputation for low-coverage sequencing data as convenient as for traditional SNP array platforms.