TY - JOUR
T1 - Evaluation of consensus strategies for haplotype phasing
AU - Al Bkhetan, Ziad
AU - Chana, Gursharan
AU - Ramamohanarao, Kotagiri
AU - Verspoor, Karin
AU - Goudey, Benjamin
N1 - Publisher Copyright:
© 2020 The Author(s) 2020. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected].
PY - 2021/7/1
Y1 - 2021/7/1
N2 - Haplotype phasing is a critical step for many genetic applications but incorrect estimates of phase can negatively impact downstream analyses. One proposed strategy to improve phasing accuracy is to combine multiple independent phasing estimates to overcome the limitations of any individual estimate. However, such a strategy is yet to be thoroughly explored. This study provides a comprehensive evaluation of consensus strategies for haplotype phasing. We explore the performance of different consensus paradigms, and the effect of specific constituent tools, across several datasets with different characteristics and their impact on the downstream task of genotype imputation. Based on the outputs of existing phasing tools, we explore two different strategies to construct haplotype consensus estimators: voting across outputs from multiple phasing tools and multiple outputs of a single non-deterministic tool. We find that the consensus approach from multiple tools reduces SE by an average of 10% compared to any constituent tool when applied to European populations and has the highest accuracy regardless of population ethnicity, sample size, variant density or variant frequency. Furthermore, the consensus estimator improves the accuracy of the downstream task of genotype imputation carried out by the widely used Minimac3, pbwt and BEAGLE5 tools. Our results provide guidance on how to produce the most accurate phasing estimates and the trade-offs that a consensus approach may have. Our implementation of consensus haplotype phasing, consHap, is available freely at https://github.com/ziadbkh/consHap. Supplementary information: Supplementary data are available at Briefings in Bioinformatics online.
AB - Haplotype phasing is a critical step for many genetic applications but incorrect estimates of phase can negatively impact downstream analyses. One proposed strategy to improve phasing accuracy is to combine multiple independent phasing estimates to overcome the limitations of any individual estimate. However, such a strategy is yet to be thoroughly explored. This study provides a comprehensive evaluation of consensus strategies for haplotype phasing. We explore the performance of different consensus paradigms, and the effect of specific constituent tools, across several datasets with different characteristics and their impact on the downstream task of genotype imputation. Based on the outputs of existing phasing tools, we explore two different strategies to construct haplotype consensus estimators: voting across outputs from multiple phasing tools and multiple outputs of a single non-deterministic tool. We find that the consensus approach from multiple tools reduces SE by an average of 10% compared to any constituent tool when applied to European populations and has the highest accuracy regardless of population ethnicity, sample size, variant density or variant frequency. Furthermore, the consensus estimator improves the accuracy of the downstream task of genotype imputation carried out by the widely used Minimac3, pbwt and BEAGLE5 tools. Our results provide guidance on how to produce the most accurate phasing estimates and the trade-offs that a consensus approach may have. Our implementation of consensus haplotype phasing, consHap, is available freely at https://github.com/ziadbkh/consHap. Supplementary information: Supplementary data are available at Briefings in Bioinformatics online.
KW - consensus estimator
KW - genotype imputation
KW - haplotype phasing
UR - http://www.scopus.com/inward/record.url?scp=85112131773&partnerID=8YFLogxK
U2 - 10.1093/bib/bbaa280
DO - 10.1093/bib/bbaa280
M3 - Review article
C2 - 33236761
AN - SCOPUS:85112131773
SN - 1467-5463
VL - 22
JO - Briefings in Bioinformatics
JF - Briefings in Bioinformatics
IS - 4
M1 - bbaa280
ER -