Please wait a minute...
Journal of Integrative Agriculture  2022, Vol. 21 Issue (2): 486-495    DOI: 10.1016/S2095-3119(21)63695-X
Special Issue: 动物科学合辑Animal Science
Animal Science · Veterinary Medicine Advanced Online Publication | Current Issue | Archive | Adv Search |
A comprehensive evaluation of factors affecting the accuracy of pig genotype imputation using a single or multi-breed reference population
ZHANG Kai-li, PENG Xia, ZHANG Sai-xian, ZHAN Hui-wen, LU Jia-hui, XIE Sheng-song, ZHAO Shu-hong, LI Xin-yun, MA Yun-long
Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education/Key Laboratory of Swine Genetics and Breeding, Ministry of Agriculture and Rural Affairs/College of Animal Science and Technology, Huazhong Agricultural University, Wuhan 430070, P.R.China
Download:  PDF in ScienceDirect  
Export:  BibTeX | EndNote (RIS)      
摘要  

基因型填充已成为基因组分析中预处理的关键步骤,其准确性直接影响下游分析。许多因素都会影响填充的准确性,其中,混合参考群体的填充倍受关注。这项研究旨在:评估填充及其影响因素之间的关系,以确保更高的填充精度;探索在参考群体中包含多个品种(系)是否有利于猪填充的准确性;选择具有良好填充效果的填充软件。在这项研究中我们使用50K芯片数据,基于单品系(大白A系)和多品种(大白A系,大白B系,杜洛克长白)参考群体评估了填充精度随验证群体标记密度参考群体样本量最小等位基因频率和参考群体组成四种影响因素的变化,并比较了Beagle 4.1、FImpute、IMPUTE2 MaCH-Admix四种填充软件的填充准确率和运行时间。通过计算填充后SNPs和真实SNPs间的基因型一致率和皮尔森相关性获得填充精度。首先,我们通过随机缺失验证群体中20、45、70、95%和99%的SNPs来模拟低密度芯片,以研究标记密度的影响。然后,我们从原参考群体中随机抽取8、86、173、434868头猪作为新的参考群体来研究参考群体样本量对填充精度的作用。对于最小等位基因频率,SNPs等位基因频率被分为7类,分别计算每类SNPs的填充准确性。结果显示,随着验证群体标记密度,参考群体样本量和最小等位基因频率增加,填充准确增加。当参考群体为与验证群体品系一致的单品系群体时,填充准确性较高,其他品种(系)的添加会导致相对差的填充结果。此外,随着参考群体中主效品系样本量的增加,填充准确性也会提高。在所有填充情景中,综合考虑填充精度和运行时间,Beagle 4.1FImpute优于IMPUTE2 MaCH-Admix。这项工作使从事相关研究的人员能够更直观地了解这些影响因素对填充的影响,并为实际猪育种中实施填充策略提供实践指导。



Abstract  Genotype imputation has become an indispensable part of genomic data analysis.  In recent years, imputation based on a multi-breed reference population has received more attention, but the relevant studies are scarce in pigs.  In this study, we used the Illumina PorcineSNP50 Bead Chip to investigate the variations of imputation accuracy with various influencing factors and compared the imputation performance of four commonly used imputation software programs.  The results indicated that imputation accuracy increased as either the validation population marker density, reference population sample size, or minor allele frequency (MAF) increased.  However, the imputation accuracy would have a certain extent of decrease when the pig reference population was a mixed group of multiple breeds or lines.  Considering both imputation accuracy and running time, Beagle 4.1 and FImpute are excellent choices among the four software packages tested.  This work visually presents the impacts of these influencing factors on imputation and provides a reference for formulating reasonable imputation strategies in actual pig breeding.
Keywords:  genotype imputation       multi-breed reference population       imputation accuracy  
Received: 19 September 2020   Accepted: 15 March 2021
Fund: This work was supported by the China Agriculture Research System of MOF and MARA (CARS-35), the National Natural Science Foundation of China (32072696, 31790414 and 31601916) and the Fundamental Research Funds for the Central Universities (2662019PY011).
About author:  Correspondence LI Xin-yun, E-mail: xyli@mail.hzau.edu.cn; MA Yun-long, Tel: +86-27-87282091, E-mail: Yunlong.Ma@mail.hzau.edu.cn

Cite this article: 

ZHANG Kai-li, PENG Xia, ZHANG Sai-xian, ZHAN Hui-wen, LU Jia-hui, XIE Sheng-song, ZHAO Shu-hong, LI Xin-yun, MA Yun-long. 2022. A comprehensive evaluation of factors affecting the accuracy of pig genotype imputation using a single or multi-breed reference population. Journal of Integrative Agriculture, 21(2): 486-495.

van den Berg S, Vandenplas J, van Eeuwijk F A, Bouwman A C, Lopes M S, Veerkamp R F. 2019. Imputation to whole-genome sequence using multiple pig populations and its use in genome-wide association studies. Genetics Selection Evolution, 51, 1–13.
van Binsbergen R, Bink M C, Calus M P L, van Eeuwijk F A, Hayes B J, Hulsegge I, Veerkamp R F. 2014. Accuracy of imputation to whole-genome sequence data in Holstein Friesian cattle. Genetics Selection Evolution, 46, 41–53.
Bouwman A C, Veerkamp R F. 2014. Consequences of splitting whole-genome sequencing effort over multiple breeds on imputation accuracy. BMC Genetics, 15, 105–113.
Browning B L, Browning S R. 2016. Genotype imputation with millions of reference samples. American Journal of Human Genetics, 98, 116–126.
Browning S R. 2006. Multilocus Association mapping using variable-length markov chains. American Journal of Human Genetics, 78, 903–913.
Butty A M, Sargolzaei M, Miglior F, Stothard P, Schenkel F S, Gredler-Grandl B, Baes C F. 2019. Optimizing selection of the reference population for genotype imputation from array to sequence variants. Frontiers in Genetics, 10, 1–16. 
Carvalheiro R, Boison S A, Neves H H R, Sargolzaei M, Schenkel F S, Utsunomiya Y T, O’Brien A M P, Sölkner J, McEwan J C, Van Tassell C P, Sonstegard T S, Garcia J F. 2014. Accuracy of genotype imputation in Nelore cattle. Genetics Selection Evolution, 46, 69–79.
Daetwyler H D, Capitan A, Pausch H, Stothard P, van Binsbergen R, Brøndum R F, Liao X P, Djari A,  Rodriguez S C, Grohs C, Esquerré D, Bouchez O, Rossignol M N, Klopp C, Rocha D, Fritz S, Eggen A, Bowman P J, Coote D, Chamberlain A J, et al. 2014. Whole-genome sequencing of 234 bulls facilitates mapping of monogenic and complex traits in cattle. Nature Genetics, 46, 858–865.
Das S, Abecasis G R, Browning B L. 2018. Genotype imputation from large reference panels. Annual Review of Genomics and Human Genetics, 19, 73–96.
Das S, Forer L, Schönherr S, Sidore C, Locke A E, Kwong A, Vrieze S I, Chew E Y, Levy S, Mcgue M. 2016. Next-generation genotype imputation service and methods. Nature Genetics, 48, 1284–1287.
Hayes B J, Bowman P J, Daetwyler H D, Kijas J W, van der Werf J H. 2012. Accuracy of genotype imputation in sheepbreeds. Animal Genetics, 43, 72–80.
Hickey J M, Crossa J, Babu R, de los Campos G. 2012. Factors affecting the accuracy of genotype imputation in populations from several maize breeding programs. Crop Science, 52, 654–663.
Hickey J M, Kinghorn B P, Tier B, Wilson J F, Dunstan N, van der Werf J H J. 2011. A combined long-range phasing and long haplotype imputation method to impute phase for SNP genotypes. Genetics Selection Evolution, 43, 1–13.
Kojima K, Tadaka S, Katsuoka F, Tamiya G, Yamamoto M, Kinoshita K. 2019. A recurrent neural network based method for genotype imputation on phased genotype data. bioRxiv, 821504. 
Larmer S G, Sargolzaei M, Schenkel F S. 2014. Extent of linkage disequilibrium, consistency of gametic phase, and imputation accuracy within and across Canadian dairy breeds. Journal of Dairy Science, 97, 3128–3141.
Li Y, Willer C, Sanna S, Abecasis G. 2009. Genotype imputation. Annual Review of Genomics and Human Genetics, 10, 387–406.
Li Y, Willer C J, Ding J, Scheet P, Abecasis G R. 2010. MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genetic Epidemiology, 34, 816–834.
Liu E Y, Li M, Wang W, Li Y. 2013. MaCH-admix: Genotype imputation for admixed populations. Genetic Epidemiology, 37, 25–37.
Ma P, Brøndum R F, Zhang Q, Lund M S, Su G. 2013. Comparison of different methods for imputing genome-wide marker genotypes in Swedish and Finnish Red Cattle. Journal of Dairy Science, 96, 4666–4677.
Marchini J, Howie B, Myers S, McVean G, Donnelly P. 2007. A new multipoint method for genome-wide association studies by imputation of genotypes. Nature Genetics, 39, 906–913.
Naj A C. 2019. Genotype imputation in genome wide association studies. Current Protocols in Human Genetics, 102, 1–15.
Nicolazzi E L, Biffani S, Jansen G. 2013. Short communication: Imputing genotypes using PedImpute fast algorithm combining pedigree and population information. Journal of Dairy Science, 96, 2649–2653.
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira M A R, Bender D, Maller J, Sklar P, de Bakker P I W, Daly M J, Sham P C. 2007. PLINK: A tool set for whole-genome association and population-based linkage analyses. American Journal of Human Genetics, 81, 559–575.
Ramnarine S, Zhang J, Chen L S, Culverhouse R, Duan W, Hancock D B, Hartz S M, Johnson E O, Olfson E, Schwantes-An T H, Saccone N L. 2015. When does choice of accuracy measure alter imputation accuracy assessments? PLoS ONE, 10, 1–18.
Roshyara N R, Scholz M. 2015. Impact of genetic similarity on imputation accuracy. BMC Genetics, 16, 90–105.
Sargolzaei M, Chesnais J P, Schenkel F S. 2014. A new approach for efficient genotype imputation using information from relatives. BMC Genomics, 15, 478–489.
Sariya S, Lee J H, Mayeux R, Vardarajan B N, Reyes-Dumeyer D, Manly J J, Brickman A M, Lantigua R, Medrano M, Jimenez-Velazquez I Z, Tosto G. 2019. Rare variants imputation in admixed populations: Comparison across reference panels and bioinformatics tools. Frontiers in Genetics, 10, 1–10.
Scheet P, Stephens M. 2006. A fast and flexible statistical model for large-scale population genotype data: Applications to inferring missing genotypes and haplotypic phase. American Journal of Human Genetics, 78, 629–644.
Shin D, Won K H, Kim S H, Kim Y M. 2018. Extent of linkage disequilibrium and effective population size of Korean Yorkshire swine. Asian-Australasian Journal of Animal Sciences, 31, 1843–1851.
Shi S, Yuan N, Yang M, Du Z, Wang J, Sheng X, Wu J, Xiao J. 2018. Comprehensive assessment of genotype imputation performance. Human Heredity, 83, 107–116.
Song H, Ye S, Jiang Y, Zhang Z, Zhang Q, Ding X. 2019. Using imputation-based whole-genome sequencing data to improve the accuracy of genomic prediction for combined populations in pigs. Genetics Selection Evolution, 51, 1–13.
Tempelman R J. 2015. Statistical and computational challenges in whole genome prediction and genome-wide association analyses for plant and animal breeding. Journal of Agricultural, Biological, and Environmental Statistics, 20, 442–466.
Traspov A, Deng W, Kostyunina O, Ji J, Shatokhin K, Lugovoy S, Zinovieva N, Yang B, Huang L. 2016. Population structure and genome characterization of local pig breeds in Russia, Belorussia, Kazakhstan and Ukraine. Genetics Selection Evolution, 48, 16.
Uimari P, Tapio M. 2011. Extent of linkage disequilibrium and effective population size in Finnish Landrace and Finnish Yorkshire pig breeds. Journal of Animal Science, 89, 609–614.
Ullah E, Mall R, Abbas M M, Kunji K, Nato A Q, Bensmail H, Wijsman E M, Saad M. 2019. Comparison and assessment of family- and population-based genotype imputation methods in large pedigrees. Genome Research, 29, 125–134.
Vanraden P M, O’Connell J R, Wiggans G R, Weigel K A. 2011. Genomic evaluations with many more genotypes. Genetics Selection Evolution, 43, 10.
Ventura R V, Miller S P, Dodds K G, Auvray B, Lee M, Bixley M, Clarke S M, McEwan J C. 2016. Assessing accuracy of imputation using different SNP panel densities in a multi‑breed sheep population. Genetics Selection Evolution, 48, 71.
Welsh C S, Stewart T S, Schwab C, Blackburn H D. 2010. Pedigree analysis of 5 swine breeds in the United States and the implications for genetic conservation. Journal of Animal Science, 88, 1610–1618.
Ye S, Yuan X, Huang S, Zhang H, Chen Z, Li J, Zhang X, Zhang Z. 2019. Comparison of genotype imputation strategies using a combined reference panel for chicken population. Animal, 13, 1119–1126.
Ye S, Yuan X, Lin X, Gao N, Luo Y, Chen Z, Li J, Zhang X, Zhang Z. 2018. Imputation from SNP chip to sequence: A case study in a Chinese indigenous chicken population. Journal of Animal Science and Biotechnology, 9, 30–41.
[1] TENG Jin-yan, YE Shao-pan, GAO Ning, CHEN Zi-tao, DIAO Shu-qi, LI Xiu-jin, YUAN Xiao-long, ZHANG Hao, LI Jia-qi, ZHANG Xi-quan, ZHANG Zhe. Incorporating genomic annotation into single-step genomic prediction with imputed whole-genome sequence data[J]. >Journal of Integrative Agriculture, 2022, 21(4): 1126-1136.
No Suggested Reading articles found!