Please wait a minute...
Journal of Integrative Agriculture
Advanced Online Publication | Current Issue | Archive | Adv Search
Benchmarking 24 combinations of genotype pre-phasing and imputation software for SNP arrays in pigs

Haonan Zeng, Kaixuan Guo, Zhanming Zhong, Jinyan Teng, Zhiting Xu, Chen Wei, Shaolei Shi, Zhe Zhang, Yahui Gao#

State Key Laboratory of Swine and Poultry Breeding Industry, National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China

 Highlights 

Evaluated 24 combinations of imputation software for pigs.

Beagle-Minimac combination provided the best imputation accuracy.

Beagle-Beagle combination stands out for convenience.

Eagle-pbwt combination showed excellent performance in resource efficiency.

Download:  PDF in ScienceDirect  
Export:  BibTeX | EndNote (RIS)      
摘要  

基因型填充可在不增加基因型检测成本的前提下提高标记密度,有利于最大化使用现有SNP芯片数据开展复杂性状的机制解析与遗传评估。在动物育种领域,准确的基因组数据对于基因组选择、关联研究以及育种预测至关重要。尽管有多种基因型填充软件可供选择,但在猪基因组研究中仍缺乏全面的基准测试。本研究基于PigGTEx项目的1602头多品种猪参考面板(PGRP)的全基因组测序数据,选用六种定相软件(fastPHASEMaCHBIMBAMEagleSHAPEITBeagle)以及四种填充软件(pbwtMinimacIMPUTEBeagle)进行两两组合,对比了24种基因型填充软件组合在猪SNP芯片的应用效果结果表明,使用Beagle进行定相Minimac进行填充的组合能够达到的填充准确性,其平均基因型一致性为0.983特别是在处理低频SNPMAF<0.05时表现尤为出色。在资源利用效率方面,pbwt在四款填充软件中表现优异,体现在运行时间最短和内存占用量最少基于评估结果,本研究提出三种基因型填充组合策略:1BeagleMinimac组合。该组合能够获得最高的填充准确性2Beagle与Beagle 组合。虽然使用Beagle进行预相位和填充需要较大的内存,但它因操作简便且仍能保持较高的填充准确性而被广泛认可;3Eaglepbwt组合。该组合以计算成本最低且准确性相对较高为特点,适合计算资源有限的场景。综上所述,研究为基因型填充技术在猪SNP芯片向全基因组水平填充中的应用提供了重要依据,并为畜禽的精准育种提供了理论支持。



Abstract  

Genotype imputation is essential for increasing marker density and maximizing the utility of existing SNP array data in animal breeding. Although a wide range of software is available for genotype imputation, a comprehensive benchmark in pigs is still lacking. In this study, we benchmarked 24 combinations of genotype imputation software for SNP arrays in pigs, comprising six independent pre-phasing software (fastPHASE, MaCH, BIMBAM, Eagle, SHAPEIT, Beagle) and four distinct imputation software (pbwt, Minimac, IMPUTE, Beagle), using 1,602 whole-genome sequencing (WGS) pigs from a multibreed pig genomics reference panel (PGRP) in PigGTEx. Our results indicated that the combination of Beagle for pre-phasing and Minimac for imputation achieves the highest imputation accuracy with a concordance of 0.983, especially for low-frequency SNPs (MAF<0.05). Finally, we proposed three recommended strategies: i) the combination of Beagle and Minimac is optimal for achieving the highest accuracy; ii) the combination of Beagle and Beagle is recognized for its convenience and relatively high accuracy despite it being memory-intensive; iii) the combination of Eagle and pbwt is feasible for its minimal computational cost with relatively high accuracy. This study provides valuable insights for implementing genotype imputation for pig SNP arrays toward sequence data and offers a basis for applications in livestock and poultry breeding.

Keywords:  pig       genotype imputation              benchmark  
Online: 10 December 2024  
Fund: 

This study was supported by the National Key R&D Program of China (2022YFF1000900), the earmarked fund for China Agriculture Research System (CARS-35), Guangxi Science and Technology Program Project (GuikeJB23023003), the Local Innovative and Research Teams Project of Guangdong Province (2019BT02N630), Dedicated Funds for the Construction of Key Disciplines in Targeted Universities (2023B10564001), and the Young Scientists Fund of the National Natural Science Foundation of China (32402714). We thank National Supercomputer Center in Guangzhou China for its support in providing computing resources. 

About author:  Haonan Zeng, E-mail: hnzeric@hotmail.com Correspondence Yahui Gao, E-mail: yahui.gao@scau.edu.cn

Cite this article: 

Haonan Zeng, Kaixuan Guo, Zhanming Zhong, Jinyan Teng, Zhiting Xu, Chen Wei, Shaolei Shi, Zhe Zhang, Yahui Gao. 2024. Benchmarking 24 combinations of genotype pre-phasing and imputation software for SNP arrays in pigs. Journal of Integrative Agriculture, Doi:10.1016/j.jia.2024.12.009

Abdellaoui A, Yengo L, Verweij K J H, Visscher P M. 2023. 15 years of GWAS discovery: Realizing the promise. The American Journal of Human Genetics, 110, 179–194.

Han J, van Hylckama Vlieg A, Rosendaal F R. 2023. Genomic science of risk prediction for venous thromboembolic disease: convenient clarification or compounding complexity. Journal of Thrombosis and Haemostasis, 21, 3292–3303.

Teng J, Ye S, Gao N, Chen Z, Diao S, Li X, Yuan X, Zhang H, Li J, Zhang X, Zhang Z. 2022. Incorporating genomic annotation into single-step genomic prediction with imputed whole-genome sequence data. Journal of Integrative Agriculture, 21, 1126–1136.

Wang Z, Li W, Tang Z. 2024. Enhancing the genomic prediction accuracy of swine agricultural economic traits using an expanded one-hot encoding in CNN models1. Journal of Integrative Agriculture,.

Desta Z A, Ortiz R. 2014. Genomic selection: genome-wide prediction in plant improvement. Trends in Plant Science, 19, 592–601.

Liu P, Ma L, Jian S, He Y, Yuan G, Ge F, Chen Z, Zou C, Pan G, Lübberstedt T, Shen Y. 2024. Population genomic analysis reveals key genetic variations and the driving force for embryonic callus induction capability in maize. Journal of Integrative Agriculture, 23, 2178–2195.

Cai Z, Christensen O F, Lund M S, Ostersen T, Sahana G. 2022. Large-scale association study on daily weight gain in pigs reveals overlap of genetic factors for growth in humans. BMC Genomics, 23, 133.

Fang L, Jiang J, Li B, Zhou Y, Freebern E, Vanraden P M, Cole J B, Liu G E, Ma L. 2019. Genetic and epigenetic architecture of paternal origin contribute to gestation length in cattle. Communications Biology, 2, 100.

Sun Y, Li Y, Zhao C, Teng J, Wang Y, Wang T, Shi X, Liu Z, Li H, Wang J, Wang W, Ning C, Wang C, Zhang Q. 2023. Genome-wide association study for numbers of vertebrae in Dezhou donkey population reveals new candidate genes. Journal of Integrative Agriculture, 22, 3159–3169.

Zhang Z, Xing S, Qiu A, Zhang N, Wang W, Qian C, Zhang J, Wang C, Zhang Q, Ding X. 2023. The development of a porcine 50K SNP panel using genotyping by target sequencing and its application1. Journal of Integrative Agriculture,.

Li Y, Bai X, Liu X, Wang W, Li Z, Wang N, Xiao F, Gao H, Guo H, Li H, Wang S. 2022. Integration of genome-wide association study and selection signatures reveals genetic determinants for skeletal muscle production traits in an F2 chicken population. Journal of Integrative Agriculture, 21, 2065–2075.

Li N, Stephens M. 2003. Modeling Linkage Disequilibrium and Identifying Recombination Hotspots Using Single-Nucleotide Polymorphism Data. Genetics, 165, 2213–2233.

Chen J, Shi X. 2019. Sparse Convolutional Denoising Autoencoders for Genotype Imputation. Genes, 10, 652.

Song M, Greenbaum J, Luttrell J I, Zhou W, Wu C, Luo Z, Qiu C, Zhao L J, Su K-J, Tian Q, Shen H, Hong H, Gong P, Shi X, Deng H-W, Zhang C. 2022. An autoencoder-based deep learning method for genotype imputation. Frontiers in Artificial Intelligence, 5,.

Kojima K, Tadaka S, Katsuoka F, Tamiya G, Yamamoto M, Kinoshita K. 2020. A genotype imputation method for de-identified haplotype reference information by using recurrent neural network. PLOS Computational Biology, 16, e1008207.

De Marino A, Mahmoud A A, Bose M, Bircan K O, Terpolovsky A, Bamunusinghe V, Bohn S, Khan U, Novkovic B, Yazdi P G. 2022. A comparative analysis of current phasing and imputation software. PLOS One, 17, e0260177.

Naito T, Okada Y. 2024. Genotype imputation methods for whole and complex genomic regions utilizing deep learning technology. Journal of Human Genetics, 1–6.

Ye S, Yuan X, Huang S, Zhang H, Chen Z, Li J, Zhang X, Zhang Z. 2019. Comparison of genotype imputation strategies using a combined reference panel for chicken population. Animal, 13, 1119–1126.

Zhang K, Peng X, Zhang S, Zhan H, Lu J, Xie S, Zhao S, Li X, Ma Y. 2022. A comprehensive evaluation of factors affecting the accuracy of pig genotype imputation using a single or multi-breed reference population. Journal of Integrative Agriculture, 21, 486–495.

Das S, Forer L, Schönherr S, Sidore C, Locke A E, Kwong A, Vrieze S I, Chew E Y, Levy S, McGue M, Schlessinger D, Stambolian D, Loh P-R, Iacono W G, Swaroop A, Scott L J, Cucca F, Kronenberg F, Boehnke M, Abecasis G R, et al. 2016. Next-generation genotype imputation service and methods. Nature Genetics, 48, 1284–1287.

McCarthy S, Das S, Kretzschmar W, Delaneau O, Wood A R, Teumer A, Kang H M, Fuchsberger C, Danecek P, Sharp K, Luo Y, Sidore C, Kwong A, Timpson N, Koskinen S, Vrieze S, Scott L J, Zhang H, Mahajan A, Veldink J, et al. 2016. A reference panel of 64,976 haplotypes for genotype imputation. Nature Genetics, 48, 1279–1283.

Zhang K, Liang J, Fu Y, Chu J, Fu L, Wang Y, Li W, Zhou Y, Li J, Yin X, Wang H, Liu X, Mou C, Wang C, Wang H, Dong X, Yan D, Yu M, Zhao S, Li X, et al. 2024. AGIDB: a versatile database for genotype imputation and variant decoding across species. Nucleic Acids Research, 52, D835–D849.

Teng J, Zhao C H, Wang D, Chen Z, Tang H, Li J B, Mei C, Yang Z P, Ning C, Zhang Q. 2022. Assessment of the performance of different imputation methods for low-coverage sequencing in Holstein cattle. Journal of Dairy Science, 105, 3355–3366.

Ye S, Zhou X, Lai Z, Ikhwanuddin M, Ma H. 2024. Systematic comparison of genotype imputation strategies in aquaculture: A case study in Nile tilapia (Oreochromis niloticus) populations. Aquaculture, 592, 741175.

Teng J, Gao Y, Yin H, Bai Z, Liu S, Zeng H, Bai L, Cai Z, Zhao B, Li X, Xu Z, Lin Q, Pan Z, Yang W, Yu X, Guan D, Hou Y, Keel B N, Rohrer G A, Lindholm-Perry A K, et al. 2024. A compendium of genetic regulatory effects across pig tissues. Nature Genetics, 56, 112–123.

Druet T, Macleod I M, Hayes B J. 2014. Toward genomic prediction from whole-genome sequence data: impact of sequencing design on genotype imputation and accuracy of predictions. Heredity, 112, 39–47.

Yang J A, Lee S H, Goddard M E, Visscher P M. 2011. GCTA: A Tool for Genome-wide Complex Trait Analysis. American Journal of Human Genetics, 88, 76–82.

Scheet P, Stephens M. 2006. A Fast and Flexible Statistical Model for Large-Scale Population Genotype Data: Applications to Inferring Missing Genotypes and Haplotypic Phase. The American Journal of Human Genetics, 78, 629–644.

Li Y, Willer C, Sanna S, Abecasis G. 2009. Genotype Imputation. Annual Review of Genomics and Human Genetics, 10, 387–406.

Servin B, Stephens M. 2007. Imputation-Based Analysis of Association Studies: Candidate Regions and Quantitative Traits. PLOS Genetics, 3, e114.

Durbin R. 2014. Efficient haplotype matching and storage using the positional Burrows–Wheeler transform (PBWT). Bioinformatics, 30, 1266–1272.

Loh P-R, Danecek P, Palamara P F, Fuchsberger C, A Reshef Y, K Finucane H, Schoenherr S, Forer L, McCarthy S, Abecasis G R, Durbin R, L Price A. 2016. Reference-based phasing using the Haplotype Reference Consortium panel. Nature Genetics, 48, 1443–1448.

Rubinacci S, Delaneau O, Marchini J. 2020. Genotype imputation using the Positional Burrows Wheeler Transform. PLOS Genetics, 16, e1009049.

Delaneau O, Zagury J-F, Robinson M R, Marchini J L, Dermitzakis E T. 2019. Accurate, scalable and integrative haplotype estimation. Nature Communications, 10, 5436.

Browning B L, Tian X, Zhou Y, Browning S R. 2021. Fast two-stage phasing of large-scale sequence data. The American Journal of Human Genetics, 108, 1880–1890.

Browning B L, Zhou Y, Browning S R. 2018. A One-Penny Imputed Genome from Next-Generation Reference Panels. The American Journal of Human Genetics, 103, 338–348.

Lin P, Hartz S M, Zhang Z H, Saccone S F, Wang J, Tischfield J A, Edenberg H J, Kramer J R, Goate A M, Bierut L J, Rice J P, COGA C C C, GENEVA. 2010. A New Statistic to Evaluate Imputation Reliability. PLOS One, 5, e9697.

Ding R, Savegnago R, Liu J, Long N, Tan C, Cai G, Zhuang Z, Wu J, Yang M, Qiu Y, Ruan D, Quan J, Zheng E, Yang H, Li Z, Tan S, Bedhane M, Schnabel R, Steibel J, Gondro C, et al. 2023. The SWine IMputation (SWIM) haplotype reference panel enables nucleotide resolution genetic mapping in pigs. Communications Biology, 6, 577.

Browning B L, Browning S R. 2016. Genotype Imputation with Millions of Reference Samples. The American Journal of Human Genetics, 98, 116–126.

Rubinacci S, Ribeiro D M, Hofmeister R J, Delaneau O. 2021. Efficient phasing and imputation of low-coverage sequencing data using large reference panels. Nature Genetics, 53, 120–126.

 

No related articles found!
No Suggested Reading articles found!