Journal of Integrative Agriculture

• •    下一篇

24种基因型填充软件组合在猪SNP芯片的应用效果研究

  

  • 修回日期:2024-12-10

Benchmarking 24 combinations of genotype pre-phasing and imputation software for SNP arrays in pigs

Haonan Zeng, Kaixuan Guo, Zhanming Zhong, Jinyan Teng, Zhiting Xu, Chen Wei, Shaolei Shi, Zhe Zhang, Yahui Gao#   

  1. State Key Laboratory of Swine and Poultry Breeding Industry, National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China

  • Revised:2024-12-10
  • About author:Haonan Zeng, E-mail: hnzeric@hotmail.com Correspondence Yahui Gao, E-mail: yahui.gao@scau.edu.cn
  • Supported by:

    This study was supported by the National Key R&D Program of China (2022YFF1000900), the earmarked fund for China Agriculture Research System (CARS-35), Guangxi Science and Technology Program Project (GuikeJB23023003), the Local Innovative and Research Teams Project of Guangdong Province (2019BT02N630), Dedicated Funds for the Construction of Key Disciplines in Targeted Universities (2023B10564001), and the Young Scientists Fund of the National Natural Science Foundation of China (32402714). We thank National Supercomputer Center in Guangzhou China for its support in providing computing resources. 

摘要:

基因型填充可在不增加基因型检测成本的前提下提高标记密度,有利于最大化使用现有SNP芯片数据开展复杂性状的机制解析与遗传评估。在动物育种领域,准确的基因组数据对于基因组选择、关联研究以及育种预测至关重要。尽管有多种基因型填充软件可供选择,但在猪基因组研究中仍缺乏全面的基准测试。本研究基于PigGTEx项目的1602头多品种猪参考面板(PGRP)的全基因组测序数据,选用六种定相软件(fastPHASEMaCHBIMBAMEagleSHAPEITBeagle)以及四种填充软件(pbwtMinimacIMPUTEBeagle)进行两两组合,对比了24种基因型填充软件组合在猪SNP芯片的应用效果结果表明,使用Beagle进行定相Minimac进行填充的组合能够达到的填充准确性,其平均基因型一致性为0.983特别是在处理低频SNPMAF<0.05时表现尤为出色。在资源利用效率方面,pbwt在四款填充软件中表现优异,体现在运行时间最短和内存占用量最少基于评估结果,本研究提出三种基因型填充组合策略:1BeagleMinimac组合。该组合能够获得最高的填充准确性2Beagle与Beagle 组合。虽然使用Beagle进行预相位和填充需要较大的内存,但它因操作简便且仍能保持较高的填充准确性而被广泛认可;3Eaglepbwt组合。该组合以计算成本最低且准确性相对较高为特点,适合计算资源有限的场景。综上所述,研究为基因型填充技术在猪SNP芯片向全基因组水平填充中的应用提供了重要依据,并为畜禽的精准育种提供了理论支持。

Abstract:

Genotype imputation is essential for increasing marker density and maximizing the utility of existing SNP array data in animal breeding. Although a wide range of software is available for genotype imputation, a comprehensive benchmark in pigs is still lacking. In this study, we benchmarked 24 combinations of genotype imputation software for SNP arrays in pigs, comprising six independent pre-phasing software (fastPHASE, MaCH, BIMBAM, Eagle, SHAPEIT, Beagle) and four distinct imputation software (pbwt, Minimac, IMPUTE, Beagle), using 1,602 whole-genome sequencing (WGS) pigs from a multibreed pig genomics reference panel (PGRP) in PigGTEx. Our results indicated that the combination of Beagle for pre-phasing and Minimac for imputation achieves the highest imputation accuracy with a concordance of 0.983, especially for low-frequency SNPs (MAF<0.05). Finally, we proposed three recommended strategies: i) the combination of Beagle and Minimac is optimal for achieving the highest accuracy; ii) the combination of Beagle and Beagle is recognized for its convenience and relatively high accuracy despite it being memory-intensive; iii) the combination of Eagle and pbwt is feasible for its minimal computational cost with relatively high accuracy. This study provides valuable insights for implementing genotype imputation for pig SNP arrays toward sequence data and offers a basis for applications in livestock and poultry breeding.

Key words: pig , genotype imputation ,  , benchmark