Journal of Integrative Agriculture ›› 2022, Vol. 21 ›› Issue (2): 486-495.DOI: 10.1016/S2095-3119(21)63695-X

所属专题: 动物科学合辑Animal Science

• • 上一篇    下一篇

基于单个或多个品种参考群体评估影响猪基因型填充准确性的因素

  

  • 收稿日期:2020-09-19 接受日期:2021-03-15 出版日期:2022-01-02 发布日期:2022-01-02

A comprehensive evaluation of factors affecting the accuracy of pig genotype imputation using a single or multi-breed reference population

ZHANG Kai-li, PENG Xia, ZHANG Sai-xian, ZHAN Hui-wen, LU Jia-hui, XIE Sheng-song, ZHAO Shu-hong, LI Xin-yun, MA Yun-long   

  1. Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education/Key Laboratory of Swine Genetics and Breeding, Ministry of Agriculture and Rural Affairs/College of Animal Science and Technology, Huazhong Agricultural University, Wuhan 430070, P.R.China
  • Received:2020-09-19 Accepted:2021-03-15 Online:2022-01-02 Published:2022-01-02
  • About author:Correspondence LI Xin-yun, E-mail: xyli@mail.hzau.edu.cn; MA Yun-long, Tel: +86-27-87282091, E-mail: Yunlong.Ma@mail.hzau.edu.cn
  • Supported by:
    This work was supported by the China Agriculture Research System of MOF and MARA (CARS-35), the National Natural Science Foundation of China (32072696, 31790414 and 31601916) and the Fundamental Research Funds for the Central Universities (2662019PY011).

摘要:

基因型填充已成为基因组分析中预处理的关键步骤,其准确性直接影响下游分析。许多因素都会影响填充的准确性,其中,混合参考群体的填充倍受关注。这项研究旨在:评估填充及其影响因素之间的关系,以确保更高的填充精度;探索在参考群体中包含多个品种(系)是否有利于猪填充的准确性;选择具有良好填充效果的填充软件。在这项研究中我们使用50K芯片数据,基于单品系(大白A系)和多品种(大白A系,大白B系,杜洛克长白)参考群体评估了填充精度随验证群体标记密度参考群体样本量最小等位基因频率和参考群体组成四种影响因素的变化,并比较了Beagle 4.1、FImpute、IMPUTE2 MaCH-Admix四种填充软件的填充准确率和运行时间。通过计算填充后SNPs和真实SNPs间的基因型一致率和皮尔森相关性获得填充精度。首先,我们通过随机缺失验证群体中20、45、70、95%和99%的SNPs来模拟低密度芯片,以研究标记密度的影响。然后,我们从原参考群体中随机抽取8、86、173、434868头猪作为新的参考群体来研究参考群体样本量对填充精度的作用。对于最小等位基因频率,SNPs等位基因频率被分为7类,分别计算每类SNPs的填充准确性。结果显示,随着验证群体标记密度,参考群体样本量和最小等位基因频率增加,填充准确增加。当参考群体为与验证群体品系一致的单品系群体时,填充准确性较高,其他品种(系)的添加会导致相对差的填充结果。此外,随着参考群体中主效品系样本量的增加,填充准确性也会提高。在所有填充情景中,综合考虑填充精度和运行时间,Beagle 4.1FImpute优于IMPUTE2 MaCH-Admix。这项工作使从事相关研究的人员能够更直观地了解这些影响因素对填充的影响,并为实际猪育种中实施填充策略提供实践指导。

Abstract: Genotype imputation has become an indispensable part of genomic data analysis.  In recent years, imputation based on a multi-breed reference population has received more attention, but the relevant studies are scarce in pigs.  In this study, we used the Illumina PorcineSNP50 Bead Chip to investigate the variations of imputation accuracy with various influencing factors and compared the imputation performance of four commonly used imputation software programs.  The results indicated that imputation accuracy increased as either the validation population marker density, reference population sample size, or minor allele frequency (MAF) increased.  However, the imputation accuracy would have a certain extent of decrease when the pig reference population was a mixed group of multiple breeds or lines.  Considering both imputation accuracy and running time, Beagle 4.1 and FImpute are excellent choices among the four software packages tested.  This work visually presents the impacts of these influencing factors on imputation and provides a reference for formulating reasonable imputation strategies in actual pig breeding.

Key words: genotype imputation , multi-breed reference population , imputation accuracy