中国农业科学

• • 上一篇    

不同筛选方法的低密度SNP集合填充准确性比较

林雨浓1,2,王泽昭2,陈燕2,朱波2,高雪2,张路培2,高会江2, 徐凌洋2,蔡文涛2,李英豪3李俊雅2*,高树新1*
  

  1. 1内蒙古民族大学动物科技学院, 内蒙古通辽 0280422中国农业科学院北京畜牧兽医研究所,北京 100193; 3通辽京缘种牛繁育有限责任公司,内蒙古通辽 028006
  • 发布日期:2022-04-12

Comparison of Imputation Accuracy for Different Low-Density SNP Selection Strategies

LIN YuNong1,2, WANG ZeZhao2, CHEN Yan2, ZHU Bo2, GAO Xue2, ZHANG LuPei2, GAO HuiJiang2, XU LingYang2, CAI WenTao2, Li YingHao3, LI JunYa2*, GAO ShuXin1* #br#   

  1. 1College of Animal Science and Technology, Inner Mongolia University for the Nationalities, Tongliao 028042 Inner Magnolia; 2Institute of Animal Sciences, Chinese Academy of Agriculture Sciences, Beijing 100193;3 Tongliao Jingyuan Breeding Cattle Breeding LLC,Tongliao 028006, Inner Magnolia
  • Online:2022-04-12

摘要: 【目的】尝试通过在华西牛参考群高密度标记芯片位点中使用两种标记筛选方法挑选具有代表性的密度梯度不同的SNP位点集合,后利用基因组填充策略在相同填充参数下将低密度芯片数据填充至高密度继而进行后续基因组研究,从而达到降低华西牛基因型分型成本的目的。研究分别比较了不同标记集合填充准确性和填充一致性的差异,阐述了标记筛选方法、标记密度、最小等位基因频率和参考群体数量等4个因素对填充结果的影响,为华西牛低密度SNP填充芯片设计提供参考【方法】将质控后剩余的1 233头华西牛群体随机分为参考群(986头)和验证群(247头)。使用等间距法(equidistance,EQ)和高MAF法(high MAF,HM)两种标记筛选方法分别从华西牛参考群体的Illumina Bovine HD 芯片位点集合中筛选出16种不同密度的SNP集合,共生成32种不同SNP梯度密度集合。随后在验证群体中利用Beagle(v5.1)软件将各低密度集合填充至770K密度水平,计算填充准确性和填充一致性并对填充性能影响因素进行分析。【结果】32种低密度SNP集合的标记数量在100至16 000之间,窗口最大为24 176 kb,最小151 kb。随着标记密度升高,EQ和HM两种筛选方法填充一致性和准确性不断提升,但填充准确性和填充一致性增加的幅度越来越小。当标记集合密度超过12k后均趋于平稳。SNP密度在16k时两种方法的填充准确性达到最高(。当标记密度低于11k时,不同标记密度梯度下HM方法填充一致性均高于EQ方法。然而SNP集合密度超过11k时,EQ筛选方法较表现出填充优势。与填充一致性结果趋势相似,在SNP集合密度低于10k时,HM方法仍然具有较高的填充准确性,但当SNP集合密度高于10k时,EQ方法的填充准确性则较高,且在SNP密度集合大于12k后,EQ填充准确性趋于稳定。同时研究发现与低MAF标记位点相比,高MAF位点的填充准确性更高。填充过程中发现,填充一致性和填充准确性随着参考群体增大而提高。当参考群体数量在600—800时,位点填充准确性和一致性较高。【结论】在华西牛群体中,填充一致性和填充准确性随标记密度递增而上升,在标记密度为10k~12k区间,可获得较好的填充效果。当标记密度小于10k时优先选择HM方法,更高密度时EQ方法较好。MAF标记位点填充准确性更高。采用填充策略进行低密度标记填充时,参考群体数量至少需在400头以上时填充效果较为理想。


关键词: 填充准确性, 低密度SNP芯片, 华西牛, 连锁不平衡, 最小等位基因频率

Abstract: 【ObjectiveTo facilitate low-cost genomic selection in Huaxi Cattle, the present study represent the first attempt to designed a new low-denstity Genotype chip to support imputation to higher density genotypes. The representative SNP markers with different density gradients were selected from high-density SNP chips in the Huaxi Cattle reference population by using two SNP selection methods. And then these marker sets were imputed to high-density sets with the same imputation parameters for subsequent genomic studies. Meanwhile, the current study compared the differences in imputation accuracy and concordance among SNP panels and illustrated the effects of four factors on imputation results including marker screening method, marker density, minor allele frequency and the number of reference population. This study provides insights about the methods to select low-density SNP markers for imputation in the current population and the representative SNPs will aid in designing low-density SNP chip for Huaxi cattle.【Method】Totally 1,233 Huaxi cattle after genotypes filtered was randomly divided into reference (986) and validation (247) populations. two SNP selection strategies, based on Equidistance (EQ) and on high MAF (HM), were used to make 16 SNP sets with different densities from the Illumina Bovine HD chip in the reference population, respectively. Each of the 32 low-density set was then imputed to the 770K density level in the validation population using Beagle (v5.1), while the imputation accuracy and concordance were calculated as the mean correlation between true and imputed genotypes. While, a comprehensive set of factors that influence the imputation performance were analyzed.【Result】The number of markers in the 32 low-density SNP sets ranged from 100 to 16 000, with a maximum window of 24 176 kb and a minimum window of 151 kb. The imputation accuracy and concordance of both EQ and HM methods went up with increasing marker densities. The imputation accuracy of both methods was highest at 16k SNP density (r2EQ=0.8801r2MAF=0.8696). When the marker density was below 11k, the imputation concordance of HM was higher than EQ for all marker density gradients. However, when the SNP density exceeded 11k, EQ showed an imputation accuracy advantage over HM. Similar to the imputation concordance results, the HM method still had higher imputation accuracy when the SNP density was lower than 10k, but the EQ method had higher imputation accuracy when the SNP pool density was higher than 10k, and the EQ imputation accuracy tended to be stable after the SNP density was greater than 12k. It was also found that the imputation accuracy of high MAF locus was higher. During the imputation process, it was found that the imputation accuracy and concordance increased with the increase of the reference panel. The imputation accuracy and concordance of loci were higher when the population of the reference panel was 600-800. 【Conclusion】In the Huaxi cattle population, imputation accuracy and concordance increased with increasing marker density, and a better imputation effect could be obtained in the marker density of 10k-12k interval. The HM method was preferred when the marker density was less than 10k, and the EQ method was better at high marker density. High MAF locuses were more accurate for imputation. When using the imputation strategy for low-density marker imputation, the number of reference panel should be at least 400 heads for better imputation effect.

Key words: imputation accuracy, low density SNP array, Chinese Simmental cattle, linkage disequilibrium, MAF