Journal of Integrative Agriculture ›› 2024, Vol. 23 ›› Issue (5): 1634-1643.DOI: 10.1016/j.jia.2023.11.048

• • 上一篇    下一篇

多策略预筛选大效应位点提高基因组预测准确性

  

  • 收稿日期:2022-11-23 接受日期:2023-11-03 出版日期:2024-05-20 发布日期:2024-04-23

Prescreening of large-effect markers with multiple strategies improves the accuracy of genomic prediction

Keanning Li1*, Bingxing An1*, Mang Liang1, Tianpeng Chang1, Tianyu Deng1, 2, Lili Du1, Sheng Cao1, 3, Yueying Du1, 4, Hongyan Li5, Lingyang Xu1, Lupei Zhang1, Xue Gao1, Junya LI1, Huijiang Gao1#   

  1. 1 Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China

    2 Shaanxi Key Laboratory of Molecular Biology for Agriculture, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China

    3 College of Computer and Information Engineering, Tianjin Agricultural University, Tianjin 300384, China

    4 College of Animal Science and Technology, Qingdao Agricultural University, Qingdao 266109, China

    5 Tongliao Animal Agriculture Development Service Center, Tongliao 028000, China

  • Received:2022-11-23 Accepted:2023-11-03 Online:2024-05-20 Published:2024-04-23
  • About author:Keanning Li, E-mail: likeanning@163.com; Bingxing An, E-mail: anbingxing@caas.cn; #Correspondence Huijiang Gao, Tel: +86-10-62818176, Fax: +86-10-62817806, E-mail: gaohuijiang@caas.com * These authors contributed equally to this study.
  • Supported by:
    This research was supported by the National Natural Science Foundations of China (31872975), the Science and Technology Project of Inner Mongolia Autonomous Region, China (2020GG0210), and the Program of National Beef Cattle and Yak Industrial Technology System, China (CARS-37).

摘要:

基因组选择利用表型与密集的全基因组单核苷酸多态性(Single nucleotide polymorphisms, SNPs)来估计个体的基因组估计育种值(Genomic estimated breeding value, GEBV)。目前,基因组最佳线性无偏预测(Genomic best linear unbiased prediction, GBLUP)是预测复杂性状最广泛的工具,随着技术不断发展,使用情境日下迁变,研究者们对预测的准确性提出更高要求。侧重于对模型本身进行拓展的相关研究将传统模型中的随机效应部分重新划分为多个子集并赋权,以此来提高预测精度,而倾向于利用多组学数据的研究则捕获基因组测序水平之外的变异来辅助提高预测精度。在本研究中,为了在大动物牛中提高基因组选择的预测精度,我们将具有基因组与转录组数据的群体设置为训练群体,以华西牛的生长性状中的背最长肌重量性状(Longissimus dorsi muscles, LDM),与肉质性状中的系水力(Water holding capacity, WHC)、剪切力(Shear force, SF)以及pH的数据作为表型数据,使用贝叶斯稀疏线性混合模型(Bayesian sparse linear mixed model, BSLMM)、全转录组关联分析(Transcriptome-wide association study, TWAS)及表达数量性状基因座(Expression quantitative trait locus, eQTL)映射的方法,根据βb>0、前1%表型方差解释率(Percent of phenotypic variation explained, PVE)、与表达关联的单核苷酸多态性位点(expression-associated single nucleotide polymorphisms, eSNPs)及基因(egenes)错误发现率(False discovery rate, FDR)< 0.01的条件进行基因组特征预筛选,而后将这些显著的预筛选位点分别设置为额外的固定效应(GBLUP-Fix)与随机效应(GFBLUP),用以改进模型并在验证群体中进行验证,同时,将传统GBLUP方法以及基于随机筛选的位点进行设置的GFBLUP与GBLUP-Fix进行比较。结果表明,在GFBLUP与GBLUP-Fix模型下,不同策略预筛选位点的加入,将LDM、WHC、SF、pH性状的预测精度平均提高了2.14%至8.69%。其中GFBLUP-TWAS在SF方面相较于GBLUP模型,预测精度提高了13.66%。此外结果也表明这些方法能够捕获比GBLUP模型更多的遗传变异。我们的研究验证并强调了多组学辅助的大效应位点预筛选策略在提高大动物的基因组预测准确性上的可行性,这为筛选位点并用于华西牛低密度SNP芯片序列的设计工作奠定基础。

Abstract:

Presently, integrating multi-omics information into a prediction model has become a ameliorate strategy for genomic selection to improve genomic prediction accuracy.  Here, we set the genomic and transcriptomic data as the training population data, using BSLMM, TWAS, and eQTL mapping to prescreen features according to | ^βb|>0, top 1% of phenotypic variation explained (PVE), expression-associated single nucleotide polymorphisms (eSNPs), and egenes (false discovery rate (FDR)<0.01), where these loci were set as extra fixed effects (named GBLUP-Fix) and random effects (GFBLUP) to improve the prediction accuracy in the validation population, respectively.  The results suggested that both GBLUP-Fix and GFBLUP models could improve the accuracy of longissimus dorsi muscle (LDM), water holding capacity (WHC), shear force (SF), and pH in Huaxi cattle on average from 2.14 to 8.69%, especially the improvement of GFBLUP-TWAS over GBLUP was 13.66% for SF.  These methods also captured more genetic variance than GBLUP.  Our study confirmed that multi-omics-assisted large-effects loci prescreening could improve the accuracy of genomic prediction.

Key words: Multi-omics data , features prescreening , eQTL mapping , Huaxi cattle , Genomic selection