Journal of Integrative Agriculture ›› 2026, Vol. 25 ›› Issue (2): 775-787.DOI: 10.1016/j.jia.2024.03.083

• • 上一篇    下一篇

混合核支持向量机提高基因组选择预测准确性

  

  • 收稿日期:2023-11-08 修回日期:2024-04-26 接受日期:2024-02-28 出版日期:2026-02-20 发布日期:2026-01-06

Using mixed kernel support vector machine to improve the predictive accuracy of genome selection

Jinbu Wang1, Wencheng Zong1, Liangyu Shi2, Mianyan Li1, Jia Li1, 3, Deming Ren1, 4, Fuping Zhao1, Lixian Wang1#, Ligang Wang1#   

  1. 1 Key Laboratory of Animal Genetics, Breeding and Reproduction (poultry) of Ministry of Agriculture and Rural Affairs, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China

    2 School of Animal Science and Nutritional Engineering, Wuhan Polytechnic University, Wuhan 430023, China

    3 College of Animal Science and Technology, Beijing University of Agriculture, Beijing 102206, China

    4 College of Animal Science and Technology, Qingdao Agricultural University, Qingdao 266109, China

  • Received:2023-11-08 Revised:2024-04-26 Accepted:2024-02-28 Online:2026-02-20 Published:2026-01-06
  • About author:Jinbu Wang, E-mail: w18439393365@163.com; #Correspondence Lixian Wang, E-mail: iaswlx@263.net; Ligang Wang, E-mail: wangligang01@caas.cn
  • Supported by:
    This work was supported by the China Agriculture Research System of MOF and MARA, the National Natural Science Foundation of China (31872337 and 31501919), and the Agricultural Science and Technology Innovation Project, China (ASTIP-IAS02).  

摘要: 传统的基因组选择参数模型无法更好的拟合日益庞大的测序数据并准确地捕捉其中的复杂效应,机器学习模型在处理相关问题上展现出巨大的潜力。本研究引入了混合核函数概念,并首次在支持向量机回归算法(SVR)中使用拉普拉斯核函数 SVR_L)与余弦核函数(SVR_C),探索基因组选择中支持向量机回归算法的性能。首先,我们进行了权重参数寻优。结果显示,当全局核函数(高斯核,拉普拉斯核)与Sigmoid核函数混合时,多数情况下权重参数为0.9时取得最高准确性。当全局核函数与多项式核函数混合时,权重参数最佳选择为0.1。其次,我们使用预测准确性、均方误差(MSE)和平均绝对误差(MAE)作为评价指标,对六个单核函数(SVR_L、SVR_C、SVR_G、SVR_P、SVR_S、SVR_L四个混合核函数(SVR_GS、SVR_GP、SVR_LS、SVR_LP两种传统参数模型(GBLUP、BayesB)以及两种流行的机器学习模型(RF、KcRR)进行基因组育种值预测性能的比较。结果表明,在大多数情况下,混合核函数模型的性能优于GBLUP、BayesB和单一核函数。例如,对于猪数据集中的性状1(T1,遗传力为0.07)SVR_GS的预测准确性较 GBLUP提10%,较SVR_G和SVR_S分别提高约4.4%,18.6%。对于小麦数据集中的环境1(E1),SVR_GS 的预测准确性GBLUP高13.3%。在单核函数中,拉普拉斯核函数和高斯核函数产生相似的结果,但高斯核函数的表现更优。与单一核函数相比,混合核函数的预测MSE和MAE明显降低。此外,就运行时间而言,在猪数据集中SVR_GS和SVR_GP比GBLUP快3倍左右,与单一核函数模型相比,运行时间仅略有增加。综上所述,SVR的混合核函数模型表现出速度和精度的优势,特别是SVR_GS模型对于基因组选择具有重要的应用潜力。本研究对SVR算法的参数寻优及混合核应用提供了重要参考。

Abstract:

The advantages of genome selection (GS) in animal and plant breeding are self-evident.  Traditional parametric models have disadvantage in better fit the increasingly large sequencing data and capture complex effects accurately.  Machine learning models have demonstrated remarkable potential in addressing these challenges.  In this study, we introduced the concept of mixed kernel functions to explore the performance of support vector machine regression (SVR) in GS.  Six single kernel functions (SVR_L, SVR_C, SVR_G, SVR_P, SVR_S, SVR_L) and four mixed kernel functions (SVR_GS, SVR_GP, SVR_LS, SVR_LP) were used to predict genome breeding values.  The prediction accuracy, mean squared error (MSE) and mean absolute error (MAE) were used as evaluation indicators to compare with two traditional parametric models (GBLUP, BayesB) and two popular machine learning models (RF, KcRR).  The results indicate that in most cases, the performance of the mixed kernel function model significantly outperforms that of GBLUP, BayesB and single kernel function.  For instance, for T1 in the pig dataset, the predictive accuracy of SVR_GS is improved by 10% compared to GBLUP, and by approximately 4.4 and 18.6% compared to SVR_G and SVR_S respectively.  For E1 in the wheat dataset, SVR_GS achieves 13.3% higher prediction accuracy than GBLUP.  Among single kernel functions, the Laplacian and Gaussian kernel functions yield similar results, with the Gaussian kernel function performing better.  The mixed kernel function notably reduces the MSE and MAE when compared to all single kernel functions.  Furthermore, regarding runtime, SVR_GS and SVR_GP mixed kernel functions run approximately three times faster than GBLUP in the pig dataset, with only a slight increase in runtime compared to the single kernel function model.  In summary, the mixed kernel function model of SVR demonstrates speed and accuracy competitiveness, and the model such as SVR_GS has important application potential for GS.

Key words: genome selection , machine learning ,  support vector machine ,  kernel function ,  mixed kernel function