Please wait a minute...
Journal of Integrative Agriculture  2026, Vol. 25 Issue (2): 775-787    DOI: 10.1016/j.jia.2024.03.083
Animal Science · Veterinary Medicine Advanced Online Publication | Current Issue | Archive | Adv Search |
Using mixed kernel support vector machine to improve the predictive accuracy of genome selection

Jinbu Wang1, Wencheng Zong1, Liangyu Shi2, Mianyan Li1, Jia Li1, 3, Deming Ren1, 4, Fuping Zhao1, Lixian Wang1#, Ligang Wang1#

1 Key Laboratory of Animal Genetics, Breeding and Reproduction (poultry) of Ministry of Agriculture and Rural Affairs, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China

2 School of Animal Science and Nutritional Engineering, Wuhan Polytechnic University, Wuhan 430023, China

3 College of Animal Science and Technology, Beijing University of Agriculture, Beijing 102206, China

4 College of Animal Science and Technology, Qingdao Agricultural University, Qingdao 266109, China

 Highlights 
We introduced mixed kernel functions and associated theories into genomic selection for the first time.  
We applied the Laplacian kernel and Cosine kernel functions to the support vector machine algorithm in the field of genomic selection for the first time.  
We proposed optimal weights for the mixed kernel support vector machine applied to traits with varying heritabilities.  The SVR_GS algorithm exhibited robust performance in genomic selection.

Download:  PDF in ScienceDirect  
Export:  BibTeX | EndNote (RIS)      
摘要  传统的基因组选择参数模型无法更好的拟合日益庞大的测序数据并准确地捕捉其中的复杂效应,机器学习模型在处理相关问题上展现出巨大的潜力。本研究引入了混合核函数概念,并首次在支持向量机回归算法(SVR)中使用拉普拉斯核函数 SVR_L)与余弦核函数(SVR_C),探索基因组选择中支持向量机回归算法的性能。首先,我们进行了权重参数寻优。结果显示,当全局核函数(高斯核,拉普拉斯核)与Sigmoid核函数混合时,多数情况下权重参数为0.9时取得最高准确性。当全局核函数与多项式核函数混合时,权重参数最佳选择为0.1。其次,我们使用预测准确性、均方误差(MSE)和平均绝对误差(MAE)作为评价指标,对六个单核函数(SVR_L、SVR_C、SVR_G、SVR_P、SVR_S、SVR_L四个混合核函数(SVR_GS、SVR_GP、SVR_LS、SVR_LP两种传统参数模型(GBLUP、BayesB)以及两种流行的机器学习模型(RF、KcRR)进行基因组育种值预测性能的比较。结果表明,在大多数情况下,混合核函数模型的性能优于GBLUP、BayesB和单一核函数。例如,对于猪数据集中的性状1(T1,遗传力为0.07)SVR_GS的预测准确性较 GBLUP提10%,较SVR_G和SVR_S分别提高约4.4%,18.6%。对于小麦数据集中的环境1(E1),SVR_GS 的预测准确性GBLUP高13.3%。在单核函数中,拉普拉斯核函数和高斯核函数产生相似的结果,但高斯核函数的表现更优。与单一核函数相比,混合核函数的预测MSE和MAE明显降低。此外,就运行时间而言,在猪数据集中SVR_GS和SVR_GP比GBLUP快3倍左右,与单一核函数模型相比,运行时间仅略有增加。综上所述,SVR的混合核函数模型表现出速度和精度的优势,特别是SVR_GS模型对于基因组选择具有重要的应用潜力。本研究对SVR算法的参数寻优及混合核应用提供了重要参考。

Abstract  
The advantages of genome selection (GS) in animal and plant breeding are self-evident.  Traditional parametric models have disadvantage in better fit the increasingly large sequencing data and capture complex effects accurately.  Machine learning models have demonstrated remarkable potential in addressing these challenges.  In this study, we introduced the concept of mixed kernel functions to explore the performance of support vector machine regression (SVR) in GS.  Six single kernel functions (SVR_L, SVR_C, SVR_G, SVR_P, SVR_S, SVR_L) and four mixed kernel functions (SVR_GS, SVR_GP, SVR_LS, SVR_LP) were used to predict genome breeding values.  The prediction accuracy, mean squared error (MSE) and mean absolute error (MAE) were used as evaluation indicators to compare with two traditional parametric models (GBLUP, BayesB) and two popular machine learning models (RF, KcRR).  The results indicate that in most cases, the performance of the mixed kernel function model significantly outperforms that of GBLUP, BayesB and single kernel function.  For instance, for T1 in the pig dataset, the predictive accuracy of SVR_GS is improved by 10% compared to GBLUP, and by approximately 4.4 and 18.6% compared to SVR_G and SVR_S respectively.  For E1 in the wheat dataset, SVR_GS achieves 13.3% higher prediction accuracy than GBLUP.  Among single kernel functions, the Laplacian and Gaussian kernel functions yield similar results, with the Gaussian kernel function performing better.  The mixed kernel function notably reduces the MSE and MAE when compared to all single kernel functions.  Furthermore, regarding runtime, SVR_GS and SVR_GP mixed kernel functions run approximately three times faster than GBLUP in the pig dataset, with only a slight increase in runtime compared to the single kernel function model.  In summary, the mixed kernel function model of SVR demonstrates speed and accuracy competitiveness, and the model such as SVR_GS has important application potential for GS.

Keywords:  genome selection       machine learning        support vector machine        kernel function        mixed kernel function  
Received: 08 November 2023   Accepted: 28 February 2024 Online: 26 April 2024  
Fund: This work was supported by the China Agriculture Research System of MOF and MARA, the National Natural Science Foundation of China (31872337 and 31501919), and the Agricultural Science and Technology Innovation Project, China (ASTIP-IAS02).  
About author:  Jinbu Wang, E-mail: w18439393365@163.com; #Correspondence Lixian Wang, E-mail: iaswlx@263.net; Ligang Wang, E-mail: wangligang01@caas.cn

Cite this article: 

Jinbu Wang, Wencheng Zong, Liangyu Shi, Mianyan Li, Jia Li, Deming Ren, Fuping Zhao, Lixian Wang, Ligang Wang. 2026. Using mixed kernel support vector machine to improve the predictive accuracy of genome selection. Journal of Integrative Agriculture, 25(2): 775-787.

Abdollahi-Arpanahi R, Gianola D, Penagaricano F. 2020. Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes. Genetics Selection Evolution52, 12.

Aggarwal C C, Hinneburg A, Keim D A. 2001. On the surprising behavior of distance metrics in high dimensional space. In: Van den Bussche J, Vianu V, eds., Database Theory-ICDT 2001. Springer Berlin Heidelberg, Berlin, Heidelberg. pp. 420–434.

Alves A A C, Espigolan R, Bresolin T, Costa R M, Fernandes Júnior G A, Ventura R V, Carvalheiro R, Albuquerque L G. 2021. Genome-enabled prediction of reproductive traits in Nellore cattle using parametric models and machine learning methods. Animal Genetics52, 32–46.

An B, Liang M, Chang T, Duan X, Du L, Xu L, Zhang L, Gao X, Li J, Gao H. 2021. KCRR: A nonlinear machine learning with a modified genomic similarity matrix improved the genomic prediction efficiency. Briefings in Bioinformatics22, 6.

Aruna S, Rajagopalan D S P. 2013. A novel SVM based CSSFFS feature selection algorithm for detecting breast cancer. International Journal of Computer Applications, 31, 8.

Bowler A L, Pound M P, Watson N J. 2022. A review of ultrasonic sensing and machine learning methods to monitor industrial processes. Ultrasonics124, 106776.

Breiman L. 2001. Random forests. Machine Learning45, 5–32.

Byvatov E, Schneider G. 2003. Support vector machine applications in bioinformatics. Applied Bioinformatics2, 67–77.

Cherkassky V, Ma Y. 2004. Practical selection of SVM parameters and noise estimation for SVM regression. Neural Netw17, 113–126.

Clark S A, Hickey J M, van der Werf J H. 2011. Different models of genetic variation and their effect on genomic evaluation. Genetics Selection Evolution43, 18.

Cleveland M A, Hickey J M, Forni S. 2012. A common dataset for genomic analysis of livestock populations. GenesGenomesGenetics2, 429–435.

Cortes C, Vapnik V. 1995. Support-vector networks. Machine Learning20, 273–297.

Crossa J, Campos G d l, Pérez P, Gianola D, Burgueño J, Araus J L, Makumbi D, Singh R P, Dreisigacker S, Yan J, Arief V, Banziger M, Braun H J. 2010. Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular markers. Genetics186, 713–724.

Crossa J, Pérez P, Hickey J, Burgueño J, Ornella L, Cerón-Rojas J, Zhang X, Dreisigacker S, Babu R, Li Y, Bonnett D, Mathews K. 2014. Genomic prediction in CIMMYT maize and wheat breeding programs. Heredity112, 48–60.

Crossa J, Pérez-Rodríguez P, Cuevas J, Montesinos-López O, Jarquín D, de Los Campos G, Burgueño J, González-Camacho J M, Pérez-Elizalde S, Beyene Y, Dreisigacker S, Singh R, Zhang X, Gowda M, Roorkiwal M, Rutkoski J, Varshney R K. 2017. Genomic selection in plant breeding: Methods, models, and perspectives. Trends in Plant Science22, 961–975.

Daetwyler H D, Pong-Wong R, Villanueva B, Woolliams J A. 2010. The impact of genetic architecture on genome-wide evaluation methods. Genetics185, 1021–1031.

Gao X, Jia B, Li G, Ma X. 2022. Calorific value forecasting of coal gangue with hybrid kernel function-support vector regression and genetic algorithm. Energies15, 18.

García-Ruiz A, Cole J B, VanRaden P M, Wiggans G R, Ruiz-López F J, Van Tassell C P. 2016. Changes in genetic selection differentials and generation intervals in US Holstein dairy cattle as a result of genomic selection. Proceedings of the National Academy of Sciences of the United States of America113, E3995–E4004.

Gianola D, Campos G, Gonzalez-Recio O, Long N, Okut H, Rosa G, Weigel K, Wu X L. 2018. Statistical learning methods for genome-based analysis of quantitative traits. ConferenceMachine Learning.

Gianola D, Okut H, Weigel K A, Rosa G J. 2011. Predicting complex quantitative traits with Bayesian neural networks: A case study with Jersey cows and wheat. BMC Genetics12, 87.

Goddard M, Hayes B. 2008. Genomic selection. Journal of Animal Breeding and Genetics=Zeitschrift für Tierzüchtung und Züchtungsbiologie124, 323–330.

González-Camacho J M, Ornella L, Pérez-Rodríguez P, Gianola D, Dreisigacker S, Crossa J. 2018. Applications of machine learning methods to genomic selection in breeding wheat for rust resistance. Plant Genome11, 2.

Gonzalez-Recio O, Forni S. 2011. Genome-wide prediction of discrete traits using Bayesian regressions and machine learning. Genetics Selection Evolution43, 7.

Gonzalez-Recio O, Rosa G, Gianola D. 2014. Machine learning methods and predictive ability metrics for genome-wide prediction of complex traits. Livestock Science166, 217–231.

Habier D, Fernando R L, Kizilkaya K, Garrick D J. 2011. Extension of the bayesian alphabet for genomic selection. BMC Bioinformatics12, 186.

Handelman G S, Kok H K, Chandra R V, Razavi A H, Lee M J, Asadi H. 2018. eDoctor: Machine learning and the future of medicine. Journal of Internal Medicine284, 603–619.

Hansen K B, Borch C. 2021. The absorption and multiplication of uncertainty in machine-learning-driven finance. British Journal of Sociology72, 1015–1029.

Hayes B J, Bowman P J, Chamberlain A J, Goddard M E. 2009. Invited review: Genomic selection in dairy cattle: Progress and challenges. Journal of Dairy Science92, 433–443.

Heffner E L, Sorrells M E, Jannink J L. 2009. Genomic selection for crop improvement. Crop Science49, 1–12.

Huang S, Cai N, Pacheco P P, Narrandes S, Wang Y, Xu W. 2018. Applications of support vector machine (SVM) learning in cancer genomics. Cancer Genomics Proteomics15, 41–51.

Ibrikci T, Ustun D, Kaya I E. 2012. Diagnosis of several diseases by using combined kernels with Support Vector Machine. Journal of Medical Systems36, 1831–1840.

Kung S Y. 2014. Kernel Methods and Machine Learning. Cambridge University Press, Cambridge.

LeCun Y, Bengio Y, Hinton G. 2015. Deep learning. Nature521, 436–444.

Legarra A, Aguilar I, Misztal I. 2009. A relationship matrix including full pedigree and genomic information. Journal of Dairy Science92, 4656–4663.

Liang M, Miao J, Wang X, Chang T, An B, Duan X, Xu L, Gao X, Zhang L, Li J, Gao H. 2021. Application of ensemble learning to genomic selection in Chinese simmental beef cattle. Journal of Animal Breeding and Genetics138, 291–299.

Lillehammer M, Meuwissen T H, Sonesson A K. 2013. Genomic selection for two traits in a maternal pig breeding scheme. Journal of Animal Science91, 3079–3087.

Long N, Gianola D, Rosa G J, Weigel K A. 2011. Application of support vector regression to genome-assisted prediction of quantitative traits. Theoretical and Applied Genetics123, 1065–1074.

Meuwissen T H, Hayes B J, Goddard M E. 2001. Prediction of total genetic value using genome-wide dense marker maps. Genetics157, 1819–1829.

Montesinos-López O A, Martín-Vallejo J, Crossa J, Gianola D, Hernández-Suárez C M, Montesinos-López A, Juliana P, Singh R. 2019. A benchmarking between deep learning, support vector machine and bayesian threshold best linear unbiased prediction for predicting ordinal traits in plant breeding. GenesGenomesGenetics9, 601–618.

Ogutu J O, Piepho H P, Schulz-Streeck T. 2011. A comparison of random forests, boosting and support vector machines for genomic selection. BMC Proceedings(Suppl 3), S11.

Ornella L, Pérez P, Tapia E, González-Camacho J M, Burgueño J, Zhang X, Singh S, Vicente F S, Bonnett D, Dreisigacker S, Singh R, Long N, Crossa J. 2014. Genomic-enabled prediction with classification algorithms. Heredity (Edinb), 112, 616–626.

Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay É. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research12, 2825–2830.

Schaeffer L R. 2006. Strategy for applying genome-wide selection in dairy cattle. Journal of Animal Breeding and Genetics123, 218–223.

Shrestha D L, Solomatine D P. 2006. Experiments with AdaBoost.RT, an improved boosting scheme for regression. Neural Computation18, 1678–1710.

Smits G F, Jordaan E M. 2002. Improved SVM regression using mixtures of kernels. InProceedings of the 2002 International Joint Conference on Neural Networks. Institute of Electrical and Electronics Engineers, vol. 3. pp. 2785–2790.

Srivastava S, Lopez B I, Kumar H, Jang M, Chai H H, Park W, Park J E, Lim D. 2021. Prediction of Hanwoo cattle phenotypes from genotypes using machine learning methods. Animals (Basel), 11, 7.

Sun X, Habier D, Fernando R L, Garrick D J, Dekkers J C. 2011. Genomic breeding value prediction and QTL mapping of QTLMAS2010 data using Bayesian Methods. BMC Proceedings(Suppl 3), S13.

Tian Z, Li S, Wang Y, Wang X. 2017. Wind power prediction method based on hybrid kernel function support vector machine. Wind Engineering42, 252–264.

Tibshirani R. 2011. Regression shrinkage and selection via the lasso: A Retrospective. Journal of the Royal Statistical Society Series (Statistical Methodology), 73, 273–282.

VanRaden P M. 2008. Efficient methods to compute genomic predictions. Journal of Dairy Science91, 4414–4423.

Varona L, Legarra A, Toro M A, Vitezica Z G. 2018. Non-additive effects in genomic selection. Frontiers in Genetics9, 78.

Wang K, Abid M A, Rasheed A, Crossa J, Hearne S, Li H. 2023. DNNGP, a deep neural network-based method for genomic prediction using multi-omics data in plants. Molecular Plant16, 279–293.

Wang X, Shi S, Wang G, Luo W, Wei X, Qiu A, Luo F, Ding X. 2022. Using machine learning to improve the accuracy of genomic prediction of reproduction traits in pigs. Journal of Animal Science and Biotechnology13, 60.

Weiskittel T M, Correia C, Yu G T, Ung C Y, Kaufmann S H, Billadeau D D, Li H. 2021. The trifecta of single-cell, systems-biology, and machine-learning approaches. Genes (Basel), 12, 7.

Whittaker J C, Thompson R, Denham M C. 2000. Marker-assisted selection using ridge regression. Genetics Research75, 249–252.

Wolc A, Stricker C, Arango J, Settar P, Fulton J E, O’Sullivan N P, Preisinger R, Habier D, Fernando R, Garrick D J, Lamont S J, Dekkers J C. 2011. Breeding value prediction for production traits in layer chickens using pedigree or genomic relationships in a reduced animal model. Genetics Selection Evolution43, 5.

Yi N, Xu S. 2008. Bayesian LASSO for quantitative trait loci mapping. Genetics179, 1045–1055.

Yin L, Zhang H, Tang Z, Yin D, Fu Y, Yuan X, Li X, Liu X, Zhao S. 2023. HIBLUP: An integration of statistical models on the BLUP framework for efficient genetic evaluation using big genomic data. Nucleic Acids Research51, 3501–3512.

Yin L, Zhang H, Zhou X, Yuan X, Zhao S, Li X, Liu X. 2020. KAML: Improving genomic prediction accuracy of complex traits using machine learning determined parameters. Genome Biology21, 146.

Zhao D, Liu H, Zheng Y, He Y, Lu D, Lyu C. 2019. Whale optimized mixed kernel function of support vector machine for colorectal cancer diagnosis. Journal of Biomedical Informatics92, 103124.

Zhao W, Lai X, Liu D, Zhang Z, Ma P, Wang Q, Zhang Z, Pan Y. 2020. Applications of support vector machine in genomic prediction in pig and maize populations. Frontiers in Genetics11, 598318.

[1] Xi Tang, Lei Xie, Min Yan, Longyun Li, Tianxiong Yao, Siyi Liu, Wenwu Xu, Shijun Xiao, Nengshui Ding, Zhiyan Zhang, Lusheng Huang . Genomic selection for meat quality traits based on VIS/NIR spectral information[J]. >Journal of Integrative Agriculture, 2025, 24(1): 235-245.
[2] Xianglin Zhang, Jie Xue, Songchao Chen, Zhiqing Zhuo, Zheng Wang, Xueyao Chen, Yi Xiao, Zhou Shi. Improving model performance in mapping cropland soil organic matter using time-series remote sensing data[J]. >Journal of Integrative Agriculture, 2024, 23(8): 2820-2841.
[3] Xiaogang He, Zirong Li, Sicheng Guo, Xingfei Zheng, Chunhai Liu, Zijie Liu, Yongxin Li, Zheming Yuan, Lanzhi Li. Epistasis-aware genome-wide association studies provide insights into the efficient breeding of high-yield and high-quality rice[J]. >Journal of Integrative Agriculture, 2024, 23(8): 2541-2556.
[4] Hui Chen, Hongxing Chen, Song Zhang, Shengxi Chen, Fulang Cen, Quanzhi Zhao, Xiaoyun Huang, Tengbing He, Zhenran Gao. Comparison of CWSI and Ts-Ta-VIs in moisture monitoring of dryland crops (sorghum and maize) based on UAV remote sensing[J]. >Journal of Integrative Agriculture, 2024, 23(7): 2458-2475.
[5] LI Qian-chuan, XU Shi-wei, ZHUANG Jia-yu, LIU Jia-jia, ZHOU Yi, ZHANG Ze-xi. Ensemble learning prediction of soybean yields in China based on meteorological data[J]. >Journal of Integrative Agriculture, 2023, 22(6): 1909-1927.
No Suggested Reading articles found!