Please wait a minute...
Journal of Integrative Agriculture
Advanced Online Publication | Current Issue | Archive | Adv Search
Using mixed kernel support vector machine to improve the predictive accuracy of genome selection
Jinbu Wang1, Wencheng Zong1, Liangyu Shi2, Mianyan Li1, Jia Li1, 3, Deming Ren1, 4, Fuping Zhao1, Lixian Wang1#, Ligang Wang1#

1Key Laboratory of Animal Genetics, Breeding and Reproduction (poultry) of Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China

2School of Animal Science and Nutritional Engineering, Wuhan Polytechnic University, Wuhan 430023, China

3College of Animal Science and Technology, Beijing University of Agriculture, Beijing 102206, China

4College of Animal Science and Technology, Qingdao Agricultural University, Qingdao 266109, China

Download:  PDF in ScienceDirect  
Export:  BibTeX | EndNote (RIS)      
摘要  传统的基因组选择参数模型无法更好的拟合日益庞大的测序数据并准确地捕捉其中的复杂效应,机器学习模型在处理相关问题上展现出巨大的潜力。本研究引入了混合核函数概念,并首次在支持向量机回归算法(SVR)中使用拉普拉斯核函数 SVR_L)与余弦核函数(SVR_C),探索基因组选择中支持向量机回归算法的性能。首先,我们进行了权重参数寻优。结果显示,当全局核函数(高斯核,拉普拉斯核)与Sigmoid核函数混合时,多数情况下权重参数为0.9时取得最高准确性。当全局核函数与多项式核函数混合时,权重参数最佳选择为0.1。其次,我们使用预测准确性、均方误差(MSE)和平均绝对误差(MAE)作为评价指标,对六个单核函数(SVR_L、SVR_C、SVR_G、SVR_P、SVR_S、SVR_L四个混合核函数(SVR_GS、SVR_GP、SVR_LS、SVR_LP两种传统参数模型(GBLUP、BayesB)以及两种流行的机器学习模型(RF、KcRR)进行基因组育种值预测性能的比较。结果表明,在大多数情况下,混合核函数模型的性能优于GBLUP、BayesB和单一核函数。例如,对于猪数据集中的性状1(T1,遗传力为0.07)SVR_GS的预测准确性较 GBLUP提10%,较SVR_G和SVR_S分别提高约4.4%,18.6%。对于小麦数据集中的环境1(E1),SVR_GS 的预测准确性GBLUP高13.3%。在单核函数中,拉普拉斯核函数和高斯核函数产生相似的结果,但高斯核函数的表现更优。与单一核函数相比,混合核函数的预测MSE和MAE明显降低。此外,就运行时间而言,在猪数据集中SVR_GS和SVR_GP比GBLUP快3倍左右,与单一核函数模型相比,运行时间仅略有增加。综上所述,SVR的混合核函数模型表现出速度和精度的优势,特别是SVR_GS模型对于基因组选择具有重要的应用潜力。本研究对SVR算法的参数寻优及混合核应用提供了重要参考。

Abstract  The advantages of genome selection (GS) in animal and plant breeding are self-evident. Traditional parametric models have disadvantage in better fit the increasingly large sequencing data and capture complex effects accurately. Machine learning models have demonstrated remarkable potential in addressing these challenges. In this study, we introduced the concept of mixed kernel functions to explore the performance of support vector machine regression (SVR) in GS. Six single kernel functions (SVR_L, SVR_C, SVR_G, SVR_P, SVR_S, SVR_L) and four mixed kernel functions (SVR_GS, SVR_GP, SVR_LS, SVR_LP) were used to predict genome breeding values. The prediction accuracy, mean squared error (MSE) and mean absolute error (MAE) were used as evaluation indicators to compare with two traditional parametric models (GBLUP, BayesB) and two popular machine learning models (RF, KcRR). The results indicate that in most cases, the performance of the mixed kernel function model significantly outperforms that of GBLUP, BayesB and single kernel function. For instance, for T1 in the pig dataset, the predictive accuracy of SVR_GS is improved by 10% compared to GBLUP, and by approximately 4.4 and 18.6% compared to SVR_G and SVR_S respectively. For E1 in the wheat dataset, SVR_GS achieves 13.3% higher prediction accuracy than GBLUP. Among single kernel functions, the Laplacian and Gaussian kernel functions yield similar results, with the Gaussian kernel function performing better. The mixed kernel function notably reduces the MSE and MAE when compared to all single kernel functions. Furthermore, regarding runtime, SVR_GS and SVR_GP mixed kernel functions run approximately three times faster than GBLUP in the pig dataset, with only a slight increase in runtime compared to the single kernel function model. In summary, the mixed kernel function model of SVR demonstrates speed and accuracy competitiveness, and the model such as SVR_GS has important application potential for GS.
Keywords:  genome selection       machine learning              support vector machine              kernel function              mixed kernel function  
Online: 26 April 2024  
Fund: This work was supported by the China Agriculture Research System of MOF and MARA, National Natural Science Foundation of China (31872337, 31501919), and Agricultural Science and Technology Innovation Project (ASTIP-IAS02).
About author:  Jinbu Wang, E-mail: w18439393365@163.com; #Correspondence Lixian Wang, E-mail: iaswlx@263.net; Ligang Wang, E-mail: wangligang01@caas.cn

Cite this article: 

Jinbu Wang, Wencheng Zong, Liangyu Shi, Mianyan Li, Jia Li, Deming Ren, Fuping Zhao, Lixian Wang, Ligang Wang. 2024. Using mixed kernel support vector machine to improve the predictive accuracy of genome selection. Journal of Integrative Agriculture, Doi:10.1016/j.jia.2024.03.083

Abdollahi-Arpanahi R, Gianola D, Penagaricano F. 2020. Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes. Genet Sel Evol, 52, 12.

Aggarwal C C, Hinneburg A, Keim D A. 2001. On the Surprising Behavior of Distance Metrics in High Dimensional Space. In: Van den Bussche J, Vianu V eds., Database Theory — ICDT 2001. Springer Berlin Heidelberg, Berlin, Heidelberg. pp. 420-434.

Alves A A C, Espigolan R, Bresolin T, Costa R M, Fernandes Júnior G A, Ventura R V, Carvalheiro R, Albuquerque L G. 2021. Genome-enabled prediction of reproductive traits in Nellore cattle using parametric models and machine learning methods. Anim Genet, 52, 32-46.

An B, Liang M, Chang T, Duan X, Du L, Xu L, Zhang L, Gao X, Li J, Gao H. 2021. KCRR: a nonlinear machine learning with a modified genomic similarity matrix improved the genomic prediction efficiency. Brief Bioinform, 22.

Aruna S, Rajagopalan D S P. 2013. A Novel SVM based CSSFFS Feature Selection Algorithm for Detecting Breast Cancer. international journal of computer applications.

Bowler A L, Pound M P, Watson N J. 2022. A review of ultrasonic sensing and machine learning methods to monitor industrial processes. Ultrasonics, 124, 106776.

Breiman L. 2001. Random Forests. Machine Learning, 45, 5-32.

Byvatov E, Schneider G. 2003. Support vector machine applications in bioinformatics. Applied bioinformatics, 2, 67-77.

Cherkassky V, Ma Y. 2004. Practical selection of SVM parameters and noise estimation for SVM regression. Neural Netw, 17, 113-126.

Clark S A, Hickey J M, van der Werf J H. 2011. Different models of genetic variation and their effect on genomic evaluation. Genet Sel Evol, 43, 18.

Cleveland M A, Hickey J M, Forni S. 2012. A common dataset for genomic analysis of livestock populations. G3 (Bethesda), 2, 429-435.

Cortes C, Vapnik V. 1995. Support-vector networks. Machine Learning, 20, 273-297.

Crossa J, Campos G d l, Pérez P, Gianola D, Burgueño J, Araus J L, Makumbi D, Singh R P, Dreisigacker S, Yan J, Arief V, Banziger M, Braun H-J. 2010. Prediction of Genetic Values of Quantitative Traits in Plant Breeding Using Pedigree and Molecular Markers. Genetics, 186, 713-724.

Crossa J, Pérez-Rodríguez P, Cuevas J, Montesinos-López O, Jarquín D, de Los Campos G, Burgueño J, González-Camacho J M, Pérez-Elizalde S, Beyene Y, Dreisigacker S, Singh R, Zhang X, Gowda M, Roorkiwal M, Rutkoski J, Varshney R K. 2017. Genomic Selection in Plant Breeding: Methods, Models, and Perspectives. Trends Plant Sci, 22, 961-975.

Crossa J, Pérez P, Hickey J, Burgueño J, Ornella L, Cerón-Rojas J, Zhang X, Dreisigacker S, Babu R, Li Y, Bonnett D, Mathews K. 2014. Genomic prediction in CIMMYT maize and wheat breeding programs. Heredity, 112, 48-60.

Daetwyler H D, Pong-Wong R, Villanueva B, Woolliams J A. 2010. The impact of genetic architecture on genome-wide evaluation methods. Genetics, 185, 1021-1031.

Gao X, Jia B, Li G, Ma X. 2022. Calorific Value Forecasting of Coal Gangue with Hybrid Kernel Function–Support Vector Regression and Genetic Algorithm. Energies.

García-Ruiz A, Cole J B, VanRaden P M, Wiggans G R, Ruiz-López F J, Van Tassell C P. 2016. Changes in genetic selection differentials and generation intervals in US Holstein dairy cattle as a result of genomic selection. Proc Natl Acad Sci U S A, 113, E3995-4004.

Gianola D, Campos G, Gonzalez-Recio O, Long N, Okut H, Rosa G, Weigel K, Wu X L. 2018. Statistical Learning Methods For Genome-based Analysis Of Quantitative Traits.

Gianola D, Okut H, Weigel K A, Rosa G J. 2011. Predicting complex quantitative traits with Bayesian neural networks: a case study with Jersey cows and wheat. BMC Genet, 12, 87.

Goddard M, Hayes B. 2008. Genomic selection. Journal of animal breeding and genetics = Zeitschrift für Tierzüchtung und Züchtungsbiologie, 124, 323-330.

González-Camacho J M, Ornella L, Pérez-Rodríguez P, Gianola D, Dreisigacker S, Crossa J. 2018. Applications of Machine Learning Methods to Genomic Selection in Breeding Wheat for Rust Resistance. Plant Genome, 11.

Gonzalez-Recio O, Forni S. 2011. Genome-wide prediction of discrete traits using Bayesian regressions and machine learning. Genet Sel Evol, 43, 7.

Gonzalez-Recio O, Rosa G, Gianola D. 2014. Machine learning methods and predictive ability metrics for genome-wide prediction of complex traits. Livestock Science, 166.

Habier D, Fernando R L, Kizilkaya K, Garrick D J. 2011. Extension of the bayesian alphabet for genomic selection. BMC Bioinformatics, 12, 186.

Handelman G S, Kok H K, Chandra R V, Razavi A H, Lee M J, Asadi H. 2018. eDoctor: machine learning and the future of medicine. J Intern Med, 284, 603-619.

Hansen K B, Borch C. 2021. The absorption and multiplication of uncertainty in machine-learning-driven finance. Br J Sociol, 72, 1015-1029.

Hayes B J, Bowman P J, Chamberlain A J, Goddard M E. 2009. Invited review: Genomic selection in dairy cattle: progress and challenges. J Dairy Sci, 92, 433-443.

Heffner E L, Sorrells M E, Jannink J-L. 2009. Genomic Selection for Crop Improvement. Crop Science, 49, 1-12.

Huang S, Cai N, Pacheco P P, Narrandes S, Wang Y, Xu W. 2018. Applications of Support Vector Machine (SVM) Learning in Cancer Genomics. Cancer Genomics Proteomics, 15, 41-51.

Ibrikci T, Ustun D, Kaya I E. 2012. Diagnosis of several diseases by using combined kernels with Support Vector Machine. J Med Syst, 36, 1831-1840.

Kung S Y. 2014. Kernel Methods and Machine Learning. Cambridge University Press, Cambridge.

LeCun Y, Bengio Y, Hinton G. 2015. Deep learning. Nature, 521, 436-444.

Legarra A, Aguilar I, Misztal I. 2009. A relationship matrix including full pedigree and genomic information. J Dairy Sci, 92, 4656-4663.

Liang M, Miao J, Wang X, Chang T, An B, Duan X, Xu L, Gao X, Zhang L, Li J, Gao H. 2021. Application of ensemble learning to genomic selection in chinese simmental beef cattle. J Anim Breed Genet, 138, 291-299.

Lillehammer M, Meuwissen T H, Sonesson A K. 2013. Genomic selection for two traits in a maternal pig breeding scheme. J Anim Sci, 91, 3079-3087.

Long N, Gianola D, Rosa G J, Weigel K A. 2011. Application of support vector regression to genome-assisted prediction of quantitative traits. Theor Appl Genet, 123, 1065-1074.

Meuwissen T H, Hayes B J, Goddard M E. 2001. Prediction of total genetic value using genome-wide dense marker maps. Genetics, 157, 1819-1829.

Montesinos-López O A, Martín-Vallejo J, Crossa J, Gianola D, Hernández-Suárez C M, Montesinos-López A, Juliana P, Singh R. 2019. A Benchmarking Between Deep Learning, Support Vector Machine and Bayesian Threshold Best Linear Unbiased Prediction for Predicting Ordinal Traits in Plant Breeding. G3 (Bethesda), 9, 601-618.

Ogutu J O, Piepho H P, Schulz-Streeck T. 2011. A comparison of random forests, boosting and support vector machines for genomic selection. BMC Proc, 5 Suppl 3, S11.

Ornella L, Pérez P, Tapia E, González-Camacho J M, Burgueño J, Zhang X, Singh S, Vicente F S, Bonnett D, Dreisigacker S, Singh R, Long N, Crossa J. 2014. Genomic-enabled prediction with classification algorithms. Heredity (Edinb), 112, 616-626.

Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay É. 2011. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res., 12, 2825–2830.

Schaeffer L R. 2006. Strategy for applying genome-wide selection in dairy cattle. J Anim Breed Genet, 123, 218-223.

Shrestha D L, Solomatine D P. 2006. Experiments with AdaBoost.RT, an Improved Boosting Scheme for Regression. Neural Computation, 18, 1678-1710.

Smits G F, Jordaan E M. 2002. Improved SVM regression using mixtures of kernels. Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290). pp. 2785-2790 vol.2783.

Srivastava S, Lopez B I, Kumar H, Jang M, Chai H H, Park W, Park J E, Lim D. 2021. Prediction of Hanwoo Cattle Phenotypes from Genotypes Using Machine Learning Methods. Animals (Basel), 11.

Sun X, Habier D, Fernando R L, Garrick D J, Dekkers J C. 2011. Genomic breeding value prediction and QTL mapping of QTLMAS2010 data using Bayesian Methods. BMC Proc, 5 Suppl 3, S13.

Tian Z, Li S, Wang Y, Wang X. 2017. Wind power prediction method based on hybrid kernel function support vector machine. Wind Engineering, 42, 252-264.

Tibshirani R. 2011. Regression Shrinkage and Selection via The Lasso: A Retrospective. Journal of the Royal Statistical Society Series B: Statistical Methodology, 73, 273-282.

VanRaden P M. 2008. Efficient methods to compute genomic predictions. J Dairy Sci, 91, 4414-4423.

Varona L, Legarra A, Toro M A, Vitezica Z G. 2018. Non-additive Effects in Genomic Selection. Front Genet, 9, 78.

Wang K, Abid M A, Rasheed A, Crossa J, Hearne S, Li H. 2023. DNNGP, a deep neural network-based method for genomic prediction using multi-omics data in plants. Mol Plant, 16, 279-293.

Wang X, Shi S, Wang G, Luo W, Wei X, Qiu A, Luo F, Ding X. 2022. Using machine learning to improve the accuracy of genomic prediction of reproduction traits in pigs. J Anim Sci Biotechnol, 13, 60.

Weiskittel T M, Correia C, Yu G T, Ung C Y, Kaufmann S H, Billadeau D D, Li H. 2021. The Trifecta of Single-Cell, Systems-Biology, and Machine-Learning Approaches. Genes (Basel), 12.

Whittaker J C, Thompson R, Denham M C. 2000. Marker-assisted selection using ridge regression. Genetics Research, 75, 249-252.

Wolc A, Stricker C, Arango J, Settar P, Fulton J E, O'Sullivan N P, Preisinger R, Habier D, Fernando R, Garrick D J, Lamont S J, Dekkers J C. 2011. Breeding value prediction for production traits in layer chickens using pedigree or genomic relationships in a reduced animal model. Genet Sel Evol, 43, 5.

Yi N, Xu S. 2008. Bayesian LASSO for quantitative trait loci mapping. Genetics, 179, 1045-1055.

Yin L, Zhang H, Tang Z, Yin D, Fu Y, Yuan X, Li X, Liu X, Zhao S. 2023. HIBLUP: an integration of statistical models on the BLUP framework for efficient genetic evaluation using big genomic data. Nucleic Acids Res, 51, 3501-3512.

Yin L, Zhang H, Zhou X, Yuan X, Zhao S, Li X, Liu X. 2020. KAML: improving genomic prediction accuracy of complex traits using machine learning determined parameters. Genome Biol, 21, 146.

Zhao D, Liu H, Zheng Y, He Y, Lu D, Lyu C. 2019. Whale optimized mixed kernel function of support vector machine for colorectal cancer diagnosis. J Biomed Inform, 92, 103124.

Zhao W, Lai X, Liu D, Zhang Z, Ma P, Wang Q, Zhang Z, Pan Y. 2020. Applications of Support Vector Machine in Genomic Prediction in Pig and Maize Populations. Front Genet, 11, 598318. 

[1] Dong Deng, Wenqi Wu, Canxing Duan, Suli Sun, Zhendong Zhu.

A novel pathogen Fusarium cuneirostrum causing common bean (Phaseolus vulgaris) root rot in China [J]. >Journal of Integrative Agriculture, 2024, 23(1): 166-176.

[2] Mu Zeng, Binhu Wang, Lei Liu, Yalan Yang, Zhonglin Tang. Genome-wide association study identifies 12 new genetic loci associated with growth traits in pigs[J]. >Journal of Integrative Agriculture, 2024, 23(1): 217-227.
[3] Jie Cheng, Xiukai Cao, Shengxuan Wang, Jiaqiang Zhang, Binglin Yue, Xiaoyan Zhang, Yongzhen Huang, Xianyong Lan, Gang Ren, Hong Chen. 3D genome organization and its study in livestock breeding[J]. >Journal of Integrative Agriculture, 2024, 23(1): 39-58.
[4] Xiaotong Guo, Xiangju Li, Zheng Li, Licun Peng, Jingchao Chen, Haiyan Yu, Hailan Cui. Effect of mutations on acetohydroxyacid synthase (AHAS) function in Cyperus difformis L.[J]. >Journal of Integrative Agriculture, 2024, 23(1): 177-186.
[5] Simin Liao, Zhibin Xu, Xiaoli Fan, Qiang Zhou, Xiaofeng Liu, Cheng Jiang, Liangen Chen, Dian Lin, Bo Feng, Tao Wang.

Genetic dissection and validation of a major QTL for grain weight on chromosome 3B in bread wheat (Triticum aestivum L.) [J]. >Journal of Integrative Agriculture, 2024, 23(1): 77-92.

[6] Yanan Xu, Yue Wu, Yan Han, Jiqing Song, Wenying Zhang, Wei Han, Binhui Liu, Wenbo Bai. Effect of chemical regulators on the recovery of leaf physiology, dry matter accumulation and translocation, and yield-related characteristics in winter wheat following dry-hot wind[J]. >Journal of Integrative Agriculture, 2024, 23(1): 108-121.
[7] Tingcheng Zhao, Aibin He, Mohammad Nauman Khan, Qi Yin, Shaokun Song, Lixiao Nie.

Coupling of reduced inorganic fertilizer with plant-based organic fertilizer as a promising fertilizer management strategy for colored rice in tropical regions [J]. >Journal of Integrative Agriculture, 2024, 23(1): 93-107.

[8] Atiqur RAHMAN, Md. Hasan Sofiur RAHMAN, Md. Shakil UDDIN, Naima SULTANA, Shirin AKHTER, Ujjal Kumar NATH, Shamsun Nahar BEGUM, Md. Mazadul ISLAM, Afroz NAZNIN, Md. Nurul AMIN, Sharif AHMED, Akbar HOSAIN. Advances in DNA methylation and its role in cytoplasmic male sterility in higher plants[J]. >Journal of Integrative Agriculture, 2024, 23(1): 1-19.
[9] Jingui Wei, Qiang Chai, Wen Yin, Hong Fan, Yao Guo, Falong Hu, Zhilong Fan, Qiming Wang. Grain yield and N uptake of maize in response to increased plant density under reduced water and nitrogen supply conditions[J]. >Journal of Integrative Agriculture, 2024, 23(1): 122-140.
[10] Wan Wang, Zhenjiang Zhang, Weldu Tesfagaber, Jiwen Zhang, Fang Li, Encheng Sun, Lijie Tang, Zhigao Bu, Yuanmao Zhu, Dongming Zhao. Establishment of an indirect immunofluorescence assay for the detection of African swine fever virus antibodies[J]. >Journal of Integrative Agriculture, 2024, 23(1): 228-238.
[11] Yanfei Song, Tai’an Tian, Yichai Chen, Keshi Zhang, Maofa Yang, Jianfeng Liu. A mite parasitoid, Pyemotes zhonghuajia, negatively impacts the fitness traits and immune response of the fall armyworm, Spodoptera frugiperda[J]. >Journal of Integrative Agriculture, 2024, 23(1): 205-216.
[12] Qi Zhang, Wenqin Zhan, Chao Li, Ling Chang, Yi Dong, Jiang Zhang.

Host-induced silencing of MpPar6 confers Myzus persicae resistance in transgenic rape plants [J]. >Journal of Integrative Agriculture, 2024, 23(1): 187-194.

[13] Jie Xue, Xianglin Zhang, Songchao Chen, Bifeng Hu, Nan Wang, Zhou Shi.

Quantifying the agreement and accuracy characteristics of four satellite-based LULC products for cropland classification in China [J]. >Journal of Integrative Agriculture, 2024, 23(1): 283-297.

[14] Qiuyan Yan, Linjia Wu, Fei Dong, Shuangdui Yan, Feng Li, Yaqin Jia, Jiancheng Zhang, Ruifu Zhang, Xiao Huang.

Subsoil tillage enhances wheat productivity, soil organic carbon and available nutrient status in dryland fields [J]. >Journal of Integrative Agriculture, 2024, 23(1): 251-266.

[15] Akmaral Baidyussen, Gulmira Khassanova, Maral Utebayev, Satyvaldy Jatayev, Rystay Kushanova, Sholpan Khalbayeva, Aigul Amangeldiyeva, Raushan Yerzhebayeva, Kulpash Bulatova, Carly Schramm, Peter Anderson, Colin L. D. Jenkins, Kathleen L. Soole, Yuri Shavrukov. Assessment of molecular markers and marker-assisted selection for drought tolerance in barley (Hordeum vulgare L.)[J]. >Journal of Integrative Agriculture, 2024, 23(1): 20-38.
No Suggested Reading articles found!