Please wait a minute...
Journal of Integrative Agriculture  2025, Vol. 24 Issue (9): 3574-3582    DOI: 10.1016/j.jia.2024.03.071
Animal Science · Veterinary Medicine Advanced Online Publication | Current Issue | Archive | Adv Search |
Enhancing the genomic prediction accuracy of swine agricultural economic traits using an expanded one-hot encoding in CNN models

Zishuai Wang1, 2*, Wangchang Li3, 4*, Zhonglin Tang1, 2, 4#

1 Kunpeng Institute of Modern Agriculture at Foshan, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518000, China

2 Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518000, China

3 College of Animal Science & Technology, Guangxi University, Nanning 530004, China

4 Guangxi Engineering Centre for Resource Development of Bama Xiang Pig, Hechi 547500, China

 Highlights 
● The CNN model achieved the highest genomic prediction accuracy for swine traits when using SNP sets comprising 1,000 markers.
● A novel one-hot encoding strategy representing 16 genotypes with eight binary variables significantly outperformed traditional encoding methods in CNN-based prediction.
The improved CNN framework offers a powerful tool for enhancing genomic prediction accuracy, providing valuable support for data-driven swine breeding programs.
Download:  PDF in ScienceDirect  
Export:  BibTeX | EndNote (RIS)      
摘要  
在动植物育种领域,深度学习(DL)方法如多层感知机(MLPs)和卷积神经网络(CNNs)已经广泛应用于预测关键的农业经济性状。然而,提升深度学习模型在基因组预测中的准确性仍然是一项具有挑战性的任务。本研究首先搜集了已公开发表的2797头杜洛克猪的基因型和6种经济性状的表型数据。通过GWAS分析,我们得到了每个单核苷酸多态性(SNP)与表型之间的相关性,并根据这些相关性信息定义了不同数量的SNP数据集(SNP数量分别为0.5k、1k、5k、10k、20k和30k)。我们采用了均方误差(Mean Square Error,MSE)的方法来评估CNN模型在不同数量的SNP数据集下的预测性能。研究结果显示,CNN模型在使用包含1,000个SNPs的数据集时达到了最佳的预测效果(MSE最小)。此外,我们开发了一种新的基因型编码方式,与传统的基因型one-hot编码方法不同,新方法将16种不同的基因型编码成八位的二进制变量作为CNN模型的输入。研究发现,相较于传统的one-hot编码方法,这种新的编码方式显著提高了CNN模型对猪重要经济性状的预测准确性。全基因组选择育种在改进育种策略方面扮演着至关重要的角色。与传统的基因组选择方法不同,深度学习模型(如CNN)能够揭示基因组中位点之间的复杂相互作用,包括上位效应和加性效应等。然而,尽管深度学习模型具有这方面的优势,但在许多情况下,其对于性状的预测准确性相较于线性模型略显不足。本研究针对猪的重要经济性状开发了一种能够显著提高CNN模型预测准确性的one-hot编码方法。这一方法的开发为深度学习模型在猪全基因组选择育种中的应用提供了新思路。未来的研究方向可以考虑引入先进的数据预处理技术,以进一步增强深度学习方法在这一领域的性能。这将有助于更全面、精确地利用基因组信息,推动育种领域的创新和进步。


Abstract  

Deep learning (DL) methods like multilayer perceptrons (MLPs) and convolutional neural networks (CNNs) have been applied to predict the complex traits in animal and plant breeding.  However, improving the genomic prediction accuracy still presents significant challenges.  In this study, we applied CNNs to predict swine traits using previously published data.  Specifically, we extensively evaluated the CNN model’s performance by employing various sets of single nucleotide polymorphisms (SNPs) and concluded that the CNN model achieved optimal performance when utilizing SNP sets comprising 1,000 SNPs.  Furthermore, we adopted a novel approach using the one-hot encoding method that transforms the 16 different genotypes into sets of eight binary variables.  This innovative encoding method significantly enhanced the CNN’s prediction accuracy for swine traits, outperforming the traditional one-hot encoding techniques.  Our findings suggest that the expanded one-hot encoding method can improve the accuracy of DL methods in the genomic prediction of swine agricultural economic traits.  This discovery has significant implications for swine breeding programs, where genomic prediction is pivotal in improving breeding strategies.  Furthermore, future research endeavors can explore additional enhancements to DL methods by incorporating advanced data pre-processing techniques. 

Keywords:  swine      agricultural economic traits        genomic prediction        deep learning        one-hot encoding        convolutional neural networks (CNNs)  
Received: 04 July 2023   Online: 27 March 2024   Accepted: 21 February 2024
Fund: This work was supported by the National Natural Science Foundation of China (32102513), the National Key Scientific Research Project (2023YFF1001100), the Shenzhen Innovation and Entrepreneurship Plan-Major Special Project of Science and Technology, China (KJZD20230923115003006) and the Innovation Project of Chinese Academy of Agricultural Sciences (CAAS-ZDRW202006).

About author:  Zishuai Wang, E-mail: wangzishuai@caas.cn; Wangchang Li, E-mail: liwangchang1019@163.com; #Correspondence Zhonglin Tang, E-mail: tangzhonglin@caas.cn * These authors contributed equally to this study.

Cite this article: 

Zishuai Wang, Wangchang Li, Zhonglin Tang. 2025. Enhancing the genomic prediction accuracy of swine agricultural economic traits using an expanded one-hot encoding in CNN models. Journal of Integrative Agriculture, 24(9): 3574-3582.

Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M. 2016. Tensorflow: A System for Large-Scale Machine LearningOsdi. Savannah, GA, USA. pp. 265–283.

Altshuler D, Daly M J, Lander E. 2008. Genetic mapping in human disease. Science322, 881–888.

Asins M. 2002. Present and future of quantitative trait locus analysis in plant breeding. Plant Breeding121, 281–291.

Badke Y M, Bates R O, Ernst C W, Schwab C, Steibel J P. 2012. Estimation of linkage disequilibrium in four US pig breeds. BMC Genomics13, 1–10.

Bovenhuis H, Van Arendonk J, Davis G, Elsen J M, Haley C, Hill W, Baret P, Hetzel D, Nicholas F. 1997. Detection and mapping of quantitative trait loci in farm animals. Livestock Production Science52, 135–144.

Browning B L, Tian X, Zhou Y, Browning S R. 2021. Fast two-stage phasing of large-scale sequence data. The American Journal of Human Genetics108, 1880–1890.

Calus M P. 2010. Genomic breeding value prediction: Methods and procedures. Animal4, 157–164.

Cao C, Liu F, Tan H, Song D, Shu W, Li W, Zhou Y, Bo X, Xie Z. 2018. Deep learning and its applications in biomedicine. GenomicsProteomics and Bioinformatics16, 17–32.

Collard B C, Mackill D. 2008. Marker-assisted selection: An approach for precision plant breeding in the twenty-first century. Philosophical Transactions of the Royal Society363, 557–572.

Flachowsky G, Meyer U, Gruen M. 2013. Plant and animal breeding as starting points for sustainable agriculture. Sustainable Agriculture Reviews12, 201–224.

Goddard M, Hayes B. 2007. Genomic selection. Journal of Animal Breeding Genetics124, 323–330.

Guo P, Zhu B, Niu H, Wang Z, Liang Y, Chen Y, Zhang L, Ni H, Guo Y, Hay E H A. 2018. Fast genomic prediction of breeding values using parallel Markov chain Monte Carlo with convergence diagnosis. BMC Bioinformatics19, 1–11.

Heffner E L, Lorenz A J, Jannink J L, Sorrells M E. 2010. Plant breeding with genomic selection: Gain per unit time and cost. Crop Science50, 1681–1690.

Jannink J L, Lorenz A J, Iwata H. 2010. Genomic selection in plant breeding: From theory to practice. Briefings in Functional Genomics9, 166–177.

Jin H, Song Q, Hu X. 2019. Auto-keras: An efficient neural architecture search system. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 1946–1956.

Kempthorne O. 1997. Heritability: Uses and abuses. Genetica99, 109–112.

Kermany D S, Goldbaum M, Cai W, Valentim C C, Liang H, Baxter S L, McKeown A, Yang G, Wu X, Yan F. 2018. Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell172, 1122–1131. e1129.

Lee K T, Byun M J, Kang K S, Park E W, Lee S H, Cho S, Kim H, Kim K W, Lee T, Park J E. 2011. Neuronal genes for subcutaneous fat thickness in human and pig are identified by local genomic sequencing and combined SNP association study. PLoS ONE6, e16356.

Lello L, Avery S G, Tellier L, Vazquez A I, de Los Campos G, Hsu S D. 2018. Accurate genomic prediction of human height. Genetics210, 477–497.

Liu L, Feng X, Li H, Cheng Li S, Qian Q, Wang Y. 2021. Deep learning model reveals potential risk genes for ADHD, especially Ephrin receptor gene EPHA5. Briefings in Bioinformatics22, bbab207.

Meuwissen T, Hayes B, Goddard M. 2013. Accelerating improvement of livestock with genomic selection. Annual Review of Animal Biosciences1, 221–237.

Meuwissen T H, Hayes B J, Goddard M. 2001. Prediction of total genetic value using genome-wide dense marker maps. Genetics157, 1819–1829.

Montesinos-López O A, Montesinos-López A, Pérez-Rodríguez P, Barrón-López J A, Martini J W, Fajardo Flores S B, Gaytan-Lugo L S, Santana Mancilla P C. 2021. A review of deep learning applications for genomic selection. BMC Genomics22, 1–23.

Nelson R M, Pettersson M E, Carlborg Ö. 2013. A century after fisher: Time for a new paradigm in quantitative genetics. Trends in Genetics29, 669–676.

Nyquist W E, Baker R J. 1991. Estimation of heritability and prediction of selection response in plant populations. Critical Reviews in Plant Sciences10, 235–322.

Pan S, Qiao J, Rui W, Yu H, Cheng W, Taylor K, Pan H. 2022. Intelligent diagnosis of northern corn leaf blight with deep learning model. Journal of Integrative Agriculture21, 1094–1105.

Pryce J, Hayes B, Goddard M J. 2012. Novel strategies to minimize progeny inbreeding while maximizing genetic gain using genomic information. Journal of Dairy Science95, 377–388.

Resende Jr M, Muñoz P, Acosta J, Peter G, Davis J, Grattapaglia D, Resende M, Kirst M. 2012. Accelerating the domestication of trees using genomic selection: Accuracy of prediction models across ages and environments. New Phytologist193, 617–624.

Schierenbeck S, Pimentel E, Tietze M, Körte J, Reents R, Reinhardt F, Simianer H, König S. 2011. Controlling inbreeding and maximizing genetic gain using semi-definite programming with pedigree-based and genomic relationships. Journal of Dairy Science94, 6143–6152.

Tang J, Zhang Z, Yang B, Guo Y, Ai H, Long Y, Su Y, Cui L, Zhou L, Wang X. 2017. Identification of loci affecting teat number by genome-wide association studies on three pig populations. Asian–Australasian Journal of Animal Sciences30, 1.

Turner S, Armstrong L L, Bradford Y, Carlson C S, Crawford D C, Crenshaw A T, De Andrade M, Doheny K F, Haines J L, Hayes G. 2011. Quality control procedures for genomewide association studies. Current Protocols in Human Genetics68, 1–19.

VanRaden P M. 2008. Efficient methods to compute genomic predictions. Journal of Dairy Science91, 4414–4423.

Wakchaure R, Ganguly S, Praveen P, Kumar A, Sharma S, Mahajan T. 2015. Marker assisted selection (MAS) in animal breeding: A review. Journal of Drug Metabolism & Toxicology6, e127.

Wang K, Liu D, Hernandez Sanchez J, Chen J, Liu C, Wu Z, Fang M, Li N. 2015. Genome wide association analysis reveals new production trait genes in a male Duroc population. PLoS ONE10, e0139207.

Wei W, Yang T, Rui L, Chen C, Tao L, Kai Z, Sun C, Li C Zhu X, Guo W. 2020. Detection and enumeration of wheat grains based on a deep learning method under various scenarios and scales. Journal of Integrative Agriculture19, 1998–2008.

Xi Q, LI Y, Su G Y, Tian H K, Zhang S, Sun Z Y, Long Y, Wan F, Qian W. 2020. MmNet: Identifying Mikania micrantha Kunth in the wild via a deep convolutional neural network. Journal of Integrative Agriculture19, 1292–1300.

Xi T, Lei X, Min Y, Li L, Yao T, Liu S, Xu W, Xiao S, Ding N, Zhang Z. 2025. Genomic selection for meat quality traits based on VIS/NIR spectral information. Journal of Integrative Agriculture, 24235–245.

Xie Q, Zhang Z, Chen Z, Sun J, Li M, Wang Q, Pan Y. 2023. Integration of selection signatures and protein interactions reveals NR6A1, PAPPA2, and PIK3C2B as the promising candidate genes underlying the characteristics of licha black pig. Biology12, 500.

Yang G F, Yang Y, He Z K, Zhang X Y, He Y. 2022. A rapid, low-cost deep learning system to classify strawberry disease based on cloud service. Journal of Integrative Agriculture21, 460–473.

Yang R, Guo X, Zhu D, Tan C, Bian C, Ren J, Huang Z, Zhao Y, Cai G, Liu D. 2021. Accelerated deciphering of the genetic architecture of agricultural economic traits in pigs using a low-coverage whole-genome sequencing strategy. Giga Science10, giab048.

Zeng J, Toosi A, Fernando R L, Dekkers J, Garrick D. 2013. Genomic selection of purebred animals for crossbred performance in the presence of dominant gene action. Genetics Selection Evolution45, 1–17.

Zhou X, Stephens M. 2012. Genome-wide efficient mixed-model analysis for association studies. Nature Genetics44, 821–824.

[1] Yufan Gao, Fei Yin, Chen Hong, Xiangfu Chen, Hang Deng, Yongjian Liu, Zhenyu Li, Qing Yao. Intelligent field monitoring system for cruciferous vegetable pests using yellow sticky board images and an improved Cascade R-CNN[J]. >Journal of Integrative Agriculture, 2025, 24(1): 220-234.
[2] Wan Wang, Zhenjiang Zhang, Weldu Tesfagaber, Jiwen Zhang, Fang Li, Encheng Sun, Lijie Tang, Zhigao Bu, Yuanmao Zhu, Dongming Zhao. Establishment of an indirect immunofluorescence assay for the detection of African swine fever virus antibodies[J]. >Journal of Integrative Agriculture, 2024, 23(1): 228-238.
[3] SONG Xiang-peng, XIA Ying-ju, XU Lu, ZHAO Jun-jie, WANG Zhen, ZHAO Qi-zu, LIU Ye-bing, ZHANG Qian-yi, WANG Qin. A multiplex real-time PCR assay for simultaneous detection of classical swine fever virus, African swine fever virus and atypical porcine pestivirus[J]. >Journal of Integrative Agriculture, 2023, 22(2): 559-567.
[4] GUO Shi-juan, LÜ Xin-ye, HU Xiang-dong. Optimal design of culling compensation policy under the African swine fever — Based on simulations of typical pig farms in China[J]. >Journal of Integrative Agriculture, 2023, 22(2): 611-622.
[5] PAN Shuai-qun, QIAO Jing-fen, WANG Rui, YU Hui-lin, WANG Cheng, Kerry TAYLOR, PAN Hong-yu. Intelligent diagnosis of northern corn leaf blight with deep learning model[J]. >Journal of Integrative Agriculture, 2022, 21(4): 1094-1105.
[6] WANG Peng-fei, WANG Ming, SHI Zhi-bin, SUN Zhen-zhao, WEI Li-li, LIU Zai-si, WANG Shi-da, HE Xi-jun, WANG Jing-fei. Development of a recombinant pB602L-based indirect ELISA assay for detecting antibodies against African swine fever virus in pigs[J]. >Journal of Integrative Agriculture, 2022, 21(3): 819-825.
[7] XIE Xing,  HAO Fei, WANG Hai-yan, PANG Mao-da, GAN Yuan, LIU Bei-bei, ZHANG Lei, WEI Yan-na, CHEN Rong, ZHANG Zhen-zhen, BAO Wen-bin, BAI Yun, SHAO Guo-qing, XIONG Qi-yan, FENG Zhi-xin. Construction of a telomerase-immortalized porcine tracheal epithelial cell model for swine-origin mycoplasma infection[J]. >Journal of Integrative Agriculture, 2022, 21(2): 504-520.
[8] LI Hui-shang, HU Chen-pei, LÜ Zheng, LI Mei-qi, GUO Xin-zhu. African swine fever and meat prices fluctuation: An empirical study in China based on TVP-VAR model[J]. >Journal of Integrative Agriculture, 2021, 20(8): 2289-2301.
[9] WANG Zi-lin, FENG Ke-ying, GE Xiu-feng, MAI Jia-cheng, WANG Han-chuan, LIU Wen-zi, ZHANG Jia-hui, SHEN Xiang-guang. Effects of 105 traditional Chinese medicines on the detection of β-agonists in medicine extracts and swine urine based on colloidal gold immunochromatographic assay[J]. >Journal of Integrative Agriculture, 2021, 20(6): 1626-1635.
[10] JIANG Cheng-gang, SUN Ying, ZHANG Fan, AI Xin, FENG Xiao-ning, HU Wei, ZHANG Xian-feng, ZHAO Dong-ming, BU Zhi-gao, HE Xi-jun. Viricidal activity of several disinfectants against African swine fever virus[J]. >Journal of Integrative Agriculture, 2021, 20(11): 3084-3088.
[11] WANG Jun, SHI xin-jin, SUN Hai-wei, CHEN Hong-jun. Insights into African swine fever virus immunoevasion strategies[J]. >Journal of Integrative Agriculture, 2020, 19(1): 11-22.
[12] Xiao Wu, Jun Zhu, Hongjian Lin. In-depth observations of fermentative hydrogen production from liquid swine manure using an anaerobic sequencing batch reactor[J]. >Journal of Integrative Agriculture, 2017, 16(06): 1276-1285.
[13] ZHANG Zhe, ZHANG Hao, PAN Rong-yang, WU Long, LI Ya-lan, CHEN Zan-mou, CAI Geng-yuan, LI Jia-qi, WU Zhen-fang. Genetic parameters and trends for production and reproduction traits of a Landrace herd in China[J]. >Journal of Integrative Agriculture, 2016, 15(05): 1069-1075.
[14] WEI Yan-di, PEI Xing-yao, ZHANG Yuan, YU Chen-fang, SUN Hong-lei, LIU Jin-hua, PU Juan. Nested RT-PCR method for the detection of European avian-like H1 swine influenza A virus[J]. >Journal of Integrative Agriculture, 2016, 15(05): 1095-1102.
[15] DENG Xian-bai, DIN Huan-zhong, HUANG Xian-hui, MA Yong-jiang, FAN Xiao-long, YAN Hai-kuo, LU Pei-cheng, LI Wei-cheng, ZENG Zhen-ling. Tissue distribution of deoxynivalenol in piglets following intravenous administration[J]. >Journal of Integrative Agriculture, 2015, 14(10): 2058-2064.
No Suggested Reading articles found!