Please wait a minute...
Journal of Integrative Agriculture
Advanced Online Publication | Current Issue | Archive | Adv Search
Enhancing the genomic prediction accuracy of swine agricultural economic traits using an expanded one-hot encoding in CNN models
Zishuai Wang1, 2*, Wangchang Li3, 4*, Zhonglin Tang1, 2, 4

1. Kunpeng Institute of Modern Agriculture at Foshan, Agricultural Genomics Institute, Chinese Academy of Agricultural Sciences, Shenzhen 518000, China

2. Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518000, China

3. College of Animal Science & Technology, Guangxi University, Nanning 530004, China

4. GuangXi Engineering Centre for Resource Development of Bama Xiang Pig, Bama 547500, China

Download:  PDF in ScienceDirect  
Export:  BibTeX | EndNote (RIS)      
摘要  在动植物育种领域,深度学习(DL)方法如多层感知机(MLPs)和卷积神经网络(CNNs)已经广泛应用于预测关键的农业经济性状。然而,提升深度学习模型在基因组预测中的准确性仍然是一项具有挑战性的任务。本研究首先搜集了已公开发表的2797头杜洛克猪的基因型和6种经济性状的表型数据。通过GWAS分析,我们得到了每个单核苷酸多态性(SNP)与表型之间的相关性,并根据这些相关性信息定义了不同数量的SNP数据集(SNP数量分别为0.5k、1k、5k、10k、20k和30k)。我们采用了均方误差(Mean Square Error,MSE)的方法来评估CNN模型在不同数量的SNP数据集下的预测性能。研究结果显示,CNN模型在使用包含1,000个SNPs的数据集时达到了最佳的预测效果(MSE最小)。此外,我们开发了一种新的基因型编码方式,与传统的基因型one-hot编码方法不同,新方法16种不同的基因型编码成八位的二进制变量作为CNN模型的输入。研究发现,相较于传统的one-hot编码方法,这种新的编码方式显著提高了CNN模型对猪重要经济性状的预测准确性。全基因组选择育种在改进育种策略方面扮演着至关重要的角色。与传统的基因组选择方法不同,深度学习模型(如CNN)能够揭示基因组中位点之间的复杂相互作用,包括上位效应和加性效应等。然而,尽管深度学习模型具有这方面的优势,但在许多情况下,其对于性状的预测准确性相较于线性模型略显不足。本研究针对猪的重要经济性状开发了一种能够显著提高CNN模型预测准确性的one-hot编码方法。这一方法的开发为深度学习模型在猪全基因组选择育种中的应用提供了新思路。未来的研究方向可以考虑引入先进的数据预处理技术,以进一步增强深度学习方法在这一领域的性能。这将有助于更全面、精确地利用基因组信息,推动育种领域的创新和进步。

Abstract  Deep learning (DL) methods like Multilayer Perceptrons (MLPs) and Convolutional Neural Networks (CNNs) have been applied to predict the complex traits in animal and plant breeding. However, improving the genomic prediction accuracy still presents significant challenges. In this study, we applied CNNs to predict swine traits using previously published data. Specifically, we extensively evaluated the CNN model's performance by employing various sets of Single Nucleotide Polymorphisms (SNPs) and concluded that the CNN model achieved optimal performance when utilizing SNP sets comprising 1,000 SNPs. Furthermore, we adopted a novel approach using the one-hot encoding method that transforms the 16 different genotypes into sets of eight binary variables. This innovative encoding method significantly enhanced the CNN’s prediction accuracy for swine traits, outperforming the traditional one-hot encoding techniques. Our findings suggest that the expanded one-hot encode method can improve the accuracy of DL methods in the genomic prediction of swine agricultural economic traits. This discovery has significant implications for swine breeding programs, where genomic prediction is pivotal in improving breeding strategies. Furthermore, future research endeavors can explore additional enhancements to DL methods by incorporating advanced data pre-processing techniques. 
Keywords:  Swine       Agricultural economic traits              Genomic prediction              Deep learning              One-hot encoding              Convolutional Neural Networks  
Online: 24 April 2024  
Fund: This work was supported by the National Natural Science Foundation of China (32102513), the National Key Scientific Research Project (2023YFF1001100), Shenzhen Innovation and Entrepreneurship Plan--Major special project of science and technology (KJZD20230923115003006) and Innovation Project of Chinese Academy of Agricultural Sciences (CAAS-ZDRW202006).
About author:  Zishuai Wang, E-mail: wangzishuai@caas.cn; Wangchang Li, E-mail: liwangchang1019@163.com; #Correspondence Zhonglin Tang, E-mail:tangzhonglin@caas.cn *These authors contributed equally to this study.

Cite this article: 

Zishuai Wang, Wangchang Li, Zhonglin Tang. 2024. Enhancing the genomic prediction accuracy of swine agricultural economic traits using an expanded one-hot encoding in CNN models. Journal of Integrative Agriculture, Doi:10.1016/j.jia.2024.03.071

Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M. 2016. Tensorflow: a system for large-scale machine learning. Osdi. Savannah, GA, USA. pp. 265-283.

Altshuler D, Daly M J, Lander E S. 2008. Genetic mapping in human disease. Science, 322, 881-888.

Asins M J P b. 2002. Present and future of quantitative trait locus analysis in plant breeding. Plant breeding, 121, 281-291.

Badke Y M, Bates R O, Ernst C W, Schwab C, Steibel J P J B g. 2012. Estimation of linkage disequilibrium in four US pig breeds. BMC genomics, 13, 1-10.

Bovenhuis H, Van Arendonk J, Davis G, Elsen J-M, Haley C, Hill W, Baret P, Hetzel D, Nicholas F. 1997. Detection and mapping of quantitative trait loci in farm animals. Livestock Production Science, 52, 135-144.

Browning B L, Tian X, Zhou Y, Browning S R J T A J o H G. 2021. Fast two-stage phasing of large-scale sequence data. The American Journal of Human Genetics, 108, 1880-1890.

Calus M P J a. 2010. Genomic breeding value prediction: methods and procedures. animal, 4, 157-164.

Cao C, Liu F, Tan H, Song D, Shu W, Li W, Zhou Y, Bo X, Xie Z J G, proteomics, bioinformatics. 2018. Deep learning and its applications in biomedicine. Genomics, proteomics

bioinformatics, 16, 17-32.

Collard B C, Mackill D J. 2008. Marker-assisted selection: an approach for precision plant breeding in the twenty-first century. Philosophical Transactions of the Royal Society B: Biological Sciences, 363, 557-572.

Flachowsky G, Meyer U, Gruen M. 2013. Plant and animal breeding as starting points for sustainable agriculture. Sustainable Agriculture Reviews, 12, 201-224.

Goddard M, Hayes B. 2007. Genomic selection. Journal of Animal breeding

Genetics, 124, 323-330.

Guo-feng Y, Yong Y, Zi-Kang H, Xin-yu Z, Yong H J J o I A. 2022. A rapid, low-cost deep learning system to classify strawberry disease based on cloud service. Journal of integrative agriculture, 21, 460-473.

Guo P, Zhu B, Niu H, Wang Z, Liang Y, Chen Y, Zhang L, Ni H, Guo Y, Hay E H A J B b. 2018. Fast genomic prediction of breeding values using parallel Markov chain Monte Carlo with convergence diagnosis. BMC bioinformatics, 19, 1-11.

Heffner E L, Lorenz A J, Jannink J L, Sorrells M E J C s. 2010. Plant breeding with genomic selection: gain per unit time and cost. Crop science, 50, 1681-1690.

Jannink J-L, Lorenz A J, Iwata H J B i f g. 2010. Genomic selection in plant breeding: from theory to practice. Briefings in functional genomics, 9, 166-177.

Jin H, Song Q, Hu X. 2019. Auto-keras: An efficient neural architecture search system. Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. pp. 1946-1956.

Kempthorne O. 1997. Heritability: Uses and abuses. Genetica, 99, 109-112.

Kermany D S, Goldbaum M, Cai W, Valentim C C, Liang H, Baxter S L, McKeown A, Yang G, Wu X, Yan F J c. 2018. Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell, 172, 1122-1131. e1129.

Lee K-T, Byun M-J, Kang K-S, Park E-W, Lee S-H, Cho S, Kim H, Kim K-W, Lee T, Park J-E J P o. 2011. Neuronal genes for subcutaneous fat thickness in human and pig are identified by local genomic sequencing and combined SNP association study. PloS one, 6, e16356.

Lello L, Avery S G, Tellier L, Vazquez A I, de Los Campos G, Hsu S D J G. 2018. Accurate genomic prediction of human height. genetics, 210, 477-497.

Liu L, Feng X, Li H, Cheng Li S, Qian Q, Wang Y J B i B. 2021. Deep learning model reveals potential risk genes for ADHD, especially Ephrin receptor gene EPHA5. Briefings in Bioinformatics, 22, bbab207.

Meuwissen T, Hayes B, Goddard M J A R A B. 2013. Accelerating improvement of livestock with genomic selection. Annu. Rev. Anim. Biosci., 1, 221-237.

Meuwissen T H, Hayes B J, Goddard M J g. 2001. Prediction of total genetic value using genome-wide dense marker maps. genetics, 157, 1819-1829.

Montesinos-López O A, Montesinos-López A, Pérez-Rodríguez P, Barrón-López J A, Martini J W, Fajardo-Flores S B, Gaytan-Lugo L S, Santana-Mancilla P C, Crossa J J B g. 2021. A review of deep learning applications for genomic selection. BMC genomics, 22, 1-23.

Nelson R M, Pettersson M E, Carlborg Ö J T i G. 2013. A century after Fisher: time for a new paradigm in quantitative genetics. Trends in Genetics, 29, 669-676.

Nyquist W E, Baker R J C r i p s. 1991. Estimation of heritability and prediction of selection response in plant populations. 10, 235-322.

PAN S-q, QIAO J-f, Rui W, YU H-l, Cheng W, Taylor K, PAN H-y. 2022. Intelligent diagnosis of northern corn leaf blight with deep learning model. Journal of integrative agriculture, 21, 1094-1105.

Pryce J, Hayes B, Goddard M J J o d s. 2012. Novel strategies to minimize progeny inbreeding while maximizing genetic gain using genomic information. Journal of dairy science, 95, 377-388.

Resende Jr M, Muñoz P, Acosta J, Peter G, Davis J, Grattapaglia D, Resende M, Kirst M J N P. 2012. Accelerating the domestication of trees using genomic selection: accuracy of prediction models across ages and environments. New Phytologist, 193, 617-624.

Schierenbeck S, Pimentel E, Tietze M, Körte J, Reents R, Reinhardt F, Simianer H, König S J J o d s. 2011. Controlling inbreeding and maximizing genetic gain using semi-definite programming with pedigree-based and genomic relationships. Journal of dairy science, 94, 6143-6152.

Tang J, Zhang Z, Yang B, Guo Y, Ai H, Long Y, Su Y, Cui L, Zhou L, Wang X J A-A j o a s. 2017. Identification of loci affecting teat number by genome-wide association studies on three pig populations. Asian-Australasian journal of animal sciences, 30, 1.

Turner S, Armstrong L L, Bradford Y, Carlson C S, Crawford D C, Crenshaw A T, De Andrade M, Doheny K F, Haines J L, Hayes G J C p i h g. 2011. Quality control procedures for genome‐wide association studies. Current protocols in human genetics, 68, 1.19. 11-11.19. 18.

VanRaden P M J J o d s. 2008. Efficient methods to compute genomic predictions. Journal of dairy science, 91, 4414-4423.

Wakchaure R, Ganguly S, Praveen P, Kumar A, Sharma S, Mahajan T J J D M T. 2015. Marker assisted selection (MAS) in animal breeding: a review. J. Drug. Metab. Toxicol, 6, e127.

Wang K, Liu D, Hernandez-Sanchez J, Chen J, Liu C, Wu Z, Fang M, Li N J P o. 2015. Genome wide association analysis reveals new production trait genes in a male Duroc population. PloS one, 10, e0139207.

Wei W, YANG T-l, Rui L, Chen C, Tao L, Kai Z, SUN C-m, LI C-y, ZHU X-k, GUO W-s J J o I A. 2020. Detection and enumeration of wheat grains based on a deep learning method under various scenarios and scales. Journal of integrative agriculture, 19, 1998-2008.

Xi Q, LI Y-z, Su G-Y, Tian H-K, Zhang S, Sun Z-Y, Long Y, WAN F-h, QIAN W-q J J o i a. 2020. MmNet: Identifying Mikania micrantha Kunth in the wild via a deep Convolutional Neural Network. Journal of integrative agriculture, 19, 1292-1300.

Xi T, Lei X, Min Y, LI L-y, YAO T-x, LIU S-y, XU W-w, XIAO S-j, DING N-s, ZHANG Z-y J J o I A. 2023. Genomic selection for meat quality traits based on VIS/NIR spectral information1. Journal of integrative agriculture.

Xie Q, Zhang Z, Chen Z, Sun J, Li M, Wang Q, Pan Y J B. 2023. Integration of Selection Signatures and Protein Interactions Reveals NR6A1, PAPPA2, and PIK3C2B as the Promising Candidate Genes Underlying the Characteristics of Licha Black Pig. Biology, 12, 500.

Yang R, Guo X, Zhu D, Tan C, Bian C, Ren J, Huang Z, Zhao Y, Cai G, Liu D J G. 2021. Accelerated deciphering of the genetic architecture of agricultural economic traits in pigs using a low-coverage whole-genome sequencing strategy. GigaScience, 10, giab048.

Zeng J, Toosi A, Fernando R L, Dekkers J, Garrick D J J G S E. 2013. Genomic selection of purebred animals for crossbred performance in the presence of dominant gene action. Genetics Selection Evolution, 45, 1-17.

Zhou X, Stephens M J N g. 2012. Genome-wide efficient mixed-model analysis for association studies. Nature genetics, 44, 821-824. 

[1] Dong Deng, Wenqi Wu, Canxing Duan, Suli Sun, Zhendong Zhu.

A novel pathogen Fusarium cuneirostrum causing common bean (Phaseolus vulgaris) root rot in China [J]. >Journal of Integrative Agriculture, 2024, 23(1): 166-176.

[2] Mu Zeng, Binhu Wang, Lei Liu, Yalan Yang, Zhonglin Tang. Genome-wide association study identifies 12 new genetic loci associated with growth traits in pigs[J]. >Journal of Integrative Agriculture, 2024, 23(1): 217-227.
[3] Jie Cheng, Xiukai Cao, Shengxuan Wang, Jiaqiang Zhang, Binglin Yue, Xiaoyan Zhang, Yongzhen Huang, Xianyong Lan, Gang Ren, Hong Chen. 3D genome organization and its study in livestock breeding[J]. >Journal of Integrative Agriculture, 2024, 23(1): 39-58.
[4] Xiaotong Guo, Xiangju Li, Zheng Li, Licun Peng, Jingchao Chen, Haiyan Yu, Hailan Cui. Effect of mutations on acetohydroxyacid synthase (AHAS) function in Cyperus difformis L.[J]. >Journal of Integrative Agriculture, 2024, 23(1): 177-186.
[5] Simin Liao, Zhibin Xu, Xiaoli Fan, Qiang Zhou, Xiaofeng Liu, Cheng Jiang, Liangen Chen, Dian Lin, Bo Feng, Tao Wang.

Genetic dissection and validation of a major QTL for grain weight on chromosome 3B in bread wheat (Triticum aestivum L.) [J]. >Journal of Integrative Agriculture, 2024, 23(1): 77-92.

[6] Yanan Xu, Yue Wu, Yan Han, Jiqing Song, Wenying Zhang, Wei Han, Binhui Liu, Wenbo Bai. Effect of chemical regulators on the recovery of leaf physiology, dry matter accumulation and translocation, and yield-related characteristics in winter wheat following dry-hot wind[J]. >Journal of Integrative Agriculture, 2024, 23(1): 108-121.
[7] Tingcheng Zhao, Aibin He, Mohammad Nauman Khan, Qi Yin, Shaokun Song, Lixiao Nie.

Coupling of reduced inorganic fertilizer with plant-based organic fertilizer as a promising fertilizer management strategy for colored rice in tropical regions [J]. >Journal of Integrative Agriculture, 2024, 23(1): 93-107.

[8] Atiqur RAHMAN, Md. Hasan Sofiur RAHMAN, Md. Shakil UDDIN, Naima SULTANA, Shirin AKHTER, Ujjal Kumar NATH, Shamsun Nahar BEGUM, Md. Mazadul ISLAM, Afroz NAZNIN, Md. Nurul AMIN, Sharif AHMED, Akbar HOSAIN. Advances in DNA methylation and its role in cytoplasmic male sterility in higher plants[J]. >Journal of Integrative Agriculture, 2024, 23(1): 1-19.
[9] Jingui Wei, Qiang Chai, Wen Yin, Hong Fan, Yao Guo, Falong Hu, Zhilong Fan, Qiming Wang. Grain yield and N uptake of maize in response to increased plant density under reduced water and nitrogen supply conditions[J]. >Journal of Integrative Agriculture, 2024, 23(1): 122-140.
[10] Wan Wang, Zhenjiang Zhang, Weldu Tesfagaber, Jiwen Zhang, Fang Li, Encheng Sun, Lijie Tang, Zhigao Bu, Yuanmao Zhu, Dongming Zhao. Establishment of an indirect immunofluorescence assay for the detection of African swine fever virus antibodies[J]. >Journal of Integrative Agriculture, 2024, 23(1): 228-238.
[11] Yanfei Song, Tai’an Tian, Yichai Chen, Keshi Zhang, Maofa Yang, Jianfeng Liu. A mite parasitoid, Pyemotes zhonghuajia, negatively impacts the fitness traits and immune response of the fall armyworm, Spodoptera frugiperda[J]. >Journal of Integrative Agriculture, 2024, 23(1): 205-216.
[12] Qi Zhang, Wenqin Zhan, Chao Li, Ling Chang, Yi Dong, Jiang Zhang.

Host-induced silencing of MpPar6 confers Myzus persicae resistance in transgenic rape plants [J]. >Journal of Integrative Agriculture, 2024, 23(1): 187-194.

[13] Jie Xue, Xianglin Zhang, Songchao Chen, Bifeng Hu, Nan Wang, Zhou Shi.

Quantifying the agreement and accuracy characteristics of four satellite-based LULC products for cropland classification in China [J]. >Journal of Integrative Agriculture, 2024, 23(1): 283-297.

[14] Qiuyan Yan, Linjia Wu, Fei Dong, Shuangdui Yan, Feng Li, Yaqin Jia, Jiancheng Zhang, Ruifu Zhang, Xiao Huang.

Subsoil tillage enhances wheat productivity, soil organic carbon and available nutrient status in dryland fields [J]. >Journal of Integrative Agriculture, 2024, 23(1): 251-266.

[15] Akmaral Baidyussen, Gulmira Khassanova, Maral Utebayev, Satyvaldy Jatayev, Rystay Kushanova, Sholpan Khalbayeva, Aigul Amangeldiyeva, Raushan Yerzhebayeva, Kulpash Bulatova, Carly Schramm, Peter Anderson, Colin L. D. Jenkins, Kathleen L. Soole, Yuri Shavrukov. Assessment of molecular markers and marker-assisted selection for drought tolerance in barley (Hordeum vulgare L.)[J]. >Journal of Integrative Agriculture, 2024, 23(1): 20-38.
No Suggested Reading articles found!