Journal of Integrative Agriculture ›› 2025, Vol. 24 ›› Issue (9): 3574-3582.DOI: 10.1016/j.jia.2024.03.071

• • 上一篇    下一篇

利用扩展的one-hot编码提高CNN模型对猪重要经济性状的基因组预测准确性

  

  • 收稿日期:2023-07-04 修回日期:2024-03-27 接受日期:2024-02-21 出版日期:2025-09-20 发布日期:2025-08-11

Enhancing the genomic prediction accuracy of swine agricultural economic traits using an expanded one-hot encoding in CNN models

Zishuai Wang1, 2*, Wangchang Li3, 4*, Zhonglin Tang1, 2, 4#   

  1. 1 Kunpeng Institute of Modern Agriculture at Foshan, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518000, China

    2 Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518000, China

    3 College of Animal Science & Technology, Guangxi University, Nanning 530004, China

    4 Guangxi Engineering Centre for Resource Development of Bama Xiang Pig, Hechi 547500, China

  • Received:2023-07-04 Revised:2024-03-27 Accepted:2024-02-21 Online:2025-09-20 Published:2025-08-11
  • About author:Zishuai Wang, E-mail: wangzishuai@caas.cn; Wangchang Li, E-mail: liwangchang1019@163.com; #Correspondence Zhonglin Tang, E-mail: tangzhonglin@caas.cn * These authors contributed equally to this study.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China (32102513), the National Key Scientific Research Project (2023YFF1001100), the Shenzhen Innovation and Entrepreneurship Plan-Major Special Project of Science and Technology, China (KJZD20230923115003006) and the Innovation Project of Chinese Academy of Agricultural Sciences (CAAS-ZDRW202006).

摘要:

在动植物育种领域,深度学习(DL)方法如多层感知机(MLPs)和卷积神经网络(CNNs)已经广泛应用于预测关键的农业经济性状。然而,提升深度学习模型在基因组预测中的准确性仍然是一项具有挑战性的任务。本研究首先搜集了已公开发表的2797头杜洛克猪的基因型和6种经济性状的表型数据。通过GWAS分析,我们得到了每个单核苷酸多态性(SNP)与表型之间的相关性,并根据这些相关性信息定义了不同数量的SNP数据集(SNP数量分别为0.5k、1k、5k、10k、20k和30k)。我们采用了均方误差(Mean Square Error,MSE)的方法来评估CNN模型在不同数量的SNP数据集下的预测性能。研究结果显示,CNN模型在使用包含1,000个SNPs的数据集时达到了最佳的预测效果(MSE最小)。此外,我们开发了一种新的基因型编码方式,与传统的基因型one-hot编码方法不同,新方法将16种不同的基因型编码成八位的二进制变量作为CNN模型的输入。研究发现,相较于传统的one-hot编码方法,这种新的编码方式显著提高了CNN模型对猪重要经济性状的预测准确性。全基因组选择育种在改进育种策略方面扮演着至关重要的角色。与传统的基因组选择方法不同,深度学习模型(如CNN)能够揭示基因组中位点之间的复杂相互作用,包括上位效应和加性效应等。然而,尽管深度学习模型具有这方面的优势,但在许多情况下,其对于性状的预测准确性相较于线性模型略显不足。本研究针对猪的重要经济性状开发了一种能够显著提高CNN模型预测准确性的one-hot编码方法。这一方法的开发为深度学习模型在猪全基因组选择育种中的应用提供了新思路。未来的研究方向可以考虑引入先进的数据预处理技术,以进一步增强深度学习方法在这一领域的性能。这将有助于更全面、精确地利用基因组信息,推动育种领域的创新和进步。

Abstract:

Deep learning (DL) methods like multilayer perceptrons (MLPs) and convolutional neural networks (CNNs) have been applied to predict the complex traits in animal and plant breeding.  However, improving the genomic prediction accuracy still presents significant challenges.  In this study, we applied CNNs to predict swine traits using previously published data.  Specifically, we extensively evaluated the CNN model’s performance by employing various sets of single nucleotide polymorphisms (SNPs) and concluded that the CNN model achieved optimal performance when utilizing SNP sets comprising 1,000 SNPs.  Furthermore, we adopted a novel approach using the one-hot encoding method that transforms the 16 different genotypes into sets of eight binary variables.  This innovative encoding method significantly enhanced the CNN’s prediction accuracy for swine traits, outperforming the traditional one-hot encoding techniques.  Our findings suggest that the expanded one-hot encoding method can improve the accuracy of DL methods in the genomic prediction of swine agricultural economic traits.  This discovery has significant implications for swine breeding programs, where genomic prediction is pivotal in improving breeding strategies.  Furthermore, future research endeavors can explore additional enhancements to DL methods by incorporating advanced data pre-processing techniques. 

Key words: swine, agricultural economic traits ,  genomic prediction ,  deep learning ,  one-hot encoding ,  convolutional neural networks (CNNs)