基于G2PSE堆叠集成的全基因组选择方法

doi:10.3864/j.issn.0578-1752.2025.15.003

Abstract

Abstract:

【Objective】 Genomic selection (GS) is a core technology for predicting individual phenotypes or genetic values from genome-wide marker information, which has important theoretical value and practical significance in agricultural breeding and genetic research. However, high-dimensional feature redundancy and nonlinear relationship modeling are key challenges in genomic selection. A genotype to phenotype stacking ensemble (G2PSE) is proposed, aiming to improve the prediction accuracy and generalization ability, and provide an efficient solution for high-dimensional genomic data analysis. 【Method】 The G2PSE stacking ensemble model framework was constructed, incorporating ten-fold cross-validation, ensemble learning, feature selection (LAR algorithm), and feature enhancement strategies. The model employed random forests (RF), support vector regression (SVR), and gradient boosting regression (GBR) as base learners, with ordinary least squares regression (OLSR) as the meta-learner. Additionally, the impact of meta-learners such as random forest, support vector regression, and neural networks on model performance was evaluated. The G2PSE model consisted of three core submodels: (1) All-feature stacking ensemble (AFSE), which fully utilized all SNP features; (2) LAR-feature stacking ensemble (LFSE), which reduced redundant information through feature selection to improve generalization; (3) LAR-feature enhanced stacking ensemble (LFESE), which combined feature selection with enhancement strategies to optimize prediction capability in high-dimensional data environments. The performance of three feature enhancement variants (AFESE, HFESEⅠ, HFESEⅡ) was explored. Finally, the model was evaluated experimentally on multi-trait datasets of three species, namely wheat, soybean, and tilapia, and further evaluated on an independent test set using the Pepper203 dataset to validate the robustness of the model. 【Result】 The G2PSE model significantly outperformed traditional methods and single machine learning models in two metrics, Pearson correlation coefficient (PCC) and mean absolute error (MAE). Among the three core submodels, LFESE performed the best by combining the feature selection and enhancement strategies, LFSE reduced redundant information and enhanced the generalization ability by feature selection, and AFSE had a significant advantage in comprehensively capturing genotypic global information. In addition, the three feature enhancement variant models further validated the importance of feature quality compared to feature quantity in improving prediction performance. The experiments also showed that the linear regression model performed best in meta-learner selection, while the LFESE and LFSE submodels demonstrated a more balanced performance in terms of computational efficiency. And a reasonable feature selection threshold was crucial for model performance, where the optimal threshold for low-dimensional datasets was 10%-20%, while the optimal threshold for high-dimensional datasets was 1%. Finally, the evaluation on an independent test set proved that the LFESE submodel had the best generalization ability. 【Conclusion】 The G2PSE model significantly improves genomic selection prediction performance through ensemble learning, feature selection, and enhancement strategies.

Key words: genomic selection, stacking ensemble, feature selection, feature enhancement, agricultural breeding

ZHUANG RunJie, LIU HuiMing, WANG ShiYu, LÜ WanPing, WEN YongXian. Genomic Selection Method Based on G2PSE Stacking Ensemble[J].Scientia Agricultura Sinica, 2025, 58(15): 2960-2979.

0
/ / Recommend

Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks

URL: https://www.chinaagrisci.com/EN/10.3864/j.issn.0578-1752.2025.15.003

https://www.chinaagrisci.com/EN/Y2025/V58/I15/2960

Figures/Tables 12

Table 1

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Table 2

Fig. 5

Fig. 6

Table 3

Table 4

Impact of different LAR screening thresholds on the prediction performance of the G2PSE model on Wheat2000"

性状 Traits	筛选SNP数目 Number of selected SNPs	LFSE模型 LFSE model	LFESE模型 LFESE model	HFESEⅠ 模型 HFESEⅠ model	HFESEⅡ 模型 HFESEⅡ model
TKW	337	0.713 (0.562)	0.802 (0.467)	0.710 (0.572)	0.669 (0.589)
	1685	0.710 (0.560)	0.750 (0.643)	0.522 (0.974)	0.667 (0.590)
	3371	0.704 (0.567)	0.694 (0.648)	0.404 (1.099)	0.666 (0.591)
	6742	0.704 (0.568)	0.689 (0.662)	0.373 (1.163)	0.666 (0.591)
	13483	0.702 (0.569)	0.691 (0.660)	0.368 (1.171)	0.666 (0.591)
	20225	0.700 (0.571)	0.724 (0.612)	0.365 (1.176)	0.666(0.591)
TW	337	0.674 (0.564)	0.778 (0.476)	0.709 (0.573)	0.602 (0.613)
	1685	0.665 (0.572)	0.655 (0.822)	0.528 (0.966)	0.600 (0.614)
	3371	0.661 (0.572)	0.635 (0.737)	0.374 (1.170)	0.599 (0.615)
	6742	0.657 (0.577)	0.627 (0.731)	0.373 (1.162)	0.599 (0.615)
	13483	0.658 (0.576)	0.627 (0.733)	0.368 (1.170)	0.599 (0.615)
	20225	0.659 (0.576)	0.628 (0.731)	0.366 (1.175)	0.600 (0.615)
GL	337	0.772 (0.478)	0.839 (0.410)	0.754 (0.510)	0.741 (0.502)
	1685	0.765 (0.486)	0.738 (0.648)	0.522 (0.947)	0.740 (0.504)
	3371	0.763 (0.489)	0.699 (0.654)	0.478 (1.007)	0.739 (0.504)
	6742	0.762 (0.490)	0.703 (0.643)	0.445 (1.031)	0.739 (0.504)
	13483	0.764 (0.488)	0.704 (0.642)	0.442 (1.036)	0.739 (0.504)
	20225	0.761 (0.490)	0.702 (0.644)	0.440 (1.039)	0.739 (0.504)
GW	337	0.750 (0.514)	0.812 (0.451)	0.712 (0.557)	0.732 (0.526)
	1685	0.745 (0.518)	0.743 (0.654)	0.497 (1.010)	0.731 (0.526)
	3371	0.740 (0.524)	0.746 (0.582)	0.420 (1.086)	0.731 (0.527)
	6742	0.741 (0.523)	0.716 (0.634)	0.402 (1.136)	0.731 (0.527)
	13483	0.741 (0.523)	0.714 (0.636)	0.401 (1.137)	0.731 (0.527)
	20225	0.742 (0.523)	0.716 (0.634)	0.400 (1.137)	0.731 (0.527)
GH	337	0.687 (0.567)	0.787 (0.483)	0.677 (0.586)	0.682 (0.567)
	1685	0.686 (0.566)	0.707 (0.705)	0.463 (1.043)	0.681 (0.568)
	3371	0.691 (0.561)	0.695 (0.654)	0.370 (1.170)	0.680 (0.569)
	6742	0.684 (0.566)	0.698 (0.645)	0.395 (1.125)	0.680 (0.569)
	13483	0.686 (0.565)	0.698 (0.645)	0.392 (1.129)	0.680 (0.569)
	20225	0.689 (0.563)	0.699 (0.644)	0.391 (1.132)	0.680 (0.569)
GP	337	0.626 (0.604)	0.746 (0.512)	0.627 (0.622)	0.515 (0.667)
	1685	0.609 (0.618)	0.662 (0.800)	0.414 (1.175)	0.513 (0.668)
	3371	0.604 (0.619)	0.616 (0.759)	0.358 (1.167)	0.512 (0.668)
	6742	0.603 (0.619)	0.617 (0.757)	0.365 (1.165)	0.512 (0.668)
	13483	0.603 (0.619)	0.618 (0.756)	0.357 (1.177)	0.512 (0.668)
	20225	0.602 (0.621)	0.615 (0.760)	0.352 (1.185)	0.512 (0.668)
SDS	337	0.663 (0.599)	0.796 (0.477)	0.694 (0.575)	0.525 (0.694)
	1685	0.656 (0.607)	0.763 (0.606)	0.580 (0.862)	0.523 (0.695)
	3371	0.644 (0.621)	0.714 (0.626)	0.468 (0.979)	0.520 (0.697)
	6742	0.644 (0.623)	0.713 (0.629)	0.488 (0.948)	0.518 (0.698)
	13483	0.643 (0.622)	0.712 (0.631)	0.487 (0.949)	0.521 (0.696)
	20225	0.644 (0.621)	0.707 (0.636)	0.440 (1.025)	0.520 (0.697)
PHT	337	0.501 (0.664)	0.697 (0.566)	0.553 (0.673)	0.279 (0.755)
	1685	0.477 (0.672)	0.626 (0.844)	0.422 (1.131)	0.278 (0.755)
	3371	0.449 (0.682)	0.520 (0.879)	0.334 (1.167)	0.275 (0.756)
	6742	0.452 (0.682)	0.512 (0.894)	0.294 (1.230)	0.276 (0.756)
	13483	0.449 (0.681)	0.515 (0.890)	0.293 (1.229)	0.275 (0.756)
	20225	0.451 (0.683)	0.517 (0.883)	0.311 (1.186)	0.275 (0.756)

Table 4

Table 5

Table 6

References 55

[1]	李棉燕, 王立贤, 赵福平. 机器学习在动物基因组选择中的研究进展. 中国农业科学, 2023, 56(18): 3682-3692. doi: 10.3864/j.issn.0578-1752.2023.18.015.
	LI M Y, WANG L X, ZHAO F P. Research progress on machine learning for genomic selection in animals. Scientia Agricultura Sinica, 2023, 56(18): 3682-3692. doi: 10.3864/j.issn.0578-1752.2023.18.015. (in Chinese)
[2]	VANRADEN P M. Efficient methods to compute genomic predictions. Journal of Dairy Science, 2008, 91(11): 4414-4423. doi: 10.3168/jds.2007-0980 pmid: 18946147
[3]	WHITTAKER J C, CURNOW R N, HALEY C S, THOMPSON R. Using marker-maps in marker-assisted selection. Genetical Research, 1995, 66(3): 255-265.
[4]	JAVID S, BIHAMTA M R, OMIDI M, ABBASI A R, ALIPOUR H, INGVARSSON P K. Genome-Wide Association Study (GWAS) and genome prediction of seedling salt tolerance in bread wheat (Triticum aestivum L.). BMC Plant Biology, 2022, 22(1): 581.
[5]	MEHER P K, RUSTGI S, KUMAR A. Performance of Bayesian and BLUP alphabets for genomic prediction: Analysis, comparison and results. Heredity, 2022, 128(6): 519-530. doi: 10.1038/s41437-022-00539-9 pmid: 35508540
[6]	HAILE T A, WALKOWIAK S, N’DIAYE A, CLARKE J M, HUCL P J, CUTHBERT R D, KNOX R E, POZNIAK C J. Genomic prediction of agronomic traits in wheat using different models and cross- validation designs. Theoretical and Applied Genetics, 2021, 134(1): 381-398.
[7]	KALER A S, PURCELL L C, BEISSINGER T, GILLMAN J D. Genomic prediction models for traits differing in heritability for soybean, rice, and maize. BMC Plant Biology, 2022, 22(1): 87.
[8]	XU Y, MA K X, ZHAO Y, WANG X, ZHOU K, YU G N, LI C, LI P C, YANG Z F, XU C W, XU S Z. Genomic selection: A breakthrough technology in rice breeding. The Crop Journal, 2021, 9(3): 669-677.
[9]	GUNUNDU R, SHIMELIS H, MASHILO J. Genomic selection and enablers for agronomic traits in maize: A review. Plant Breeding, 2023, 142(5): 573-593.
[10]	CESARANI A, MASUDA Y, TSURUTA S, NICOLAZZI E L, VANRADEN P M, LOURENCO D, MISZTAL I. Genomic predictions for yield traits in US Holsteins with unknown parent groups. Journal of Dairy Science, 2021, 104(5): 5843-5853. doi: 10.3168/jds.2020-19789 pmid: 33663836
[11]	ONOGI A, WATANABE T, OGINO A, KUROGI K, TOGASHI K. Genomic prediction with non-additive effects in beef cattle: Stability of variance component and genetic effect estimates against population size. BMC Genomics, 2021, 22(1): 512.
[12]	ABDOLLAHI-ARPANAHI R, LOURENCO D, LEGARRA A, MISZTAL I. Dissecting genetic trends to understand breeding practices in livestock: A maternal pig line example. Genetics, Selection, Evolution, 2021, 53(1): 89.
[13]	YIN C, ZHOU P, WANG Y W, YIN Z J, LIU Y. Using genomic selection to improve the accuracy of genomic prediction for multi-populations in pigs. Animal, 2024, 18(2): 101062.
[14]	MOTA L F M, ARIKAWA L M, SANTOS S W B, FERNANDES JÚNIOR G A, ALVES A A C, ROSA G J M, MERCADANTE M E Z, CYRILLO J N S G, CARVALHEIRO R, ALBUQUERQUE L G. Benchmarking machine learning and parametric methods for genomic prediction of feed efficiency-related traits in Nellore cattle. Scientific Reports, 2024, 14: 6404. doi: 10.1038/s41598-024-57234-4 pmid: 38493207
[15]	BANI S H, VAEZ T R, MANAFIAZAR G, MASOUDI A A, EHSANI A, SHAHINFAR S. Comparing machine learning algorithms and linear model for detecting significant SNPs for genomic evaluation of growth traits in F₂ chickens. Journal of Agricultural Science and Technology, 2024, 26(6): 1261-1274.
[16]	GRINBERG N F, ORHOBOR O I, KING R D. An evaluation of machine-learning for predicting phenotype: Studies in yeast, rice, and wheat. Machine Learning, 2020, 109(2): 251-277. doi: 10.1007/s10994-019-05848-5 pmid: 32174648
[17]	XIANG T, LI T, LI J L, LI X, WANG J. Using machine learning to realize genetic site screening and genomic prediction of productive traits in pigs. FASEB Journal, 2023, 37(6): e22961.
[18]	OMEKA W K M, LIYANAGE D S, LEE S, UDAYANTHA H M V, KIM G, GANESHALINGAM S, JEONG T, JONES D B, MASSAULT C, JERRY D R, LEE J. Genomic prediction model optimization for growth traits of olive flounder (Paralichthys olivaceus). Aquaculture Reports, 2024, 36: 102132.
[19]	MONTESINOS-LÓPEZ O A, GONZALEZ H N, MONTESINOS- LÓPEZ A, DAZA-TORRES M, LILLEMO M, MONTESINOS- LÓPEZ J C, CROSSA J. Comparing gradient boosting machine and Bayesian threshold BLUP for genome-based prediction of categorical traits in wheat breeding. The Plant Genome, 2022, 15(3): e20214.
[20]	ZHAO W, LAI X S, LIU D Y, ZHANG Z Y, MA P P, WANG Q S, ZHANG Z, PAN Y C. Applications of support vector machine in genomic prediction in pig and maize populations. Frontiers in Genetics, 2020, 11: 598318.
[21]	周铂涵, 梅步俊, 吕琦, 王志英, 苏蕊. 机器学习及其在动物遗传育种中的应用研究进展. 中国畜牧兽医, 2024, 51(12): 5348-5358. doi: 10.16431/j.cnki.1671-7236.2024.12.022
	ZHOU B H, MEI B J, LÜ Q, WANG Z Y, SU R. Research progress of machine learning and its application in animal genetics and breeding. China Animal Husbandry & Veterinary Medicine, 2024, 51(12): 5348-5358. (in Chinese)
[22]	MA W L, QIU Z X, SONG J, LI J J, CHENG Q, ZHAI J J, MA C. A deep convolutional neural network approach for predicting phenotypes from genotypes. Planta, 2018, 248(5): 1307-1318. doi: 10.1007/s00425-018-2976-9 pmid: 30101399
[23]	WANG K L, ALI ABID M, RASHEED A, CROSSA J, HEARNE S, LI H H. DNNGP, a deep neural network-based method for genomic prediction using multi-omics data in plants. Molecular Plant, 2023, 16(1): 279-293.
[24]	WU C L, ZHANG Y Y, YING Z W, LI L, WANG J, YU H, ZHANG M C, FENG X Z, WEI X H, XU X G. A transformer-based genomic prediction method fused with knowledge-guided module. Briefings in Bioinformatics, 2023, 25(1): bbad438.
[25]	MONTESINOS-LÓPEZ O A, MONTESINOS-LÓPEZ A, PÉREZ- RODRÍGUEZ P, BARRÓN-LÓPEZ J A, MARTINI J W R, FAJARDO-FLORES S B, GAYTAN-LUGO L S, SANTANA- MANCILLA P C, CROSSA J. A review of deep learning applications for genomic selection. BMC Genomics, 2021, 22(1): 19.
[26]	HASSANALI M, SOLTANAGHAEI M, JAVDANI GANDOMANI T, ZAMANI BOROUJENI F. Exploring stacking methods for software effort estimation with hyperparameter tuning. Cluster Computing, 2025, 28(4): 241.
[27]	ALZUBI R, RAMZAN N, ALZOUBI H, KATSIGIANNIS S. SNPs-based hypertension disease detection via machine learning techniques. 2018 24th International Conference on Automation and Computing (ICAC). September 6-7, 2018, Newcastle Upon Tyne, UK. IEEE, 2018: 1-6.
[28]	MEHARIE M G, MENGESHA W J, GARIY Z A, MUTUKU R N N. Application of stacking ensemble machine learning algorithm in predicting the cost of highway construction projects. Engineering, Construction and Architectural Management, 2022, 29(7): 2836-2853.
[29]	林泳恩, 孟越, 杜懿, 王大洋, 王大刚. 堆叠集成模型径流预报效果的影响因素研究. 水文, 2023, 43(1): 57-61.
	LIN Y E, MENG Y, DU Y, WANG D Y, WANG D G. Study on influence factors about runoff forecasting performance of stacking integrated model. Journal of China Hydrology, 2023, 43(1): 57-61. (in Chinese)
[30]	YOON T, KANG D. Multi-modal stacking ensemble for the diagnosis of cardiovascular diseases. Journal of Personalized Medicine, 2023, 13(2): 373.
[31]	LIANG M, CHANG T P, AN B X, DUAN X H, DU L L, WANG X Q, MIAO J, XU L Y, GAO X, ZHANG L P, LI J Y, GAO H J. A stacking ensemble learning framework for genomic prediction. Frontiers in Genetics, 2021, 12: 600040.
[32]	GU L L, YANG R Q, WANG Z Y, JIANG D, FANG M. Ensemble learning for integrative prediction of genetic values with genomic variants. BMC Bioinformatics, 2024, 25(1): 120.
[33]	YU T X, ZHANG W P, HAN J W, LI F Z, WANG Z H, CAO C Q. An ensemble learning approach for predicting phenotypes from genotypes. 2021 20th International Conference on Ubiquitous Computing and Communications (IUCC/CIT/DSCI/SmartCNS). December 20-22, 2021, London, United Kingdom. IEEE, 2021: 382-389.
[34]	LI S S, YU J, KANG H M, LIU J F. Genomic selection in Chinese Holsteins using regularized regression models for feature selection of whole genome sequencing data. Animals, 2022, 12(18): 2419.
[35]	冯盼峰, 温永仙. 基于随机森林算法的两阶段变量选择研究. 系统科学与数学, 2018, 38(1): 119-130. doi: 10.12341/jssms13325
	FENG P F, WEN Y X. Two-stage stepwise variable selection based on random forests. Journal of Systems Science and Mathematical Sciences, 2018, 38(1): 119-130. (in Chinese) doi: 10.12341/jssms13325
[36]	孙嘉利, 吴清太, 温阳俊, 张瑾. 基于FASTmrEMMA、最小角回归和随机森林的全基因组选择新算法. 南京农业大学学报, 2021, 44(2): 366-372.
	SUN J L, WU Q T, WEN Y J, ZHANG J. A new algorithm of genomics selection based on FASTmrEMMA, least angle regression and random forest. Journal of Nanjing Agricultural University, 2021, 44(2): 366-372. (in Chinese)
[37]	PILES M, BERGSMA R, GIANOLA D, GILBERT H, TUSELL L. Feature selection stability and accuracy of prediction models for genomic prediction of residual feed intake in pigs using machine learning. Frontiers in Genetics, 2021, 12: 611506.
[38]	MCLAREN C G, BRUSKIEWICH R M, PORTUGAL A M, COSICO A B. The International Rice Information System. A platform for meta-analysis of rice crop data. Plant Physiology, 2005, 139(2): 637-642. pmid: 16219924
[39]	CROSSA J, JARQUÍN D, FRANCO J, PÉREZ-RODRÍGUEZ P, BURGUEÑO J, SAINT-PIERRE C, VIKRAM P, SANSALONI C, PETROLI C, AKDEMIR D, et al. Genomic prediction of gene bank wheat landraces. G3, 2016, 6(7): 1819-1834.
[40]	XAVIER A, MUIR W M, RAINEY K M. Assessing predictive properties of genome-wide selection in soybeans. G3, 2016, 6(8): 2611-2616.
[41]	YOSHIDA G M, LHORENTE J P, CORREA K, SOTO J, SALAS D, YÁÑEZ J M. Genome-wide association study and cost-efficient genomic predictions for growth and fillet yield in Nile Tilapia (Oreochromis niloticus). G3, 2019, 9(8): 2597-2607.
[42]	LOZADA D N, SANDHU K S, BHATTA M. Ridge regression and deep learning models for genome-wide selection of complex traits in New Mexican Chile peppers. BMC Genomic Data, 2023, 24(1): 80.
[43]	李娟, 章明清, 许文江, 孔庆波, 姚宝全. 提高三元肥效模型建模成功率的主成分回归技术研究. 土壤学报, 2018, 55(2): 467-478.
	LI J, ZHANG M Q, XU W J, KONG Q B, YAO B Q. Principal component regression technology of ternary fertilizer response model for improving success rate of modeling. Acta Pedologica Sinica, 2018, 55(2): 467-478. (in Chinese)
[44]	NGUYEN T T, HUANG J, WU Q Y, NGUYEN T, LI M. Genome- wide association data classification and SNPs selection using two-stage quality-based Random Forests. BMC Genomics, 2015, 16(Suppl. 2): S5.
[45]	ZHAO W P, LI J C, ZHAO J, ZHAO D D, ZHU X Y. PDD_GBR: Research on evaporation duct height prediction based on gradient boosting regression algorithm. Radio Science, 2019, 54(11): 949-962.
[46]	ZHAO M, YE N. High-dimensional ensemble learning classification: An ensemble learning classification algorithm based on high- dimensional feature space reconstruction. Applied Sciences, 2024, 14(5): 1956.
[47]	HUANG C Y. Feature selection and feature stability measurement method for high-dimensional small sample data based on big data technology. Computational Intelligence and Neuroscience, 2021, 2021(1): 3597051.
[48]	JI Y H, LIANG Y, YANG Z Y, AI N. SW-Net: A novel few-shot learning approach for disease subtype prediction. Biocell, 2023, 47(3): 569-579.
[49]	FU G F, WANG G, DAI X T. An adaptive threshold determination method of feature screening for genomic selection. BMC Bioinformatics, 2017, 18(1): 212.
[50]	DENG Y, HU X L, LI B, ZHANG C X, HU W M. Multi-scale self-attention-based feature enhancement for detection of targets with small image sizes. Pattern Recognition Letters, 2023, 166: 46-52.
[51]	WANG Y H, DENG X L, LUO J Q, LI B L, XIAO S D. Cross-task feature enhancement strategy in multi-task learning for harvesting Sichuan pepper. Computers and Electronics in Agriculture, 2023, 207: 107726.
[52]	CHAN J Y, LEOW S M H, BEA K T, CHENG W K, PHOONG S W, HONG Z W, CHEN Y L. Mitigating the multicollinearity problem and its machine learning approach: A review. Mathematics, 2022, 10(8): 1283.
[53]	LI B, WANG Y Q, LI L S, LIU Y D. Research on apple origins classification optimization based on least-angle regression in instance selection. Agriculture, 2023, 13(10): 1868.
[54]	SHARMA J, JANGALE V, SHEKHAWAT R S, YADAV P. Improving genetic variant identification for quantitative traits using ensemble learning-based approaches. BMC Genomics, 2025, 26(1): 237.
[55]	TANAKA R, IWATA H. Bayesian optimization for genomic selection: A method for discovering the best genotype among a large number of candidates. Theoretical and Applied Genetics, 2018, 131(1): 93-105. doi: 10.1007/s00122-017-2988-z pmid: 28986680

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

数据集 Datasets	性状 Traits	个体数 Individuals	标记数 SNPs	遗传率 h²	数据来源 Data source
Wheat599	E1-GY	599	1279	0.832	https://github.com/AIBreeding/DNNGP?tab=readme-ov-file
	E2-GY	599	1279	0.729
	E3-GY	599	1279	0.689
	E4-GY	599	1279	0.711
Wheat2000	TKW	2000	33709	0.833	https://github.com/cma2015/DeepGS
	TW	2000	33709	0.754
	GL	2000	33709	0.881
	GW	2000	33709	0.848
	GH	2000	33709	0.839
	GP	2000	33709	0.625
	SDS	2000	33709	0.681
	PHT	2000	33709	0.434
Soy5014	HT	5014	4234	0.449	https://doi.org/10.1534/g3.116.032268
	R8	5014	4234	0.558
	YLD	5014	4234	0.485
Tilapia1125	HW	1125	32306	0.304	https://figshare.com/s/9b265a22b7e138c5a839
Pepper203	PHT	203	14922	0.610	https://bmcgenomdata.biomedcentral.com/articles/10.1186/s12863-023-01179-6
Pepper203	FT	203	14922	0.730

性状Traits	模型Model	条件数Condition number
HT	AFSE	14.823
	LFSE	16.598
	LFESE	490.901
	AFESE	5.436×10¹⁶
	HFESEⅠ	298.676
	HFESEⅡ	5.281×10¹⁶
R8	AFSE	7.709
	LFSE	8.837
	LFESE	495.610
	AFESE	5.506×10¹⁶
	HFESEⅠ	372.634
	HFESEⅡ	5.132×10¹⁶
YLD	AFSE	8.353
	LFSE	8.817
	LFESE	286.771
	AFESE	5.291×10¹⁶
	HFESEⅠ	286.676
	HFESEⅡ	5.489×10¹⁶

环境 Environment	筛选SNP数目 Number of selected SNPs	LFSE模型 LFSE model	LFESE模型 LFESE model	HFESEⅠ模型 HFESEⅠ model	HFESEⅡ模型 HFESEⅡ model
E1-GY	13	0.511(0.691)	0.489 (0.701)	0.581 (0.641)	0.297 (1.132)
	64	0.592 (0.634)	0.612 (0.616)	0.563 (0.655)	0.305 (1.112)
	128	0.604 (0.631)	0.636 (0.606)	0.543 (0.683)	0.321 (1.095)
	256	0.604 (0.625)	0.542 (0.670)	0.454 (1.251)	0.330 (1.074)
	512	0.597 (0.629)	0.407 (1.631)	0.333(1.728)	0.354 (1.030)
	767	0.596 (0.630)	0.129 (3.452)	0.079 (3.518)	0.353 (1.033)
E2-GY	13	0.429 (0.701)	0.467 (0.691)	0.518 (0.658)	0.339 (1.095)
	64	0.592 (0.624)	0.621 (0.622)	0.614 (0.625)	0.360 (1.033)
	128	0.607 (0.612)	0.698 (0.588)	0.681 (0.592)	0.373 (1.003)
	256	0.598 (0.622)	0.662 (0.634)	0.630 (0.662)	0.388 (0.968)
	512	0.564 (0.638)	0.521 (1.014)	0.409 (1.243)	0.431 (0.909)
	767	0.536 (0.648)	0.287 (2.026)	0.204 (2.484)	0.489 (0.838)
E3-GY	13	0.512 (0.678)	0.493 (0.681)	0.455 (0.692)	0.276 (1.070)
	64	0.512 (0.677)	0.498 (0.679)	0.501 (0.689)	0.274 (1.072)
	128	0.555 (0.655)	0.594 (0.628)	0.551 (0.672)	0.287 (1.059)
	256	0.511 (0.678)	0.496 (0.682)	0.523 (0.743)	0.280 (1.061)
	512	0.510 (0.681)	0.499 (0.679)	0.285 (1.394)	0.282 (1.061)
	767	0.510 (0.680)	0.496 (0.680)	0.132 (3.048)	0.279 (1.065)
E4-GY	13	0.457 (0.711)	0.477 (0.686)	0.514 (0.668)	0.291 (1.079)
	64	0.604 (0.630)	0.602 (0.633)	0.592 (0.644)	0.311 (1.047)
	128	0.639 (0.607)	0.653 (0.598)	0.648 (0.614)	0.319 (1.037)
	256	0.628 (0.607)	0.708 (0.594)	0.668 (0.637)	0.313 (1.044)
	512	0.586 (0.628)	0.434 (1.123)	0.403 (1.157)	0.323 (1.029)
	767	0.560 (0.641)	0.243 (2.231)	0.230 (2.387)	0.340 (1.011)

模型 Model	验证集Validation set		测试集Test set		差值绝对值Absolute difference
模型 Model	PCC	MAE	PCC	MAE	PCC	MAE
AFSE	0.639	0.566	0.520	0.580	0.119	0.014
LFSE	0.735	0.502	0.645	0.563	0.090	0.061
LFESE	0.693	0.544	0.734	0.517	0.041	0.027
AFESE	0.659	0.553	-0.023	0.690	0.682	0.137
HFESEⅠ	0.669	0.553	0.415	0.745	0.254	0.192
HFESEⅡ	0.659	0.554	0.527	0.625	0.132	0.071

模型 Model	验证集Validation set		测试集Test set		差值绝对值Absolute difference
模型 Model	PCC	MAE	PCC	MAE	PCC	MAE
AFSE	0.785	0.468	0.705	0.508	0.080	0.040
LFSE	0.806	0.444	0.756	0.501	0.050	0.057
LFESE	0.753	0.502	0.718	0.561	0.035	0.059
AFESE	0.769	0.467	0.434	0.701	0.335	0.234
HFESEⅠ	0.736	0.564	0.657	0.624	0.079	0.060
HFESEⅡ	0.768	0.469	0.712	0.521	0.056	0.052

Genomic Selection Method Based on G2PSE Stacking Ensemble

RichHTML

PDF

Abstract

Cite this article

share this article

Figures/Tables 12

References 55

Related Articles 11

Metrics

Comments

Recommended 0

[1]	ZANG ShaoLong, LIU LinRu, GAO YueZhi, WU Ke, HE Li, DUAN JianZhao, SONG Xiao, FENG Wei. Classification and Identification of Nitrogen Efficiency of Wheat Varieties Based on UAV Multi-Temporal Images [J]. Scientia Agricultura Sinica, 2024, 57(9): 1687-1708.
[2]	MEI GuangYuan, LI Rong, MEI Xin, CHEN RiQiang, FAN YiGuang, CHENG JinPeng, FENG ZiHeng, TAO Ting, ZHAO Qian, ZHAO PeiQin, YANG XiaoDong. A VSURF-CA Based Hyperspectral Disease Index Estimation Model of Wheat Stripe Rust [J]. Scientia Agricultura Sinica, 2024, 57(3): 484-499.
[3]	LIU YanLing, QIU Ao, ZHANG ZiPeng, WANG Xue, DU HeHe, LUO WenXue, WANG GuiJiang, WEI Xia, SHI WenYing, DING XiangDong. The Efficiency of Haplotype-Based Genomic Selection Using Genotyping by Target Sequencing in Pigs [J]. Scientia Agricultura Sinica, 2024, 57(11): 2243-2253.
[4]	CAO Ke, CHEN ChangWen, YANG XuanWen, BIE HangLing, WANG LiRong. Genomic Selection for Fruit Weight and Soluble Solid Contents in Peach [J]. Scientia Agricultura Sinica, 2023, 56(5): 951-963.
[5]	LI MianYan, WANG LiXian, ZHAO FuPing. Research Progress on Machine Learning for Genomic Selection in Animals [J]. Scientia Agricultura Sinica, 2023, 56(18): 3682-3692.
[6]	ZHOU Jun,LIN Qing,SHAO BaoQuan,REN DuanYang,LI JiaQi,ZHANG Zhe,ZHANG Hao. Evaluating the Application Effect of Single-Step Genomic Selection in Pig Populations [J]. Scientia Agricultura Sinica, 2022, 55(15): 3042-3049.
[7]	ZHU Mo,ZHENG MaiQing,CUI HuanXian,ZHAO GuiPing,LIU Yang. Comparison of Genomic Prediction Accuracy for Meat Type Chicken Carcass Traits Based on GBLUP and BayesB Method [J]. Scientia Agricultura Sinica, 2021, 54(23): 5125-5131.
[8]	TANG ZhenShuang,YIN Dong,YIN LiLin,MA YunLong,XIANG Tao,ZHU MengJin,YU Mei,LIU XiaoLei,LI XinYun,QIU XiaoTian,ZHAO ShuHong. To Evaluate the “Two-Step” Genomic Selection Strategy in Pig by Simulation [J]. Scientia Agricultura Sinica, 2021, 54(21): 4677-4684.
[9]	ZHAO Jing,LI ZhiMing,LU LiQun,JIA Peng,YANG HuanBo,LAN YuBin. Weed Identification in Maize Field Based on Multi-Spectral Remote Sensing of Unmanned Aerial Vehicle [J]. Scientia Agricultura Sinica, 2020, 53(8): 1545-1555.
[10]	ZHANG JinXin, TANG ShaoQing, SONG HaiLiang, GAO Hong, JIANG Yao, JIANG YiFan, MI ShiRong, MENG QingLi, YU Fan, XIAO Wei, YUN Peng, ZHANG Qing, DING XiangDong. Joint Genomic Selection of Yorkshire in Beijing [J]. Scientia Agricultura Sinica, 2019, 52(12): 2161-2170.
[11]	ZHU Bo, WANG Yan-hui, NIU Hong, CHEN Yan, ZHANG Lu-pei, GAO Hui-jiang, GAO Xue, LI Jun-ya, SUN Shao-hua. The Strategy of Parameter Optimization of Bayesian Methods for Genomic Selection in Livestock [J]. Scientia Agricultura Sinica, 2014, 47(22): 4495-4505.