Scientia Agricultura Sinica ›› 2025, Vol. 58 ›› Issue (2): 203-213.doi: 10.3864/j.issn.0578-1752.2025.02.001

• CROP GENETICS & BREEDING·GERMPLASM RESOURCES·MOLECULAR GENETICS • Previous Articles     Next Articles

Heterosis Groups Research in Maize Inbred Lines Based on Machine Learning

CAO ShiLiang1(), ZHANG JianGuo1, YU Tao1, YANG GengBin1, LI WenYue1, MA XueNa1, SUN YanJie4, HAN WeiBo2, TANG Gui3, SHAN DaPeng4   

  1. 1 Maize Institute, Heilongjiang Academy of Agricultural Sciences/National Engineering Laboratory for Maize (Harbin), Harbin 150086
    2 Institue of Forage and Grassland Sciences, Heilongjiang Academy of Agricultural Sciences, Harbin 150086
    3 Rural Revitalization Institute, Heilongjiang Academy of Agricultural Sciences, Harbin 150086
    4 Suihua Branch of Heilongjiang Academy of Agricultural Sciences, Suihua 152052, Heilongjiang
  • Received:2024-06-18 Accepted:2024-09-14 Online:2025-01-21 Published:2025-01-21

Abstract:

【Objective】The objective of this study is to optimize the classification and discriminant method of maize heterotic groups, and provide guidance and reference for maize breeding practices.【Method】Solid-phase chips were used to genotype 60 waxy maize inbred lines, and high-quality SNP markers with different density were obtained through quality control. Population structure analysis and genetic distance clustering were used to classify the 60 waxy maize inbred lines into different groups, and the differences between different classification methods were compared. On this basis, random forest and support vector machine methods were used to sample and discriminate the results of different classification methods. Five-fold cross-validation was used for sampling, and the prediction accuracy of maize group classification based on different classification methods was compared.【Result】Using different quality control standards, 11 431 and 4 022 molecular markers were obtained, respectively. Based on these two molecular marker densities, 60 materials were divided into 5 and 4 clusters, respectively. When using 11 431 SNP markers, the population structure analysis and genetic distance clustering results showed that the intra-cluster sample consistency was 63.33%. When using 4 022 SNP markers for clustering, the intra-cluster sample consistency was 90.00%. The prediction accuracy results for discriminating maize inbred line clusters showed that the average prediction accuracy (91.43%) of Random Forest and Support Vector Machine using 4 022 markers were higher than that of 11 431 markers (86.25%). Among them, the highest prediction accuracy was achieved by Random Forest using 4 022 markers, with a prediction accuracy of 94.17%.【Conclusion】Clustering analysis ultimately divided 60 waxy maize inbred lines into 4 clusters. Sampling and cross-validation results using Random Forest and Support Vector Machine for cluster classification showed that Random Forest achieved higher prediction accuracy than Support Vector Machine.

Key words: maize, machine learning, cross-validation, clustering, discrimination analysis

Fig. 1

Quantitative distribution of molecular markers with two different densities on different chromosomes"

Fig. 2

Population structure based on 11 431 SNPs a: Line chart of cross validation error; b: Population structure diagram"

Fig. 3

Cluster analysis graph based on genetic distance A: Group A; B: Group B; C: Group C; D: Group D; E: Group E. The same as below"

Table 1

Comparison of grouping result between structure and cluster analysis"

分群方法
Clustering method
A群
Group A
B群
Group B
C群
Group C
D群
Group D
E群
Group E
群体结构分群
STR
N1, N2, N3, N4, N5, N6, N7, N8, N9, N10, N11, N12, N13, N15, N16, N32 N38, N39, N40, N41, N42, N43, N44 N14, N18, N19, N20, N21, N22,
N23, N24, N25, N26
N17, N28, N29, N30, N31, N33, N34, N35, N36, N37, N48, N50, N52, N53, N54, N55, N56, N57, N58, N60 N27, N45, N46, N47, N49, N51, N59
遗传距离分群
GD
N1, N2, N3, N4, N5, N6, N7, N8, N9, N10, N11, N12, N14, N16, N17, N32, N34 N38, N39, N40, N41, N42, N43, N44, N48, N50 N18, N19, N20, N21, N22, N23, N24, N25, N26, N27, N29, N31, N35, N37, N49, N51, N54, N59, N60 N13, N15, N28, N30, N33, N36, N45, N46, N47, N52, N53, N55, N56 N57, N58

Fig. 4

Population structures based on 4 022 SNPs a: Line chart of cross validation error; b: Population structure diagram"

Fig. 5

Cluster analysis graph based on genetic distance"

Table 2

Comparison of grouping result between structure and cluster analysis"

分群方法
Clustering method
A群
Group A
B群
Group B
C群
Group C
D群
Group D
群体结构分群
STR
N1, N2, N3, N4, N5, N6, N7, N8, N9, N10, N11, N12, N16, N32, N13, N15, N14, N17 N38, N39, N40, N41, N42, N43, N44 N18, N19, N20, N21, N22, N23, N24, N25, N26, N29, N34 N45, N46, N47, N59, N51, N53, N27, N28, N30, N49, N56, N37, N31, N48, N54, N55, N52, N33, N50, N35, N36, N60, N57, N58
遗传距离分群
GD
N1, N2, N3, N4, N5, N6, N7, N8, N9, N10, N11, N12, N16, N32, N13, N15, N14, N17 N38, N39, N40, N41, N42, N43, N44, N48, N50, N55, N57, N58 N18, N19, N20, N21, N22, N23, N24, N25, N29, N34 N26, N27, N28, N30, N31, N33, N35, N36, N37, N45, N46, N47, N49, N51, N52, N53, N54, N56, N59, N60

Table 3

Comparison of prediction accuracy of inbred line group discriminant (prediction accuracy/coefficient of variation)"

分群方法
Grouping methods
预测方法
Prediction methods
标记数
Marker number
预测方法均值
Mean of prediction method (%)
分群方法均值
Mean of grouping methods (%)
11431 4022
群体结构
STR
随机森林RF 87.50%/0.11 94.17%/0.07 90.84 89.83
支持向量机SVM 87.22%/0.11 90.42%/0.10 88.82
遗传距离
GD
随机森林RF 86.25%/0.11 94.17%/0.06 90.21 87.40
支持向量机SVM 84.03%/0.13 86.94%/0.10 85.49
均值Mean (%) 86.25 91.43

Fig. 6

Comparison of prediction accuracy of machine learning models under different classification methods a: Comparison of prediction accuracy between different molecular density; b: Cluster analysis based on genetic distance"

[1]
MELCHINGER A E, LEE M, LAMKEY K R, HALLAUER A R, WOODMAN W L. Genetic diversity for restriction fragment length polymorphisms and heterosis for two diallel sets of maize inbreds. Theoretical and Applied Genetics, 1990, 80(4): 488-496.

doi: 10.1007/BF00226750 pmid: 24221007
[2]
刘志斋, 吴迅, 刘海利, 李永祥, 李清超, 王凤格, 石云素, 宋燕春, 宋伟彬, 赵久然, 赖锦盛, 黎裕, 王天宇. 基于40个核心SSR标记揭示的820份中国玉米重要自交系的遗传多样性与群体结构. 中国农业科学, 2012, 45(11): 2107-2138. doi: 10.3864/j.issn.0578-1752.2012.11.001.
LIU Z Z, WU X, LIU H L, LI Y X, LI Q C, WANG F G, SHI Y S, SONG Y C, SONG W B, ZHAO J R, LAI J S, LI Y, WANG T Y. Genetic diversity and population structure of important Chinese maize inbred lines revealed by 40 core simple sequence repeats (SSRs). Scientia Agricultura Sinica, 2012, 45(11): 2107-2138. doi: 10.3864/j.issn.0578-1752.2012.11.001. (in Chinese)
[3]
赵久然, 李春辉, 宋伟, 王元东, 张如养, 王继东, 王凤格, 田红丽, 王蕊. 基于SNP芯片揭示中国玉米育种种质的遗传多样性与群体遗传结构. 中国农业科学, 2018, 51(4): 626-644. doi: 10.3864/j.issn.0578-1752.2018.04.003.
ZHAO J R, LI C H, SONG W, WANG Y D, ZHANG R Y, WANG J D, WANG F G, TIAN H L, WANG R. Genetic diversity and population structure of important Chinese maize breeding germplasm revealed by SNP-chips. Scientia Agricultura Sinica, 2018, 51(4): 626-644. doi: 10.3864/j.issn.0578-1752.2018.04.003. (in Chinese)
[4]
杨亚桐, 董安忆, 刘松涛, Zenda Tinashe, 段会军. 基于SSR分子标记的糯玉米遗传多样性研究. 江苏农业科学, 2020, 48(2): 83-86.
YANG Y T, DONG A Y, LIU S T, TINASHE Z, DUAN H J. Study on genetic diversity of waxy corn based on simple sequence repeats (SSR) molecular markers. Jiangsu Agricultural Sciences, 2020, 48(2): 83-86. (in Chinese)
[5]
卢柏山, 史亚兴, 宋伟, 徐丽, 赵久然. 利用SNP标记划分甜玉米自交系的杂种优势类群. 玉米科学, 2015, 23(1):58-62, 68.
LU B S, SHI Y X, SONG W, XU L, ZHAO J R. Heterotic grouping of sweet corn inbred lines by SNP markers. Journal of Maize Sciences, 2015, 23(1): 58-62, 68. (in Chinese)
[6]
徐磊, 徐志军, 安东升, 胡小文, 高玉尧, 刘洋. 基于SNP标记的糯玉米指纹图谱构建和遗传多样性分析. 分子植物育种, 2022, 20(19):6405-6414.
XU L, XU Z J, AN D S, HU X W, GAO Y Y, LIU Y. Fingerprints construction and genetic diversity analysis of waxy corns based on SNP markers. Molecular Plant Breeding, 2022, 20(19): 6405-6414. (in Chinese)
[7]
卢媛, 艾为大, 韩晴, 王义发, 李宏杨, 瞿玉玑, 施标, 沈雪芳. 糯玉米自交系SSR标记遗传多样性及群体遗传结构分析. 作物学报, 2019, 45(2):214-224.

doi: 10.3724/SP.J.1006.2019.83008
LU Y, AI W D, HAN Q, WANG Y F, LI H Y, QU Y J, SHI B, SHEN X F. Genetic diversity and population structure analysis by SSR markers in waxy maize. Acta Agronomica Sinica, 2019, 45(2): 214-224. (in Chinese)
[8]
李扬. 机器学习在农业领域应用现状与前景. 安徽农学通报, 2021, 27(1):164-165, 174.
LI Y. Application status and prospect of machine learning in agriculture. Anhui Agricultural Science Bulletin, 2021, 27(1): 164-165, 174. (in Chinese)
[9]
薛菁菁, 陈慧敏, 孔令怡, 樊欣怡, 聂飞平. 机器学习的基石: 聚类任务的现状与挑战. 科学观察, 2024, 19(1):4-17.

doi: 10.15978/j.cnki.1673-5668.202401002
XUE J J, CHEN H M, KONG L Y, FAN X Y, NIE F P. The foundation of machine learning: current status and challenges in clustering tasks. Science Focus, 2024, 19(1): 4-17. (in Chinese)
[10]
BRADBURY P J, ZHANG Z W, KROON D E, CASSTEVENS T M, RAMDOSS Y, BUCKLER E S. TASSEL: Software for association mapping of complex traits in diverse samples. Bioinformatics, 2007, 23(19): 2633-2635.

doi: 10.1093/bioinformatics/btm308 pmid: 17586829
[11]
ALEXANDER D H, NOVEMBRE J, LANGE K. Fast model-based estimation of ancestry in unrelated individuals. Genome Research, 2009, 19(9): 1655-1664.

doi: 10.1101/gr.094052.109 pmid: 19648217
[12]
JOMBART T. Adegenet: A R package for the multivariate analysis of genetic markers. Bioinformatics, 2008, 24(11): 1403-1405.

doi: 10.1093/bioinformatics/btn129 pmid: 18397895
[13]
TEAM R. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, 2014.
[14]
YU G C, SMITH D K, ZHU H C, GUAN Y, LAM T T Y. Ggtree: an r package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods in Ecology and Evolution, 2017, 8(1): 28-36.
[15]
LIAW A, WIENER M. Classification and regression by randomForest. R News, 2002, 2: 18-22.
[16]
KUHN M. Building predictive models in r using the caret package. Journal of Statistical Software, 2008, 28(5): 1-26.
[17]
李念念, 王义波, 徐国平, 易黎, 王爱方, 李婷, 曹刚强. 基于SNP标记的玉米自交系类群划分方法和分群功效评估指标的比较. 植物遗传资源学报, 2020, 21(3):605-618.

doi: 10.13430/j.cnki.jpgr.20190618003
LI N N, WANG Y B, XU G P, YI L, WANG A F, LI T, CAO G Q. Comparison of different grouping procedures and evaluation criteria for grouping maize inbreds using SNP data. Journal of Plant Genetic Resources, 2020, 21(3): 605-618. (in Chinese)
[18]
钟昌松, 徐利远, 余桂蓉, 杜文平, 蒲志刚, 李琳. 20份特用玉米自交系亲缘关系的SSR标记研究. 玉米科学, 2006, 14(4):43-46.
ZHONG C S, XU L Y, YU G R, DU W P, PU Z G, LI L. Genetic relationship among 20 special maize inbreds studied by SSR markers. Journal of Maize Sciences, 2006, 14(4): 43-46. (in Chinese)
[19]
赵文明, 王森, 陈艳萍, 张美景, 袁建华. 基于60个核心SSR标记的糯玉米自交系遗传多样性分析. 江西农业学报, 2018, 30(12):1-8.
ZHAO W M, WANG S, CHEN Y P, ZHANG M J, YUAN J H. Genetic diversity analysis of waxy corn inbred lines based on 60 core SSR markers. Acta Agriculturae Jiangxi, 2018, 30(12): 1-8. (in Chinese)
[20]
史亚兴, 卢柏山, 宋伟, 徐丽, 赵久然. 基于SNP标记技术的糯玉米种质遗传多样性分析. 华北农学报, 2015, 30(3): 77-82.

doi: 10.7668/hbnxb.2015.03.015
SHI Y X, LU B S, SONG W, XU L, ZHAO J R. Acta Agriculturae Boreali-Sinica, 2015, 30(3): 77-82. (in Chinese)
[21]
孙佩, 张培风, 张玉红, 周联东, 王蕊, 王文洁, 张瑞平, 李祥, 马朝阳, 李合顺, 王学军. 基于SNP标记的玉米自交系S155、PHA458和A01的遗传多样性分析. 陕西农业科学, 2023, 69(9):1-7.
SUN P, ZHANG P F, ZHANG Y H, ZHOU L D, WANG R, WANG W J, ZHANG R P, LI X, MA C Y, LI H S, WANG X J. Genetic diversity analysis of maize inbred lines S155, PHA458 and A01 based on SNP markers. Shaanxi Journal of Agricultural Sciences, 2023, 69(9): 1-7. (in Chinese)
[22]
肖颖妮, 于永涛, 谢利华, 祁喜涛, 李春艳, 文天祥, 李高科, 胡建广. 基于SNP标记揭示中国鲜食玉米品种的遗传多样性. 作物学报, 2022, 48(6):1301-1311.

doi: 10.3724/SP.J.1006.2022.13031
XIAO Y N, YU Y T, XIE L H, QI X T, LI C Y, WEN T X, LI G K, HU J G. Genetic diversity analysis of Chinese fresh corn hybrids using SNP Chips. Acta Agronomica Sinica, 2022, 48(6): 1301-1311. (in Chinese)
[23]
李余良, 索海翠, 韩福光, 刘建华, 胡建广, 高磊, 李武. 基于SLAF-seq技术分析甜、糯玉米种质遗传多样性. 玉米科学, 2019, 27(4):71-78.
LI Y L, SUO H C, HAN F G, LIU J H, HU J G, GAO L, LI W. Analysis of genetic diversity of sweet and wax corn germplasms using SLAF-seq technology. Journal of Maize Sciences, 2019, 27(4): 71-78. (in Chinese)
[24]
王欣, 徐一亿, 徐扬, 徐辰武. 作物全基因组选择育种技术研究进展. 生物技术通报, 2024, 40(3):1-13.

doi: 10.13560/j.cnki.biotech.bull.1985.2023-1079
WANG X, XU Y Y, XU Y, XU C W. Research progress in genomic selection breeding technology for crops. Biotechnology Bulletin, 2024, 40(3): 1-13. (in Chinese)
[25]
YIN L L, ZHANG H H, ZHOU X, YUAN X H, ZHAO S H, LI X Y, LIU X L. KAML: improving genomic prediction accuracy of complex traits using machine learning determined parameters. Genome Biology, 2020, 21(1): 146.

doi: 10.1186/s13059-020-02052-w pmid: 32552725
[1] WANG YaFei, YAN Peng, XUE JinTao, DONG XueRui, MENG FanQi, GUO LiNa, LUO Yi, ZHANG Juan, DONG ZhiQiang, LU Lin. Effects of Ethephon-Glycine Betaine-Salicylic Acid Mixture on Root System Architecture, Physiological Function and Yield of Maize Under Heat Stress [J]. Scientia Agricultura Sinica, 2026, 59(7): 1439-1455.
[2] WANG JiaNuo, CHEN GuiPing, LI Pan, WANG LiPing, NAN YunYou, HE Wei, FAN ZhiLong, HU FaLong, CHAI Qiang, YIN Wen, ZHAO LiaoHao. Photo-Physiological Mechanism at Grain Filling Stage of No-Tillage with Plastic Re-Mulching to Increase Maize Yield in Oasis Irrigation Areas [J]. Scientia Agricultura Sinica, 2026, 59(6): 1189-1202.
[3] ZHOU XinJie, REN Hao, CHEN YingLong, ZHANG JiWang, ZHAO Bin, REN BaiZhao, LIU Peng, WANG HongZhang. Effects of Calcium Peroxide on Root Morphology and Yield Formation of Summer Maize in Waterlogging Farmland [J]. Scientia Agricultura Sinica, 2026, 59(6): 1203-1216.
[4] HE JiHang, ZHANG Qing, LÜ XiangYue, XUE JiQuan, XU ShuTu, LIU JianChao. Evaluation of Nitrogen Efficiency of Different Stay-Green Maize Hybrids [J]. Scientia Agricultura Sinica, 2026, 59(6): 1217-1230.
[5] LI YongJuan, ZHANG YueTong, WANG YiBo, ZHAO ChangJiang, SONG Jie, CHEN XueLi, YAO Qin. Effects of Biochar Application on the Abundance and Community Composition of Nitrogen-Fixing Microbial nifH Gene in Soybean Rotation and Continuous Cropping Systems [J]. Scientia Agricultura Sinica, 2026, 59(6): 1272-1285.
[6] LI SiYuan, LI HongPing, CHANG HongQing, ZHANG SenYan, LI SiJia, CUI XinFei, QIAO Po, ZENG Bo, LIU GuiZhen, LIU TianXue, TANG JiHua, LI ChaoHai. Effects of Density Increase on Dynamic Change of Yield and Agronomic Traits of Maize Cultivars with Different Plant Heights [J]. Scientia Agricultura Sinica, 2026, 59(5): 967-984.
[7] DONG JinLong, ZHAO Ying, YU HaiBing, LÜ JianYe, QIN JiaQi, LIANG Chen, MING Bo, LI ShaoKun. Multi-Model Elucidating of Nutritional Quality Contributions to Maize Kernel Test Weight and Regional Heterogeneity [J]. Scientia Agricultura Sinica, 2026, 59(5): 985-995.
[8] QIAN Jin, LI YingXue, WU Fang, ZOU XiaoChen. Improved Leaf Phosphorus Content Estimation of Winter Wheat Using Ensemble Hyperspectral Dimensionality Reduction Method [J]. Scientia Agricultura Sinica, 2026, 59(4): 781-792.
[9] CHEN GuiPing, WEI JinGui, GUO Yao, LI Pan, WANG FeiEr, QIU HaiLong, FENG FuXue, YIN Wen. Synergistic Effects of Wide-Narrow Row and Density Enhancement on the Photosynthetic Characteristics and Resource Utilization of Maize in Oasis Irrigation Areas [J]. Scientia Agricultura Sinica, 2026, 59(2): 278-291.
[10] ZHANG ZhiYong, TAN ShiChao, XIONG ShuPing, MA XinMing, WEI YiHao, WANG XiaoChun. Effects of Annual Water and Nitrogen Optimization on Yield and Nitrogen Migration of Wheat-Maize Rotation System in Irrigation Area of Northern Henan [J]. Scientia Agricultura Sinica, 2026, 59(2): 336-353.
[11] FEI YaoYing, WANG Di, TANG WeiJie, GUO CaiLi, ZHANG XiaoHu, QIU XiaoLei, CHENG Tao, YAO Xia, JIANG ChongYa, ZHU Yan, CAO WeiXing, ZHENG HengBiao. Estimation of Rice Grain Protein Content Using Fusion Imagery from UAV-based Multi-Sensors [J]. Scientia Agricultura Sinica, 2026, 59(1): 41-56.
[12] WANG AiDong, LI RuiJie, FENG XiangQian, HONG WeiYuan, LI ZiQiu, ZHANG XiaoGuo, WANG DanYing, CHEN Song. Multi-Angle Imaging and Machine Learning Approaches for Accurate Rice Leaf Area Estimation [J]. Scientia Agricultura Sinica, 2025, 58(9): 1719-1734.
[13] WEI WenHua, LI Pan, SHAO GuanGui, FAN ZhiLong, HU FaLong, FAN Hong, HE Wei, CHAI Qiang, YIN Wen, ZHAO LianHao. Response of Silage Maize Yield and Quality to Reduced Irrigation and Combined Organic-Inorganic Fertilizer in Northwest Irrigation Areas [J]. Scientia Agricultura Sinica, 2025, 58(8): 1521-1534.
[14] XUE YuQi, ZHAO JiYu, SUN WangSheng, REN BaiZhao, ZHAO Bin, LIU Peng, ZHANG JiWang. Effects of Different Nitrogen Forms on Yield and Quality of Summer Maize [J]. Scientia Agricultura Sinica, 2025, 58(8): 1535-1549.
[15] CHEN GuiPing, LI Pan, SHAO GuanGui, WU XiaYu, YIN Wen, ZHAO LianHao, FAN ZhiLong, HU FaLong. The Regulatory Effect of Reduced Irrigation and Combined Organic- Inorganic Fertilizer Application on Stay-Green Characteristics in Silage Maize Leaves After Tasseling Stage [J]. Scientia Agricultura Sinica, 2025, 58(7): 1381-1396.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!