Scientia Agricultura Sinica ›› 2025, Vol. 58 ›› Issue (2): 203-213.doi: 10.3864/j.issn.0578-1752.2025.02.001

• CROP GENETICS & BREEDING·GERMPLASM RESOURCES·MOLECULAR GENETICS • Previous Articles     Next Articles

Heterosis Groups Research in Maize Inbred Lines Based on Machine Learning

CAO ShiLiang1(), ZHANG JianGuo1, YU Tao1, YANG GengBin1, LI WenYue1, MA XueNa1, SUN YanJie4, HAN WeiBo2, TANG Gui3, SHAN DaPeng4   

  1. 1 Maize Institute, Heilongjiang Academy of Agricultural Sciences/National Engineering Laboratory for Maize (Harbin), Harbin 150086
    2 Institue of Forage and Grassland Sciences, Heilongjiang Academy of Agricultural Sciences, Harbin 150086
    3 Rural Revitalization Institute, Heilongjiang Academy of Agricultural Sciences, Harbin 150086
    4 Suihua Branch of Heilongjiang Academy of Agricultural Sciences, Suihua 152052, Heilongjiang
  • Received:2024-06-18 Accepted:2024-09-14 Online:2025-01-21 Published:2025-01-21

Abstract:

【Objective】The objective of this study is to optimize the classification and discriminant method of maize heterotic groups, and provide guidance and reference for maize breeding practices.【Method】Solid-phase chips were used to genotype 60 waxy maize inbred lines, and high-quality SNP markers with different density were obtained through quality control. Population structure analysis and genetic distance clustering were used to classify the 60 waxy maize inbred lines into different groups, and the differences between different classification methods were compared. On this basis, random forest and support vector machine methods were used to sample and discriminate the results of different classification methods. Five-fold cross-validation was used for sampling, and the prediction accuracy of maize group classification based on different classification methods was compared.【Result】Using different quality control standards, 11 431 and 4 022 molecular markers were obtained, respectively. Based on these two molecular marker densities, 60 materials were divided into 5 and 4 clusters, respectively. When using 11 431 SNP markers, the population structure analysis and genetic distance clustering results showed that the intra-cluster sample consistency was 63.33%. When using 4 022 SNP markers for clustering, the intra-cluster sample consistency was 90.00%. The prediction accuracy results for discriminating maize inbred line clusters showed that the average prediction accuracy (91.43%) of Random Forest and Support Vector Machine using 4 022 markers were higher than that of 11 431 markers (86.25%). Among them, the highest prediction accuracy was achieved by Random Forest using 4 022 markers, with a prediction accuracy of 94.17%.【Conclusion】Clustering analysis ultimately divided 60 waxy maize inbred lines into 4 clusters. Sampling and cross-validation results using Random Forest and Support Vector Machine for cluster classification showed that Random Forest achieved higher prediction accuracy than Support Vector Machine.

Key words: maize, machine learning, cross-validation, clustering, discrimination analysis

Fig. 1

Quantitative distribution of molecular markers with two different densities on different chromosomes"

Fig. 2

Population structure based on 11 431 SNPs a: Line chart of cross validation error; b: Population structure diagram"

Fig. 3

Cluster analysis graph based on genetic distance A: Group A; B: Group B; C: Group C; D: Group D; E: Group E. The same as below"

Table 1

Comparison of grouping result between structure and cluster analysis"

分群方法
Clustering method
A群
Group A
B群
Group B
C群
Group C
D群
Group D
E群
Group E
群体结构分群
STR
N1, N2, N3, N4, N5, N6, N7, N8, N9, N10, N11, N12, N13, N15, N16, N32 N38, N39, N40, N41, N42, N43, N44 N14, N18, N19, N20, N21, N22,
N23, N24, N25, N26
N17, N28, N29, N30, N31, N33, N34, N35, N36, N37, N48, N50, N52, N53, N54, N55, N56, N57, N58, N60 N27, N45, N46, N47, N49, N51, N59
遗传距离分群
GD
N1, N2, N3, N4, N5, N6, N7, N8, N9, N10, N11, N12, N14, N16, N17, N32, N34 N38, N39, N40, N41, N42, N43, N44, N48, N50 N18, N19, N20, N21, N22, N23, N24, N25, N26, N27, N29, N31, N35, N37, N49, N51, N54, N59, N60 N13, N15, N28, N30, N33, N36, N45, N46, N47, N52, N53, N55, N56 N57, N58

Fig. 4

Population structures based on 4 022 SNPs a: Line chart of cross validation error; b: Population structure diagram"

Fig. 5

Cluster analysis graph based on genetic distance"

Table 2

Comparison of grouping result between structure and cluster analysis"

分群方法
Clustering method
A群
Group A
B群
Group B
C群
Group C
D群
Group D
群体结构分群
STR
N1, N2, N3, N4, N5, N6, N7, N8, N9, N10, N11, N12, N16, N32, N13, N15, N14, N17 N38, N39, N40, N41, N42, N43, N44 N18, N19, N20, N21, N22, N23, N24, N25, N26, N29, N34 N45, N46, N47, N59, N51, N53, N27, N28, N30, N49, N56, N37, N31, N48, N54, N55, N52, N33, N50, N35, N36, N60, N57, N58
遗传距离分群
GD
N1, N2, N3, N4, N5, N6, N7, N8, N9, N10, N11, N12, N16, N32, N13, N15, N14, N17 N38, N39, N40, N41, N42, N43, N44, N48, N50, N55, N57, N58 N18, N19, N20, N21, N22, N23, N24, N25, N29, N34 N26, N27, N28, N30, N31, N33, N35, N36, N37, N45, N46, N47, N49, N51, N52, N53, N54, N56, N59, N60

Table 3

Comparison of prediction accuracy of inbred line group discriminant (prediction accuracy/coefficient of variation)"

分群方法
Grouping methods
预测方法
Prediction methods
标记数
Marker number
预测方法均值
Mean of prediction method (%)
分群方法均值
Mean of grouping methods (%)
11431 4022
群体结构
STR
随机森林RF 87.50%/0.11 94.17%/0.07 90.84 89.83
支持向量机SVM 87.22%/0.11 90.42%/0.10 88.82
遗传距离
GD
随机森林RF 86.25%/0.11 94.17%/0.06 90.21 87.40
支持向量机SVM 84.03%/0.13 86.94%/0.10 85.49
均值Mean (%) 86.25 91.43

Fig. 6

Comparison of prediction accuracy of machine learning models under different classification methods a: Comparison of prediction accuracy between different molecular density; b: Cluster analysis based on genetic distance"

[1]
MELCHINGER A E, LEE M, LAMKEY K R, HALLAUER A R, WOODMAN W L. Genetic diversity for restriction fragment length polymorphisms and heterosis for two diallel sets of maize inbreds. Theoretical and Applied Genetics, 1990, 80(4): 488-496.

doi: 10.1007/BF00226750 pmid: 24221007
[2]
刘志斋, 吴迅, 刘海利, 李永祥, 李清超, 王凤格, 石云素, 宋燕春, 宋伟彬, 赵久然, 赖锦盛, 黎裕, 王天宇. 基于40个核心SSR标记揭示的820份中国玉米重要自交系的遗传多样性与群体结构. 中国农业科学, 2012, 45(11): 2107-2138. doi: 10.3864/j.issn.0578-1752.2012.11.001.
LIU Z Z, WU X, LIU H L, LI Y X, LI Q C, WANG F G, SHI Y S, SONG Y C, SONG W B, ZHAO J R, LAI J S, LI Y, WANG T Y. Genetic diversity and population structure of important Chinese maize inbred lines revealed by 40 core simple sequence repeats (SSRs). Scientia Agricultura Sinica, 2012, 45(11): 2107-2138. doi: 10.3864/j.issn.0578-1752.2012.11.001. (in Chinese)
[3]
赵久然, 李春辉, 宋伟, 王元东, 张如养, 王继东, 王凤格, 田红丽, 王蕊. 基于SNP芯片揭示中国玉米育种种质的遗传多样性与群体遗传结构. 中国农业科学, 2018, 51(4): 626-644. doi: 10.3864/j.issn.0578-1752.2018.04.003.
ZHAO J R, LI C H, SONG W, WANG Y D, ZHANG R Y, WANG J D, WANG F G, TIAN H L, WANG R. Genetic diversity and population structure of important Chinese maize breeding germplasm revealed by SNP-chips. Scientia Agricultura Sinica, 2018, 51(4): 626-644. doi: 10.3864/j.issn.0578-1752.2018.04.003. (in Chinese)
[4]
杨亚桐, 董安忆, 刘松涛, Zenda Tinashe, 段会军. 基于SSR分子标记的糯玉米遗传多样性研究. 江苏农业科学, 2020, 48(2): 83-86.
YANG Y T, DONG A Y, LIU S T, TINASHE Z, DUAN H J. Study on genetic diversity of waxy corn based on simple sequence repeats (SSR) molecular markers. Jiangsu Agricultural Sciences, 2020, 48(2): 83-86. (in Chinese)
[5]
卢柏山, 史亚兴, 宋伟, 徐丽, 赵久然. 利用SNP标记划分甜玉米自交系的杂种优势类群. 玉米科学, 2015, 23(1):58-62, 68.
LU B S, SHI Y X, SONG W, XU L, ZHAO J R. Heterotic grouping of sweet corn inbred lines by SNP markers. Journal of Maize Sciences, 2015, 23(1): 58-62, 68. (in Chinese)
[6]
徐磊, 徐志军, 安东升, 胡小文, 高玉尧, 刘洋. 基于SNP标记的糯玉米指纹图谱构建和遗传多样性分析. 分子植物育种, 2022, 20(19):6405-6414.
XU L, XU Z J, AN D S, HU X W, GAO Y Y, LIU Y. Fingerprints construction and genetic diversity analysis of waxy corns based on SNP markers. Molecular Plant Breeding, 2022, 20(19): 6405-6414. (in Chinese)
[7]
卢媛, 艾为大, 韩晴, 王义发, 李宏杨, 瞿玉玑, 施标, 沈雪芳. 糯玉米自交系SSR标记遗传多样性及群体遗传结构分析. 作物学报, 2019, 45(2):214-224.

doi: 10.3724/SP.J.1006.2019.83008
LU Y, AI W D, HAN Q, WANG Y F, LI H Y, QU Y J, SHI B, SHEN X F. Genetic diversity and population structure analysis by SSR markers in waxy maize. Acta Agronomica Sinica, 2019, 45(2): 214-224. (in Chinese)
[8]
李扬. 机器学习在农业领域应用现状与前景. 安徽农学通报, 2021, 27(1):164-165, 174.
LI Y. Application status and prospect of machine learning in agriculture. Anhui Agricultural Science Bulletin, 2021, 27(1): 164-165, 174. (in Chinese)
[9]
薛菁菁, 陈慧敏, 孔令怡, 樊欣怡, 聂飞平. 机器学习的基石: 聚类任务的现状与挑战. 科学观察, 2024, 19(1):4-17.

doi: 10.15978/j.cnki.1673-5668.202401002
XUE J J, CHEN H M, KONG L Y, FAN X Y, NIE F P. The foundation of machine learning: current status and challenges in clustering tasks. Science Focus, 2024, 19(1): 4-17. (in Chinese)
[10]
BRADBURY P J, ZHANG Z W, KROON D E, CASSTEVENS T M, RAMDOSS Y, BUCKLER E S. TASSEL: Software for association mapping of complex traits in diverse samples. Bioinformatics, 2007, 23(19): 2633-2635.

doi: 10.1093/bioinformatics/btm308 pmid: 17586829
[11]
ALEXANDER D H, NOVEMBRE J, LANGE K. Fast model-based estimation of ancestry in unrelated individuals. Genome Research, 2009, 19(9): 1655-1664.

doi: 10.1101/gr.094052.109 pmid: 19648217
[12]
JOMBART T. Adegenet: A R package for the multivariate analysis of genetic markers. Bioinformatics, 2008, 24(11): 1403-1405.

doi: 10.1093/bioinformatics/btn129 pmid: 18397895
[13]
TEAM R. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, 2014.
[14]
YU G C, SMITH D K, ZHU H C, GUAN Y, LAM T T Y. Ggtree: an r package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods in Ecology and Evolution, 2017, 8(1): 28-36.
[15]
LIAW A, WIENER M. Classification and regression by randomForest. R News, 2002, 2: 18-22.
[16]
KUHN M. Building predictive models in r using the caret package. Journal of Statistical Software, 2008, 28(5): 1-26.
[17]
李念念, 王义波, 徐国平, 易黎, 王爱方, 李婷, 曹刚强. 基于SNP标记的玉米自交系类群划分方法和分群功效评估指标的比较. 植物遗传资源学报, 2020, 21(3):605-618.

doi: 10.13430/j.cnki.jpgr.20190618003
LI N N, WANG Y B, XU G P, YI L, WANG A F, LI T, CAO G Q. Comparison of different grouping procedures and evaluation criteria for grouping maize inbreds using SNP data. Journal of Plant Genetic Resources, 2020, 21(3): 605-618. (in Chinese)
[18]
钟昌松, 徐利远, 余桂蓉, 杜文平, 蒲志刚, 李琳. 20份特用玉米自交系亲缘关系的SSR标记研究. 玉米科学, 2006, 14(4):43-46.
ZHONG C S, XU L Y, YU G R, DU W P, PU Z G, LI L. Genetic relationship among 20 special maize inbreds studied by SSR markers. Journal of Maize Sciences, 2006, 14(4): 43-46. (in Chinese)
[19]
赵文明, 王森, 陈艳萍, 张美景, 袁建华. 基于60个核心SSR标记的糯玉米自交系遗传多样性分析. 江西农业学报, 2018, 30(12):1-8.
ZHAO W M, WANG S, CHEN Y P, ZHANG M J, YUAN J H. Genetic diversity analysis of waxy corn inbred lines based on 60 core SSR markers. Acta Agriculturae Jiangxi, 2018, 30(12): 1-8. (in Chinese)
[20]
史亚兴, 卢柏山, 宋伟, 徐丽, 赵久然. 基于SNP标记技术的糯玉米种质遗传多样性分析. 华北农学报, 2015, 30(3): 77-82.

doi: 10.7668/hbnxb.2015.03.015
SHI Y X, LU B S, SONG W, XU L, ZHAO J R. Acta Agriculturae Boreali-Sinica, 2015, 30(3): 77-82. (in Chinese)
[21]
孙佩, 张培风, 张玉红, 周联东, 王蕊, 王文洁, 张瑞平, 李祥, 马朝阳, 李合顺, 王学军. 基于SNP标记的玉米自交系S155、PHA458和A01的遗传多样性分析. 陕西农业科学, 2023, 69(9):1-7.
SUN P, ZHANG P F, ZHANG Y H, ZHOU L D, WANG R, WANG W J, ZHANG R P, LI X, MA C Y, LI H S, WANG X J. Genetic diversity analysis of maize inbred lines S155, PHA458 and A01 based on SNP markers. Shaanxi Journal of Agricultural Sciences, 2023, 69(9): 1-7. (in Chinese)
[22]
肖颖妮, 于永涛, 谢利华, 祁喜涛, 李春艳, 文天祥, 李高科, 胡建广. 基于SNP标记揭示中国鲜食玉米品种的遗传多样性. 作物学报, 2022, 48(6):1301-1311.

doi: 10.3724/SP.J.1006.2022.13031
XIAO Y N, YU Y T, XIE L H, QI X T, LI C Y, WEN T X, LI G K, HU J G. Genetic diversity analysis of Chinese fresh corn hybrids using SNP Chips. Acta Agronomica Sinica, 2022, 48(6): 1301-1311. (in Chinese)
[23]
李余良, 索海翠, 韩福光, 刘建华, 胡建广, 高磊, 李武. 基于SLAF-seq技术分析甜、糯玉米种质遗传多样性. 玉米科学, 2019, 27(4):71-78.
LI Y L, SUO H C, HAN F G, LIU J H, HU J G, GAO L, LI W. Analysis of genetic diversity of sweet and wax corn germplasms using SLAF-seq technology. Journal of Maize Sciences, 2019, 27(4): 71-78. (in Chinese)
[24]
王欣, 徐一亿, 徐扬, 徐辰武. 作物全基因组选择育种技术研究进展. 生物技术通报, 2024, 40(3):1-13.

doi: 10.13560/j.cnki.biotech.bull.1985.2023-1079
WANG X, XU Y Y, XU Y, XU C W. Research progress in genomic selection breeding technology for crops. Biotechnology Bulletin, 2024, 40(3): 1-13. (in Chinese)
[25]
YIN L L, ZHANG H H, ZHOU X, YUAN X H, ZHAO S H, LI X Y, LIU X L. KAML: improving genomic prediction accuracy of complex traits using machine learning determined parameters. Genome Biology, 2020, 21(1): 146.

doi: 10.1186/s13059-020-02052-w pmid: 32552725
[1] ZHANG SiJia, YANG Jie, ZHAO Shuai, LI LiWei, WANG GuiYan. The Impact of Diversified Crops and Wheat-Maize Rotations on Soil Quality in the North China Plain [J]. Scientia Agricultura Sinica, 2025, 58(2): 238-251.
[2] LIU Jing, WANG Hong, ZHANG Lei, XIAO JiuJun, WU JianGao, GONG MingChong. Inversion of Nitrogen Content in Chili Pepper Leaves Based on Hyperspectral Analysis [J]. Scientia Agricultura Sinica, 2025, 58(2): 252-265.
[3] CAO YanYong, CHENG ZeQiang, MA Juan, YANG WenBo, ZHU WeiHong, SUN XinYan, LI HuiMin, XIA LaiKun, DUAN CanXing. Integrating Transcriptomic and Metabolomic Analyses Reveals Maize Responses to Stalk Rot Caused by Fusarium proliferatum [J]. Scientia Agricultura Sinica, 2025, 58(1): 75-90.
[4] LÜ JinLing, YOU Ke, WANG XiaoFei, XIAO Qiang, LI WenFeng, MA Jin, YANG Qing, ZHANG JinPing, KONG HaiJiang, CHANG YunHua. Variation Characteristics and Key Influencing Factors of Near-Surface Ambient Ammonia Concentration in Typical Cropland Areas in Henan Province [J]. Scientia Agricultura Sinica, 2025, 58(1): 127-140.
[5] FAN Hong, YIN Wen, HU FaLong, FAN ZhiLong, ZHAO Cai, YU AiZhong, HE Wei, SUN YaLi, WANG Feng, CHAI Qiang. Compensation Potential of Dense Planting on Nitrogen Reduction in Maize Yield in Oasis Irrigation Area [J]. Scientia Agricultura Sinica, 2024, 57(9): 1709-1721.
[6] WANG ChengZe, ZHANG Yan, FU Wei, JIA JingZhe, DONG JinGao, SHEN Shen, HAO ZhiMin. Bioinformatics and Expression Pattern Analysis of Maize ACO Gene Family [J]. Scientia Agricultura Sinica, 2024, 57(7): 1308-1318.
[7] GAO ChenXi, HAO LuYang, HU Yue, LI YongXiang, ZHANG DengFeng, LI ChunHui, SONG YanChun, SHI YunSu, WANG TianYu, LI Yu, LIU XuYang. Analysis of Transposable Element Associated Epigenetic Regulation under Drought in Maize [J]. Scientia Agricultura Sinica, 2024, 57(6): 1034-1048.
[8] ZHAO KaiNan, DING Hao, LIU AKang, JIANG ZongHao, CHEN GuangZhou, FENG Bo, WANG ZongShuai, LI HuaWei, SI JiSheng, ZHANG Bin, BI XiangJun, LI Yong, LI ShengDong, WANG FaHong. Nitrogen Fertilizer Reduction and Postponing for Improving Plant Photosynthetic Physiological Characteristics to Increase Wheat- Maize and Annual Yield and Economic Return [J]. Scientia Agricultura Sinica, 2024, 57(5): 868-884.
[9] WANG Yu, ZHANG YuPeng, ZHU GuanYa, LIAO HangXi, HOU WenFeng, GAO Qiang, WANG Yin. Effects of Localized Nitrogen Supply on Plant Growth and Water and Nitrogen Use Efficiencies of Maize Seedling Under Drought Stress [J]. Scientia Agricultura Sinica, 2024, 57(5): 919-934.
[10] GAO ShangJie, LIU XingRen, LI YingChun, LIU XiaoWan. Effects of Biochar and Straw Return on Greenhouse Gas Emissions and Global Warming Potential in the Farmland [J]. Scientia Agricultura Sinica, 2024, 57(5): 935-949.
[11] LI QianChuan, XU ShiWei, ZHANG YongEn, ZHUANG JiaYu, LI DengHua, LIU BaoHua, ZHU ZhiXun, LIU Hao. Stacking Ensemble Learning Modeling and Forecasting of Maize Yield Based on Meteorological Factors [J]. Scientia Agricultura Sinica, 2024, 57(4): 679-697.
[12] SHI DeYang, LI YanHong, WANG FeiFei, XIA DeJun, JIAO YanLin, SUN NiNa, ZHAO Jian. Regulation Effects of Line-Spacing Expansion and Row-Spacing Shrinkage on Dry Matter and Nutrient Accumulation and Transport of Summer Maize Under High Plant Density [J]. Scientia Agricultura Sinica, 2024, 57(23): 4658-4672.
[13] CAO WenZhuo, YU ZhenWen, ZHANG YongLi, ZHANG Zhen, SHI Yu, WANG YongJun. The Difference of Grain Starch Accumulation Dynamics and Yield Formation of Spring Maize Under Different Nitrogen Application Rates in Black Soil [J]. Scientia Agricultura Sinica, 2024, 57(22): 4431-4443.
[14] DONG KuiJun, ZHANG YiTao, LIU HanWen, ZHANG JiZong, WANG WeiJun, WEN YanChen, LEI QiuLiang, WEN HongDa. Effects of Nitrogen Reduction Application of Summer Maize- Soybean Intercropping on Agronomic Traits and Economic Benefits as well as Its Yield of Subsequent Wheat [J]. Scientia Agricultura Sinica, 2024, 57(22): 4495-4506.
[15] HAN XuDong, YANG ChuanQi, ZHANG Qing, LI YaWei, YANG XiaXia, HE JiaTian, XUE JiQuan, ZHANG XingHua, XU ShuTu, LIU JianChao. QTL Mapping and Candidate Gene Screening for Nitrogen Use Efficiency in Maize [J]. Scientia Agricultura Sinica, 2024, 57(21): 4175-4191.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!