中国农业科学 ›› 2025, Vol. 58 ›› Issue (2): 203-213.doi: 10.3864/j.issn.0578-1752.2025.02.001

• 作物遗传育种·种质资源·分子遗传学 • 上一篇    下一篇

基于机器学习的玉米自交系杂种优势类群研究

曹士亮1(), 张建国1, 于滔1, 杨耿斌1, 李文跃1, 马雪娜1, 孙艳杰4, 韩微波2, 唐贵3, 单大鹏4   

  1. 1 黑龙江省农业科学院玉米研究所/玉米国家工程实验室(哈尔滨),哈尔滨 150086
    2 黑龙江省农业科学院草业研究所,哈尔滨 150086
    3 黑龙江省农业科学院乡村振兴研究所,哈尔滨 150086
    4 黑龙江省农业科学院绥化分院,黑龙江绥化 152052
  • 收稿日期:2024-06-18 接受日期:2024-09-14 出版日期:2025-01-21 发布日期:2025-01-21
  • 联系方式: 曹士亮,Tel:13946013375;E-mail:caoshiliang2003@126.com
  • 基金资助:
    黑龙江省省属科研院所科研业务费(CZKYF2023-1-A003); 黑龙江省种业创新与发展项目

Heterosis Groups Research in Maize Inbred Lines Based on Machine Learning

CAO ShiLiang1(), ZHANG JianGuo1, YU Tao1, YANG GengBin1, LI WenYue1, MA XueNa1, SUN YanJie4, HAN WeiBo2, TANG Gui3, SHAN DaPeng4   

  1. 1 Maize Institute, Heilongjiang Academy of Agricultural Sciences/National Engineering Laboratory for Maize (Harbin), Harbin 150086
    2 Institue of Forage and Grassland Sciences, Heilongjiang Academy of Agricultural Sciences, Harbin 150086
    3 Rural Revitalization Institute, Heilongjiang Academy of Agricultural Sciences, Harbin 150086
    4 Suihua Branch of Heilongjiang Academy of Agricultural Sciences, Suihua 152052, Heilongjiang
  • Received:2024-06-18 Accepted:2024-09-14 Published:2025-01-21 Online:2025-01-21

摘要:

【目的】优化玉米杂种优势类群划分与判别分析方法,为玉米育种提供指导和参考。【方法】采用固相芯片对60份糯玉米自交系进行基因分型,通过质量控制获得不同密度的SNP标记,采用群体结构分析和遗传距离聚类的方法对60份糯玉米进行类群划分,比较不同密度分子标记和分群方法的差异。在此基础上,分别采用随机森林和支持向量机对类群划分结果进行抽样和交叉验证,比较玉米自交系类群判别的预测精度。【结果】通过不同的质量控制标准分别获得11 431和4 022个分子标记,基于2种分子标记密度,分别将60份材料分成5个类群和4个类群,其中,以11 431个SNP标记为基础,通过群体结构分析和遗传距离聚类结果发现,类群内样本一致性为63.33%,以4 022个SNP标记进行分群,发现2种类群划分方法的群内样本一致性为90.00%;比较玉米自交系类群判别的预测精度结果为:基于4 022个标记,随机森林和支持向量机预测精度的平均值(91.43%)高于11 431个标记的预测精度的平均值(86.25%),其中,预测精度最高的是采用4 022个标记的随机森林预测,预测精度为94.17%。【结论】聚类分析法最终将60份玉米糯自交系分为4个类群,运用随机森林和支持向量机对类群划分结果进行抽样和交叉验证,发现随机森林法比支持向量机法能获得更高的预测精度。

关键词: 玉米, 机器学习, 交叉验证, 类群划分, 判别分析

Abstract:

【Objective】The objective of this study is to optimize the classification and discriminant method of maize heterotic groups, and provide guidance and reference for maize breeding practices.【Method】Solid-phase chips were used to genotype 60 waxy maize inbred lines, and high-quality SNP markers with different density were obtained through quality control. Population structure analysis and genetic distance clustering were used to classify the 60 waxy maize inbred lines into different groups, and the differences between different classification methods were compared. On this basis, random forest and support vector machine methods were used to sample and discriminate the results of different classification methods. Five-fold cross-validation was used for sampling, and the prediction accuracy of maize group classification based on different classification methods was compared.【Result】Using different quality control standards, 11 431 and 4 022 molecular markers were obtained, respectively. Based on these two molecular marker densities, 60 materials were divided into 5 and 4 clusters, respectively. When using 11 431 SNP markers, the population structure analysis and genetic distance clustering results showed that the intra-cluster sample consistency was 63.33%. When using 4 022 SNP markers for clustering, the intra-cluster sample consistency was 90.00%. The prediction accuracy results for discriminating maize inbred line clusters showed that the average prediction accuracy (91.43%) of Random Forest and Support Vector Machine using 4 022 markers were higher than that of 11 431 markers (86.25%). Among them, the highest prediction accuracy was achieved by Random Forest using 4 022 markers, with a prediction accuracy of 94.17%.【Conclusion】Clustering analysis ultimately divided 60 waxy maize inbred lines into 4 clusters. Sampling and cross-validation results using Random Forest and Support Vector Machine for cluster classification showed that Random Forest achieved higher prediction accuracy than Support Vector Machine.

Key words: maize, machine learning, cross-validation, clustering, discrimination analysis