中国农业科学 ›› 2024, Vol. 57 ›› Issue (4): 679-697.doi: 10.3864/j.issn.0578-1752.2024.04.005

• 耕作栽培·生理生化·农业信息技术 • 上一篇    下一篇

基于气象因素的玉米单产堆栈集成学习建模与预测

李乾川(), 许世卫(), 张永恩, 庄家煜, 李灯华, 刘保花, 朱之洵, 刘浩   

  1. 中国农业科学院农业信息研究所,北京 100081
  • 收稿日期:2023-06-12 接受日期:2023-08-02 出版日期:2024-02-16 发布日期:2024-02-20
  • 通信作者:
    许世卫,E-mail:
  • 联系方式: 李乾川,E-mail:82101211326@caas.cn。
  • 基金资助:
    中国农业科学院科技创新工程(CAAS-ASTIP-2016-AII)

Stacking Ensemble Learning Modeling and Forecasting of Maize Yield Based on Meteorological Factors

LI QianChuan(), XU ShiWei(), ZHANG YongEn, ZHUANG JiaYu, LI DengHua, LIU BaoHua, ZHU ZhiXun, LIU Hao   

  1. Agricultural Information Institute, Chinese Academy of Agricultural Sciences, Beijing 100081
  • Received:2023-06-12 Accepted:2023-08-02 Published:2024-02-16 Online:2024-02-20

摘要:

【目的】在世界气候变化加剧和气象灾害频发的背景下,探究气象因素对玉米单产的重要性并准确预测玉米单产对于促进农业生产和田间管理具有重要意义。本文旨在量化分析玉米各生育阶段气象因素对单产的重要性并建立高精度、高可靠性的玉米气象单产堆栈集成学习估测模型来预测单产。【方法】利用HP滤波法和移动平均法确定各县域趋势单产模型并分离出各县气象单产。采用轻量级梯度提升机(LightGBM)、Bagging和Stacking 3种集成学习方法,通过对中国12个省份596个县级行政区域和气象观测站跨度34年的日度气象数据和玉米产量数据进行分析,建立3种基于不同集成学习框架(LightGBM、Bagging和Stacking)的玉米气象单产预测模型。【结果】适用HP滤波法作为趋势单产模型的县域主要集中在陕西、河南、江苏和安徽地区。相较于HP滤波法,更多县域适用于移动平均法,且多数县域R 2分布于0.8以上。基于5年滑动预测和模型精度评价指标,3种模型对玉米单产的平均绝对百分比误差(MAPE)指标均低于6%。Stacking模型MAPE值达到4.60%,预测精度高,泛化性强。结果表明玉米气象单产堆栈集成学习预测模型(Stacking)具有更高精度和更强鲁棒性,并能有效利用各基学习器特点与优势,提升预测精度,是根据气象因素预测玉米单产的最优模型。此外,基于12省玉米生育阶段27个气象因素的随机森林特征重要性评分对玉米单产的定量分析,对作物监测和田间管理有借鉴和参考意义。【结论】3种集成学习方法,尤其是堆栈集成学习模型(Stacking)预测效果能够详细反映出玉米单产的时空分布变化情况。基于气象因素的玉米单产堆栈集成学习模型可为田间管理和精准预测玉米单产提供新方法。

关键词: 玉米气象单产, 集成学习, 单产估测, 县级数据, 特征重要性

Abstract:

【Objective】In the context of intensified global climate change and frequent meteorological disasters, exploring the significance of meteorological factors on maize yield and accurately predicting maize yield is crucial for enhancing agricultural production and field management. This paper aims to quantitatively analyze the importance of meteorological factors during various growth stages of maize on yield and to establish a highly accurate and reliable maize meteorological yield stacking ensemble learning estimation model for yield prediction.【Method】Using the HP filter method and moving average method, trend yield models for various counties were determined, and county-level meteorological yields were isolated. Three ensemble learning methods (light gradient boosting machine (LightGBM), Bagging, and Stacking) were employed. By analyzing daily meteorological data and maize yield data over 34 years from 596 county-level administrative regions and meteorological observation stations across 12 provinces in China, three maize meteorological yield prediction models based on different ensemble learning frameworks (LightGBM, Bagging, and Stacking) were established.【Result】The HP filter method as the trend yield model was mainly applicable in the regions of Shaanxi, Henan, Jiangsu, and Anhui. Compared to the HP filter method, more counties were suitable for the moving average method, with most counties having the R2 distribution above 0.8. Based on a 5-year sliding forecast and model accuracy evaluation indicators, the mean absolute percentage error (MAPE) for the three models on maize yield was below 6%. The Stacking model achieved a MAPE of 4.60%, indicating high prediction accuracy and strong generalizability. The results demonstrate that the maize meteorological yield stack-integrated learning prediction model has higher accuracy and stronger robustness. It effectively utilizes the characteristics and advantages of each base learner to improve prediction accuracy, making it the optimal model for predicting maize yield based on meteorological factors. Furthermore, a quantitative analysis of the impact of 27 meteorological factors during the maize growth stages in 12 provinces, using the random forest feature importance score, is of reference value for crop monitoring and field management.【Conclusion】The three ensemble learning methods, especially the stack-integrated learning model (Stacking), can accurately reflect the spatiotemporal distribution changes in maize yield. The stack-integrated learning model for maize yield based on meteorological factors provides a new method for field management and accurate prediction of maize yield.

Key words: maize meteorological yield, ensemble learning, yield estimation, county-level data, feature importance