Please wait a minute...
Journal of Integrative Agriculture  2023, Vol. 22 Issue (6): 1909-1927    DOI: 10.1016/j.jia.2023.02.011
Agricultural Economics and Management Advanced Online Publication | Current Issue | Archive | Adv Search |
Ensemble learning prediction of soybean yields in China based on meteorological data
LI Qian-chuan1, XU Shi-wei1, 2, 5#, ZHUANG Jia-yu1, 5, LIU Jia-jia2, ZHOU Yi3, ZHANG Ze-xi4

1 Agricultural Information Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, P.R.China

2 Beijing Engineering Research Center for Agricultural Monitoring and Early Warning, Beijing 100081, P.R.China

3 Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, P.R.China

4 The Department of Mathematics, Columbia University, NY 10027, USA

5 Key Laboratory of Agricultural Monitoring and Early Warning Technology, Ministry of Agriculture and Rural Affairs, Beijing 100081, P.R.China

Download:  PDF in ScienceDirect  
Export:  BibTeX | EndNote (RIS)      

准确预测大豆单产对于农业生产、监测和预警具有重要意义。尽管目前有研究已经使用机器学习算法来基于气象数据预测大豆单产,但尚没有充分探讨如何使用不同的模型来有效地将不同地区的大豆气象单产与大豆单产区分开来。此外,综合利用各种机器学习算法的优势与特点以通过集成学习算法提高大豆预测单产精度的研究也不够深入。通过对中国最主要的两个大豆主产区东北地区和黄淮地区,173个县级行政区域和气象观测站跨度34年的单日气象数据和大豆产量数据进行研究与分析,本文采用K近邻(K-Nearest Neighbors, KNN),随机森林(Random Forest, RF)和支持向量机(Support Vector Machine, SVR)作为3个有效的基模型,建立了基于堆栈集成学习框架的高精度、高可靠性大豆气象单产预测模型。通过5折交叉验证进一步提升了模型泛化能力,并利用主成分分析降维和超参数调优对模型进行了优化。利用173个县的5年滑动预测和4种回归指标进行模型精度评价,表明大豆气象单产堆栈集成学习预测模型具有更高的精度和更强的鲁棒性。基于堆栈集成学习框架173个县大豆单产5年滑动估测表明,模型估测效果能够详细反映出大豆单产的时空分布变化情况,MAPE低于5%。大豆气象单产堆栈集成学习预测模型为准确预测大豆单产提供了新的思路。


The accurate prediction of soybean yield is of great significance for agricultural production, monitoring and early warning.  Although previous studies have used machine learning algorithms to predict soybean yield based on meteorological data, it is not clear how different models can be used to effectively separate soybean meteorological yield from soybean yield in various regions.  In addition, comprehensively integrating the advantages of various machine learning algorithms to improve the prediction accuracy through ensemble learning algorithms has not been studied in depth.  This study used and analyzed various daily meteorological data and soybean yield data from 173 county-level administrative regions and meteorological stations in two principal soybean planting areas in China (Northeast China and the Huang–Huai region), covering 34 years.  Three effective machine learning algorithms (K-nearest neighbor, random forest, and support vector regression) were adopted as the base-models to establish a high-precision and highly-reliable soybean meteorological yield prediction model based on the stacking ensemble learning framework.  The model’s generalizability was further improved through 5-fold cross-validation, and the model was optimized by principal component analysis and hyperparametric optimization.  The accuracy of the model was evaluated by using the five-year sliding prediction and four regression indicators of the 173 counties, which showed that the stacking model has higher accuracy and stronger robustness.  The 5-year sliding estimations of soybean yield based on the stacking model in 173 counties showed that the prediction effect can reflect the spatiotemporal distribution of soybean yield in detail, and the mean absolute percentage error (MAPE) was less than 5%.  The stacking prediction model of soybean meteorological yield provides a new approach for accurately predicting soybean yield.

Keywords:  meteorological factors       ensemble learning        crop yield prediction        machine learning        county-level  
Received: 27 September 2022   Online: 10 February 2023   Accepted: 16 November 2022
Fund: The research was supported by the Science and Tech- nology Innovation Project of Chinese Academy of Agricultural Sciences (CAAS-ASTIP-2016-AII).     
About author:  LI Qian-chuan, E-mail:; #Correspondence XU Shi-wei, Email:

Cite this article: 

LI Qian-chuan, XU Shi-wei, ZHUANG Jia-yu, LIU Jia-jia, ZHOU Yi, ZHANG Ze-xi. 2023. Ensemble learning prediction of soybean yields in China based on meteorological data. Journal of Integrative Agriculture, 22(6): 1909-1927.

Abdi-Dehkordi M, BozorgHaddad O, Chu X. 2018. Determination of optimized cropping patterns according to crop yield response under baseline condition and climatechange condition. Irrigation and Drainage67, 654–669.

Berndt C, Haberlandt U. 2018. Spatial interpolation of climate variables in Northern Germany - Influence of temporal resolution and network density. Journal of Hydrology (Regional Studies), 15, 184–202.

Bhowmik A K, Costa A C. 2015. Representativeness impacts on accuracy and precision of climate spatial interpolation in datascarce regions. Meteorological Applications22, 368–377.

Bongaarts J. 2020. Special report on climate change and land use. Population and Development Review45, 936–937.

Breiman L. 2001. Random forests. Machine Learning45, 5–32.

Brito G R A, Villaverde A R, Quan A L, Pérez M E R. 2021. Comparison between SARIMA and Holt–Winters models for forecasting monthly streamflow in the western region of Cuba. SN Applied Sciences3, 671.

Cai Y, Guan K, Lobell D, Potgieter A B, Wang S, Peng J, Xu T, Asseng S, Zhang Y, You L, Peng B. 2019. Integrating satellite and climate data to predict wheat yield in Australia using machine learning approaches. Agricultural and Forest Meteorology274, 144–159.

Corrales D C, Schoving C, Raynal H, Debaeke P, Journet E, Constantin J. 2022. A surrogate model based on feature selection techniques and regression learners to improve soybean yield prediction in southern France. Computers and Electronics in Agriculture192, 106578.

Cortes C, Vapnik V. 1995. Support-vector networks. Machine Learning20, 273–297.

Cover T, Hart P E. 1967. Nearest neighbor pattern classification. IEEE Transactions on Information Theory13, 21–27.

Das B, Nair B, Arunachalam V, Reddy K V, Venkatesh P, Chakraborty D, Desai S. 2020. Comparative evaluation of linear and nonlinear weather-based models for coconut yield prediction in the west coast of India. International Journal of Biometeorology64, 1111–1123.

Department of Rural Socio-economic Survey. 2020. China Rural Statistical Yearbook 2020. China Statistics Press, Beijing. (in Chinese)

Eulenstein F, Lana M, Schlindwein S, Sheudzhen A, Tauschke M, Behrend A, Guevara E, Meira S. 2016. Trends of soybean yields under climate change scenarios. Horticulturae3, 10.

Fehr W R, Caviness C E. 1977. Stage of soybean development. Special Report 80, Iowa State University.

Fei S, Hassan M A, He Z, Chen Z, Shu M, Wang J, Li C, Xiao Y. 2021. Assessment of ensemble learning to predict wheat grain yield based on UAV-multispectral reflectance. Remote Sensing13, 2338.

Feng L, Zhang Z, Ma Y, Du Q, Williams P, Drewry J, Luck B. 2020. Alfalfa yield prediction using UAV-based hyperspectral imagery and ensemble learning. Remote Sensing12, 2028.

Gao W, Huang X, Lin M, Jia J, Tian Z. 2022. Short-term cooling load prediction for office buildings based on feature selection scheme and stacking ensemble model. Engineering Computations39, 2003–2029.

Grassini P, Eskridge K M, Cassman K G. 2013. Distinguishing between yield advances and yield plateaus in historical crop production trends. Nature Communications4, 2918.

Gu J, Liu S, Zhou Z, Chalov S R, Zhuang Q. 2022. A stacking ensemble learning model for monthly rainfall prediction in the Taihu Basin, China. Water14, 492.

Guo S, Guo E, Zhang Z, Dong M, Wang X, Fu Z, Guan K, Zhang W, Zhang W, Zhao J, Liu Z, Zhao C, Yang X. 2022. Impacts of mean climate and extreme climate indices on soybean yield and yield components in Northeast China. Science of the Total Environment838, 156284.

Guo S, Yang X, Zhang Z, Zhang F, Liu T. 2021. Spatial distribution and temporal trend characteristics of agro-climatic resources and extreme climate events during the soybean growing season in Northeast China from 1981 to 2017. Journal of Meteorological Research34, 1309–1323.

Hodrick R J, Prescott E C. 1997. Postwar U.S. business cycles: An empirical investigation. Journal of Money, Credit and Banking29, 1–16.

Hu L, Huang M, Ke S, Tsai C. 2016. The distance function effect on k-nearest neighbor classification for medical datasets. SpringerPlus5, 1304.

Huang D, Dai A, Zhu J. 2020. Are the transient and equilibrium climate change patterns similar in response to increased CO2Journal of Climate33, 8003–8023.

Iizumi T, Luo J, Challinor A J, Sakurai G, Yokozawa M, Sakuma H, Brown M E, Yamagata T. 2014. Impacts of El Niño Southern Oscillation on the global yields of major crops. Nature Communications5, 3712.

Jägermeyr J, Müller C, Ruane A C, Elliott J, Balkovic J, Castillo O, Faye B, Foster I, Folberth C, Franke J A, Fuchs K, Guarin J R, Heinke J, Hoogenboom G, Iizumi T, Jain A K, Kelly D, Khabarov N, Lange S, Lin T, et al. 2021. Climate impacts on global agriculture emerge earlier in new generation of climate and crop models. Nature Food2, 873–885.

Ji Z, Pan Y, Li N. 2021. Integrating the temperature vegetation dryness index and meteorology parameters to dynamically predict crop yield with fixed date intervals using an integral regression model. Ecological Modelling455, 109651.

Jia H, Zhao J, Liu J, Zhang M, Sun W. 2021. Accurate heart disease prediction via improved stacking integration algorithm. Journal of Imaging Science and Technology65, 030408.

Kern A, Barcza Z, Marjanović H, Árendás T, Fodor N, Bónis P, Bognár P, Lichtenberger J. 2018. Statistical modelling of crop yield in Central Europe using climate data and remote sensing vegetation indices. Agricultural and Forest Meteorology260, 300–320.

Kukal M S, Irmak S. 2018. Climate-driven crop yield and yield variability and climate change impacts on the U.S. great plains agricultural production. Scientific Reports8, 3450.

Li C, Wang Y, Ma C, Chen W, Li Y, Li J, Ding F, Xiao Z. 2021. Improvement of wheat grain yield prediction model performance based on stacking technique. Applied Sciences11, 12164.

Li L, Yang T, Liu R, Redden B, Maalouf F, Zong X. 2017. Food legume production in China. The Crop Journal5, 115–126.

Li S, You S, Song Z, Zhang L, Liu Y. 2021. Impacts of climate and environmental change on bean cultivation in China. Atmosphere12, 1591.

Li X, Lu H, Zhang Z, Xing W. 2021. Spatio-temporal variations of the major meteorological disasters and its response to climate change in Henan Province during the past two millennia. PeerJ9, e12365.

Liakos K G, Busato P, Moshou D, Pearson S, Bochtis D. 2018. Machine learning in agriculture: A review. Sensors18, 2674.

Liu C, Sun Y. 2019. A simple and trustworthy asymptotic test in difference-in-differences regressions. Journal of Econometrics210, 327–362.

Liu L, Wu L. 2020. Holt–Winters model with grey generating operator and its application. Communication in Statistics - Theory and Methods51, 1–14.

Liu W, Sun W, Huang J, Wen H, Huang R. 2021. Excessive rainfall is the key meteorological limiting factor for winter wheat yield in the middle and lower reaches of the Yangtze River. Agronomy12, 50.

Liu Y, Wang S, Wang X, Chen B, Chen J, Wang J, Huang M, Wang Z, Ma L, Wang P, Amir M, Zhu K. 2022. Exploring the superiority of solar-induced chlorophyll fluorescence data in predicting wheat yield using machine learning and deep learning methods. Computers and Electronics in Agriculture192, 106612.

Madhukar A, Dashora K, Kumar V. 2021. Climate trends in temperature and water variables during wheat growing season and impact on yield. Environmental Processes8, 1047–1072.

Maestrini B, Basso B. 2018. Drivers of within-field spatial and temporal variability of crop yield across the US Midwest. Scientific Reports8, 14833.

Milfont T L, Zubielevitch E, Milojev P, Sibley C G. 2021. Ten-year panel data confirm generation gap but climate beliefs increase at similar rates across ages. Nature Communications12, 4038.

Mokhtar A, Ssawy W E, He H, Al-Anasari N, Sammen S S, Gyasi-Agyei Y, Abuarab M. 2022. Using machine learning models to predict hydroponically grown lettuce yield. Frontiers in Plant Science13, 706042.

Mota L F M, Giannuzzi D, Bisutti V, Pegolo S, Trevisi E, Schiavon S, Gallo L, Fineboym D, Katz G, Cecchinato A. 2022. Real-time milk analysis integrated with stacking ensemble learning as a tool for the daily prediction of cheese-making traits in Holstein cattle. Journal of Dairy Science105, 4237–4255.

Napier-Munn T J, Meyer D H. 1999. A modified paired t-test for the analysis of plant trials with data autocorrelated in time. Minerals Engineering12, 1093–1100.

Nguyen-Huy T, Deo R C, Mushtaq S, An-Vo D A, Khan S. 2018. Modeling the joint influence of multiple synoptic-scale, climate mode indices on Australian wheat yield using a vine copula-based approach. European Journal of Agronomy98, 65–81.

Pang A, Chang M W L, Chen Y. 2022. Evaluation of random forests (RF) for regional and local-scale wheat yield prediction in Southeast Australia. Sensors22, 717.

Pangarkar D J, Sharma R, Sharma A, Sharma M. 2020. Assessment of the different machine learning models for prediction of cluster bean (Cyamopsis tetragonoloba L. Taub.) yield. Advances in Research21, 98–105.

Rajković D, Jeromela A M, Pezo L, Lončar B, Zanetti F, Monti A, Špika A K. 2021. Yield and quality prediction of winter rapeseed - Artificial neural network and random forest models. Agronomy12, 58.

Rong L, Duan X, Gu Z, Feng D. 2021. Climatic and environmental drivers on temporal-spatial variations of grain meteorological yield in high mountainous region. Archives of Agronomy and Soil Science67, 2000–2014.

dos Santos C A C, Neale C M U, Mekonnen M M, Goncalves I Z, de Oliveira G, Ruiz-Alvarez O, Safa B, Rowe C M. 2022. Trends of extreme air temperature and precipitation and their impact on corn and soybean yields in Nebraska, USA. Theoretical and Applied Climatology147, 1379–1399.

Saxena R, Mathur P. 2019. Recent trends in rainfall and temperature over North West India during 1871–2016. Theoretical and Applied Climatology135, 1323–1338.

Selvaraj M G, Valderrama M, Guzman D, Valencia M, Ruiz H, Acharjee A. 2020. Machine learning for high-throughput field phenotyping and image processing provides insight into the association of above and below-ground traits in cassava (Manihot esculenta Crantz). Plant Methods16, 87.

Shafiee S, Lied L M, Burud I, Dieseth J A, Alsheikh M, Lillemo M. 2021. Sequential forward selection and support vector regression in comparison to LASSO regression for spring wheat yield prediction based on UAV imagery. Computers and Electronics in Agriculture183, 106036.

Shakhovska N, Melnykova N, Chopiyak V, Ml M G. 2022. An ensemble methods for medical insurance costs prediction task. ComputersMaterials & Continua70, 3969–3984.

Shimoda S, Kanno H, Hirota T. 2018. Time series analysis of temperature and rainfall-based weather aggregation reveals significant correlations between climate turning points and potato (Solanum tuberosum L.) yield trends in Japan. Agricultural and Forest Meteorology263, 147–155.

Singh S, Prakash A, Chakraborty N R, Wheeler C, Agarwal P K, Ghosh A. 2016. Trait selection by path and principal component analysis in Jatropha curcas for enhanced oil yield. Industrial Crops & Products86, 173–179.

Srivastava A K, Safaei N, Khaki S, Lopez G, Zeng W Z, Ewert F, Gaiser T, Rahimi J. 2022. Winter wheat yield prediction using convolutional neural networks from environmental and phenological data. Scientific Reports12, 3215.

Srivastava R K, Mequanint F, Chakraborty A, Panda R K, Halder D. 2022. Augmentation of maize yield by strategic adaptation to cope with climate change for a future period in Eastern India. Journal of Cleaner Production339, 130599.

Sun J, Di L, Sun Z, Shen Y, Lai Z. 2019. County-level soybean yield prediction using deep CNN-LSTM model. Sensors19, 4363.

Taghizadeh-Mehrjardi R, Schmidt K, Amirian-Chakan A, Rentschler T, Zeraatpisheh M, Sarmadian F, Valavi R, Davatgar N, Behrens T, Scholten T. 2020. Improving the spatial prediction of soil organic carbon content in two contrasting climatic regions by stacking machine learning models and rescanning covariate space. Remote Sensing12, 1095.

Tian H, Wang P, Tansey K, Zhang J, Zhang S, Li H. 2021. An LSTM neural network for improving wheat yield estimates by integrating remote sensing data and meteorological data in the Guanzhong Plain, PR China. Agricultural and Forest Meteorology310, 108629.

Tibshirani R. 1996. A comparison of some error estimates for neural network models. Neural Computation8, 152–163.

Trull O, García-Díaz J C, Troncoso A. 2020. Initialization methods for multiple seasonal Holt–Winters forecasting models. Mathematics8, 268.

Ventura L M B, Pinto F D, Soares L M, Luna A S, Gioda A. 2019. Forecast of daily PM2.5 concentrations applying artificial neural networks and Holt–Winters models. Air QualityAtmosphere & Health12, 317–325.

Wang C, Linderholm H W, Song Y, Wang F, Liu Y, Tian J, Xu J, Song Y, Ren G. 2020. Impacts of drought on maize and soybean production in Northeast China during the past five decades. International Journal of Environmental Research and Public Health17, 2459.

Wang H, Wang G. 2022. The prediction model for haze pollution based on stacking framework and feature extraction of time series images. Science of the Total Environment839, 156003.

Wang L, Fan K. 2022. Synoptic and climatic conditions of an extreme snowstorm event over Northeast China and its climate predictability. Frontiers in Earth Science10, 835061.

Wang L, Liao S, Huang S, Ming B, Meng Q, Wang P. 2018. Increasing concurrent drought and heat during the summer maize season in Huang–Huai–Hai Plain, China. International Journal of Climatology38, 3177–3190.

Wang Y, Yang K, Li H. 2020. Industrial time-series modeling via adapted receptive field temporal convolution networks integrating regularly updated multi-region operations based on PCA. Chemical Engineering Science228, 115956.

Wang Z, Shi P, Zhang Z, Meng Y, Luan Y, Wang J. 2018. Separating out the influence of climatic trend, fluctuations, and extreme events on crop yield: A case study in Hunan Province, China. Climate Dynamics51, 4469–4487.

Wolpert D H. 1992. Stacked generalization. Neural Networks5, 241–259.

Wu T, Zhang W, Jiao X, Guo W, Hamoud Y A. 2021. Evaluation of stacking and blending ensemble learning methods for estimating daily reference evapotranspiration. Computers and Electronics in Agriculture184, 106039.

Xu S, Li G, Li Z. 2015. China agricultural outlook for 2015–2024 based on China Agricultural Monitoring and Early-warning System (CAMES). Journal of Integrative Agriculture14, 1889–1902.

Xu Y, Chou J, Yang F, Sun M, Zhao W, Li J. 2021. Assessing the sensitivity of main crop yields to climate change impacts in China. Atmosphere12, 172.

Xuan Y, Yi Y, Liang H, Wei S, Jiang L, Ali I, Ullah S, Zhao Q. 2019. Effects of meteorological factors on the yield and quality of special rice in different periods after anthesis. Agricultural Sciences10, 451–475.

Yang M J, Wang G L, Ahmed K F, Adugna B, Eggen M, Atsbeha E, You L Z, Koo J, Anagnostou E. 2020. The role of climate in the trend and variability of Ethiopia’s cereal crop yields. Science of the Total Environment723, 137893.

Yin X, Leng G. 2020. Modelling global impacts of climate variability and trend on maize yield during 1980–2010. International Journal of Climatology41, E1583–E1596.

Yu Y, Peng M, Wang H, Ma Z, Li W. 2020. Improved PCA model for multiple fault detection, isolation and reconstruction of sensors in nuclear power plant. Annals of Nuclear Energy148, 107662.

Yu Z, Guindani M, Grieco S F, Chen L, Holmes T C, Xu X. 2021. Beyond t test and ANOVA: Applications of mixed-effects models for more rigorous statistical analysis in neuroscience research. Neuron110, 21–35.

Yuan M. 2020. Comment: From ridge regression to methods of regularization. Technometrics62, 447–450.

Zhang H, Zhu T. 2022. Stacking model for photovoltaic-power-generation prediction. Sustainability14, 5669.

Zhang Z, Lu C. 2020. Identification of maize yield trend patterns in the North China Plain. International Journal of Plant Production15, 125–137.

Zhou Z, Shi H, Fu Q, Li T, Gan T Y, Liu S. 2020. Assessing spatiotemporal characteristics of drought and its effects on climate-induced yield of maize in Northeast China. Journal of Hydrology588, 125097.

Zhuang J, Xu S, Li G, Zhang Y, Wu J, Liu J. 2018. The influence of meteorological factors on wheat and rice yields in China. Crop Science58, 1440–1445.

Zymaroieva A, Zhukov O, Romanchuck L. 2020. The spatial patterns of long-term temporal trends in yields of soybean (Glycine max (L.) Merril) in the Central European mixed forests (Polissya) and East European forest steppe ecoregions within Ukraine. Journal of Central European Agriculture21, 320–332.

[1] WU You, ZHAO Wen-qing, MENG Ya-li, WANG You-hua, CHEN Bing-lin, ZHOU Zhi-guo. Relationships between temperature-light meteorological factors and seedcotton biomass per boll at different boll positions[J]. >Journal of Integrative Agriculture, 2018, 17(06): 1315-1326.
[2] HUANG Qing, WANG Li-min, CHEN Zhong-xin, LIU Hang. Effects of meteorological factors on different grades of winter wheat growth in the Huang-Huai-Hai Plain, China[J]. >Journal of Integrative Agriculture, 2016, 15(11): 2647-2657.
No Suggested Reading articles found!