中国农业科学 ›› 2018, Vol. 51 ›› Issue (24): 4659-4676.doi: 10.3864/j.issn.0578-1752.2018.24.007

• 土壤肥料·节水灌溉·农业生态环境 • 上一篇    下一篇

基于RF和SGT算法的子区优先建模对绿洲尺度 土壤盐度预测精度的影响

王飞1,2,3(),杨胜天2,魏阳2,杨晓东2,3,丁建丽1,2,3()   

  1. 1 新疆大学智慧城市与环境建模新疆普通高校重点实验室,乌鲁木齐830046
    2 新疆大学资源与环境科学学院,乌鲁木齐 830046
    3 绿洲生态教育部重点实验室,乌鲁木齐 830046
  • 收稿日期:2018-05-14 接受日期:2018-07-20 出版日期:2018-12-16 发布日期:2018-12-16
  • 基金资助:
    国家自然科学基金(U1603241、41661046、41771470、41261090、U1303381);新疆大学博士研究基金(BS150246)

Influence of Sub-Region Priority Modeling Constructed by Random Forest and Stochastic Gradient Treeboost on the Accuracy of Soil Salinity Prediction in Oasis Scale

WANG Fei1,2,3(),YANG ShengTian2,WEI Yang2,YANG XiaoDong2,3,DING JianLi1,2,3()   

  1. 1 Xinjiang Common University Key Laboratory of Smart City and Environmental Stimulation, Xinjiang University, Urumqi 830046
    2 College of Resource and Environmental Sciences, Xinjiang University, Urumqi 830046
    3 Laboratory for Oasis Ecosystem, Ministry of Education, Urumqi 830046
  • Received:2018-05-14 Accepted:2018-07-20 Online:2018-12-16 Published:2018-12-16

摘要:

目的 试图通过优先在干旱区绿洲的子区构建模型以提高绿洲全局土壤盐度的预测精度。同时量化全局模型和子区模型之间精度的差异性和不确定性。方法 利用随机森林(Random Forest,RF)和随机梯度增进算法(Stochastic Gradient Treeboost,SGT)定量化上述不确定性,同时,对比本地尺度多个情景(景观)优先建立模型再合并预测值对于模拟全局土壤盐度的精度影响。基于驱动因子(土地利用和地貌),响应因子(Normalized Difference Vegetation Index, NDVI和土壤电导率,EC),研究设计了27个能够相对覆盖典型绿洲不同土壤盐度变异性的环境情景。结果 70.37%(19/27)的情景证明SGT的预测精度高于RF。单独建模的10个情景的预测精度高于全局模型下10个再分类情景(根据情景设定规则将全局模型预测值再分类)的精度。特别是,EC≤4 dS·m -1 和 2 dS·m -1< EC<16 dS·m -1两个情景应该单独进行建模预测。4个情景(两两合并)预测值合并后的精度高于全局模型再分类后的精度。需要指出的是,用于绿洲尺度子区情景构建的首选分割变量是EC,其次是地貌和土地利用。结论 研究推荐基于SGT在绿洲内部不同景观尺度上优先建模,再将各景观尺度的预测值进行合并,以提高绿洲土壤盐度的推理精度。

关键词: 土壤盐分, 机器学习, 干旱区, Landsat OLI, 空间异质性, 随机森林算法, 随机梯度增进算法

Abstract:

【Objective】 This study attempts to improve the prediction accuracy of soil salinity in arid oasis by building models preferentially in the sub-area of oasis. At the same time, the difference and uncertainty of accuracy between global model and sub-region model are quantified. 【Method】 Therefore, to investigate the above differences, this study used two machine learning methods (Random Forest, RF and Stochastic Gradient Treeboost, SGT) to quantify the above effects and to prove the necessity of the building model in the sub-region compared with the full-sample model with respect to the simulation precision under the complex background of an arid region. Twenty-seven environmental scenarios (twelve original and fifteen derivatives) were designed based on the driving factors (land use and landform) and response factors (Normalized Difference Vegetation Index, NDVI and electrical conductivity, EC), which reflected variety of variabilities in soil salinity. After analyzing the results, the following preliminary conclusions were drawn. 【Result】 The simulation results from 70.37% (19/27) of the scenarios showed that the predicted value of soil salinity from SGT was closer to the observed value from RF. Ten original sub-regions were modeled individually and compared with the full-sample model under the oasis scale (according to the 10 partition rules to reclassify the simulated values), and the result showed that the prediction accuracy of the former 70% scenario was higher than that of the latter. In particular, the regions of EC≤4 dS·m -1 and 2 ddS·m -1<EC<16 dS·m -1 should be modeled separately to predict the spatial variability of regional salinity. By combining the predictions of sub-regions and comparing them with the predicted values of the full-sample model, the former (all four different combination modes) showed a higher prediction accuracy than the latter. In addition, this result also indicated that the preferred medium for partitioning the sub-regions was soil electrical conductivity, followed by landform and land use. 【Conclusion】 The study proposes to establish a soil salinity model based on SGT preferentially on different landscape scales within the oasis, and then combine the predicted values of each landscape scale to improve the prediction accuracy of oasis soil salinity.

Key words: soil salinity, machine learning, arid regions, Landsat OLI, spatial heterogeneity, Random Forest, Stochastic Gradient Treeboost