中国农业科学 ›› 2020, Vol. 53 ›› Issue (3): 563-573.doi: 10.3864/j.issn.0578-1752.2020.03.009

• 土壤肥料·节水灌溉·农业生态环境 • 上一篇    下一篇

集成土壤-环境关系与机器学习的干旱区土壤属性数字制图

张振华,丁建丽(),王敬哲,葛翔宇,王瑾杰,田美玲,赵启东   

  1. 新疆大学资源与环境科学学院/新疆大学绿洲生态教育部重点实验室/新疆大学智慧城市与环境建模自治区普通高校重点实验室,乌鲁木齐 830046
  • 收稿日期:2019-05-06 接受日期:2019-09-18 出版日期:2020-02-01 发布日期:2020-02-13
  • 通讯作者: 丁建丽
  • 作者简介:张振华,E-mail:15099577874@163.com。
  • 基金资助:
    国家重点研发计划(2016YFC0402409-03);国家自然科学基金(41961059);国家自然科学基金(41771470);新疆维吾尔自治区自然科学基金青年基金(2018D01C067)

Digital Soil Properties Mapping by Ensembling Soil-Environment Relationship and Machine Learning in Arid Regions

ZHANG ZhenHua,DING JianLi(),WANG JingZhe,GE XiangYu,WANG JinJie,TIAN MeiLing,ZHAO QiDong   

  1. College of Research and Environmental Science, Xinjiang University/ Ministry of Education Key Laboratory of Qasis Ecology, Xinjiang University/ Key Laboratory of Smart City and Environment Modelling of Higher Education Institute, Xinjiang University, Urumqi 830046
  • Received:2019-05-06 Accepted:2019-09-18 Online:2020-02-01 Published:2020-02-13
  • Contact: JianLi DING

摘要:

【目的】土壤属性的空间分布是影响农业生产力、土地管理和生态安全的重要因素。通过土壤环境耦合关系,在机器学习算法框架下,定量预测出干旱区土壤酸碱度(pH)、土壤盐分含量(Soil Salt Content,SSC)与土壤有机质(Soil Organic Matter, SOM)3种土壤属性的空间分布,为干旱区农业生产和生态安全提供科学依据。【方法】在渭干河—库车河绿洲干旱区于2017年7月设计采集典型表层(0—20 cm)土壤样品82个,依据土壤-环境之间的关系,集成DEM数据和Landsat 8数据提取出32种环境协变量,利用栅格重采样将提取出的32种变量重采样为90 m空间分辨率并转换为Grid格式参与建模。借助梯度提升决策树(Gradient Boosting Decision Tree,GBDT)模型依次对3类土壤属性的32种环境协变量进行重要性排序,并通过均方根误差(Root Mean Square Error,RMSE)界定出协变量重要性阈值点,从而筛选出参与3类土壤属性制图的环境协变量。进而运用随机森林(Random Forest, RF)、Bagging和Cubist 3种非线性模型建模,并引入多元线性回归模型(Multiple Linear Regression,MLR)进行对比分析,选出最优模型并绘制出90 m分辨率新疆渭干河-库车河绿洲干旱区pH、SSC与SOM 3种土壤属性图。【结果】梯度提升决策树能有效筛选出重要协变量,高程(Elevation)、剖面曲率(Profile Curvature)、差值植被指数(Difference Vegetation Index)、扩展增强型植被指数(Extended Normalized Difference Vegetation Index)、调整土壤亮度植被指数(Modified Soil Adjusted Vegetation Index)、盐分指数S1(Salinity Index S1)以及盐分指数S6 (Salinity Index S6) 7类环境变量均参与3类土壤属性建模,其中SSC遴选出参与建模协变量15种,pH和SOM则均为17种,且遥感指标在预测土壤属性图中起到强大的作用。机器学习3种算法的结果均优于MLR。通过3种非线性模型对比发现,随机森林在3种土壤属性中均表现最佳。在随机森林预测的3种土壤属性中,土壤pH验证集效果R 2=0.6779,RMSE =0.2182,ρc=0.6084;在SSC预测中,验证集R 2=0.7945,RMSE =3.1803,ρc=0.8377;在SOM预测中,验证集R 2=0.7472,RMSE =3.5456,ρc=0.7009。 【结论】GBDT所筛选出的重要性因子借助机器学习算法可以用于干旱区土壤属性制图,且随机森林模型均对3类土壤属性表现出最佳预测能力。依据所绘制的土壤属性图并结合土壤分类图厘清了3种制图属性的空间分布。

关键词: 土壤属性, 环境协变量, 数字土壤制图, 机器学习, 梯度提升决策树模型, 随机森林模型, Bagging模型, Cubist模型

Abstract:

【Objective】The spatial distribution of soil properties is an important factor affecting agricultural productivity, land management and ecological security. Utilizing the coupling relationship between soil and environment within framework of machine learning algorithm, the spatial distribution of soil pH, soil salt content (SSC) and soil organic matter (SOM) was quantitatively predicted to provide a scientific basis on ecological security and agricultural production in the arid region. 【Method】A total of 82 topsoil (0-20 cm) samples were collected from the Ugan-Kuqa River basin oasis in Xinjiang Uyghur Autonomous Region in July 2017. Furthermore, Digital elevation model (DEM) data and Landsat 8 data were used to extract 32 environmental covariates according to the soil-environment relationship. The 32 extracted variables were resampled to 90 m spatial resolution via raster resampling and were converted to grid format for participate in modeling. According to the importance of environmental covariates, they were ranked respectively using Gradient Boosting Decision Tree (GBDT) algorithm on the three soil attributes. We considered three strategies to estimate soil properties, including random forest, bagging and Cubist algorithm. Compared with non-linear models, we introduced classic linear model (MLR) to conduct optimization. On this foundation, we mapped the soil properties (pH, SSC and SOM) with a resolution of 90 m in the Ugan-Kuqa River basin oasis, respectively.【Result】The results showed that GBDT could screen out important covariates effectively. Elevation and Profile Curvature, Difference Vegetation Index, Extended Normalized Difference Vegetation Index, Modified Soil Adjusted Vegetation Index and Salinity Index S1 and Salinity Index S6 were important factors and involved in modeling of three kinds of soil properties, among which SSC selects 15 covariates to participate in modeling, pH and SOM were both 17. Remote sensing index played a significant role in predicting soil property maps. Non-linear models showed more accuracy than MLR as linear model. Random forest performed best in all three soil properties. Among the three soil properties predicted by random forest, the validation dataset of soil pH, SSC and SOM were R 2=0.6779, RMSE=0.2182, ρc=0.6084, R 2=0.7945, RMSE=3.1803, ρc=0.8377 and R 2=0.7472, RMSE=3.5456, ρc=0.7009, respectively. 【Conclusion】 The importance factors selected by GBDT and machine learning algorithm could be used to mapping soil properties in arid areas. The random forest strategy showed the best predictive ability for soil properties. The spatial distribution of mapping three properties could be determined by combining with soil classification map.

Key words: soil property, environment covariates, digital soil mapping, machine learning, Gradient Boosting Decision Tree, GBDT, Random Forest, RF, Bagging Model, Cubist Model