|
|
(JIA-2019-0726) A case-based method of selecting covariates for digital soil mapping |
LIANG Peng1, 2, QIN Cheng-zhi1, 2, 3, ZHU A-xing1, 2, 3, 4, 5, HOU Zhi-wei1, 2, FAN Nai-qing1, 2, WANG Yi-jie1, 2 |
1 State Key Laboratory of Resources and Environmental Information System, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, P.R.China
2 University of Chinese Academy of Sciences, Beijing 100049, P.R.China
3 Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application, School of Geography, Nanjing Normal University, Nanjing 210097, P.R.China
4 Key Laboratory of Virtual Geographic Environment, Ministry of Education, Nanjing Normal University, Nanjing 210023, P.R.China
5 Department of Geography, University of Wisconsin-Madison, Madison, WI 53706, USA |
|
|
摘要 在数字土壤制图中,选择合适的环境变量是影响制图精度的重要因素之一。然而,现有的基于统计或者机器学习为数字土壤制图选择环境变量的方法往往不适用于样点较少的情况。针对这一问题,本文提出了一种基于案例的方法,以形式化表达数字土壤制图实际应用中隐含的有关环境变量选择的知识,用于自动选择环境变量。该方法首先从实际数字土壤制图应用中提取案例,然后利用这些案例对每一个潜在环境变量训练一个随机森林分类器,最后使用训练好的分类器判断相应的潜在环境变量是否适合于新的数字土壤制图应用问题。评价实验以选择地形类环境变量为例,从56篇同行评审的期刊文章中抽取了191个数字土壤制图案例,通过留一交叉验证的方式来评价新方法的性能,并与一种模拟数字土壤制图新手用户选择环境变量的方法进行对比。实验结果显示:与对照方法相比,本文提出的基于案例的方法在查全率、查准率和F1度量等三种定量评价指标上都有30%以上的提升。本文提出的新方法也可类似应用于其它地理变量制图领域(如滑坡敏感性制图、物种分布制图等)中的环境变量自动选择问题。 |
|
Abstract Selecting a proper set of covariates is one of the most important factors that influence the accuracy of digital soil mapping (DSM). The statistical or machine learning methods for selecting DSM covariates are not available for those situations with limited samples. To solve the problem, this paper proposed a case-based method which could formalize the covariate selection knowledge contained in practical DSM applications. The proposed method trained Random Forest (RF) classifiers with DSM cases extracted from the practical DSM applications and then used the trained classifiers to determine whether each one potential covariate should be used in a new DSM application. In this study, we took topographic covariates as examples of covariates and extracted 191 DSM cases from 56 peer-reviewed journal articles to evaluate the performance of the proposed case-based method by Leave-One-Out cross validation. Compared with a novices’ commonly-used way of selecting DSM covariates, the proposed case-based method improved more than 30% accuracy according to three quantitative evaluation indices (i.e., recall, precision, and F1-score). The proposed method could be also applied to selecting the proper set of covariates for other similar geographical modeling domains, such as landslide susceptibility mapping, and species distribution modeling.
|
Received: 11 June 2019
Online: 20 January 2020
|
Fund:This work was supported by grants from the National Natural Science Foundation of China (41431177 and 41871300), the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD), China, the Innovation Project of State Key Laboratory of Resources and Environmental Information System (LREIS), China (O88RA20CYA), and the Outstanding Innovation Team in Colleges and Universities in Jiangsu Province, China. |
Corresponding Authors:
Correspondence QIN Cheng-zhi, Tel: +86-10-64888959, E-mail: qincz@lreis.ac.cn
|
About author:: LIANG Peng, E-mail: liangp@lreis.ac.cn; |
|
|
|
|
Adhikari K, Hartemink A E, Minasny B, Kheir R B, Greve M B, Greve M H. 2014. Digital mapping of soil organic carbon contents and stocks in Denmark. PLoS ONE, 9, e105519.
|
|
Behrens T, Zhu A X, Schmidt K, Scholten T. 2010. Multi-scale digital terrain analysis and feature selection for digital soil mapping. Geoderma, 155, 175-185.
|
|
Bishop T F, Minasny B. 2016. Digital soil-terrain modeling: the predictive potential and uncertainty. In: Grunwald S, ed., Environmental Soil-Landscape Modeling. CRC Press, Boca Raton. pp. 194-222.
|
|
Breiman L. 2001. Random Forests. Machine Learning, 45, 5-32.
|
|
de Carvalho Junior W, Lagacherie P, da Silva Chagas C, Calderano Filho B, Bhering S B. 2014. A regional-scale assessment of digital mapping of soil attributes in a tropical hillslope environment. Geoderma, 232-234, 479-486.
|
|
Chandrashekar G, Sahin F. 2014. A survey on feature selection methods. Computers & Electrical Engineering, 40, 16-28.
|
|
Derksen S, Keselman H J. 1992. Backward, forward and stepwise automated subset selection algorithms: Frequency of obtaining authentic and noise variables. British Journal of Mathematical and Statistical Psychology, 45, 265-282.
|
|
Dharumarajan S, Hegde R, Janani N, Singh S K. 2019. The need for digital soil mapping in India. Geoderma Regional, 16, e00204.
|
|
Dharumarajan S, Hegde R, Singh S K. 2017. Spatial prediction of major soil properties using Random Forest techniques - A case study in semi-arid tropics of South India. Geoderma Regional, 10, 154-162.
|
|
Ditzler C A. 2017. Soil properties and classification (soil taxonomy). In: West L T, Singer M J, Hartemink A E, eds., The Soils of the USA. Springer, Cham. pp. 29-41.
|
|
Fourcade Y, Besnard A G, Secondi J. 2018. Paintings predict the distribution of species, or the challenge of selecting environmental predictors and evaluation statistics. Global Ecology and Biogeography, 27, 245-256.
|
|
Greve M H, Kheir R B, Greve M B, Bøcher P K. 2012. Using digital elevation models as an environmental predictor for soil clay contents. Soil Science Society of America Journal, 76, 2116-2127.
|
|
Grimm R, Behrens T, Märker M, Elsenbeer H. 2008. Soil organic carbon concentrations and stocks on Barro Colorado Island - Digital soil mapping using Random Forests analysis. Geoderma, 146, 102-113.
|
|
Guyon I, Elisseeff A. 2003. An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157-1182.
|
|
Harris P T, Baker E K. 2011. Seafloor Geomorphology as Benthic Habitat: GeoHAB Atlas of Seafloor Geomorphic Features and Benthic Habitats. Elsevier, Amsterdam.
|
|
Hengl T, Heuvelink G B M, Kempen B, Leenaars J G B, Walsh M G, Shepherd K D, Sila A, MacMillan R A, Mendes de Jesus J, Tamene L, Tondoh J E. 2015. Mapping soil properties of africa at 250 m resolution: Random forests significantly improve current predictions. PLoS ONE, 10, e0125814.
|
|
Hengl T, Reuter H I. 2008. Geomorphometry: Concepts, Software, Applications. Elsevier, Amsterdam.
|
|
Hou Z W, Qin C Z, Zhu A X, Liang P, Wang Y J, Zhu Y Q. 2019. From manual to intelligent: A review of input data preparation methods for geographic modeling. ISPRS International Journal of Geo-Information, 8, 376.
|
|
Jiang J C, Zhu A X, Qin C Z, Zhu T, Liu J Z, Du F, Liu J, Zhang G M, An Y M. 2016. CyberSoLIM: A cyber platform for digital soil mapping. Geoderma, 263, 234-243.
|
|
Jiang J C, Zhu A X, Qin C Z, Liu J Z. 2019. A knowledge-based method for the automatic determination of hydrological model structures. Journal of Hydroinformatics, 21, 1163-1178.
|
|
Kaster D S, Medeiros C B, Rocha H V. 2005. Supporting modeling and problem solving from precedent experiences: The role of workflows and case-based reasoning. Environmental Modelling & Software, 20, 689-704.
|
|
Khoshgoftaar T M, Golawala M, Hulse J V. 2007. An empirical study of learning from imbalanced data using random forest. In: The 19th IEEE International Conference on Tools with Artificial Intelligence (ICTAI) 2007. IEEE, Greece. pp. 310-317.
|
|
Lagacherie P, Sneep A R, Gomez C, Bacha S, Coulouma G, Hamrouni M H, Mekki I. 2013. Combining Vis-NIR hyperspectral imagery and legacy measured soil profiles to map subsurface soil properties in a Mediterranean area (Cap-Bon, Tunisia). Geoderma, 209-210, 168-176.
|
|
Lark R M, Bishop T F A, Webster R. 2007. Using expert knowledge with control of false discovery rate to select regressors for prediction of soil properties. Geoderma, 138, 65-78.
|
|
Lecours V, Devillers R, Simms A E, Lucieer V L, Brown C J. 2017. Towards a framework for terrain attribute selection in environmental studies. Environmental Modelling & Software, 89, 19-30.
|
|
Liu F, Zhang G L, Sun Y J, Zhao Y G, Li D C. 2013. Mapping the three-dimensional distribution of soil organic matter across a subtropical hilly landscape. Soil Science Society of America Journal, 77, 1241-1253.
|
|
Lu Y Y, Liu F, Zhao Y G, Song X D, Zhang G L. 2019. An integrated method of selecting environmental covariates for predictive soil depth mapping. Journal of Integrative Agriculture, 18, 301-315.
|
|
Ma Y, Minasny B, Wu C. 2017. Mapping key soil properties to support agricultural production in Eastern China. Geoderma Regional, 10, 144-153.
|
|
McBratney A B, Mendonça Santos M L, Minasny B. 2003. On digital soil mapping. Geoderma, 117, 3-52.
|
|
Mosleh Z, Salehi M H, Jafari A, Borujeni I E, Mehnatkesh A. 2016. The effectiveness of digital soil mapping to predict soil properties over low-relief areas. Environmental Monitoring and Assessment, 188, 195.
|
|
Pahlavan-Rad M R, Akbarimoghaddam A. 2018. Spatial variability of soil texture fractions and pH in a flood plain (case study from eastern Iran). Catena, 160, 275-281.
|
|
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay É. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825-2830.
|
|
Qin C Z, Wu X W, Jiang J C, Zhu A X. 2016. Case-based knowledge formalization and reasoning method for digital terrain analysis-application to extracting drainage networks. Hydrology and Earth System Sciences, 20, 3379-3392.
|
|
Ramcharan A, Hengl T, Nauman T, Brungard C, Waltman S, Wills S, Thompson J. 2018. Soil property and class maps of the conterminous united states at 100-meter spatial resolution. Soil Science Society of America Journal, 82, 186-201.
|
|
Rossiter D G, Liu J, Carlisle S, Zhu A X. 2015. Can citizen science assist digital soil mapping? Geoderma, 259, 71-80.
|
|
Santra P, Kumar M, Panwar N. 2017. Digital soil mapping of sand content in arid western India through geostatistical approaches. Geoderma Regional, 9, 56-72.
|
|
Shi J J, Yang L, Zhu A X, Qin C Z, Liang P, Zeng C Y, Pei T. 2018. Machine-learning variables at different scales vs. knowledge-based variables for mapping multiple soil properties. Soil Science Society of America Journal, 82, 645-656.
|
|
Vaysse K, Lagacherie P. 2015. Evaluating Digital Soil Mapping approaches for mapping GlobalSoilMap soil properties from legacy data in Languedoc-Roussillon (France). Geoderma Regional, 4, 20-30.
|
|
Wang S, Zhuang Q, Jia S, Jin X, Wang Q. 2018. Spatial variations of soil organic carbon stocks in a coastal hilly area of China. Geoderma, 314, 8-19.
|
|
Zhang G L, Liu F, Song X D. 2017. Recent progress and future prospect of digital soil mapping: A review. Journal of Integrative Agriculture, 16, 2871-2885.
|
|
Zhou Y, Biswas A, Ma Z, Lu Y, Chen Q, Shi Z. 2016. Revealing the scale-specific controls of soil organic matter at large scale in Northeast and North China Plain. Geoderma, 271, 71-79.
|
|
Zhu A X. 2006. SoLIM solutions. [2019-01-10]. http://solim.geography.wisc.edu/software/index.htm
|
|
Zhu A X, Liu F, Li B L, Pei T, Qin C Z, Liu G H, Wang Y J, Chen Y N, Ma X W, Qi F, Zhou C H. 2010. Differentiation of soil conditions over low relief areas using feedback dynamic patterns. Soil Science Society of America Journal, 74, 861-869.
|
|
Zhu A X, Liu J, Du F, Zhang S J, Qin C Z, Burt J, Behrens T, Scholten T. 2015. Predictive soil mapping with limited sample data. European Journal of Soil Science, 66, 535-547.
|
|
Zhu A X, Lu G, Liu J, Qin C Z, Zhou C. 2018. Spatial prediction based on Third Law of Geography. Annals of GIS, 24, 225-240.
|
|
Zhu A X, Wang R X, Qiao J P, Qin C Z, Chen Y B, Liu J, Du F, Lin Y, Zhu T X. 2014. An expert knowledge-based approach to landslide susceptibility mapping using GIS and fuzzy logic. Geomorphology, 214, 128-138.
|
|
Zhu Y, Yang J. 2019. Automatic data matching for geospatial models: A new paradigm for geospatial data and models sharing. Annals of GIS, 25, 283-298.
|
|
|
|