? Automatic extraction and structuration of soil–environment relationship information from soil survey reports
JIA
      
Quick Search in JIA      Advanced Search  
    2019, Vol. 18 Issue (02): 328-339     DOI: 10.1016/S2095-3119(18)62071-4
Special focus: Digital mapping in agriculture and environment Current Issue | Next Issue | Archive | Adv Search  |   
Automatic extraction and structuration of soil–environment relationship information from soil survey reports
WANG De-sheng1, 2, 3, LIU Jun-zhi1, 2, 3, ZHU A-xing1, 2, 3, 4, 5, WANG Shu1, 2, 3, ZENG Can-ying1, 2, 3, MA Tian-wu1, 2, 3
1 Key Laboratory of Virtual Geographic Environment, Nanjing Normal University, Nanjing 210023, P.R.China
2 State Key Laboratory Cultivation Base of Geographical Environment Evolution (Jiangsu Province), Nanjing 210023, P.R.China
3 Jiangsu Center for Collaborative Innovation in Geographic Information Resource Development and Application, Nanjing 210023, P.R.China
4 State Key Laboratory of Resources and Environmental Information System, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, P.R.China
5 Department of Geography, University of Wisconsin-Madison, Madison, WI 53706, USA
 Download: PDF in ScienceDirect (1092 KB)   HTML (1 KB)   Export: BibTeX | EndNote (RIS)      Supporting Info
Abstract 
In addition to soil samples, conventional soil maps, and experienced soil surveyors, text about soils (e.g., soil survey reports) is an important potential data source for extracting soil–environment relationships.  Considering that the words describing soil–environment relationships are often mixed with unrelated words, the first step is to extract the needed words and organize them in a structured way.  This paper applies natural language processing (NLP) techniques to automatically extract and structure information from soil survey reports regarding soil–environment relationships.  The method includes two steps: (1) construction of a knowledge frame and (2) information extraction using either a rule-based method or a statistic-based method for different types of information.  For uniformly written text information, the rule-based approach was used to extract information.  These types of variables include slope, elevation, accumulated temperature, annual mean temperature, annual precipitation, and frost-free period.  For information contained in text written in diverse styles, the statistic-based method was adopted.  These types of variables include landform and parent material.  The soil species of China soil survey reports were selected as the experimental dataset.  Precision (P), recall (R), and F1-measure (F1) were used to evaluate the performances of the method.  For the rule-based method, the P values were 1, the R values were above 92%, and the F1 values were above 96% for all the involved variables.  For the method based on the conditional random fields (CRFs), the P, R and F1 values for the parent material were, respectively, 84.15, 83.13, and 83.64%; the values for landform were 88.33, 76.81, and 82.17%, respectively.  To explore the impact of text types on the performance of the CRFs-based method, CRFs models were trained and validated separately by the descriptive texts of soil types and typical profiles.  For parent material, the maximum F1 value for the descriptive text of soil types was 90.7%, while the maximum F1 value for the descriptive text of soil profiles was only 75%.  For landform, the maximum F1 value for the descriptive text of soil types was 85.33%, which was similar to that of the descriptive text of soil profiles (i.e., 85.71%).  These results suggest that NLP techniques are effective for the extraction and structuration of soil–environment relationship information from a text data source.
Service
E-mail this article
Add to my bookshelf
Add to citation manager
E-mail Alert
RSS
Articles by authors
WANG De-sheng
LIU Jun-zhi
ZHU A-xing
WANG Shu
ZENG Can-ying
MA Tian-wu
Key wordssoil–environment relationship     text     natural language processing     extraction     structuration     
Received: 2018-01-02; Accepted: 2018-07-09
Fund: This study is supported by the National Natural Science Foundation of China (41431177 and 41601413), the National Basic Research Program of China (2015CB954102), the Natural Science Research Program of Jiangsu Province, China (BK20150975 and 14KJA170001), and the Outstanding Innovation Team in Colleges and Universities in Jiangsu Province, China.
Corresponding Authors: Correspondence LIU Jun-zhi, E-mail: liujunzhi@njnu.edu.cn; ZHU A-xing, E-mail: azhu@wisc.edu    
About author: WANG De-sheng, E-mail: desheng.5@163.com
Cite this article:   
WANG De-sheng, LIU Jun-zhi, ZHU A-xing, WANG Shu, ZENG Can-ying, MA Tian-wu. 2019. Automatic extraction and structuration of soil–environment relationship information from soil survey reports. Journal of Integrative Agriculture, 18(02): 328-339.
URL:  
http://www.chinaagrisci.com/Jwk_zgnykxen/EN/ 10.1016/S2095-3119(18)62071-4      or     http://www.chinaagrisci.com/Jwk_zgnykxen/EN/Y2019/V18/I02/328
 
No references of article
[1] YANG Xin-ran, Kevin Z. Chen, KONG Xiang-zhi. Factors affecting the adoption of on-farm milk safety measures in Northern China - An examination from the perspective of farm size and production type[J]. Journal of Integrative Agriculture, 2019, 18(02): 471-481.
Copyright © 2015 ChinaAgriSci.com, All Rights Reserved
Chinese Academy of Agricultural Sciences (CAAS) No. 12 South Street, Zhongguancun, Beijing 100081, P. R. China
http://www.ChinaAgriSci.com   JIA E-mail: jia_journal@caas.cn