中国农业科学 ›› 2011, Vol. 44 ›› Issue (23): 4833-4840.doi: 10.3864/j.issn.0578-1752.2011.23.009

• 土壤肥料·节水灌溉·农业生态环境 • 上一篇    下一篇

聚类、粗糙集与决策树的组合算法在地力评价中的应用

陈桂芬, 马丽, 董玮, 辛敏刚   

  1. 1.吉林农业大学信息技术学院,长春 130118
    2.吉林省农安县农业技术推广总站,吉林农安 130200
  • 收稿日期:2010-07-26 出版日期:2011-12-01 发布日期:2011-06-14
  • 通讯作者: 陈桂芬,Tel:0431-84532775,E-mail:guifchen@163.com
  • 作者简介:陈桂芬,Tel:0431-84532775,E-mail:guifchen@163.com
  • 基金资助:

    国家“863”计划项目(2006AA10A309,2006AA10Z271)、国家星火计划项目(2008GA661003)

Applied Research of Combinatorial Algorithm of Clustering,Rough Set and Decision Tree Method in Productivity Evaluation

 CHEN  Gui-Fen, MA  Li, DONG  Wei, XIN  Min-Gang   

  1. 1.吉林农业大学信息技术学院,长春 130118
    2.吉林省农安县农业技术推广总站,吉林农安 130200
  • Received:2010-07-26 Online:2011-12-01 Published:2011-06-14

摘要: 【目的】地力评价方法大多数有一定的主观性,较少考虑土壤各属性间的依赖关系。论文旨在采用数据挖掘方法,寻求地力等级划分的新方法。【方法】结合农安县耕地调查数据,应用K-means聚类方法、Johnson粗糙集属性约简算法与C4.5决策树算法相结合的优化算法评价地力等级。【结果】使用K-means聚类方法,得到最佳学习样本数;使用粗糙集属性约简和决策树相结合的方法,去掉了冗余属性7个,决策树模型共有节点317个,其中叶节点个数为159个,生成规则159条,模型准确率为82.08%。与未聚类和未约简的方法相比,决策树结点个数减少41.62%。【结论】使用该组合算法,在保证模型准确率的同时,降低了算法的时间和空间复杂性,提高了挖掘效率。

关键词: 聚类, 粗糙集, 决策树, 土壤评价, 地力等级

Abstract: 【Objective】 Fertility evaluation method has a certain subjective and less considers the dependence relation among soil attributes. This paper is aimed to seek a new method of productivity evaluation by data mining method. 【Method】 Based on Nong’an cultivated land survey data, the paper used optimization algorithm of K-means clustering method, Johnson rough set attribute reduction algorithm and C4.5 decision tree algorithm to evaluate the productivity grade. 【Result】 The best learning samples are obtained by using K-means clustering method. Rough sets are used in soil attribute reduction, and 7 soil redundant attributes are removed. The decision tree model has 317 nodes and 159 leaf nodes, extracts 159 rules, model accuracy is 82.08%. The decision tree node number decreased by 41.62% compared with no-clustering and no-reduction approaches. 【Conclusion】 Using the combination algorithm, while the accuracy of the model is ensured, the algorithm time and space complexity are reduced and the mining efficiency is improved.

Key words: clustering, rough set, decision tree, soil evaluation, productivity grade