Journal of Integrative Agriculture ›› 2012, Vol. 11 ›› Issue (5): 752-759.DOI: 10.1016/S1671-2927(00)8596

• 论文 • 上一篇    下一篇

Agricultural Ontology Based Feature Optimization for Agricultural Text Clustering

 SU Ya-ru, WANG Ru-jing, CHEN Peng, WEI Yuan-yuan, LI Chuan-xi   

  1. 1.Institute of Intelligent Machines, Chinese Academy of Sciences, Hefei 230031, P.R.China
    2.School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, P.R.China
  • 收稿日期:2011-07-14 出版日期:2012-05-01 发布日期:2012-07-18
  • 通讯作者: Correspondence WANG Ru-jing, Tel: +86-551-5592968, E-mail: rjwang@iim.ac.cn; CHEN Peng, E-mail: pchen@iim.ac.cn
  • 作者简介:SU Ya-ru, smomo@mail.ustc.edu.cn
  • 基金资助:

    This research was supported by the National Natural Science Foundation of China (60774096) and the National High- Tech R&D Program of China (2008BAK49B05).

Agricultural Ontology Based Feature Optimization for Agricultural Text Clustering

 SU Ya-ru, WANG Ru-jing, CHEN Peng, WEI Yuan-yuan, LI Chuan-xi   

  1. 1.Institute of Intelligent Machines, Chinese Academy of Sciences, Hefei 230031, P.R.China
    2.School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, P.R.China
  • Received:2011-07-14 Online:2012-05-01 Published:2012-07-18
  • Contact: Correspondence WANG Ru-jing, Tel: +86-551-5592968, E-mail: rjwang@iim.ac.cn; CHEN Peng, E-mail: pchen@iim.ac.cn
  • About author:SU Ya-ru, smomo@mail.ustc.edu.cn
  • Supported by:

    This research was supported by the National Natural Science Foundation of China (60774096) and the National High- Tech R&D Program of China (2008BAK49B05).

摘要: Feature optimization is important to agricultural text mining. Usually, the vector space model is used to represent text documents. However, this basic approach still suffers from two drawbacks: the curse of dimension and the lack of semantic information. In this paper, a novel ontology-based feature optimization method for agricultural text was proposed. First, terms of vector space model were mapped into concepts of agricultural ontology, which concept frequency weights are computed statistically by term frequency weights; second, weights of concept similarity were assigned to the concept features according to the structure of the agricultural ontology. By combining feature frequency weights and feature similarity weights based on the agricultural ontology, the dimensionality of feature space can be reduced drastically. Moreover, the semantic information can be incorporated into this method. The results showed that this method yields a significant improvement on agricultural text clustering by the feature optimization.

关键词: agricultural ontology, feature optimization, agricultural text clustering

Abstract: Feature optimization is important to agricultural text mining. Usually, the vector space model is used to represent text documents. However, this basic approach still suffers from two drawbacks: the curse of dimension and the lack of semantic information. In this paper, a novel ontology-based feature optimization method for agricultural text was proposed. First, terms of vector space model were mapped into concepts of agricultural ontology, which concept frequency weights are computed statistically by term frequency weights; second, weights of concept similarity were assigned to the concept features according to the structure of the agricultural ontology. By combining feature frequency weights and feature similarity weights based on the agricultural ontology, the dimensionality of feature space can be reduced drastically. Moreover, the semantic information can be incorporated into this method. The results showed that this method yields a significant improvement on agricultural text clustering by the feature optimization.

Key words: agricultural ontology, feature optimization, agricultural text clustering