Please wait a minute...
Journal of Integrative Agriculture  2012, Vol. 11 Issue (5): 752-759    DOI: 10.1016/S1671-2927(00)8596
SECTION 2: Theory, Technology and Method Advanced Online Publication | Current Issue | Archive | Adv Search |
Agricultural Ontology Based Feature Optimization for Agricultural Text Clustering
 SU Ya-ru, WANG Ru-jing, CHEN Peng, WEI Yuan-yuan, LI Chuan-xi
1.Institute of Intelligent Machines, Chinese Academy of Sciences, Hefei 230031, P.R.China
2.School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, P.R.China
Download:  PDF in ScienceDirect  
Export:  BibTeX | EndNote (RIS)      
摘要  Feature optimization is important to agricultural text mining. Usually, the vector space model is used to represent text documents. However, this basic approach still suffers from two drawbacks: the curse of dimension and the lack of semantic information. In this paper, a novel ontology-based feature optimization method for agricultural text was proposed. First, terms of vector space model were mapped into concepts of agricultural ontology, which concept frequency weights are computed statistically by term frequency weights; second, weights of concept similarity were assigned to the concept features according to the structure of the agricultural ontology. By combining feature frequency weights and feature similarity weights based on the agricultural ontology, the dimensionality of feature space can be reduced drastically. Moreover, the semantic information can be incorporated into this method. The results showed that this method yields a significant improvement on agricultural text clustering by the feature optimization.

Abstract  Feature optimization is important to agricultural text mining. Usually, the vector space model is used to represent text documents. However, this basic approach still suffers from two drawbacks: the curse of dimension and the lack of semantic information. In this paper, a novel ontology-based feature optimization method for agricultural text was proposed. First, terms of vector space model were mapped into concepts of agricultural ontology, which concept frequency weights are computed statistically by term frequency weights; second, weights of concept similarity were assigned to the concept features according to the structure of the agricultural ontology. By combining feature frequency weights and feature similarity weights based on the agricultural ontology, the dimensionality of feature space can be reduced drastically. Moreover, the semantic information can be incorporated into this method. The results showed that this method yields a significant improvement on agricultural text clustering by the feature optimization.
Keywords:  agricultural ontology      feature optimization      agricultural text clustering  
Received: 14 July 2011   Accepted:
Fund: 

This research was supported by the National Natural Science Foundation of China (60774096) and the National High- Tech R&D Program of China (2008BAK49B05).

Corresponding Authors:  Correspondence WANG Ru-jing, Tel: +86-551-5592968, E-mail: rjwang@iim.ac.cn; CHEN Peng, E-mail: pchen@iim.ac.cn     E-mail:  rjwang@iim.ac.cn
About author:  SU Ya-ru, smomo@mail.ustc.edu.cn

Cite this article: 

SU Ya-ru, WANG Ru-jing, CHEN Peng, WEI Yuan-yuan, LI Chuan-xi. 2012. Agricultural Ontology Based Feature Optimization for Agricultural Text Clustering. Journal of Integrative Agriculture, 11(5): 752-759.

[1]Abdi H, Williams L J. 2010. Principal component analysis. Wiley Interdisciplinary reviews: Computational Statistics, 2, 433-459.

[2]Batet M, Sanchez D, Valls A. 2010. An ontology-based measure to compute semantic similarity in biomedicine. Journal of Biomedical Informatics, 44, 118-125.

[3]Berry M W. 2003. Survey of Text Mining: Clustering, Classification, and Retrieval (Hardcover). Springer-Verlag Berlin, Heidelberg. Bloehdorn S, Cimiano P, Hotho A, Staab S. 2005. An ontology-based framework for text mining. GLDVJournal for Computational Linguistics and Language Technology, 20, 87-112.

[4]Chen R C, Chuang C H. 2008. Automating construction of a domain ontology using a projective adaptive resonance theory neural network and Bayesian network. Expert Systems, 25, 414-430.

[5]Chua S, Kulathuramaiyer N. 2004. Semantic feature selection using wordNet. In: Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence. Beijing, China. pp. 166-172.

[6]Dollah R B, Aono M. 2011. Ontology based approach for classifying biomedical text abstracts. International Journal of Data Engineering (IJDE), 2, 84. Gruber T R. 1993. A translation approach to portable ontology specifications. Knowledge Acquisition, 5, 199-220.

[7]He D, Wu X D. 2006. Ontology-based feature weighting for biomedical literature classification. In: Proceedings of the 2006 IEEE International Conference on Information Reuse and Integration. Hawaii, USA. pp. 280-285.

[8]He X, Niyogi P. 2004. Locality preserving projections. In: Proceedings of the NIPS, Advances in Neural Information Processing Systems. MIT Press, Vancouver. p. 103. Hotho A, Maedche A, Staab S. 2002. Ontology-based text document clustering. In: Proceedings of the IJCAI-2001 Workshop “Text Learning: Beyond Supervision”. Seattle, USA. pp. 48-54.

[9]Hu F, Zhang Y. 2010. Text mining based on domain ontology. In: Proceedings of the International Conference on E-Business and E-Government. Guangzhou, China. pp. 1456-1459. [10]Jing L, Zhou L, Ng M K, Huang J Z. 2006. Ontology-based distance measure for text clustering. In: Workshop on Text Mining, SIAM International Conference on Data Mining. Bethesda, Maryland, USA. Kawtrakul A. 2012. Ontology engineering and knowledge services for agriculture domain. Journal of Integrative Agriculture, 11, 741-751.

[11]Khan L, Luo F. 2002. Ontology construction for information selection. In: Proceedings of 14th IEEE International Conference on Tools with Artificial Intelligence. Washington, D.C., USA. pp. 122-127.

[12]Kuo C C, Ma K Y. 1998. Error analysis and confidence measure of Chinese word segmentation. In: The 5th International Conference on Spoken Language Processing. Sydney, Australia. Liu Y, Wang X, Wu C. 2008. ConSOM: A conceptional self-organizing map model for text clustering. Neurocomputing, 71, 857-862.

[13]Moravec P, Kolovrat M, Snasel V. 2004. LSI vs. Wordnet ontology in dimension reduction for information retrieval. DATESO, Cerna Ricka, Czech Republic. pp. 254-259.

[14]Nyberg K, Raiko T, Tinanen T, Hyvnen E. 2010. Document classification utilising ontologies and relations between documents. In: Proceedings of the 8th Workshop on Mining and Learning with Graphs. Washington, D.C., USA. pp. 86-93.

[15]Solka J L. 2008. Text data mining: theory and methods. Statistics Surveys, 2, 94-112.

[16]Wang B B, McKay R I, Abbass H A, Barlow M. 2003. A comparative study for domain ontology guided feature extraction. In: Proceedings of the 26th Australian Computer Science Conference. Australian Computer Society, Adelaide, South Australia. pp. 69-78.

[17]Wei Y Y, Wang R J, Hu Y M, Wang X. 2012. From web resources to agricultural ontology: a method for semiautomatic construction. Journal of Integrative Agriculture, 11, 775-783.

[18]Weng S S, Tsai H J, Liu S C, Hsu C H. 2006. Ontology construction for information classification. Expert Systems with Applications, 31, 1-12.

[19]Wu S H, Hsu W L. 2002. SOAT: a semi-automatic domain ontology acquisition tool from Chinese corpus. In: Proceedings of the 19th International Conference on Computational Linguistics. Howrd International House and Academia Sinica, Taipei, Taiwan, China. pp. 1-5.

[20]Wu S H, Tsai T H, Hsu W L. 2003. Text categorization using automatically acquired domain ontology. In: The 6th International Workshop on Information Retrieval with Asian Languages. Sappora, Japan. pp. 138-145.

[21]Zhang D, Jing X Y, Yang J. 2006. Linear discriminant analysis. Biometric Image Discrimination Technologies: Compututional Intelligence and Its Applications Series. IgI Global, Hershey, Pennsylvania, USA. pp. 41-64.

[22]Zhang X D, Jing L P, Hu X H, Ng M, Xia J L, Zhou X H. 2008. Medical document clustering using ontologybased term similarity measures. International Journal of Data Warehousing and Mining, 4, 62-73.

[23]Zhang X D, Jing L P, Hu X H, Ng M, Xia J L, Zhou X H. 2007. A comparative study of ontology based term similarity measures on PubMed document clustering. In: Advances in Databases: Concepts, Systems and Applications. Springer-Verlag Berlin, Heidelberg. pp. 115-126.
[1] XIAN Guo-jian, ZHAO Rui-xue. A Review and Prospects on Collaborative Ontology Editing Tools[J]. >Journal of Integrative Agriculture, 2012, 11(5): 731-740.
[2] WEI Yuan-yuan, WANG Ru-jing, HU Yi-min, WANG Xue. From Web Resources to Agricultural Ontology: a Method for Semi-Automatic Construction[J]. >Journal of Integrative Agriculture, 2012, 11(5): 775-783.
[3] LI Chuan-xi, SU Ya-ru, WANG Ru-jing, WEI Yuan-yuan, HUANG He. Structured AJAX Data Extraction Based on Agricultural Ontology[J]. >Journal of Integrative Agriculture, 2012, 11(5): 784-791.
No Suggested Reading articles found!