Journal of Integrative Agriculture ›› 2012, Vol. 11 ›› Issue (5): 784-791.DOI: 10.1016/S1671-2927(00)8600

• 论文 • 上一篇    下一篇

Structured AJAX Data Extraction Based on Agricultural Ontology

 LI Chuan-xi, SU Ya-ru, WANG Ru-jing, WEI Yuan-yuan, HUANG He   

  1. 1.Institute of Intelligent Machines, Chinese Academy of Sciences, Hefei 230031, P.R.China
    2.School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, P.R.China
  • 收稿日期:2011-06-28 出版日期:2012-05-01 发布日期:2012-07-18
  • 通讯作者: Correspondence WANG Ru-jing, Tel: +86-551-5592968, E-mail: rjwang@iim.ac.cn
  • 作者简介:LI Chuan-xi, Mobile: 13515518016, E-mail: Ben04@mail.ustc.edu.cn
  • 基金资助:

    This research was supported by the Knowledge Innovation Program of the Chinese Academy of Sciences and the National High-Tech R&D Program of China (2008BAK49B05).

Structured AJAX Data Extraction Based on Agricultural Ontology

 LI Chuan-xi, SU Ya-ru, WANG Ru-jing, WEI Yuan-yuan, HUANG He   

  1. 1.Institute of Intelligent Machines, Chinese Academy of Sciences, Hefei 230031, P.R.China
    2.School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, P.R.China
  • Received:2011-06-28 Online:2012-05-01 Published:2012-07-18
  • Contact: Correspondence WANG Ru-jing, Tel: +86-551-5592968, E-mail: rjwang@iim.ac.cn
  • About author:LI Chuan-xi, Mobile: 13515518016, E-mail: Ben04@mail.ustc.edu.cn
  • Supported by:

    This research was supported by the Knowledge Innovation Program of the Chinese Academy of Sciences and the National High-Tech R&D Program of China (2008BAK49B05).

摘要: More web pages are widely applying AJAX (Asynchronous JavaScript XML) due to the rich interactivity and incremental communication. By observing, it is found that the AJAX contents, which could not be seen by traditional crawler, are well-structured and belong to one specific domain generally. Extracting the structured data from AJAX contents and annotating its semantic are very significant for further applications. In this paper, a structured AJAX data extraction method for agricultural domain based on agricultural ontology was proposed. Firstly, Crawljax, an open AJAX crawling tool, was overridden to explore and retrieve the AJAX contents; secondly, the retrieved contents were partitioned into items and then classified by combining with agricultural ontology. HTML tags and punctuations were used to segment the retrieved contents into entity items. Finally, the entity items were clustered and the semantic annotation was assigned to clustering results according to agricultural ontology. By experimental evaluation, the proposed approach was proved effectively in resource exploring, entity extraction, and semantic annotation.

关键词: information extraction, structured data, AJAX, agricultural ontology, semantic annotation

Abstract: More web pages are widely applying AJAX (Asynchronous JavaScript XML) due to the rich interactivity and incremental communication. By observing, it is found that the AJAX contents, which could not be seen by traditional crawler, are well-structured and belong to one specific domain generally. Extracting the structured data from AJAX contents and annotating its semantic are very significant for further applications. In this paper, a structured AJAX data extraction method for agricultural domain based on agricultural ontology was proposed. Firstly, Crawljax, an open AJAX crawling tool, was overridden to explore and retrieve the AJAX contents; secondly, the retrieved contents were partitioned into items and then classified by combining with agricultural ontology. HTML tags and punctuations were used to segment the retrieved contents into entity items. Finally, the entity items were clustered and the semantic annotation was assigned to clustering results according to agricultural ontology. By experimental evaluation, the proposed approach was proved effectively in resource exploring, entity extraction, and semantic annotation.

Key words: information extraction, structured data, AJAX, agricultural ontology, semantic annotation