Please wait a minute...
Journal of Integrative Agriculture  2012, Vol. 11 Issue (5): 784-791    DOI: 10.1016/S1671-2927(00)8600
SECTION 2: Theory, Technology and Method Advanced Online Publication | Current Issue | Archive | Adv Search |
Structured AJAX Data Extraction Based on Agricultural Ontology
 LI Chuan-xi, SU Ya-ru, WANG Ru-jing, WEI Yuan-yuan, HUANG He
1.Institute of Intelligent Machines, Chinese Academy of Sciences, Hefei 230031, P.R.China
2.School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, P.R.China
Download:  PDF in ScienceDirect  
Export:  BibTeX | EndNote (RIS)      
摘要  More web pages are widely applying AJAX (Asynchronous JavaScript XML) due to the rich interactivity and incremental communication. By observing, it is found that the AJAX contents, which could not be seen by traditional crawler, are well-structured and belong to one specific domain generally. Extracting the structured data from AJAX contents and annotating its semantic are very significant for further applications. In this paper, a structured AJAX data extraction method for agricultural domain based on agricultural ontology was proposed. Firstly, Crawljax, an open AJAX crawling tool, was overridden to explore and retrieve the AJAX contents; secondly, the retrieved contents were partitioned into items and then classified by combining with agricultural ontology. HTML tags and punctuations were used to segment the retrieved contents into entity items. Finally, the entity items were clustered and the semantic annotation was assigned to clustering results according to agricultural ontology. By experimental evaluation, the proposed approach was proved effectively in resource exploring, entity extraction, and semantic annotation.

Abstract  More web pages are widely applying AJAX (Asynchronous JavaScript XML) due to the rich interactivity and incremental communication. By observing, it is found that the AJAX contents, which could not be seen by traditional crawler, are well-structured and belong to one specific domain generally. Extracting the structured data from AJAX contents and annotating its semantic are very significant for further applications. In this paper, a structured AJAX data extraction method for agricultural domain based on agricultural ontology was proposed. Firstly, Crawljax, an open AJAX crawling tool, was overridden to explore and retrieve the AJAX contents; secondly, the retrieved contents were partitioned into items and then classified by combining with agricultural ontology. HTML tags and punctuations were used to segment the retrieved contents into entity items. Finally, the entity items were clustered and the semantic annotation was assigned to clustering results according to agricultural ontology. By experimental evaluation, the proposed approach was proved effectively in resource exploring, entity extraction, and semantic annotation.
Keywords:  information extraction      structured data      AJAX      agricultural ontology      semantic annotation  
Received: 28 June 2011   Accepted:
Fund: 

This research was supported by the Knowledge Innovation Program of the Chinese Academy of Sciences and the National High-Tech R&D Program of China (2008BAK49B05).

Corresponding Authors:  Correspondence WANG Ru-jing, Tel: +86-551-5592968, E-mail: rjwang@iim.ac.cn     E-mail:  rjwang@iim.ac.cn
About author:  LI Chuan-xi, Mobile: 13515518016, E-mail: Ben04@mail.ustc.edu.cn

Cite this article: 

LI Chuan-xi, SU Ya-ru, WANG Ru-jing, WEI Yuan-yuan, HUANG He. 2012. Structured AJAX Data Extraction Based on Agricultural Ontology. Journal of Integrative Agriculture, 11(5): 784-791.

[1]Berson T A. 1993. Differential cryptanalysis mod 232 with applications to MD5. In: Proceedings of the 11th Annual International Conference on Theory and Application of Cryptographic Techniques. Springer-Verlag Berlin, Heidelber. pp. 71-80.

[2]Buitelaar P, Cimiano P, Frank A, Hartung M, Racioppa S. 2008. Ontology-based information extraction and integration from heterogeneous data sources. International Journal of Human Computer Studies, 66, 759-788.

[3]Carlson A, Schafer C. 2008. Bootstrapping information extraction from semi-structured web pages. ECML/ PKDD, 5122, 195-210.

[4]Cohen W W, Hurst M, Jensen L S. 2002. A flexible learning system for wrapping tables and lists in HTML documents. In: Proceedings of the 11th International Conference on World Wide Web. Honolulu, Hawaii, USA. pp. 232-241.

[5]Crescenzi V, Mecca G, Merialdo P. 2001. Road runner: towards automatic data extraction from large web sites. In: Proceedings of the 27th International Conference on Very Large Data Bases. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. pp. 109-118.

[6]Cui Y P. 2009. Agricultural Ontology-Based Knowledge Management Key Technologies Research. China Agricultural Science and Technology Press, Beijing. (in Chinese)

[7]Diligenti M, Coetzee F, Lawrence S, Giles L, Gori M. 2000. Focused crawling using context graphs. In: Proceedings of 26th International Conference on Very Large Data Bases. Cairo, Egypt. pp. 527-534.

[8]Ester M, Kriegel H, Sander J, Xu X. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In: The 2nd International Conference on Knowledge Discovery and Data Mining. AAAI Press, California, USA. pp. 226-231.

[9]Frey G. 2007. Indexing AJAX web applications. MSc thesis, Institute of Computational Sciences, Switzerland. Garrett J. 2005. Ajax: a new approach to web applications. [2009-04-10]. http://adaptivepath.com/ideas/ajax-newapproach-web-applications

[10]Gregg D, Walczak S. 2006. Adaptive web information extraction. Communications of the ACM, 49, 78-84. Huang Y Q, Cui W H, Zhang Y J, Deng G Y. 2012. Research on development of agricultural geographic information ontology. Journal of Integrative Agriculture, 11, 865-877.

[11]ICTCLAS. 2010. [2009-05-10]. http://ictclas.org/ Joachims T, Sander R, Ester M, Kriegel H P, Xu X. 1998. Density-based clustering in spatial databases: the algorithm GDBSCAN and its applications. Data Mining and Knowledge Discovery, 2, 169-194.

[12]JTidy. 2010. [2010-04-20]. http://jtidy.sourceforge.net Li D, Zhao C. 2009. Computer and computing technologies in agriculture II. vol. 1. In: Proceedings of the 2nd IFIP International Conference on Computer and Computing Technologies in Agriculture. Springer, Beijing, China.

[13]Liu L, Pu C, Han W. 2000. XWRAP: an XML-enabled wrapper construction system for web information sources. In: Proceedings of the 16th International Conference on Data Engineering. SanDiego, CA, USA. pp. 611-621.

[14]Lukose D. 2012. World-wide semantic web of agriculture knowledge. Journal of Integrative Agriculture, 11, 769-774.

[15]Marchetto A, Tonella P, Ricca F. 2008. State-based testing of Ajax web applications. In: Proceedings of the 1st IEEE International Conference on Software Testing, Verification and Validation. Lillehammer, Norway. pp. 121-130.

[16]MD5. 2010. [2010-02-10]. http://en.wikipedia.org/wiki/MD5 Menczer F, Pant G, Srinivasan P, Ruiz M E. 2001. Evaluating topic-driven web crawlers. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New Orleans, Louisiana, USA. pp. 241-249.

[17]Mesbah A, Bozdag E, Deursen A V. 2008. Crawling AJAX by inferring user interface state changes. In: Proceedings of the 8th International Conference on Web Engineering. New Jersey, USA. pp. 122-134.

[18]Mesbah A, Deursen A V. 2007. An architectural style for Ajax. In: Proceedings of the 6th Working IEEE/IFIP Conference on Software Architecture. Mumbai, India. p. 9. Mohan S. 2010. Indexing Web 2.0 applications. MSc thesis, Oregon State University, USA.

[19]Qian P, Zheng Y L. 2006. Agricultural Ontology Research and Application (fine). China Agricultural Science and Technology Press, Beijing. (in Chinese)

[20]Rick C. 2000. Efficient computation of all longest common subsequences. In: Proceedings of the 7th Scandinavian Workshop on Algorithm Theory Springer-Verlag. London, UK. pp. 407-418.

[21]Roest D, Mesbah A, Deursen A V. 2010. Regression testing ajax applications: coping with dynamism. In: Proceedings of the 3rd International Conference on Software Testing, Verification and Validation. Paris, French. pp. 127-136.

[22]Russell S, Norvig P. 2002. Artificial Intelligence: A Modern Approach. Prentice Hall. New Jersey, USA. Shchekotykhin K, Jannach D, Friedrich G. 2010. xCrawl: a high-recall crawling method for web mining. Knowledge and Information Systems, 25, 303-326.

[23]Song R, Liu H, Wen J, Ma W. 2004. Learning block importance models for web pages. In: Proceedings of the 13th International Conference on World Wide Web. New York, USA. pp. 203-211.

[24]Tian X. 2009. Extracting structured data from Ajax site. In: Proceedings of the 1st International Workshop on Database Technology and Applications. Wuhan, China. pp. 259-262.

[25]Yi W G, Yan L W, Liu Y Q, Liu Z. 2010. An ontology-based web information extraction approach. In: Proceedings of the 2nd International Conference on Future Computer and Communication. Wuhan, China. p. 1. Wei Y Y, Wang R J, Hu Y M, Wang X. 2012. From web resources to agricultural ontology: a method for semi-automatic construction. Journal of Integrative Agriculture, 11, 775-783.

[26]Wimalasuriya D C, Dou D. 2010. Ontology-based information extraction: an introduction and a survey of current approaches. Journal of Information Science, 36, 306-323.

[27]Zhai Y, Liu B. 2005. Web data extraction based on partial tree alignment. In: Proceedings of the 14th International Conference on World Wide Web. Chiba, Japan. pp. 76-85.

[28]Zhao H, Meng W, Yu C. 2006. Automatic extraction of dynamic record sections from search engine result pages. In: Proceedings of the 32nd International Conference on Very Large Data Bases. Seoul, Korea. pp. 989-1000.
[1] XIAN Guo-jian, ZHAO Rui-xue. A Review and Prospects on Collaborative Ontology Editing Tools[J]. >Journal of Integrative Agriculture, 2012, 11(5): 731-740.
[2] SU Ya-ru, WANG Ru-jing, CHEN Peng, WEI Yuan-yuan, LI Chuan-xi. Agricultural Ontology Based Feature Optimization for Agricultural Text Clustering[J]. >Journal of Integrative Agriculture, 2012, 11(5): 752-759.
[3] WEI Yuan-yuan, WANG Ru-jing, HU Yi-min, WANG Xue. From Web Resources to Agricultural Ontology: a Method for Semi-Automatic Construction[J]. >Journal of Integrative Agriculture, 2012, 11(5): 775-783.
No Suggested Reading articles found!