The Proposed Algorithm for Semi-Structured Data Integration: Case Study of Setiu Wetland Data Set


  • Mustafa Man School of Informatics and Applied Mathematics, Universiti Malaysia Terengganu, 21030 Kuala Nerus, Terengganu, Malaysia.
  • Ily Amalina Ahmad Sabri School of Informatics and Applied Mathematics, Universiti Malaysia Terengganu, 21030 Kuala Nerus, Terengganu, Malaysia.


Document Object Model, JSON, SemiStructured Data, WEIDJ,


Recent evolutions in web technology and computer science provide environmental community in expanding resources for data collection and analysis. Today, people are facing challenges to the design of analysis methods, workflows, and interaction with data sets. Data integration is one of older research fields in database area. It is consists of three types of data; structured data, semi-structured data and unstructured data. Web pages is a part of semi-structured data. In this paper, we briefly introduce the problem of data extraction from web pages focus on images. We also discuss the evolution of extraction images from semi-structured to structured format using WEIDJ (Wrapper for extraction Images using Document Object Model (DOM) and JavaScript Object Notation Data (JSON) approach). An experiment was conducted on same website using different approach JSON and DOM to show the comparison of time performance.


