International Journal of Advance Computational Engineering and Networking (IJACEN)
.
current issues
Volume-8,Issue-5  ( May, 2020 )
Past issues
  1. Volume-8,Issue-4  ( Apr, 2020 )
  2. Volume-8,Issue-3  ( Mar, 2020 )
  3. Volume-8,Issue-2  ( Feb, 2020 )
  4. Volume-8,Issue-1  ( Jan, 2020 )
  5. Volume-7,Issue-12  ( Dec, 2019 )
  6. Volume-7,Issue-11  ( Nov, 2019 )
  7. Volume-7, Issue-10  ( Oct, 2019 )
  8. Volume-7, Issue-9  ( Sep, 2019 )
  9. Volume-7, Issue-8  ( Aug, 2019 )
  10. Volume-7, Issue-7  ( Jul, 2019 )

Statistics report
Aug. 2020
Submitted Papers : 80
Accepted Papers : 10
Rejected Papers : 70
Acc. Perc : 12%
Issue Published : 88
Paper Published : 1239
No. of Authors : 3115
  Journal Paper

Paper Title
Unsupervised Approach For Semi-Structured Data Record Extraction From Multiple Pages Using Tag Tree Similarities

Abstract
In this paper we present a novel unsupervised approach for data records extraction from multiple similar web pages using tag tree similarities. Extracting the data records from multiple web pages consist of following sequences. We first identify the related web pages from the web source. Next we construct the DOM tree for related web pages using html parser. We then compare two or more web pages to eliminate unwanted regions such as header, menu bar, navigation bar, advertisements, etc and find the region containing data records also referred to as data region. We then traverse sub trees of data region to extract individual data record and store them in required form such as XML. The main contribution of this paper is in developing a fully unsupervised algorithm for extracting both structured as well as semi-structured data records from multiple related web pages. Our proposed system can extract valuable data records from many commercial web sources more precisely. Hence it can serve as a tool for integrating information from various commercial websites. This integrated information can then be used for providing various value added services such as comparative shopping, market intelligence, meta-querying and search. Keywords - Data Record Detection, Information Extraction, Semi-Structured data, Wrapper Generation.


Author - Aleem Ansari, Hemalata Vasistha

| PDF |
Viewed - 85
| Published on 2015-10-13
   
   
IRAJ Other Journals
IJACEN updates
Paper Submission is open now for upcoming Issue.
The Conference World

JOURNAL SUPPORTED BY