International Journal of Advance Computational Engineering and Networking (IJACEN)
.
Follow Us On :
current issues
Volume-12,Issue-8  ( Aug, 2024 )
Past issues
  1. Volume-12,Issue-7  ( Jul, 2024 )
  2. Volume-12,Issue-6  ( Jun, 2024 )
  3. Volume-12,Issue-5  ( May, 2024 )
  4. Volume-12,Issue-4  ( Apr, 2024 )
  5. Volume-12,Issue-3  ( Mar, 2024 )
  6. Volume-12,Issue-2  ( Feb, 2024 )
  7. Volume-12,Issue-1  ( Jan, 2024 )
  8. Volume-11,Issue-12  ( Dec, 2023 )
  9. Volume-11,Issue-11  ( Nov, 2023 )
  10. Volume-11,Issue-10  ( Oct, 2023 )

Statistics report
Dec. 2024
Submitted Papers : 80
Accepted Papers : 10
Rejected Papers : 70
Acc. Perc : 12%
Issue Published : 140
Paper Published : 1656
No. of Authors : 4371
  Journal Paper


Paper Title :
Unsupervised Approach For Semi-Structured Data Record Extraction From Multiple Pages Using Tag Tree Similarities

Author :Aleem Ansari, Hemalata Vasistha

Article Citation :Aleem Ansari ,Hemalata Vasistha , (2015 ) " Unsupervised Approach For Semi-Structured Data Record Extraction From Multiple Pages Using Tag Tree Similarities " , International Journal of Advance Computational Engineering and Networking (IJACEN) , pp. 60-64, Volume-3, Issue-10

Abstract : In this paper we present a novel unsupervised approach for data records extraction from multiple similar web pages using tag tree similarities. Extracting the data records from multiple web pages consist of following sequences. We first identify the related web pages from the web source. Next we construct the DOM tree for related web pages using html parser. We then compare two or more web pages to eliminate unwanted regions such as header, menu bar, navigation bar, advertisements, etc and find the region containing data records also referred to as data region. We then traverse sub trees of data region to extract individual data record and store them in required form such as XML. The main contribution of this paper is in developing a fully unsupervised algorithm for extracting both structured as well as semi-structured data records from multiple related web pages. Our proposed system can extract valuable data records from many commercial web sources more precisely. Hence it can serve as a tool for integrating information from various commercial websites. This integrated information can then be used for providing various value added services such as comparative shopping, market intelligence, meta-querying and search. Keywords - Data Record Detection, Information Extraction, Semi-Structured data, Wrapper Generation.

Type : Research paper

Published : Volume-3, Issue-10


DOIONLINE NO - IJACEN-IRAJ-DOIONLINE-3095   View Here

Copyright: © Institute of Research and Journals

| PDF |
Viewed - 94
| Published on 2015-10-13
   
   
IRAJ Other Journals
IJACEN updates
Paper Submission is open now for upcoming Issue.
The Conference World

JOURNAL SUPPORTED BY