International Journal of Advance Computational Engineering and Networking (IJACEN)
.
IRAJ Home     Conferences
FOLLOW
current issues
Volume-7, Issue-6  ( Jun, 2019 )
Past issues
  1. Volume-7, Issue-5  ( May, 2019 )
  2. Volume-7, Issue-4  ( Apr, 2019 )
  3. Volume-7, Issue-3  ( Mar, 2019 )
  4. Volume-7, Issue-2  ( Feb, 2019 )
  5. Volume-7, Issue-1  ( Jan, 2019 )
  6. Volume-6, Issue-12  ( Dec, 2018 )
  7. Volume-6, Issue-11  ( Nov, 2018 )
  8. Volume-6, Issue-10  ( Oct, 2018 )
  9. Volume-6, Issue-9  ( Sep, 2018 )
  10. Volume-6, Issue-8  ( Aug, 2018 )

Statistics report
Aug. 2019
Submitted Papers : 80
Accepted Papers : 10
Rejected Papers : 70
Acc. Perc : 12%
Issue Published : 77
Paper Published : 1176
No. of Authors : 2946
  Journal Paper

Paper Title
Modified N-Gram based Model for Identifying and Filtering Near-Duplicate Documents Detection

Abstract
During last three decades World Wide Web (WWW) has expanded exponentially. A great deal of the web is full of duplicate or near-duplicate content. Documents that are served on the web are in different formats like PDF, HTML, excel and text. Our proposed solution is created on a publicly available dataset files. The dataset consists of files which are tagged as duplicate. Our work in this paper is based on the duplicate and near duplicate document detection using n-Gram based, a low-dimensional demonstration(LSI-SVD) approach, implemented in c#.net. Keywords - Duplicate document, N-gram, SVD (Singular Value Decomposition), LSI(Latent Semantic Indexing), Cosine similarity etc.


Author - Farheen Naaz, Farheen Siddiqui

| PDF |
Viewed - 40
| Published on 2017-12-30
   
   
IRAJ Other Journals
IJACEN updates
Paper Submission is open now for upcoming Issue.
The Conference World

JOURNAL SUPPORTED BY