Vol. 3 No. 2 (2019): Vol 3, Iss 2, Year 2019

Analysis of Progressive Duplicate Data Detection

Veeramuthu P
Department of Computer Science, Besant Theosophical College, Madanapalle, Andhra Pradesh, India
Published December 30, 2019
  • Sorted Neighborhood Method (SNM), progressive blocking (PB), progressive sorted neighborhood method (PSNM) and Data Mining.
How to Cite
P, V. (2019). Analysis of Progressive Duplicate Data Detection. Journal of Computational Mathematica, 3(2), 41-50. https://doi.org/10.26524/cm53


Duplicate detection is the process of identifying multiple representations of same real world entities. Today, duplicate detection methods need to process ever larger datasets in ever shorter time: maintaining the quality of a dataset becomes increasingly difficult. We present two novel, progressive duplicate detection algorithms that significantly increase the efficiency of finding duplicates if the execution time is limited, They maximize the gain of the overall process within the time available by reporting most results much earlier than traditional approaches. Comprehensive experiments show that our progressive algorithms can double the efficiency over time of traditional duplicate detection and significantly improve upon related work.


Download data is not yet available.