IJRICIT-01-002 ENHANCED REPLICA DETECTION IN SHORT TIME FOR LARGE DATA SETS
Similarity check of real world entities is a necessary factor in these days which is named as Data Replica Detection. Time is an critical factor today in tracking Data Replica Detection for large data sets, without having impact over quality of Dataset. In this we primarily introduce two Data Replica Detection algorithms , where in these contribute enhanced procedural standards in finding Data Replication at limited execution periods.This contribute better improvised state of time than conventional techniques . We propose two Data Replica Detection algorithms namely progressive sorted neighborhood method (PSNM), which performs best on small and almost clean datasets, and progressive blocking (PB), which performs best on large and very grimy datasets. Both enhance the efficiency of duplicate detection even on very large datasets.
Similarity check of real world entities is a necessary factor in these days which is named as Data Replica Detection.
Time is an critical factor today in tracking Data Replica Detection for large data sets, without having impact over quality
of Dataset. In this we primarily introduce two Data Replica Detection algorithms , where in these contribute enhanced
procedural standards in finding Data Replication at limited execution periods.This contribute better improvised state
of time than conventional techniques . We propose two Data Replica Detection algorithms namely progressive sorted
neighborhood method (PSNM), which performs best on small and almost clean datasets, and progressive blocking (PB),
which performs best on large and very grimy datasets. Both enhance the efficiency of duplicate detection even on very
large datasets.
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
International Journal of Research and Innovation on Science, Engineering and Technology (IJRISET)<br />
REFERENCES<br />
[1]Wallace M. andKollias S. (2008), „Computationally Efficient<br />
Incremental Transitive Closure of Sparse Fuzzy Binary<br />
Relations, Proc. IEEE Trans. Conf. Fuzzy Systems,<br />
Vol. 3, pp. 1561-1565.<br />
[2] Elmagarmid A.K., Ipeirotis P.G., and Verykios V.S.<br />
(2007), „Duplicate record detection: A survey, IEEE Trans.<br />
Know. Data Eng., Vol. 19, No. 1, pp. 1–16.<br />
[3] Madhavan J., Jeffery S.R., Cohen S., Dong X., Ko D.,<br />
Yu C. and Halevy A. (2007), „ Web-scale data integration:<br />
You can only afford to pay as you go, Proc. Conf. Innovative<br />
Data Syst. Res, pp. 342-350.<br />
AUTHORS<br />
Pathan Firoze Khan,<br />
Research Scholar,<br />
Department of Computer Science and Engineering,<br />
Chintalapudi Engineering College, Guntur, AP, India.<br />
K Raj Kiran,<br />
Assistant professor,<br />
Department of Computer Science and Engineering,<br />
Chintalapudi Engineering College, Guntur, AP, India.<br />
6