04.07.2016 Views

IJRICIT-01-002 ENHANCED REPLICA DETECTION IN SHORT TIME FOR LARGE DATA SETS

Similarity check of real world entities is a necessary factor in these days which is named as Data Replica Detection. Time is an critical factor today in tracking Data Replica Detection for large data sets, without having impact over quality of Dataset. In this we primarily introduce two Data Replica Detection algorithms , where in these contribute enhanced procedural standards in finding Data Replication at limited execution periods.This contribute better improvised state of time than conventional techniques . We propose two Data Replica Detection algorithms namely progressive sorted neighborhood method (PSNM), which performs best on small and almost clean datasets, and progressive blocking (PB), which performs best on large and very grimy datasets. Both enhance the efficiency of duplicate detection even on very large datasets.

Similarity check of real world entities is a necessary factor in these days which is named as Data Replica Detection.
Time is an critical factor today in tracking Data Replica Detection for large data sets, without having impact over quality
of Dataset. In this we primarily introduce two Data Replica Detection algorithms , where in these contribute enhanced
procedural standards in finding Data Replication at limited execution periods.This contribute better improvised state
of time than conventional techniques . We propose two Data Replica Detection algorithms namely progressive sorted
neighborhood method (PSNM), which performs best on small and almost clean datasets, and progressive blocking (PB),
which performs best on large and very grimy datasets. Both enhance the efficiency of duplicate detection even on very
large datasets.

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

International Journal of Research and Innovation on Science, Engineering and Technology (IJRISET)<br />

REFERENCES<br />

[1]Wallace M. andKollias S. (2008), „Computationally Efficient<br />

Incremental Transitive Closure of Sparse Fuzzy Binary<br />

Relations, Proc. IEEE Trans. Conf. Fuzzy Systems,<br />

Vol. 3, pp. 1561-1565.<br />

[2] Elmagarmid A.K., Ipeirotis P.G., and Verykios V.S.<br />

(2007), „Duplicate record detection: A survey, IEEE Trans.<br />

Know. Data Eng., Vol. 19, No. 1, pp. 1–16.<br />

[3] Madhavan J., Jeffery S.R., Cohen S., Dong X., Ko D.,<br />

Yu C. and Halevy A. (2007), „ Web-scale data integration:<br />

You can only afford to pay as you go, Proc. Conf. Innovative<br />

Data Syst. Res, pp. 342-350.<br />

AUTHORS<br />

Pathan Firoze Khan,<br />

Research Scholar,<br />

Department of Computer Science and Engineering,<br />

Chintalapudi Engineering College, Guntur, AP, India.<br />

K Raj Kiran,<br />

Assistant professor,<br />

Department of Computer Science and Engineering,<br />

Chintalapudi Engineering College, Guntur, AP, India.<br />

6

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!