30.12.2014 Views

PDF (1MB) - QUT ePrints

PDF (1MB) - QUT ePrints

PDF (1MB) - QUT ePrints

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Table II: Features of XML Data Clustering Approaches<br />

ACM Computing Surveys, Vol. , No. , 2009.<br />

Prototypes<br />

❵❵❵❵❵❵❵❵❵❵❵❵<br />

Criteria<br />

XClust XMine PCXSS SeqXClust XCLS XEdge PSim XProj SemXClust S-GRACE SumXClust VectXClust DFTXClust<br />

Clustered DTD DTD& XSD XSD XML XML XML XML XML XML XML XML XML<br />

data type schema XSD schema schema schema documents documents documents documents documents documents documents documents documents<br />

Data rooted unordered rooted ordered rooted ordered rooted ordered rooted ordered rooted ordered rooted rooted ordered rooted directed rooted ordered binary time<br />

representation labeled tree labeled tree labeled tree labeled tree labeled tree labeled tree labeled tree labeled tree labeled tree graph labeled tree branch vector series<br />

Exploited elements, attributes, elements, attributes,<br />

elements, attributes, level information<br />

subtrees whole XML tree whole XML tree whole XML whole XML<br />

node paths<br />

edge information paths substructures<br />

objects paths paths node context elements (tree tuple) (s-graph) (structural summaries) tree tree<br />

name, linguistic, name, name,<br />

Schema matching cardinality, hierarchical structure, data type, data type,<br />

— — — — — — — — —<br />

path context path similarity constraint node context<br />

Similarity<br />

level edge path frequent content & distance structural<br />

approach Tree matching — — — —<br />

similarity similarity similarity substructure structural similarity metric distance<br />

— —<br />

Vector<br />

binary branch DFT<br />

— — — — — — — — — — —<br />

based distance distance<br />

Similarity DTD similarity schema similarity<br />

schema similarity<br />

edge similarity path similarity<br />

distance fully connected<br />

—<br />

—<br />

— —<br />

—<br />

representation matrix matrix matrix matrix matrix array graph<br />

—<br />

Clustering<br />

hierarchical<br />

constrained<br />

graph partitioning partition-base partitional hierarchical single link<br />

hierarchical<br />

incremental<br />

incremental<br />

K-means<br />

algorithm agglomerative hierarchical agglomerative greedy algorithm approach XTrK-means algorithm ROCK approach hierarchical<br />

— —<br />

Cluster hierarchy of hierarchy of<br />

hierarchy of<br />

representation schema classes schema classes K-clusters<br />

schema classes<br />

K-clusters K-clusters K-clusters K-clusters K-clusters K-clusters K-clusters — —<br />

(dendrogram) (dendrogram) (dendrogram)<br />

quality of inter & intra-cluster inter & intra-cluster inter & intra-cluster external& internal measures<br />

Precision<br />

closeness<br />

precision<br />

Quality criteria<br />

integrated<br />

purity, entropy, FScore FScore —<br />

inter & intra-cluster standard deviation<br />

— —<br />

FScore FScore FScore<br />

recall<br />

recall<br />

schema inter & intra cluster outlier ratio<br />

Efficiency<br />

response<br />

time& space<br />

page I/O response<br />

response response response<br />

— —<br />

—<br />

—<br />

—<br />

evaluation time complexity time time time time<br />

—<br />

Application XML data general XML general XML general XML general XML general XML XML storage XML data XML data XML query hierarchical structural Query Storage & retrieval<br />

area integration management management management management management for query processing mining mining processing management processing of large documents<br />

XML Data Clustering: An Overview · 37

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!