12.01.2015 Views

Download - Academy Publisher

Download - Academy Publisher

Download - Academy Publisher

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

ISBN 978-952-5726-09-1 (Print)<br />

Proceedings of the Second International Symposium on Networking and Network Security (ISNNS ’10)<br />

Jinggangshan, P. R. China, 2-4, April. 2010, pp. 054-057<br />

Research for the Algorithm of Query to<br />

Compressed XML Data<br />

Guojia Yu 1 , Huizhong Qiu 2 , and Lin Tian 3<br />

1<br />

Scholl of Computer Science and Engineerin<br />

University of Electronic Science and Technology of China, ChengDU, China<br />

Email:yuguojia@foxmail.com<br />

2<br />

Scholl of Computer Science and Engineering<br />

University of Electronic Science and Technology of China, ChengDU, China<br />

Email: hzqiu@ uestc.edu.cn, ruan052@126.com<br />

Abstract—Because XML data is increasingly becoming the<br />

standard of transmission and distribution of Internet and<br />

enterprise's data in a common format. Efficient algorithms<br />

of compression and query in XML data can directly reduce<br />

the cost of storage of data and shorten response time of<br />

query. Studying in this aspect is widely promising. This<br />

article proposed an equivalence relation on the basis of<br />

characters of XML, and proved the rationality of the index<br />

and the feasibility of query algorithm on this method, then<br />

put forward a new query algorithm on the compressed<br />

index. Finally, compared with XGrind that supports query<br />

on the partial decompression of compressed XML data in<br />

experiment. The efficiency of query on the compressed<br />

index was significantly higher than Xgrind's in several sets<br />

of data .<br />

Index Terms—XML date; compressed index; query;<br />

algorithm<br />

I. INTRODUCTION<br />

In this paper,build XML compressed index, and query<br />

efficiently on this index. Complete it in three parts in<br />

main:<br />

First, code XML data of tags and attribute names with<br />

dictionary, then use Huffman[1] coding to compress the<br />

element values and attribute values.<br />

Secondly, expand SAX generic events into another<br />

events. And compress the original XML tree structure to<br />

build a new compressed index, reduce greatly data<br />

redundancy by the structured data itself.<br />

Finally, query efficiently some data on the compressed<br />

index.<br />

II. BUILD THE COMPRESSED INDEX<br />

A. Pre-Compression<br />

The first step: use the dictionary of pre-compression to<br />

encode XML data in the non-content nodes, scan the<br />

DTD or Schema whose XML document would be<br />

compressed, store the labels’ name and attributes’ name<br />

into two dictionaries, and then the values of the<br />

dictionary instead of these labels’ name and attributes’<br />

name. After that,build the compressed indexed. In the<br />

previous,article shows some concepts of terminology:<br />

1 The same name item: the same name of tags basing<br />

on the same parent node compose the same name item.<br />

2 Different chain: the first element in all of the same<br />

items based on the same parent node composed of the<br />

different chain.<br />

3Repetition rate of XML data: (the number of lable<br />

elements of XML ― the number of the same name item)<br />

/ the number of lable elements of XML.<br />

4 The judgement event: combine two adjacent events<br />

(that’s geniric events)of SAX,then expand API of the<br />

event-driven SAX parser to a judgement event<br />

B. TP Equivalence Relations<br />

Illuminationed by the indexs of APEX[2], Fabric[3],<br />

XQueC[4], XBZip[5] and other methods, this article will<br />

convert the tree structure of XML to another index who<br />

could guarantee to support efficient query. So introduce a<br />

TP equivalence relations (tree to tree-graph)with two<br />

structures can be interchangeable, that is isomorphic, as<br />

follows.<br />

Figure 1. XML document tree structure<br />

[Definition 1] TP equivalence relation: given a tree G:<br />

G ( V , E)<br />

, where V is the set of nodes in G, E is the set of<br />

edges in G. Convert G, you can get another form of treegraph<br />

G '(<br />

V ', E'<br />

) . Based on G and G',we can define a<br />

binary relation R.<br />

If R satisfies the following conditions, that R is a TP<br />

equivalence relations of G and G':<br />

1) any node u in G has exactly the same and unique<br />

corresponding node u' in G'.<br />

2) If there is a node in G, whose child pointer p point<br />

to the p 1<br />

-the first child on the left, that they have a<br />

p→<br />

α p<br />

relationship 1, then, in G' there must also exist a<br />

corresponding element of q, which child pointer point to<br />

© 2010 ACADEMY PUBLISHER<br />

AP-PROC-CS-10CN006<br />

54

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!