Download - Academy Publisher
Download - Academy Publisher
Download - Academy Publisher
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
ISBN 978-952-5726-09-1 (Print)<br />
Proceedings of the Second International Symposium on Networking and Network Security (ISNNS ’10)<br />
Jinggangshan, P. R. China, 2-4, April. 2010, pp. 054-057<br />
Research for the Algorithm of Query to<br />
Compressed XML Data<br />
Guojia Yu 1 , Huizhong Qiu 2 , and Lin Tian 3<br />
1<br />
Scholl of Computer Science and Engineerin<br />
University of Electronic Science and Technology of China, ChengDU, China<br />
Email:yuguojia@foxmail.com<br />
2<br />
Scholl of Computer Science and Engineering<br />
University of Electronic Science and Technology of China, ChengDU, China<br />
Email: hzqiu@ uestc.edu.cn, ruan052@126.com<br />
Abstract—Because XML data is increasingly becoming the<br />
standard of transmission and distribution of Internet and<br />
enterprise's data in a common format. Efficient algorithms<br />
of compression and query in XML data can directly reduce<br />
the cost of storage of data and shorten response time of<br />
query. Studying in this aspect is widely promising. This<br />
article proposed an equivalence relation on the basis of<br />
characters of XML, and proved the rationality of the index<br />
and the feasibility of query algorithm on this method, then<br />
put forward a new query algorithm on the compressed<br />
index. Finally, compared with XGrind that supports query<br />
on the partial decompression of compressed XML data in<br />
experiment. The efficiency of query on the compressed<br />
index was significantly higher than Xgrind's in several sets<br />
of data .<br />
Index Terms—XML date; compressed index; query;<br />
algorithm<br />
I. INTRODUCTION<br />
In this paper,build XML compressed index, and query<br />
efficiently on this index. Complete it in three parts in<br />
main:<br />
First, code XML data of tags and attribute names with<br />
dictionary, then use Huffman[1] coding to compress the<br />
element values and attribute values.<br />
Secondly, expand SAX generic events into another<br />
events. And compress the original XML tree structure to<br />
build a new compressed index, reduce greatly data<br />
redundancy by the structured data itself.<br />
Finally, query efficiently some data on the compressed<br />
index.<br />
II. BUILD THE COMPRESSED INDEX<br />
A. Pre-Compression<br />
The first step: use the dictionary of pre-compression to<br />
encode XML data in the non-content nodes, scan the<br />
DTD or Schema whose XML document would be<br />
compressed, store the labels’ name and attributes’ name<br />
into two dictionaries, and then the values of the<br />
dictionary instead of these labels’ name and attributes’<br />
name. After that,build the compressed indexed. In the<br />
previous,article shows some concepts of terminology:<br />
1 The same name item: the same name of tags basing<br />
on the same parent node compose the same name item.<br />
2 Different chain: the first element in all of the same<br />
items based on the same parent node composed of the<br />
different chain.<br />
3Repetition rate of XML data: (the number of lable<br />
elements of XML ― the number of the same name item)<br />
/ the number of lable elements of XML.<br />
4 The judgement event: combine two adjacent events<br />
(that’s geniric events)of SAX,then expand API of the<br />
event-driven SAX parser to a judgement event<br />
B. TP Equivalence Relations<br />
Illuminationed by the indexs of APEX[2], Fabric[3],<br />
XQueC[4], XBZip[5] and other methods, this article will<br />
convert the tree structure of XML to another index who<br />
could guarantee to support efficient query. So introduce a<br />
TP equivalence relations (tree to tree-graph)with two<br />
structures can be interchangeable, that is isomorphic, as<br />
follows.<br />
Figure 1. XML document tree structure<br />
[Definition 1] TP equivalence relation: given a tree G:<br />
G ( V , E)<br />
, where V is the set of nodes in G, E is the set of<br />
edges in G. Convert G, you can get another form of treegraph<br />
G '(<br />
V ', E'<br />
) . Based on G and G',we can define a<br />
binary relation R.<br />
If R satisfies the following conditions, that R is a TP<br />
equivalence relations of G and G':<br />
1) any node u in G has exactly the same and unique<br />
corresponding node u' in G'.<br />
2) If there is a node in G, whose child pointer p point<br />
to the p 1<br />
-the first child on the left, that they have a<br />
p→<br />
α p<br />
relationship 1, then, in G' there must also exist a<br />
corresponding element of q, which child pointer point to<br />
© 2010 ACADEMY PUBLISHER<br />
AP-PROC-CS-10CN006<br />
54