12.07.2015 Views

Homomorphism Resolving of XPath Trees Based on Automata*

Homomorphism Resolving of XPath Trees Based on Automata*

Homomorphism Resolving of XPath Trees Based on Automata*

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<str<strong>on</strong>g>Homomorphism</str<strong>on</strong>g> <str<strong>on</strong>g>Resolving</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>XPath</str<strong>on</strong>g> <str<strong>on</strong>g>Trees</str<strong>on</strong>g> <str<strong>on</strong>g>Based</str<strong>on</strong>g> <strong>on</strong><strong>Automata*</strong>Ming Fu and Yu Zhang 1,21Department <str<strong>on</strong>g>of</str<strong>on</strong>g> Computer Science & Technology,University <str<strong>on</strong>g>of</str<strong>on</strong>g> Science & Technology <str<strong>on</strong>g>of</str<strong>on</strong>g> China, Hefei, 230027, China2 Laboratory <str<strong>on</strong>g>of</str<strong>on</strong>g> Computer Science, Chinese Academy <str<strong>on</strong>g>of</str<strong>on</strong>g> Sciences, Beijing, 100080, Chinabrightfu@ustc.edu, yuzhang@ustc.edu.cnAbstract. As a query language for navigating XML trees and selecting a set <str<strong>on</strong>g>of</str<strong>on</strong>g>element nodes, <str<strong>on</strong>g>XPath</str<strong>on</strong>g> is ubiquitous in XML applicati<strong>on</strong>s. One important issue<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>XPath</str<strong>on</strong>g> queries is c<strong>on</strong>tainment checking, which is known as a co-NPcomplete. The homomorphism relati<strong>on</strong>ship between two <str<strong>on</strong>g>XPath</str<strong>on</strong>g> trees, which is aPTIME problem, is a sufficient but not necessary c<strong>on</strong>diti<strong>on</strong> for the c<strong>on</strong>tainmentrelati<strong>on</strong>ship. We propose a new tree structure to depict <str<strong>on</strong>g>XPath</str<strong>on</strong>g> based <strong>on</strong> the level<str<strong>on</strong>g>of</str<strong>on</strong>g> the tree node and adopt a method <str<strong>on</strong>g>of</str<strong>on</strong>g> sharing the prefixes <str<strong>on</strong>g>of</str<strong>on</strong>g> multi-trees toc<strong>on</strong>struct incrementally the most effective automata, named XTHC (<str<strong>on</strong>g>XPath</str<strong>on</strong>g><str<strong>on</strong>g>Trees</str<strong>on</strong>g> <str<strong>on</strong>g>Homomorphism</str<strong>on</strong>g> Checker). XTHC takes an <str<strong>on</strong>g>XPath</str<strong>on</strong>g> tree and produces theresult <str<strong>on</strong>g>of</str<strong>on</strong>g> checking homomorphism relati<strong>on</strong>ship between an arbitrary tree inmulti-trees and the input tree, thereinto the input tree is transformed into eventswhich force the automata to run. Moreover, we c<strong>on</strong>sider and narrow thediscrepancy between homomorphism relati<strong>on</strong>ship and c<strong>on</strong>tainment relati<strong>on</strong>shipas possible as we can.Keywords: <str<strong>on</strong>g>XPath</str<strong>on</strong>g> tree, c<strong>on</strong>tainment, homomorphism, automata.1 Introducti<strong>on</strong>XML has become the standard <str<strong>on</strong>g>of</str<strong>on</strong>g> exchanging a wide variety <str<strong>on</strong>g>of</str<strong>on</strong>g> data <strong>on</strong> Web andelsewhere. XML is essentially a directed labeled tree. <str<strong>on</strong>g>XPath</str<strong>on</strong>g>[1] is a simple andpopular query language to navigate XML trees and extract informati<strong>on</strong> from them.<str<strong>on</strong>g>XPath</str<strong>on</strong>g> expressi<strong>on</strong> p is said to c<strong>on</strong>tain another <str<strong>on</strong>g>XPath</str<strong>on</strong>g> expressi<strong>on</strong> q, denoted byq ⊆ p, if and <strong>on</strong>ly if for any XML document D, if the resulting set <str<strong>on</strong>g>of</str<strong>on</strong>g> p returned byquerying <strong>on</strong> D c<strong>on</strong>tains the resulting set <str<strong>on</strong>g>of</str<strong>on</strong>g> q. C<strong>on</strong>tainment checking becomes <strong>on</strong>e <str<strong>on</strong>g>of</str<strong>on</strong>g>the most important issues in <str<strong>on</strong>g>XPath</str<strong>on</strong>g> queries. Query c<strong>on</strong>tainment is crucial in manyc<strong>on</strong>texts, such as query optimizati<strong>on</strong> and reformulati<strong>on</strong>, informati<strong>on</strong> integrati<strong>on</strong>,integrity checking, etc. However, [2] shows that c<strong>on</strong>tainment in fragment XP {[ ],*,//} isco-NP complete. The authors proposed a complete algorithm for c<strong>on</strong>tainment, whosecomplexity is EXPTIME. The authors also proposed a sound but incomplete PTIME* This work is supported by the Nati<strong>on</strong>al Natural Science Foundati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> China under Grant No.60673126, and the Foundati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> Laboratory <str<strong>on</strong>g>of</str<strong>on</strong>g> Computer Science, Chinese Academy <str<strong>on</strong>g>of</str<strong>on</strong>g>Science under Grant No. SYSKF0502.G. D<strong>on</strong>g et al. (Eds.): APWeb/WAIM 2007, LNCS 4505, pp. 821–828, 2007.© Springer-Verlag Berlin Heidelberg 2007


<str<strong>on</strong>g>Homomorphism</str<strong>on</strong>g> <str<strong>on</strong>g>Resolving</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>XPath</str<strong>on</strong>g> <str<strong>on</strong>g>Trees</str<strong>on</strong>g> <str<strong>on</strong>g>Based</str<strong>on</strong>g> <strong>on</strong> Automata 823where a and b are the minimum and maximum numbers <str<strong>on</strong>g>of</str<strong>on</strong>g> levels between v and v'respectively. The relati<strong>on</strong>ship between nodes in tree T is given as:1) If n is a root node test, i.e. /n or //n, there exists an edge in T between the node vin T that corresp<strong>on</strong>ds to n, and the root r, edge(r, v), where r is the parental node <str<strong>on</strong>g>of</str<strong>on</strong>g> v.When /n, L(v)=[1, 1]; and L(v)=[1, ∞ ] when //n.2) If n is not a root node test, there is an adjacent node test n' in q that satisfies n'/n,n'[n], n'//n or n'[.//n], therefore, there exists an edge in T between v and v'(corresp<strong>on</strong>ding to n and n' respectively), where v' is the parental node <str<strong>on</strong>g>of</str<strong>on</strong>g> v. When n'/nor n'[n], L(v)=[1, 1]; and L(v)=[1, ∞ ] when n'//n or n'[.//n].Definiti<strong>on</strong> 2: Given an <str<strong>on</strong>g>XPath</str<strong>on</strong>g> tree T, let NODES(T) be the set <str<strong>on</strong>g>of</str<strong>on</strong>g> nodes in T,EDGES(T) be the set <str<strong>on</strong>g>of</str<strong>on</strong>g> edges in T, ROOT(T) be the root node <str<strong>on</strong>g>of</str<strong>on</strong>g> T. If there existsv∈ NODES(T), and the outdegree <str<strong>on</strong>g>of</str<strong>on</strong>g> v is greater than 1, or the outdegree or theindegree <str<strong>on</strong>g>of</str<strong>on</strong>g> v is 0, node v is then called key node <str<strong>on</strong>g>of</str<strong>on</strong>g> the <str<strong>on</strong>g>XPath</str<strong>on</strong>g> tree T. ∀ edge(x,y)∈EDGES(T), where x,y∈NODES(T), and edge(x,y) implies x is the parental node <str<strong>on</strong>g>of</str<strong>on</strong>g> y.If nid is the unique idtentifier <str<strong>on</strong>g>of</str<strong>on</strong>g> node y and ln is the label <str<strong>on</strong>g>of</str<strong>on</strong>g> node y, we then denotenode y by nid [a,b] , where [a,b] equals to L(y).Informally, key nodes in an <str<strong>on</strong>g>XPath</str<strong>on</strong>g> tree are branching nodes (nodes with outdegreegreater than 1), leaves, and root.There are <str<strong>on</strong>g>of</str<strong>on</strong>g>ten some wildcard locati<strong>on</strong> steps without predicate used in an <str<strong>on</strong>g>XPath</str<strong>on</strong>g>expressi<strong>on</strong>, which are represented as n<strong>on</strong>-branching nodes ‘*’, such as the expressi<strong>on</strong>/a/*//*/b. We can remove those wildcard nodes in the <str<strong>on</strong>g>XPath</str<strong>on</strong>g> tree for simplificati<strong>on</strong>,but have to revise the L(v) value <str<strong>on</strong>g>of</str<strong>on</strong>g> some related n<strong>on</strong>-wildcard node v which is thedescendent node <str<strong>on</strong>g>of</str<strong>on</strong>g> the removed wildcard node. Fig. 1(a) illustrates the two <str<strong>on</strong>g>XPath</str<strong>on</strong>g>trees <str<strong>on</strong>g>of</str<strong>on</strong>g> the expressi<strong>on</strong> /a/*//*/b before and after removing n<strong>on</strong>-branching wildcardnodes, where L(b) is revised. In the following c<strong>on</strong>text, all <str<strong>on</strong>g>XPath</str<strong>on</strong>g> trees are those treesfrom which the n<strong>on</strong>-branching wildcard nodes are removed.(a)(b)Fig. 1. (a)<str<strong>on</strong>g>XPath</str<strong>on</strong>g> Tree /a/*//*/b ; (b)<str<strong>on</strong>g>XPath</str<strong>on</strong>g> Tree /a/*/b[.//*/c]//dDefiniti<strong>on</strong> 3: Given an <str<strong>on</strong>g>XPath</str<strong>on</strong>g> tree T, let CNODES(T) be the set <str<strong>on</strong>g>of</str<strong>on</strong>g> alterable nodes,FNODES(T) be the set <str<strong>on</strong>g>of</str<strong>on</strong>g> fixed nodes, NODES(T)={ROOT(T)} ∪ CNODES(T)∪ FNODES(T). ∀ n ∈ NODES(T) and n ≠ ROOT(T), L(n) = [a,b]. If a=b, thenn∈FNODES(T); if b= ∞ , then n∈CNODES(T). When CNODES(T) is not empty, the<str<strong>on</strong>g>XPath</str<strong>on</strong>g> tree T is an alterable tree, otherwise it is a fixed tree.As an example, the <str<strong>on</strong>g>XPath</str<strong>on</strong>g> tree <str<strong>on</strong>g>of</str<strong>on</strong>g> the <str<strong>on</strong>g>XPath</str<strong>on</strong>g> expressi<strong>on</strong> /a/*/b[.//*/c]//d is shown inFig. 1(b). The set <str<strong>on</strong>g>of</str<strong>on</strong>g> level relati<strong>on</strong>ship between node x 2 and its parental node isL(x 2 )=[2,2]. From definiti<strong>on</strong> 3, node x 2 is a fixed node. The set <str<strong>on</strong>g>of</str<strong>on</strong>g> level relati<strong>on</strong>ship


824 M. Fu and Y. Zhangbetween node x 3 and its parental node is L(x 3 )=[2, ∞ ], and node x 3 is an alterablenode, so the corresp<strong>on</strong>ding <str<strong>on</strong>g>XPath</str<strong>on</strong>g> tree is an alterable tree.Definiti<strong>on</strong> 4: Functi<strong>on</strong> h: NODES(p) → NODES(q) describes the homomorphismrelati<strong>on</strong>ship from <str<strong>on</strong>g>XPath</str<strong>on</strong>g> tree p to <str<strong>on</strong>g>XPath</str<strong>on</strong>g> tree q:1)h(ROOT(p)) = ROOT(q);2)For each x∈NODES(p), LABEL(x)='*' orLABEL(x) = LABEL(h(x));3)For each edge(x,y) ∈ EDGES(p), wherex,y∈NODES(p), L(x,y) ⊇ L(h(x),h(y));Fig. 2 shows the homomorphism mapping hfrom <str<strong>on</strong>g>XPath</str<strong>on</strong>g> tree p to <str<strong>on</strong>g>XPath</str<strong>on</strong>g> tree q based <strong>on</strong><str<strong>on</strong>g>XPath</str<strong>on</strong>g> expressi<strong>on</strong>s /a/*//b and /a[c]//*/*//b.3 <str<strong>on</strong>g>Homomorphism</str<strong>on</strong>g> Resoluti<strong>on</strong> <str<strong>on</strong>g>Based</str<strong>on</strong>g> <strong>on</strong> XTHC Machine3.1 C<strong>on</strong>structi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> Basic XTHC MachineFig. 2. <str<strong>on</strong>g>Homomorphism</str<strong>on</strong>g> mapping h:pqWe will incrementally c<strong>on</strong>struct NFA with prefix-sharing <strong>on</strong> the set <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>XPath</str<strong>on</strong>g> treesP={p 1 ,p 2 …p n }. Each node nid [a,b] in the <str<strong>on</strong>g>XPath</str<strong>on</strong>g> tree will be mapped to an automatafragment in NFA, and such a fragment has a unique start state and a unique end state.There are two cases while c<strong>on</strong>structing the fragment from the node nid [a,b] :1. When a=b, nid [a,b] is a fixed node, the c<strong>on</strong>structed automata fragment is shown inFig.3(a). The states s-1 and s+a-1 are the start and end states <str<strong>on</strong>g>of</str<strong>on</strong>g> the fragment,respectively. Since a represents the minimum number <str<strong>on</strong>g>of</str<strong>on</strong>g> levels between nodenid [a,b] and its parental node, starting from state s-1, we can c<strong>on</strong>struct in turn a-1states al<strong>on</strong>g the arcs labeled ‘*’, which are called extended states; we thenc<strong>on</strong>struct state s+a-1 al<strong>on</strong>g the arc labeled ln from state s+a-2. Obviously thereexist extended states in the automata fragment based <strong>on</strong> nid [a,b] when a>1.(a)(b)Fig. 3. (a) The automata fragment corresp<strong>on</strong>ding to the fixed node nid [a,a] ; (b) The automatafragment corresp<strong>on</strong>ding to the alterable node nid [a, ∞ ]2. When b= ∞ , nid [a,b] is an alterable node, many kinds <str<strong>on</strong>g>of</str<strong>on</strong>g> automata fragment can bec<strong>on</strong>structed, <strong>on</strong>e example is shown in Fig.3(b). Similarly to that in case 1, we firstc<strong>on</strong>struct a-1 extended states and the end state s+a-1, starting from state s-1. Sinceb= ∞ , it is necessary to add self-looping arc, labeled by ‘*’, in any <strong>on</strong>e or morestates from state s-1 or the following a-1 extended states. The chain c<strong>on</strong>sisting <str<strong>on</strong>g>of</str<strong>on</strong>g>the start state and the extended states, is denoted by extended state-chain. Fig.3(b)<strong>on</strong>ly shows <strong>on</strong>e self-looping arc at last state <str<strong>on</strong>g>of</str<strong>on</strong>g> the extended state-chain. Obviously,


<str<strong>on</strong>g>Homomorphism</str<strong>on</strong>g> <str<strong>on</strong>g>Resolving</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>XPath</str<strong>on</strong>g> <str<strong>on</strong>g>Trees</str<strong>on</strong>g> <str<strong>on</strong>g>Based</str<strong>on</strong>g> <strong>on</strong> Automata 825an automata fragment corresp<strong>on</strong>ding to an alterable node nid [a,b] (a>1) in an <str<strong>on</strong>g>XPath</str<strong>on</strong>g>tree p is optimal, if and <strong>on</strong>ly if there is <strong>on</strong>ly <strong>on</strong>e state in the fragment that has aself-looping arc, and this state must be the last state al<strong>on</strong>g the extended state-chain.Definiti<strong>on</strong> 5: Suppose the NFA c<strong>on</strong>structed from set P <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>XPath</str<strong>on</strong>g> trees is A, called theXHTC machine. We can create the following two index tables for each state s in A:1) LP(s): list <str<strong>on</strong>g>of</str<strong>on</strong>g> leaf nodes. ∀p∈P, for each leaf node n l in p, if s is the last statec<strong>on</strong>structed from n l , then n l ∈LP(s). Only when s is a leaf state, LP(s) is n<strong>on</strong>-empty.2) LB(s): list <str<strong>on</strong>g>of</str<strong>on</strong>g> branching nodes. ∀ p∈P, for each branching node n b in p, if s isthe last state c<strong>on</strong>structed from n b , then n b ∈LB(s). Only when s is a branching state,LB(s) is n<strong>on</strong>-empty.Fig. 4(b) is the XTHC machine c<strong>on</strong>structed from <str<strong>on</strong>g>XPath</str<strong>on</strong>g> trees p 1 , p 2 , and p 3 whichare shown in Fig.4(a), p i .x represents node x in <str<strong>on</strong>g>XPath</str<strong>on</strong>g> tree p i , a state is denoted by acircle. An arc implies state transiti<strong>on</strong>, where dashed lines represents transiti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g>descendant-axis type, and solid lines represents transiti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> child-axis type. A label<strong>on</strong> an arc is a node test. State S1 has an arc to itself since it has a transiti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g>descendant-axis type.(a)(b)Fig. 4. (a) The <str<strong>on</strong>g>XPath</str<strong>on</strong>g> tree set P ; (b) The XTHC machine c<strong>on</strong>structed from <str<strong>on</strong>g>XPath</str<strong>on</strong>g> tree set PDefiniti<strong>on</strong> 6: A basic n<strong>on</strong>-deterministic XTHC machine A is defined as:whereA = (Q s , Σ , δ, q s 0, F, B, S s )• Q s is the set <str<strong>on</strong>g>of</str<strong>on</strong>g> NFA states;• Σ is the set <str<strong>on</strong>g>of</str<strong>on</strong>g> input symbols;• q s 0 is the initial(or start) NFA state <str<strong>on</strong>g>of</str<strong>on</strong>g> A, i.e. the root state;• δ is the set <str<strong>on</strong>g>of</str<strong>on</strong>g> state transiti<strong>on</strong> functi<strong>on</strong>s, it c<strong>on</strong>tains at least the NFA state transiti<strong>on</strong>functi<strong>on</strong>, i.e. t forward : Q s × Σ→2 Q s ;• F ⊆ Q s is the set <str<strong>on</strong>g>of</str<strong>on</strong>g> final states, it is also the set <str<strong>on</strong>g>of</str<strong>on</strong>g> leaf states;• B ⊆ Q s is the set <str<strong>on</strong>g>of</str<strong>on</strong>g> branching states;• ∀ q s ∈Q s , we call q s an NFA state <str<strong>on</strong>g>of</str<strong>on</strong>g> A, LP(q s ) and LB(q s ) are two index tables <str<strong>on</strong>g>of</str<strong>on</strong>g>q s (see definiti<strong>on</strong> 5); S s is the stack for state transiti<strong>on</strong>, the stack frame <str<strong>on</strong>g>of</str<strong>on</strong>g> S s is asubset <str<strong>on</strong>g>of</str<strong>on</strong>g> Q s .


826 M. Fu and Y. Zhang3.2 Running an XTHC MachineIn order to resolve the homomorphous relati<strong>on</strong>ship using an XTHC machine, a depthfirsttraverse <strong>on</strong> the input <str<strong>on</strong>g>XPath</str<strong>on</strong>g> tree is required to generate SAX events. These eventswill be used as input to the XTHC machine for the XTHC machine running.Four types <str<strong>on</strong>g>of</str<strong>on</strong>g> events will be generated at depth-first traverse <strong>on</strong> the input <str<strong>on</strong>g>XPath</str<strong>on</strong>g>tree p: start<str<strong>on</strong>g>XPath</str<strong>on</strong>g>Tree, startElement, endElement and end<str<strong>on</strong>g>XPath</str<strong>on</strong>g>Tree. Time <str<strong>on</strong>g>of</str<strong>on</strong>g> theseevents being generated is:1) send start<str<strong>on</strong>g>XPath</str<strong>on</strong>g>Tree event when entering root <str<strong>on</strong>g>of</str<strong>on</strong>g> p;2) send startElement event when entering n<strong>on</strong>-root node <str<strong>on</strong>g>of</str<strong>on</strong>g> p;3) send endElement event when tracing back to n<strong>on</strong>-root node <str<strong>on</strong>g>of</str<strong>on</strong>g> p;4) send end<str<strong>on</strong>g>XPath</str<strong>on</strong>g>Tree event when tracing back to root <str<strong>on</strong>g>of</str<strong>on</strong>g> p.Since a and b are not always 1 in a node nid [a,b] <str<strong>on</strong>g>of</str<strong>on</strong>g> an <str<strong>on</strong>g>XPath</str<strong>on</strong>g> tree, more than <strong>on</strong>eevents are sent at entering or tracing back to node nid [a,b] :1) the startElement event sequence sent when a=b is shown in Fig.5(a);2) the startElement event sequence sent when b= ∞ is shown in Fig.5(b).In particular, there are some restricti<strong>on</strong>s applied <strong>on</strong> a startElement(‘//’) event:1) it occurs <strong>on</strong>ly when node nid [a,b] is an alterable node;2) state transiti<strong>on</strong> driven by this event occurs <strong>on</strong>ly at state s in the extended statechaincorresp<strong>on</strong>ding to the alterable node, and there is a unique state transiti<strong>on</strong>:t forward (s, ‘//’) → s.Similarly, more than <strong>on</strong>e endElement event are sent when tracing back to nodenid [a,b] in the tree, which are shown in Fig. 5(c) and 5(d).(a)(b)(c)(d)Fig. 5. (a) The startElement(“SE” for short) event sequence <str<strong>on</strong>g>of</str<strong>on</strong>g> the fixed node nid [a,a] ; (b) ThestartElement event sequence <str<strong>on</strong>g>of</str<strong>on</strong>g> the alterable node nid [a,∞ ] ; (c) The endElement(“EE” for short)event sequence <str<strong>on</strong>g>of</str<strong>on</strong>g> the fixed node nid [a,a] ; (d) The endElement event sequence <str<strong>on</strong>g>of</str<strong>on</strong>g> the alterablenode nid [a, ∞ ]Fig. 6 shows rules <str<strong>on</strong>g>of</str<strong>on</strong>g> processing SAX events in an XTHC machine. Thehomomorphism relati<strong>on</strong>ship between tree p i in a set <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>XPath</str<strong>on</strong>g> tree P={p 1 ,p 2 ,…,p n } andan input tree q can be resolved by running the XTHC machine. When the XTHCmachine is running, ∀ p∈P, homomorphism informati<strong>on</strong> between each node v in pand nodes in the input tree q is recorded. Let v∈p, a be the label <str<strong>on</strong>g>of</str<strong>on</strong>g> node u in theinput <str<strong>on</strong>g>XPath</str<strong>on</strong>g> tree q. We define the following three operati<strong>on</strong>s to mark, deliver andreset informati<strong>on</strong> about the mapping in the <str<strong>on</strong>g>XPath</str<strong>on</strong>g> tree p:


<str<strong>on</strong>g>Homomorphism</str<strong>on</strong>g> <str<strong>on</strong>g>Resolving</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>XPath</str<strong>on</strong>g> <str<strong>on</strong>g>Trees</str<strong>on</strong>g> <str<strong>on</strong>g>Based</str<strong>on</strong>g> <strong>on</strong> Automata 8271) mark(v, u): when the XTHC machine is running at a leaf state q s (q s ∈ F),∀ v∈ LP(q s ), mark <strong>on</strong> v the informati<strong>on</strong> about the mapping from the leafnode v to the node u in the input <str<strong>on</strong>g>XPath</str<strong>on</strong>g> tree q;2) deliver(v): when the machine traces back to a key state q s (q s ∈ F ∪ B),∀ v∈ LB(q s ) ∪ LP(q s ), if informati<strong>on</strong> about the mapping was marked <strong>on</strong>node v, deliver the mapping informati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> v to the nearest ancestor key nodeto v in the <str<strong>on</strong>g>XPath</str<strong>on</strong>g> tree;3) reset(v): when the machine traces back to a key state q s (q s ∈ F ∪ B),∀ v∈LB(q s ) ∪ LP(q s ), reset the mapping informati<strong>on</strong> <strong>on</strong> node v.start<str<strong>on</strong>g>XPath</str<strong>on</strong>g>Tree()push(Ss, {q s other initializati<strong>on</strong>0});startElement(a)q s set ={}; // current NFA state setu = getCurrentInputNode( );for each q s in peek(Ss)merge t forward(q s , a) into q s setpush(Ss, q s set);for each q s in q s setif (q s ∈ F)for each in LP( s )endElement(a)q s set = pop(Ss);for each q s in q s setif (q s ∈ B or q s ∈ F){for each v in LB(q s ) or LP(q s )if exsit mapping <str<strong>on</strong>g>of</str<strong>on</strong>g> v{deliver(v);reset(v);}}end<str<strong>on</strong>g>XPath</str<strong>on</strong>g>Tree( )Fig. 6. The processing rules <str<strong>on</strong>g>of</str<strong>on</strong>g> SAX events in XTHCThe time complexity <str<strong>on</strong>g>of</str<strong>on</strong>g> the algorithm resolving homomorphism from <strong>on</strong>e <str<strong>on</strong>g>XPath</str<strong>on</strong>g>tree p to another <str<strong>on</strong>g>XPath</str<strong>on</strong>g> tree q is O(|p||q| 2 )[2]. Therefore, the time complexity fromeach tree p in a set <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>XPath</str<strong>on</strong>g> tree P={p 1 ,p 2 ,…,p n } to q is O(n|p||q| 2 ) without usingprefix-sharing automata. However, if prefix-sharing automata is used, the timecomplexity is O(m|q| 2 ), where m is the number <str<strong>on</strong>g>of</str<strong>on</strong>g> states in NFA. When <str<strong>on</strong>g>XPath</str<strong>on</strong>g> trees inP have comm<strong>on</strong> branches and prefixes, n|p| is much greater than m, therefore, it ismuch more efficient to resolve homomorphism from multi-<str<strong>on</strong>g>XPath</str<strong>on</strong>g> trees to <strong>on</strong>e single<str<strong>on</strong>g>XPath</str<strong>on</strong>g> tree using prefix-sharing automata.4 ExperimentsAn algorithm resolving homomorphism based <strong>on</strong> the XTHC machine (XHO) wasimplemented using Java. The experimental platform is <strong>on</strong> Windows XP operati<strong>on</strong>system, Pentium 4 CPU, with frequency <str<strong>on</strong>g>of</str<strong>on</strong>g> 1.6GHz and memory <str<strong>on</strong>g>of</str<strong>on</strong>g> 512MB. Wecompared several algorithms: the homomorphism algorithm (HO)[2], the completealgorithm in a c<strong>on</strong><strong>on</strong>ical model (CM), branch homomorphism algorithm(BHO)[4],and the proposed XHO algorithm. We checked the scope <str<strong>on</strong>g>of</str<strong>on</strong>g> each algorithm atresolving c<strong>on</strong>tainment <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>XPath</str<strong>on</strong>g> expressi<strong>on</strong>s (see table 1, where T/F represents pc<strong>on</strong>taining/not c<strong>on</strong>taining q), and the time complexity <str<strong>on</strong>g>of</str<strong>on</strong>g> these algorithms(see Fig. 7).This experiment shows XHO is as capable as existing homomorphism algorithms.Furthermore, XHO supports c<strong>on</strong>tainment calculati<strong>on</strong> from multi-<str<strong>on</strong>g>XPath</str<strong>on</strong>g> expressi<strong>on</strong>s to<strong>on</strong>e single <str<strong>on</strong>g>XPath</str<strong>on</strong>g> expressi<strong>on</strong>. Although BHO also supports such calculati<strong>on</strong>, it may


ndecmeTiinnnRu828 M. Fu and Y. Zhanggive incorrect results in some cases as shown in Table 1. BHO gives a result that israther different from the correct result CM gives. Compared to BHO, XHO givessmaller discrepancy between c<strong>on</strong>tainment and homomorphism.Table 1. Some pairs <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>XPath</str<strong>on</strong>g> trees for experiments and c<strong>on</strong>tainment resultsNo p q HO BHO XHO CMno.1 /a//*[.//c]//d /a//b[c]//d T T T Tno.2 /a/*/*/c /a/b[c]/e/c T T T Tno.3 /a//b[*//c]/b/c /a//b[*//c]/b[b/c]//c T T T Tno.4 /a//*/b /a/*//b T F T Tno.5 /a/*[.//b]//c /a//*/b/c F F T Tno.6 /a[a//b[c/*//d]/b/c/d] /a[a//b[c/*//d]/b[c//d]/b/c/d] F F F Tno.7 /a/*/*/*/c /a//*/b//b/c F F F Fno.8 /a//b[c]/d /a/b[.//c]//d F F F F)18000<strong>on</strong>o.116000s14000no.2or12000no.3cim10000no.4(8000no.56000no.6g4000no.7no.820000HO BHO XHO CM<str<strong>on</strong>g>Homomorphism</str<strong>on</strong>g> Algorithms5 C<strong>on</strong>clusi<strong>on</strong>Fig. 7. The experimental results for some homomorphism algorithmsThis paper c<strong>on</strong>siders an algorithm to resolve c<strong>on</strong>tainment between multi-<str<strong>on</strong>g>XPath</str<strong>on</strong>g>expressi<strong>on</strong>s and <strong>on</strong>e single <str<strong>on</strong>g>XPath</str<strong>on</strong>g> expressi<strong>on</strong> through homomorphism. While highefficiency is kept at calculating multi-c<strong>on</strong>tainment relati<strong>on</strong>ships, we also c<strong>on</strong>siderdiscrepancy between c<strong>on</strong>tainment and homomorphism. The algorithm works correctly<strong>on</strong> calculating c<strong>on</strong>tainment <str<strong>on</strong>g>of</str<strong>on</strong>g> a special type <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>XPath</str<strong>on</strong>g> expressi<strong>on</strong>s. Experimentsshowed that our algorithm is more complete than c<strong>on</strong>venti<strong>on</strong>al homomorphismalgorithms. Future research can be d<strong>on</strong>e <strong>on</strong> how to resolve homomorphism between<strong>on</strong>e <str<strong>on</strong>g>XPath</str<strong>on</strong>g> tree and multi-<str<strong>on</strong>g>XPath</str<strong>on</strong>g> trees simultaneously.References[1] World Wide Web C<strong>on</strong>sortium, XML Path Language (<str<strong>on</strong>g>XPath</str<strong>on</strong>g>) Versi<strong>on</strong> 1.0,http://www.w3.org/TR/xpath, W3C Recommendati<strong>on</strong>, November 1999.[2] G. Miklau and D. Suciu. C<strong>on</strong>tainment and equivalence for a fragment <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>XPath</str<strong>on</strong>g>. Journal <str<strong>on</strong>g>of</str<strong>on</strong>g>the ACM, 51(1):2-45, 2004.[3] Yuguo Liao, Jianhua Feng, Y<strong>on</strong>g Zhang and Lizhu Zhou. Hidden c<strong>on</strong>diti<strong>on</strong>edhomomorphism for <str<strong>on</strong>g>XPath</str<strong>on</strong>g> fragment c<strong>on</strong>tainment. In DASFAA 2006, LNCS 3882, 454-467,2006.[4] Sanghyun Yoo, Jin Hyun S<strong>on</strong> and Myoung Ho Kim. Maintaining homomorphisminformati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>XPath</str<strong>on</strong>g> patterns. IASTED-DBA2005, 2005, 192-197.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!