12.07.2015 Views

PDF slides - Inria

PDF slides - Inria

PDF slides - Inria

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Optimization techniques for XML query processingIoana Manolescu-GoujotGemo/IASI groupINRIA Saclay–Île-de-France and LRI, Université de Paris Sud-11October 30, 2009I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 1 / 64


Outline of this talkPart IMaterialized views for XQueryPart IIXML data management on DHTsClosingRelated worksContext of our workPerspectivesI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 3 / 64


Part IMaterialized views for XQueryI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 4 / 64


Materialized views for XQueryI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 5 / 64


Materialized views for XQuery: outline1 Motivation2 The XAM languageXPath{/,//,∗,[ ]}NestingOptionalityIDsSemantics3 From XQuery to XAMs4 XAM query rewritingI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 6 / 64


MotivationHistorical contextI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 7 / 64


MotivationHistorical context1998 XML standard is out, gaining tractionI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 7 / 64


MotivationHistorical context1998 XML standard is out, gaining traction1998 W3C holds XML QL workshopI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 7 / 64


MotivationHistorical context1998 XML standard is out, gaining traction1998 W3C holds XML QL workshop2000 Intelligent XML node identifiers for persistent storageI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 7 / 64


MotivationHistorical context1998 XML standard is out, gaining traction1998 W3C holds XML QL workshop2000 Intelligent XML node identifiers for persistent storage2001 W3C work on XQuery under way...I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 7 / 64


MotivationHistorical context1998 XML standard is out, gaining traction1998 W3C holds XML QL workshop2000 Intelligent XML node identifiers for persistent storage2001 W3C work on XQuery under way... then come XPath, XSchema,XSLI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 7 / 64


MotivationHistorical context1998 XML standard is out, gaining traction1998 W3C holds XML QL workshop2000 Intelligent XML node identifiers for persistent storage2001 W3C work on XQuery under way... then come XPath, XSchema,XSL2001 Shredding-based implementations in RDBMSs. Query translationdependent on shreddingI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 7 / 64


MotivationHistorical context1998 XML standard is out, gaining traction1998 W3C holds XML QL workshop2000 Intelligent XML node identifiers for persistent storage2001 W3C work on XQuery under way... then come XPath, XSchema,XSL2001 Shredding-based implementations in RDBMSs. Query translationdependent on shredding2002 Main-memory tree-based implementations (Galax, Qizx, Saxon)I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 7 / 64


MotivationHistorical context1998 XML standard is out, gaining traction1998 W3C holds XML QL workshop2000 Intelligent XML node identifiers for persistent storage2001 W3C work on XQuery under way... then come XPath, XSchema,XSL2001 Shredding-based implementations in RDBMSs. Query translationdependent on shredding2002 Main-memory tree-based implementations (Galax, Qizx, Saxon)2003 SQL/XML: built-in type xml, XML import and outport, some XPathI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 7 / 64


MotivationHistorical context1998 XML standard is out, gaining traction1998 W3C holds XML QL workshop2000 Intelligent XML node identifiers for persistent storage2001 W3C work on XQuery under way... then come XPath, XSchema,XSL2001 Shredding-based implementations in RDBMSs. Query translationdependent on shredding2002 Main-memory tree-based implementations (Galax, Qizx, Saxon)2003 SQL/XML: built-in type xml, XML import and outport, some XPath2004 XQuery translated to relational algebra for column-basedin-memory store (MonetDB/XQuery), general RDBMSs(PathFinder)I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 7 / 64


MotivationHistorical context1998 XML standard is out, gaining traction1998 W3C holds XML QL workshop2000 Intelligent XML node identifiers for persistent storage2001 W3C work on XQuery under way... then come XPath, XSchema,XSL2001 Shredding-based implementations in RDBMSs. Query translationdependent on shredding2002 Main-memory tree-based implementations (Galax, Qizx, Saxon)2003 SQL/XML: built-in type xml, XML import and outport, some XPath2004 XQuery translated to relational algebra for column-basedin-memory store (MonetDB/XQuery), general RDBMSs(PathFinder)No generic framework for access path selectionI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 7 / 64


MotivationContributions1 Language for XML materialized views: XAMs [ABM05]Describe storage structures, materialized views, indicesFeatures from XQuery and modern storesAccess path selection relies on view-based query rewriting2 Access path selection for an XQuery subsetAlgebraic decomposition [ABM + 06]XQuery = query XAMs + join/restructuringAlgorithms for rewriting XAM queries using XAM viewsUnder Dataguide constraints [ABMP07]In the general case [MZ09]3 Prototype implementations in ULoad [ABMV05] and ViP2P [MZ09]I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 8 / 64


MotivationAccess path selection for XQueryXMLizeXQueryto XAMs(Algebraictranslation)XML resultExecutionengineq1q2q3rew(q1)rew(q1)rew(q1)QueryplanXQueryqueryRewritingRewritingRewritingrew(q1)rew(q1)rew(q1)rew(q2)Cost−basedoptimizerv1v2 v3 v4rew(q1)rew(q1)rew(q3)I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 9 / 64


MotivationXAMs: materialized views for XQueryXMLizeXQueryto XAMs(Algebraictranslation)XML resultExecutionengineq1q2q3rew(q1)rew(q1)rew(q1)QueryplanXQueryqueryRewritingRewritingRewritingrew(q1)rew(q1)rew(q1)rew(q2)Cost−basedoptimizerv1v2 v3 v4rew(q1)rew(q1)rew(q3)I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 10 / 64


{/,//,∗,[ ]}The XAM language XPathXML Access Modules (XAMs)abfbcbddesomedtextI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 11 / 64


{/,//,∗,[ ]}The XAM language XPathXML Access Modules (XAMs)bcafbbd⊤adedbfsometextI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 11 / 64


{/,//,∗,[ ]}The XAM language XPathXML Access Modules (XAMs)bcafbbd⊤adesomedtextb fa[//f]b〈b〉〈c〉〈d/〉〈e〉some〈/e〉〈/c〉〈/b〉I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 11 / 64


{/,//,∗,[ ]}The XAM language XPathXML Access Modules (XAMs)bcafbbd⊤adesomedtextb fa[//f]b〈b〉〈c〉〈d/〉〈e〉some〈/e〉〈/c〉〈/b〉〈b〉〈b〉〈d〉text〈/d〉〈/b〉〈d/〉〈/b〉I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 11 / 64


{/,//,∗,[ ]}The XAM language XPathXML Access Modules (XAMs)a⊤bfbacbdbfdesomedtextfor $x in /a, $y in $x/b,$z in $x//freturn $y, $zI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 12 / 64


{/,//,∗,[ ]}The XAM language XPathXML Access Modules (XAMs)a⊤bfbacbdbfdesomedtextfor $x in /a, $y in $x/b,$z in $x//freturn $y, $zb cont〈b〉〈c〉〈d/〉〈e〉some〈/e〉〈/c〉〈/b〉f cont〈f/〉I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 12 / 64


{/,//,∗,[ ]}The XAM language XPathXML Access Modules (XAMs)a⊤bfbacbdbfdesomedtextfor $x in /a, $y in $x/b,$z in $x//freturn $y, $zb cont〈b〉〈c〉〈d/〉〈e〉some〈/e〉〈/c〉〈/b〉〈b〉〈b〉〈d〉text〈/d〉〈/b〉〈d/〉〈/b〉f cont〈f/〉〈f/〉I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 12 / 64


{/,//,∗,[ ]}The XAM language XPathXML Access Modules (XAMs)a⊤bfbacbdb contf contdesomedtextfor $x in /a, $y in $x/b,$z in $x//freturn $y, $zb cont〈b〉〈c〉〈d/〉〈e〉some〈/e〉〈/c〉〈/b〉〈b〉〈b〉〈d〉text〈/d〉〈/b〉〈d/〉〈/b〉f cont〈f/〉〈f/〉I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 13 / 64


{/,//,∗,[ ]}The XAM language XPathXML Access Modules (XAMs)a⊤bfbacbdb valf contdesomedtextfor $x in /a, $y in $x/b,$z in $x//freturn $y/text(), $zb valsometextf cont〈f/〉〈f/〉I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 14 / 64


{/,//,∗,[ ]}The XAM language XPathXML Access Modules (XAMs)a⊤bfba valcbdd contdesomedtextfor $x in //b,$z in $x//dreturn $x/text(), $ya valsome textsome textsome textd cont〈d/〉〈d〉text〈/d〉〈d/〉I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 15 / 64


The XAM languageNestingXML Access Modules (XAMs)bafb⊤dcebdda valnsometextd conta valsome textd cont〈d/〉〈d〉text〈/d〉〈d/〉I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 16 / 64


The XAM languageOptionalityXML Access Modules (XAMs)a⊤bcfbbda valndesomedtextforreturnd cont$x in //a$x/text(), 〈y〉{$x//d}〈/y〉a valsome textd cont〈d/〉〈d〉text〈/d〉〈d/〉I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 17 / 64


The XAM languageIDsXML Access Modules (XAMs)a 1,12b 2,5f 7,6 b⊤8,10c 3,4 b 9,9 d 12,11a id,valnd 4,1 e 5,3 d 10,8some 6,2 text 11,7d id,conta ida val1,12 some textd idd cont4,1 〈d/〉10,8 〈d〉text〈/d〉12,11 〈d/〉I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 18 / 64


The XAM languageSemanticsFormal XAM semanticsNested relational tables including ⊥Two equivalent specifications:Natural semanticsBased on tree embeddingsUseful for reasoning about containmentAlgebraic semanticsBased on a canonical database and structural (outer-,) (nested ) joinsUseful for pattern extraction from XQueryI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 19 / 64


From XQuery to XAMsAccess path selection for XQueryXMLizeXQueryto XAMs(Algebraictranslation)XML resultExecutionengineq1q2q3rew(q1)rew(q1)rew(q1)QueryplanXQueryqueryRewritingRewritingRewritingrew(q1)rew(q1)rew(q1)rew(q2)Cost−basedoptimizerv1v2 v3 v4rew(q1)rew(q1)rew(q3)I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 20 / 64


From XQuery to XAMsFrom XQuery to XAMsAlgebraic approach based on nested relationalmodel [MP05, ABM + 06]1 XQuery dialect: nested for-where-return blocks2 Syntax-driven translation to a nested relational algebra expression3 Algebraic operation reordering leads to identifyingsub-expressions corresponding to XAMsI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 21 / 64


XAM query rewritingAccess path selection for XQueryXMLizeXQueryto XAMs(Algebraictranslation)XML resultExecutionengineq1q2q3rew(q1)rew(q1)rew(q1)QueryplanXQueryqueryRewritingRewritingRewritingrew(q1)rew(q1)rew(q1)rew(q2)Cost−basedoptimizerv1v2 v3 v4rew(q1)rew(q1)rew(q3)I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 22 / 64


XAM query rewritingAlgebraic rewriting & operatorsLet q be a query XAM and V = {v 1 , v 2 , . . . , v k } a set of views XAMs.A rewriting of q using V is an algebraic expressione(v 1 , v 2 , . . . , v k )such that e(d) = q(d) for any document dI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 23 / 64


XAM query rewritingAlgebraic rewriting & operatorsLet q be a query XAM and V = {v 1 , v 2 , . . . , v k } a set of views XAMs.A rewriting of q using V is an algebraic expressione(v 1 , v 2 , . . . , v k )such that e(d) = q(d) for any document dAlgebra operatorsscan(v) ×π cols (op)π o (op)sort cols (op)σ cond (op)nav i,np (op) evaluates np over the cont attribute op.iI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 23 / 64


XAM query rewritingThe nav operatora 1,12v 1⊤b 2,5 f 7,6 b 8,10b contc 3,4 b 9,9 d 12,11bd 4,1 e 5,3 dcont10,8〈b〉〈c〉〈d/〉〈e〉some〈/e〉〈/c〉〈/b〉some 6,2 text 〈b〉〈b〉〈d〉text〈/d〉〈/b〉〈d/〉〈/b〉11,7nav b.cont,//ccontv 1b cont c cont〈b〉〈c〉〈d/〉〈e〉some〈/e〉〈/c〉〈/b〉 〈c〉〈d/〉〈e〉some〈/e〉〈/c〉I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 24 / 64


XAM query rewritingRewriting exampleqac id,contbe vald [val=5]v 1 v 2 v 3a ida idb id,contc id,contd valI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 25 / 64


XAM query rewritingRewriting exampleqac id,contbe vald [val=5]nav b.cont,e.valv 1v 1 v 2 v 3a ida idb id,contc id,contd valI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 25 / 64


XAM query rewritingRewriting exampleqc id,contbae vald [val=5]σ b.id≺c.id⊲⊳ a.id≺b.idv 1 v 2 v 3nav b.cont,e.valv 1v 2a ida idb id,contc id,contd valI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 25 / 64


XAM query rewritingRewriting exampleqc id,contab d [val=5]e valv 1 v 2 v 3⊲⊳ a.id=a.idσ b.id≺c.id⊲⊳ a.id≺b.id v 3nav b.cont,e.valv 2σ val=5a ida idv 1b id,contc id,contd valI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 25 / 64


XAM query rewritingRewriting exampleqc id,contab d [val=5]e valv 1 v 2 v 3π c.cont,e.val⊲⊳ a.id=a.idσ b.id≺c.id⊲⊳ a.id≺b.id v 3nav b.cont,e.valv 2σ val=5a ida idv 1b id,contc id,contd valI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 25 / 64


XAM query rewritingRewriting algorithmsBased on subset enumerationTest if a rewriting can be built out of a view subsetRecalls bucket algorithm or [TYÖ + 08]Test minimality at the endExponential complexity, polynomial problemI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 26 / 64


XAM query rewritingRewriting algorithmsBased on subset enumerationTest if a rewriting can be built out of a view subsetRecalls bucket algorithm or [TYÖ + 08]Test minimality at the endExponential complexity, polynomial problemBottom-up algorithmsUse smaller partial rewritings to build bigger onesDynamic Programming Rewriting)Greedy based on the biggest query coverageReuse of earlier information ⇒ more efficientI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 26 / 64


XAM query rewritingArchitecture(s)StorePlain relational (no nesting) in PostgresOur own built on BerkeleyDBRelational + SQL/XML on Oracle 12Outside the storeXQuery analysis and XAM extractionXAM rewritingULoad: full XAMs, under Dataguide constraints [ABMP07]ViP2P: conjunctive XAMs, no constraints [MZ09]OptimizerExecution engineI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 27 / 64


Part IIXML data management on DHTsI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 28 / 64


XML data management on DHTsI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 29 / 64


XML data management on DHTs5 Motivation6 KadoP: XML indexing on DHTsIndexing and query processingScaling up7 ViP2P: materialized views on DHTsView materializationView indexingI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 30 / 64


MotivationDistributed data managementOld goal (1970)I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 31 / 64


MotivationDistributed data managementOld goal (1970)distributed versions of industrial-strength DBMSsmassively parallel with map/reduceI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 31 / 64


MotivationDistributed data managementOld goal (1970)distributed versions of industrial-strength DBMSsmassively parallel with map/reduceStill missing: the flexible federationhigh independence of the sites: when to be in, what to storedata distribution transparency. . . with the usual performance requirementsI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 31 / 64


MotivationMotivation: distributed warehouses of Web contentWeb contentstructured documents, schemas, annotations, concepts, mappings,Web services, inter-document linksDistributed Web content warehouse operationspublish resourcesconnect (annotate, map, link...) existing resourcesupdate resourcesenhance resources by combining themIn the style of the RNTL WebContent project (2005-2009)I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 32 / 64


MotivationDistributed hash tablesp 1p 2p 3p 4p 5p 8p 7p 6I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 33 / 64


MotivationDistributed hash tablesp 1p 2p 3p 4p 5put(k 1 , v 1 )p 8p 7p 6I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 33 / 64


MotivationDistributed hash tables(k 1 , v 1 )p 1p 2p 3p 4p 5put(k 1 , v 1 )p 8p 7p 6I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 33 / 64


MotivationDistributed hash tables(k 1 , v 1 )p 1p 2p 3p 4p 5put(k 1 , v 1 )p 8p 7p 6I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 33 / 64


MotivationDistributed hash tables(k 1 , {v 1 , v 2 })p 1p 2p 3p 4p 5p 8p 7put(k 1 , v 1 ) put(k 1 , v 2 )p 6I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 33 / 64


MotivationDistributed hash tables(k 1 , {v 1 , v 2 })get(k 1 )p 1p 2p 3p 4p 5p 8p 7put(k 1 , v 1 ) put(k 1 , v 2 )p 6I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 33 / 64


MotivationDistributed hash tables(k 1 , {v 1 , v 2 })get(k 1 )p 1p 2p 3p 4p 5p 8p 7put(k 1 , v 1 ) put(k 1 , v 2 )p 6I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 33 / 64


MotivationDistributed hash tables(k 1 , {v 1 , v 2 })get(k 1 ){v 1 , v 2 }p 1p 2p 3p 4p 5p 8p 7put(k 1 , v 1 ) put(k 1 , v 2 )p 6I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 33 / 64


MotivationFrom DHTs to distributed data managementDHTs provide:logical network maintenanceefficient message routingshared (key, value) repositoryI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 34 / 64


MotivationFrom DHTs to distributed data managementDHTs provide:logical network maintenanceefficient message routingshared (key, value) repositoryStil need:data indexing algorithmsI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 34 / 64


MotivationFrom DHTs to distributed data managementDHTs provide:logical network maintenanceefficient message routingshared (key, value) repositoryStil need:data indexing algorithmsstorage for application data and even DHT index dataI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 34 / 64


MotivationFrom DHTs to distributed data managementDHTs provide:logical network maintenanceefficient message routingshared (key, value) repositoryStil need:data indexing algorithmsstorage for application data and even DHT index datalocal query processingI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 34 / 64


MotivationFrom DHTs to distributed data managementDHTs provide:logical network maintenanceefficient message routingshared (key, value) repositoryStil need:data indexing algorithmsstorage for application data and even DHT index datalocal query processingdistributed query processing: operators, including data transfers,optimization . . .I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 34 / 64


MotivationDHT index queriesThe part of a user query that can be answered directly by consultingthe DHT content indexTypically less precise than the user queryFind the IDs of documents matching the queryFind the IDs of documents which may match the queryI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 35 / 64


MotivationDHT index queriesThe part of a user query that can be answered directly by consultingthe DHT content indexTypically less precise than the user queryFind the IDs of documents matching the queryFind the IDs of documents which may match the queryMany trade-offsI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 35 / 64


MotivationTrade-offs in DHT indexing and query processingLevel of detail of the indexing algorithm:index query precision ↗ ⇒ execution time ↘data publication time ↗, possibly execution time ↗Data re-placement or clustering:fewer peers contacted for a query (message no. ↘, executiontime ?)data transfers in the absence of queries (message no. ↗, totalmessage size ↗)I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 36 / 64


MotivationBuilding XML stores on DHTsPeers retain control over the data they store/publishno global schemadocuments published independentlyannotations, triples, links can freely connect contentpeers collaborate for storing the indexload balancingI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 37 / 64


MotivationBuilding XML stores on DHTsSystemsPeers retain control over the data they store/publishno global schemadocuments published independentlyannotations, triples, links can freely connect contentpeers collaborate for storing the indexload balancingXML indexing: KadoP [AMP05, AMP + 08]I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 37 / 64


MotivationBuilding XML stores on DHTsSystemsPeers retain control over the data they store/publishno global schemadocuments published independentlyannotations, triples, links can freely connect contentpeers collaborate for storing the indexload balancingXML indexing: KadoP [AMP05, AMP + 08]XML materialized views in P2P networks: ViP2P [MZ09]I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 37 / 64


KadoP: XML indexing on DHTsIndexing and query processingKadoP: DHT-based XML indexingp 1p 2p 3p 4p 5p 8p 7p 6I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 38 / 64


KadoP: XML indexing on DHTsIndexing and query processingKadoP: DHT-based XML indexingp 1p 2p 3p 4p 5doc 1 .xml〈article〉〈title〉XML〈/title〉〈/article〉p 8p 7p 6I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 38 / 64


KadoP: XML indexing on DHTsIndexing and query processingKadoP: DHT-based XML indexing(article,(doc 1 ,1,3))(title,(doc 1 ,2,2))p 1p 2p 3p 4p 5doc 1 .xml〈article〉〈title〉XML〈/title〉〈/article〉p 8p 7(’XML’,(doc 1 ,3,1))p 6I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 38 / 64


KadoP: XML indexing on DHTsIndexing and query processingKadoP: DHT-based XML indexing(article,(doc 1 ,1,3))(title,(doc 1 ,2,2))p 1p 2p 3p 4p 5doc 1 .xml〈article〉〈title〉XML〈/title〉〈/article〉p 8p 7(’XML’,(doc 1 ,3,1))p 6I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 38 / 64


KadoP: XML indexing on DHTsIndexing and query processingKadoP: DHT-based XML indexingdoc 2 .xml〈article〉〈title〉Web〈/title〉〈/article〉(article,(doc 1 ,1,3))(title,(doc 1 ,2,2))p 1p 2p 3p 4p 5doc 1 .xml〈article〉〈title〉XML〈/title〉〈/article〉p 8p 7(’XML’,(doc 1 ,3,1))p 6I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 38 / 64


KadoP: XML indexing on DHTsIndexing and query processingKadoP: DHT-based XML indexingdoc 2 .xml〈article〉〈title〉Web〈/title〉〈/article〉(article,(doc 1 ,1,3))(article,(doc 2 ,1,3))(title,(doc 1 ,2,2))(title,(doc 2 ,2,2))p 1p 2p 3p 4p 5doc 1 .xmlp 8p 6〈article〉p 7〈title〉XML〈/title〉(’Web’,(doc 2 ,3,1))〈/article〉 (’XML’,(doc 1 ,3,1))I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 38 / 64


KadoP: XML indexing on DHTsIndexing and query processingKadoP: DHT-based XML indexingdoc 2 .xml〈article〉〈title〉Web〈/title〉〈/article〉(article,(doc 1 ,1,3))(article,(doc 2 ,1,3))(title,(doc 1 ,2,2))(title,(doc 2 ,2,2))p 1p 2p 3p 4p 5doc 1 .xml〈article〉〈title〉XML〈/title〉〈/article〉p 8p 7p 6(’Web’,(doc 2 ,3,1))(’XML’,(doc 1 ,3,1))I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 39 / 64


KadoP: XML indexing on DHTsIndexing and query processingKadoP: DHT-based XML indexingdoc 2 .xml〈article〉〈title〉Web〈/title〉〈/article〉//article[cont(.,’XML’)]//title(article,(doc 1 ,1,3))(article,(doc 2 ,1,3))(title,(doc 1 ,2,2))(title,(doc 2 ,2,2))p 1p 2p 3p 4p 5doc 1 .xml〈article〉〈title〉XML〈/title〉〈/article〉p 8p 7p 6(’Web’,(doc 2 ,3,1))(’XML’,(doc 1 ,3,1))I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 39 / 64


KadoP: XML indexing on DHTsIndexing and query processingKadoP: DHT-based XML indexingdoc 2 .xml〈article〉〈title〉Web〈/title〉〈/article〉//article[cont(.,’XML’)]//title(article,(doc 1 ,1,3))(article,(doc 2 ,1,3))(title,(doc 1 ,2,2))(title,(doc 2 ,2,2))p 3p 2 p 4p 1 ⊲⊳p 5doc 1 .xml〈article〉〈title〉XML〈/title〉〈/article〉p 8p 7p 6(’Web’,(doc 2 ,3,1))(’XML’,(doc 1 ,3,1))I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 39 / 64


KadoP: XML indexing on DHTsIndexing and query processingKadoP: DHT-based XML indexingdoc 2 .xml〈article〉〈title〉Web〈/title〉〈/article〉//article[cont(.,’XML’)]//title(doc 1 ,2,2)(article,(doc 1 ,1,3))(article,(doc 2 ,1,3))(title,(doc 1 ,2,2))(title,(doc 2 ,2,2))p 3p 2 p 4p 1 ⊲⊳p 5doc 1 .xml〈article〉〈title〉XML〈/title〉〈/article〉p 8p 7p 6(’Web’,(doc 2 ,3,1))(’XML’,(doc 1 ,3,1))I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 39 / 64


KadoP: XML indexing on DHTsIndexing and query processingKadoP: DHT-based XML indexingdoc 2 .xml〈article〉〈title〉Web〈/title〉〈/article〉//article[cont(.,’XML’)]//title(doc 1 ,2,2)(article,(doc 1 ,1,3))(article,(doc 2 ,1,3))(title,(doc 1 ,2,2))(title,(doc 2 ,2,2))p 3p 2 p 4p 1 ⊲⊳p 5doc 1 .xml〈article〉〈title〉XML〈/title〉〈/article〉p 8p 7p 6(’Web’,(doc 2 ,3,1))(’XML’,(doc 1 ,3,1))I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 39 / 64


KadoP: XML indexing on DHTsIndexing and query processingKadoP: DHT-based XML indexingdoc 2 .xml〈article〉〈title〉Web〈/title〉〈/article〉//article[cont(.,’XML’)]//title(doc 1 ,2,2)〈title〉XML〈/title〉(article,(doc 1 ,1,3))(article,(doc 2 ,1,3))(title,(doc 1 ,2,2))(title,(doc 2 ,2,2))p 3p 2 p 4p 1 ⊲⊳p 5doc 1 .xml〈article〉〈title〉XML〈/title〉〈/article〉p 8p 7p 6(’Web’,(doc 2 ,3,1))(’XML’,(doc 1 ,3,1))I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 39 / 64


KadoP: XML indexing on DHTsScaling upScaling up KadoPEngineering issues:1 DHT values were too large for efficient storage ⇒ new store2 Blocking get operation ⇒ pipelined getScalability: longest posting list in a querylong posting list = frequent term [LHSH04]distributed B-tree organization ⇒ parallelized posting list transfersAlso Bloom filters to reduce transfersI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 40 / 64


KadoP: XML indexing on DHTsScaling upKadoP indexing experiments on Grid5KI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 41 / 64


KadoP: XML indexing on DHTsScaling upKadoP querying experiments on Grid5KI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 42 / 64


KadoP: XML indexing on DHTsScaling upLessons learned with KadoPPerformant message routing (redundant fingers)I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 43 / 64


KadoP: XML indexing on DHTsScaling upLessons learned with KadoPPerformant message routing (redundant fingers)Simulation ≠ deploymentI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 43 / 64


KadoP: XML indexing on DHTsScaling upLessons learned with KadoPPerformant message routing (redundant fingers)Simulation ≠ deployment(Some) DHTs were not built for intensive, detailed indexing.This somehow improved with time.I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 43 / 64


KadoP: XML indexing on DHTsScaling upLessons learned with KadoPPerformant message routing (redundant fingers)Simulation ≠ deployment(Some) DHTs were not built for intensive, detailed indexing.This somehow improved with time.Indexing takes time (orders of magnitude wrt first try)I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 43 / 64


KadoP: XML indexing on DHTsScaling upLessons learned with KadoPPerformant message routing (redundant fingers)Simulation ≠ deployment(Some) DHTs were not built for intensive, detailed indexing.This somehow improved with time.Indexing takes time (orders of magnitude wrt first try)Parallelism a big plusI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 43 / 64


ViP2P: materialized views on DHTs5 Motivation6 KadoP: XML indexing on DHTsIndexing and query processingScaling up7 ViP2P: materialized views on DHTsView materializationView indexingI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 44 / 64


ViP2P: materialized views on DHTsViP2P: views in peer-to-peerp 1p 2p 3p 4p 5p 8p 7p 6I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 45 / 64


ViP2P: materialized views on DHTsViP2P: views in peer-to-peerp 1p 2p 3p 4p 5The peers may store:documentsp 8p 7p 6I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 45 / 64


ViP2P: materialized views on DHTsViP2P: views in peer-to-peerp 1p 2p 3p 4p 5The peers may store:documentsviewsp 8p 7p 6I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 45 / 64


ViP2P: materialized views on DHTsViP2P: views in peer-to-peerWhen q arrives:p 1p 2p 3p 4p 5qp 8p 7p 6I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 45 / 64


ViP2P: materialized views on DHTsViP2P: views in peer-to-peerp 1p 2p 3p 4p 5When q arrives:view definitionlookupqp 8p 7p 6I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 45 / 64


ViP2P: materialized views on DHTsViP2P: views in peer-to-peerp 1p 2p 3p 4p 5When q arrives:view definitionlookuprewritingqp 8p 7p 6I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 45 / 64


ViP2P: materialized views on DHTsViP2P: views in peer-to-peerp 1p 2p 3p 4p 5qp 8p 6p 7When q arrives:view definitionlookuprewritingexecution ofphysical planI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 45 / 64


ViP2P: materialized views on DHTsViP2P: views in peer-to-peerWhen d arrives:p 3p 2 p 4p 1 dp 5p 8p 7p 6I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 45 / 64


ViP2P: materialized views on DHTsViP2P: views in peer-to-peerp 3p 2 p 4p 1 dp 5When d arrives:search viewdefinitions for whichv i (d) ≠ ∅p 8p 7p 6I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 45 / 64


ViP2P: materialized views on DHTsViP2P: views in peer-to-peerp 3p 2 p 4p 1 dp 5When d arrives:search viewdefinitions for whichv i (d) ≠ ∅compute v i (d)p 8p 7p 6I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 45 / 64


ViP2P: materialized views on DHTsViP2P: views in peer-to-peerp 3p 2 p 4p 1 dp 5p 8p 6p 7When d arrives:search viewdefinitions for whichv i (d) ≠ ∅compute v i (d)send resultsI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 45 / 64


ViP2P: materialized views on DHTsView materializationView materialization experiment1000 peers, 250 machines, 2000 documents, 500 views (70 viewscontribute to all the documents)I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 46 / 64


ViP2P: materialized views on DHTsView indexingView indexing and lookup for query rewritingab ideview to indexc iddquery to look upI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 47 / 64


ViP2P: materialized views on DHTsView indexingView indexing and lookup for query rewritingab ideview to indexc iddquery to look upLI LI & RLI RLI LPI LPI & RPI RPIindex lookup index index lookup indexkeys keys keys keys keys keysa, b a, b b, c a/b/c, a/b, a/c a/bc, d c, d a/b/d, a/d, a/e a/b/ce e a/e b/c, b/da/b/c, a/b/dI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 47 / 64


ViP2P: materialized views on DHTsView indexingView look up performanceWe used 1440 views related to but different from query qI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 48 / 64


ViP2P: materialized views on DHTsView indexingView look up performanceWe used 1440 views related to but different from query qI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 49 / 64


ViP2P: materialized views on DHTsView indexingPerformance of rewriting algorithmsI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 50 / 64


ViP2P: materialized views on DHTsView indexingQuery execution: sample planI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 51 / 64


ViP2P: materialized views on DHTsView indexingQuery executionI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 52 / 64


Part IIIClosingI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 53 / 64


Closing8 Related works9 Context of this work10 PerspectivesI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 54 / 64


Related worksRelated worksDistributed data management[ÖV99, Kos00]XPath query rewriting [BOB + 04, XO05, CDO08, TYÖ + 08]XPath: wildcard *, unionRewritings: intersection, navigations, joinsDHT-based relational data management [LHSH04, HRVM08, APV07]DHT-based XML indexing [GWJD03, BC06, SHA05, AMP + 08]DHT-based shared XML caches [LP08]Layered architecture for Web content warehousing [AAC + 08]RDF querying and reasoning on DHT[KMK08, LIK06]I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 55 / 64


Context of this workMain research grantsMDP2P (2003-2006) Massive Data Management in Peer-to-Peer(P. Valduriez, INRIA)Tralala (2004-2008) Transformations, Logics and Languages for XML(G. Gastagna, ENS)WebContent (2006-2009) Platform for Semantic Web Applications(G.de Chalendar, CEA)WebStand (2006-2009) “Young Investigator”, with B. Nguyen(U. Versailles) and D. Colazzo (U. Paris Sud-11)CODEX (2009-2012) Efficiency, Dynamicity and Composition for XML(I. Manolescu)DataRing (2009-2012) Peer-to-peer Data Mgmt. (P. Valduriez)WebDam (2009-2013) Web Data Management (S. Abiteboul)I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 56 / 64


Context of this workPhD students (co-) advisedAndrei Arion (2004-2007) XML Access Modules: Towards PhysicalData Independence in XML Databases (withV. Benzaken, U. Paris Sud-11)Nicoleta Preda (2004-2008) Efficient Web Resource Management inStructured Peer-to-Peer Networks (with S. Abiteboul,INRIA)Spyros Zoupanos (2006-2009) Efficient Peer-to-Peer DataManagement (with S. Abiteboul, INRIA)I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 57 / 64


Context of this workPhD students (co-) advisedAndrei Arion (2004-2007) XML Access Modules: Towards PhysicalData Independence in XML Databases (withV. Benzaken, U. Paris Sud-11)Nicoleta Preda (2004-2008) Efficient Web Resource Management inStructured Peer-to-Peer Networks (with S. Abiteboul,INRIA)Spyros Zoupanos (2006-2009) Efficient Peer-to-Peer DataManagement (with S. Abiteboul, INRIA)Also visiting PhD students:Gabriela Ruberg (2004) ActiveXML optimizationMelanie Weiss (2005) Declarative XML data cleaningI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 57 / 64


Context of this workOngoing PhDsWael Khemiri (2008-) Efficient Interactive Workflows for ScientificApplications (with V. Benzaken, U. Paris Sud-11 and J.-D.Fekete, INRIA Aviz)Konstantinos Karanasos (2009-) Semantic Web and XML DataManagement (with F. Goasdoué, IASI, U. Paris Sud-11)Asterios Katsifodimos (2009-) Peer-to-peer OptimizationI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 58 / 64


PerspectivesPerspectivesDistributed data managementWe have only seen the beginningWhy this time will be betterUnparalleled opportunitieshardware? cloudsstorage and data placement? key-value stores, cheap disksconnectivity? P2P infrastructures with reliabilitysyntactic interoperability? XMLsemantic interoperability? Semantic Webtables and joins on the Web? Google and Search Computing ERCIs the Web the database?I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 59 / 64


PerspectivesPerspectivesDistributed data managementWe have only seen the beginningWhy this time will be betterUnparalleled opportunitieshardware? cloudsstorage and data placement? key-value stores, cheap disksconnectivity? P2P infrastructures with reliabilitysyntactic interoperability? XMLsemantic interoperability? Semantic Webtables and joins on the Web? Google and Search Computing ERCIs the Web the database? No, just part of itI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 59 / 64


PerspectivesWays to goData access transparency for new distributed architecturesAutomatic strategies for establishing which data to store whereEfficient ways of exploiting any data from anywhere (RDF, XML,annotations...)Query rewriting, execution, optimizationHigh-level models for expressive data management applicationsThe right compromise between WfMC and Java programmingNatural and expressive manipulation paradigms (also liquidqueries...)If you build it, they will comeI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 60 / 64


PerspectivesNew INRIA/University team: LeoI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 61 / 64


PerspectivesThank you!I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 62 / 64


Perspectives[AAC + 08] Serge Abiteboul, Tristan Allard, Philippe Chatalic, GeorgesGardarin, A. Ghitescu, François Goasdoué, IoanaManolescu, Benjamin Nguyen, M. Ouazara, A. Somani,Nicolas Travers, Gabriel Vasile, and Spyros Zoupanos.Webcontent: efficient P2P warehousing of web data.PVLDB, 1(2):1428–1431, 2008.[ABM05]Andrei Arion, Véronique Benzaken, and Ioana Manolescu.XML access modules: Towards physical dataindependence in XML databases.In XIME-P, 2005.[ABM + 06] Andrei Arion, Véronique Benzaken, Ioana Manolescu,Yannis Papakonstantinou, and Ravi Vijay.Algebra-based identification of tree patterns in XQuery.In FQAS, pages 13–25, 2006.[ABMP07] Andrei Arion, Véronique Benzaken, Ioana Manolescu, andYannis Papakonstantinou.I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 62 / 64


PerspectivesStructured materialized views for XML queries.In VLDB, pages 87–98, 2007.[ABMV05] Andrei Arion, Véronique Benzaken, Ioana Manolescu, andRavi Vijay.ULoad: Choosing the right storage for your XMLapplication.In VLDB, pages 1330–1333, 2005.[AMP05]S. Abiteboul, I. Manolescu, and N. Preda.Constructing and querying peer-to-peer warehouses ofXML resources.In ICDE ’05: Demo Session, 2005.[AMP + 08] Serge Abiteboul, Ioana Manolescu, Neoklis Polyzotis,Nicoleta Preda, and Chong Sun.XML processing in DHT networks.In ICDE, pages 606–615, 2008.[APV07]Reza Akbarinia, Esther Pacitti, and Patrick Valduriez.I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 62 / 64


[BC06]PerspectivesData currency in replicated DHTs.In SIGMOD Conference, pages 211–222, 2007.Angela Bonifati and Alfredo Cuzzocrea.Storing and retrieving XPath fragments in structured P2Pnetworks.Data Knowl. Eng., 59(2), 2006.[BOB + 04] A. Balmin, F. Ozcan, K. Beyer, R. Cochrane, andH. Pirahesh.A framework for using materialized XPath views in XMLquery processing.In VLDB, 2004.[CDO08]Bogdan Cautis, Alin Deutsch, and Nicola Onose.XPath rewriting using multiple views: Achievingcompleteness and efficiency.In WebDB, 2008.[GWJD03] L. Galanis, Y. Wang, S.R. Jeffery, and D.J. DeWitt.I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 62 / 64


PerspectivesLocating data sources in large distributed systems.In VLDB, 2003.[HRVM08] Rabab Hayek, Guillaume Raschia, Patrick Valduriez, andNoureddine Mouaddib.Summary management in P2P systems.In EDBT, pages 16–25, 2008.[KMK08][Kos00]Zoi Kaoudi, Iris Miliaraki, and Manolis Koubarakis.RDFS reasoning and query answering on top of DHTs.In International Semantic Web Conference, pages499–516, 2008.Donald Kossmann.The state of the art in distributed query processing.ACM Comput. Surv., 32(4):422–469, 2000.[LHSH04] Boon Thau Loo, Ryan Huebsch, Ion Stoica, and Joseph M.Hellerstein.The case for a hybrid P2P search infrastructure.I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 62 / 64


[LIK06][LP08][MP05][MZ09]PerspectivesIn IPTPS, pages 141–150, 2004.Erietta Liarou, Stratos Idreos, and Manolis Koubarakis.Evaluating conjunctive triple pattern queries over largestructured overlay networks.In International Semantic Web Conference, pages399–413, 2006.Kostas Lillis and Evaggelia Pitoura.Cooperative XPath caching.In SIGMOD Conference, pages 327–338, 2008.Ioana Manolescu and Yannis Papakonstantinou.XQuery midflight: Emerging database-oriented paradigmsand a classification of research advances.In ICDE, page 1143, 2005.Ioana Manolescu and Spyros Zoupanos.Materialized views for P2P XML warehousing.Journées de Bases de Données Avancées, 2009.I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 62 / 64


Perspectives[ÖV99][SHA05]M. Tamer Özsu and Patrick Valduriez.Principles of Distributed Database Systems, SecondEdition.Prentice-Hall, 1999.Gleb Skobeltsyn, Manfred Hauswirth, and Karl Aberer.Efficient processing of XPath queries with structuredoverlay networks.In OTM Conferences (2), 2005.[TYÖ + 08] Nan Tang, Jeffrey Xu Yu, M. Tamer Özsu, Byron Choi, andKam-Fai Wong.Multiple materialized view selection for XPath queryrewriting.In ICDE, pages 873–882, 2008.[XO05]W. Xu and M. Ozsoyoglu.Rewriting XPath queries using materialized views.In VLDB, 2005.I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 63 / 64


PerspectivesMinimal canonical rewritingsLet q be a query XAM and V = {v 1 , v 2 , . . . , v k } a set of views XAMs.A rewriting of q using V is an algebraic expressione(v 1 , v 2 , . . . , v k )such that e(d) = q(d) for any document dI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 63 / 64


PerspectivesMinimal canonical rewritingsLet q be a query XAM and V = {v 1 , v 2 , . . . , v k } a set of views XAMs.A rewriting of q using V is an algebraic expressione(v 1 , v 2 , . . . , v k )such that e(d) = q(d) for any document dMinimal rewritingsDo not use a view if we could do without itI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 63 / 64


PerspectivesMinimal canonical rewritingsLet q be a query XAM and V = {v 1 , v 2 , . . . , v k } a set of views XAMs.A rewriting of q using V is an algebraic expressione(v 1 , v 2 , . . . , v k )such that e(d) = q(d) for any document dMinimal rewritingsDo not use a view if we could do without itCanonical rewritingsOperators are organized in a certain way (avoids equivalent planexplosion)I. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 63 / 64


PerspectivesMinimal canonical rewritingqc id,contab d [val=5]e valv 1 v 2 v 3π c.cont,e.val⊲⊳ a.id=a.idσ b.id≺c.id⊲⊳ a.id≺b.id v 3nav b.cont,e.valv 2σ val=5a ida idv 1b id,contc id,contd valI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 64 / 64


PerspectivesMinimal canonical rewritingqc id,contbae vald [val=5]π c.cont,e.valσ a.id=a.id ∧ b.id≺c.id ∧ a.id≺b.id ∧ d.val=5×v 1 v 2 v 3a ida idnav b.cont,e.valv 1v 2 v 3b id,contc id,contd valI. Manolescu (Gemo/IASI) Optimizations techniques for XML October 30, 2009 64 / 64

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!