15.07.2016 Views

MARKLOGIC SERVER

Inside-MarkLogic-Server

Inside-MarkLogic-Server

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

subchild. What would you do? With the simple element term list from above you can<br />

find documents having , , and elements. That's good,<br />

but it doesn't respect the required hierarchical relationship. It's like a phrase query that<br />

looks for words without concern for placement.<br />

MarkLogic uses a unique parent-child index to track element hierarchies. It's much like<br />

a fast phrase searches index except that instead of using adjoining words as the term key<br />

it uses parent-child names. There's a term list tracking documents that have a book/<br />

metadata relationship (that is, a as a parent of a ) and another<br />

for metadata/title. There's even one tracking which documents have any particular<br />

root element. Intersecting these three lists produces an even better set of candidate<br />

documents. Now you can see how the XPath /a/b/c can be resolved very much like<br />

the phrase "a b c".<br />

The parent-child index lets you search against an XPath even when you don't know the<br />

path in advance. The index is so useful that it's one MarkLogic always maintains; there's<br />

no configuration option to turn it off.<br />

Note that even with the parent-child index, there's still the small potential for<br />

documents to be in the candidate set that aren't an actual match. Knowing that<br />

somewhere inside an XML document there's a parent of and<br />

a parent of doesn't mean it's the same between<br />

them. That's where filtering comes in. MarkLogic confirms each of the results by<br />

looking inside the document before the programmer sees them. While the document is<br />

open, MarkLogic also extracts any nodes that match the query.<br />

Remember that the goal of index resolution is to make the candidate set so small and<br />

accurate that very little filtering is needed.<br />

Doc 1<br />

Parent-Child Index<br />

Term Doc<br />

a/b<br />

a/e<br />

1<br />

1<br />

b/c 1<br />

b/d 1<br />

Figure 3: MarkLogic indexes the parent-child relationships of XML elements to enable search based<br />

on document structure. (It does the same for the properties in JSON documents.)<br />

16

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!