15.07.2016 Views

MARKLOGIC SERVER

Inside-MarkLogic-Server

Inside-MarkLogic-Server

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

INDEXING VALUES<br />

Now what if we want to search for element values? Let's imagine that I ask<br />

you for books published in a certain year. In XPath, that can be expressed as /<br />

book[metadata/pubyear = 2016]. How do we resolve this efficiently?<br />

Thinking back to the paper-based approach, what you want to do is maintain a term<br />

list for each XML element value (or JSON property value). In other words, you can<br />

track a term list for documents having a equal to 2016, as well as any<br />

other element name with any other value you find during indexing. Then, for any<br />

query asking for an element with a particular value, you can immediately resolve which<br />

documents have it directly from indexes. Intersect that value index with the parent-child<br />

structural indexes discussed above, and you've used several small indexes in cooperation<br />

to match a larger query. It works even when you don't know the schema or query in<br />

advance. This is how you build a database using the heart of a search engine.<br />

Can an element-value index be stored efficiently? Yes, thanks to hashing. Instead of<br />

storing the full element name and value, you can hash the element name and value<br />

down to a succinct integer and use that as the term list lookup key. Then no matter how<br />

long the element name and value, it's actually a small entry in the index. MarkLogic<br />

uses hashes behind the scenes to store all term list keys, element-value or otherwise, for<br />

the sake of efficiency. The element-value index has proven to be so efficient and useful<br />

that it's always and automatically enabled within MarkLogic.<br />

In the above example, 2016 is queried as an integer. Does MarkLogic actually store<br />

the value as an integer? By default, no—it's stored as the textual representation of the<br />

integer value, the same as it appeared in the document, and the above query executes<br />

the same as if 2016 were surrounded by string quotes. Often this type of fuzziness is<br />

sufficient. For cases where data type encoding matters, it's possible to use a range index,<br />

which is discussed later.<br />

17

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!