15.07.2016 Views

MARKLOGIC SERVER

Inside-MarkLogic-Server

Inside-MarkLogic-Server

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

For each range index, MarkLogic creates data structures that make it easy to do two<br />

things: for any document ID, get the document's range index value(s); for any range<br />

index value, get the document IDs that have that value.<br />

Conceptually, you can think of a range index as implemented by two data structures,<br />

written to disk and then memory mapped for efficient access. One can be thought of<br />

as an array of structures holding document IDs and values, sorted by document IDs,<br />

and the other an array of structures holding values and document IDs, sorted by values.<br />

It's not actually this simple or wasteful with memory and disk (in reality, the values are<br />

only stored once), but it's a good mental model. With our party example, you'd have a<br />

list of birthdays mapped to people, sorted by birthday, and a list of people mapped to<br />

birthdays, sorted by person.<br />

RANGE QUERIES<br />

To perform a fast range query, MarkLogic uses the "value to document ID" lookup<br />

array. Because the lookup array is sorted by value, there's a specific subsequence in<br />

the array that holds the values between the two user-defined endpoints. Each of the<br />

values in the range has a document ID associated with it, and those document IDs<br />

can be quickly gathered and used like a synthetic term list to limit the search result to<br />

documents having a matching value within the user's specified range.<br />

For example, to find partygoers with a birthday between January 1, 1980, and May<br />

16, 1980, you'd find the point in the date-sorted range index for the start date, then<br />

the end date. Every date in the array between those two endpoints is a birthday of<br />

someone at the party, and it's easy to get the people's names because every birthdate has<br />

the person listed right next to it. If multiple people have the same birthday, you'll have<br />

multiple entries in the array with the same value but a different corresponding name. In<br />

MarkLogic, instead of people's names, the system tracks document IDs.<br />

Range queries can be combined with any other types of queries in MarkLogic. Let's<br />

say that you want to limit results to those within a date range as above but also having<br />

a certain metadata tag. MarkLogic uses the range index to get the set of document IDs<br />

in the range, uses a term list to get the set of document IDs with the metadata tag, and<br />

intersects the sets of IDs to determine the set of results matching both constraints. All<br />

indexes in MarkLogic are fully composable with each other.<br />

Programmers probably think they're using range indexes only via functions such as<br />

cts:element-range-query(). In fact, range indexes are also used to accelerate<br />

regular XPath and XQuery expressions. The XPath expression /book[metadata/<br />

price > 19.99] looks for books above a certain price and will leverage a decimal<br />

(or double or float) range index on if it exists. What if there's no such<br />

range index? Then MarkLogic won't be able to use any index to assist with the price<br />

28

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!