Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
For each range index, MarkLogic creates data structures that make it easy to do two<br />
things: for any document ID, get the document's range index value(s); for any range<br />
index value, get the document IDs that have that value.<br />
Conceptually, you can think of a range index as implemented by two data structures,<br />
written to disk and then memory mapped for efficient access. One can be thought of<br />
as an array of structures holding document IDs and values, sorted by document IDs,<br />
and the other an array of structures holding values and document IDs, sorted by values.<br />
It's not actually this simple or wasteful with memory and disk (in reality, the values are<br />
only stored once), but it's a good mental model. With our party example, you'd have a<br />
list of birthdays mapped to people, sorted by birthday, and a list of people mapped to<br />
birthdays, sorted by person.<br />
RANGE QUERIES<br />
To perform a fast range query, MarkLogic uses the "value to document ID" lookup<br />
array. Because the lookup array is sorted by value, there's a specific subsequence in<br />
the array that holds the values between the two user-defined endpoints. Each of the<br />
values in the range has a document ID associated with it, and those document IDs<br />
can be quickly gathered and used like a synthetic term list to limit the search result to<br />
documents having a matching value within the user's specified range.<br />
For example, to find partygoers with a birthday between January 1, 1980, and May<br />
16, 1980, you'd find the point in the date-sorted range index for the start date, then<br />
the end date. Every date in the array between those two endpoints is a birthday of<br />
someone at the party, and it's easy to get the people's names because every birthdate has<br />
the person listed right next to it. If multiple people have the same birthday, you'll have<br />
multiple entries in the array with the same value but a different corresponding name. In<br />
MarkLogic, instead of people's names, the system tracks document IDs.<br />
Range queries can be combined with any other types of queries in MarkLogic. Let's<br />
say that you want to limit results to those within a date range as above but also having<br />
a certain metadata tag. MarkLogic uses the range index to get the set of document IDs<br />
in the range, uses a term list to get the set of document IDs with the metadata tag, and<br />
intersects the sets of IDs to determine the set of results matching both constraints. All<br />
indexes in MarkLogic are fully composable with each other.<br />
Programmers probably think they're using range indexes only via functions such as<br />
cts:element-range-query(). In fact, range indexes are also used to accelerate<br />
regular XPath and XQuery expressions. The XPath expression /book[metadata/<br />
price > 19.99] looks for books above a certain price and will leverage a decimal<br />
(or double or float) range index on if it exists. What if there's no such<br />
range index? Then MarkLogic won't be able to use any index to assist with the price<br />
28