15.07.2016 Views

MARKLOGIC SERVER

Inside-MarkLogic-Server

Inside-MarkLogic-Server

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Fields are another way to pinpoint what parts of your documents get indexed. They<br />

enable you to include (or exclude) different XML elements or JSON properties as<br />

a single indexable unit. For example, you can combine and <br />

elements (but exclude ) as a field. By adding a range index<br />

on the field, you can perform range operations on those combined values. Or maybe<br />

you have documents that define last names in different ways, with and<br />

elements. By creating a range-indexed field for those variations, you can<br />

order your search results across these documents by last name even though the markup<br />

varies. See the "Advanced Topics" chapter for more on fields.<br />

LEXICONS<br />

When configuring a database, you have the options to configure a "URI lexicon" and<br />

"collection lexicon." These are range indexes in disguise. The URI lexicon tracks the<br />

document URIs held in the system, making it possible to quickly extract the URIs<br />

matching a query without touching the disk. The collection lexicon tracks the collection<br />

URIs, letting you do the same with them. Internally, they're both just like any other<br />

range index, with the value being the URI. The lexicon retrieval calls can accept a<br />

passed-in cts:query object constraint, making it easy and efficient to find the distinct<br />

URIs or collections for documents matching a query.<br />

Why do we need a range index on these things? Isn't this stuff in the indexes? Not by<br />

default. Remember that term list lookup keys are hashes, so while it's possible with the<br />

Universal Index to find all documents in a collection (hash the collection name to find<br />

the term list and look at the document IDs), it's not efficient to find all collections (hashes<br />

are one-way). The lexicons can calculate the document and collection URIs the same<br />

way regular range indexes extract values from within documents.<br />

It's also possible to configure a "word lexicon" to track the individual words in a<br />

database (or limited to a certain element). For memory efficiency the word lexicon<br />

is kept as a flat list of words with no associated document IDs, lest a common word<br />

require a large number of sequential entries in the list. You can still pass a cts:query<br />

object constraint to the word lexicon retrieval call to retrieve only words in documents<br />

matching the query. To impose that constraint, MarkLogic pulls each word from the<br />

lexicon and checks it, via a quick term list lookup, to see if any document IDs from its<br />

term list are in the document IDs matching the query. If so, the word is returned; if<br />

not, it's on to the next word. Word lexicons can help with wildcard queries, which will<br />

be discussed later.<br />

DATA MANAGEMENT<br />

In the next section, we'll take a look at how MarkLogic manages data on disk and<br />

handles concurrent reads and writes.<br />

33

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!