Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
MarkLogic then sorts the document IDs based on relevance score, calculated using<br />
term frequency data held in the term lists and term frequency statistics. This produces a<br />
score-sorted set of candidate document IDs.<br />
At this point MarkLogic starts iterating over the sorted document IDs. Because this is a<br />
filtered search, MarkLogic opens the highest-scoring document and filters it by looking<br />
inside it to make sure it's a match. If it's not a match, it gets skipped. If it is, it continues<br />
on to the return clause. 1<br />
As each document makes it to the return clause, MarkLogic extracts a few nodes from<br />
the document and constructs a new HTML node in memory for the result sequence.<br />
After 10 hits (because of the [1 to 10] predicate), the search expression finishes and<br />
MarkLogic stops iterating.<br />
Now, what if all indexes aren't enabled? As always, MarkLogic does the best it can<br />
with the indexes you've enabled. For example, if you don't have any positional indexes,<br />
MarkLogic applies the positional constraints during the filter phase. If you don't have<br />
fast element phrase searches to help with the phrase match, MarkLogic<br />
uses fast element word searches and element word positions instead. If you don't have those, it<br />
keeps falling back to other indexes like fast phrase searches. The less specific the indexes,<br />
the more candidate documents to consider and the more documents that might have to<br />
be discarded during the filter phase. The end result after filtering, however, is always<br />
the same.<br />
INDEXING DOCUMENT METADATA<br />
By now you should have a clear understanding of how MarkLogic uses term lists for<br />
indexing both text and structure. MarkLogic doesn't stop there; it turns out that term<br />
lists are useful for indexing many other things such as collections, directories, and<br />
security rules. That's why it's named the Universal Index.<br />
COLLECTION INDEXES<br />
Collections in MarkLogic are a way to tag documents as belonging to a named group (a<br />
taxonomic designation, origin, whether it's draft or published, etc.) without modifying<br />
the document itself. Each document can be tagged as belonging to any number of<br />
collections. A query constraint can limit the scope of a search to a certain collection or<br />
set of collections.<br />
Collection constraints are implemented internally as just another term list to be<br />
intersected. There's a term list for each collection tracking the documents within that<br />
collection. If you limit a query to a collection, it intersects that collection's term list with<br />
the other constraints in the query. If you limit a query to two collections, the two term<br />
lists are unioned into one before being intersected.<br />
1 Previously read term lists and documents get cached for efficiency, a topic discussed later.<br />
22