15.07.2016 Views

MARKLOGIC SERVER

Inside-MarkLogic-Server

Inside-MarkLogic-Server

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

MarkLogic then sorts the document IDs based on relevance score, calculated using<br />

term frequency data held in the term lists and term frequency statistics. This produces a<br />

score-sorted set of candidate document IDs.<br />

At this point MarkLogic starts iterating over the sorted document IDs. Because this is a<br />

filtered search, MarkLogic opens the highest-scoring document and filters it by looking<br />

inside it to make sure it's a match. If it's not a match, it gets skipped. If it is, it continues<br />

on to the return clause. 1<br />

As each document makes it to the return clause, MarkLogic extracts a few nodes from<br />

the document and constructs a new HTML node in memory for the result sequence.<br />

After 10 hits (because of the [1 to 10] predicate), the search expression finishes and<br />

MarkLogic stops iterating.<br />

Now, what if all indexes aren't enabled? As always, MarkLogic does the best it can<br />

with the indexes you've enabled. For example, if you don't have any positional indexes,<br />

MarkLogic applies the positional constraints during the filter phase. If you don't have<br />

fast element phrase searches to help with the phrase match, MarkLogic<br />

uses fast element word searches and element word positions instead. If you don't have those, it<br />

keeps falling back to other indexes like fast phrase searches. The less specific the indexes,<br />

the more candidate documents to consider and the more documents that might have to<br />

be discarded during the filter phase. The end result after filtering, however, is always<br />

the same.<br />

INDEXING DOCUMENT METADATA<br />

By now you should have a clear understanding of how MarkLogic uses term lists for<br />

indexing both text and structure. MarkLogic doesn't stop there; it turns out that term<br />

lists are useful for indexing many other things such as collections, directories, and<br />

security rules. That's why it's named the Universal Index.<br />

COLLECTION INDEXES<br />

Collections in MarkLogic are a way to tag documents as belonging to a named group (a<br />

taxonomic designation, origin, whether it's draft or published, etc.) without modifying<br />

the document itself. Each document can be tagged as belonging to any number of<br />

collections. A query constraint can limit the scope of a search to a certain collection or<br />

set of collections.<br />

Collection constraints are implemented internally as just another term list to be<br />

intersected. There's a term list for each collection tracking the documents within that<br />

collection. If you limit a query to a collection, it intersects that collection's term list with<br />

the other constraints in the query. If you limit a query to two collections, the two term<br />

lists are unioned into one before being intersected.<br />

1 Previously read term lists and documents get cached for efficiency, a topic discussed later.<br />

22

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!