15.07.2016 Views

MARKLOGIC SERVER

Inside-MarkLogic-Server

Inside-MarkLogic-Server

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

If you have the fast phrase searches option enabled, MarkLogic will incorporate two-word<br />

terms into its inverted index. With this index enabled, MarkLogic can use the third<br />

approach listed above. This makes phrase queries very efficient at the cost of slightly<br />

larger indexes on disk and slightly slower performance during document ingestion, since<br />

extra entries must be added to the indexes.<br />

If you have the word positions option enabled, MarkLogic will use position information<br />

to resolve the phrase—the second approach listed above. This index resolution isn't<br />

as efficient as fast phrase searches because it requires position-matching work, but word<br />

positions can also support proximity queries that look for words near each other but not<br />

necessarily next to each other. Word positions also enables "mild not" where you can, for<br />

example, search for Mexico but not if it appears in the phrase New Mexico.<br />

FAST PHRASE SEARCHES<br />

WORD POSITIONS<br />

Term Doc<br />

Term Doc:Pos<br />

a 1<br />

a blue 1<br />

blue<br />

1, 2<br />

blue car 2, 3<br />

…<br />

…<br />

a 1:1<br />

blue 1:2<br />

car<br />

1:3, 2:3<br />

red 2:2, 3:2<br />

…<br />

…<br />

Figure 2: With fast phrase searches turned on, MarkLogic also indexes word pairs. The word positions<br />

setting indexes information about a term's location in the document.<br />

If neither the fast phrase searches nor word positions index is enabled, then MarkLogic will<br />

use the simple word indexes as best it can and rely on filtering the results. Filtering is<br />

what MarkLogic calls the act of opening a document and checking to see if it's a true<br />

match. Having the indexes off and relying on filtering keeps the indexes smaller but will<br />

slow queries proportional to how many candidate documents aren't actually matching<br />

documents—that is, how many have to be read as candidates but are thrown away<br />

during filtering.<br />

It's important to note that the query results will include the same documents in each<br />

case; it's mostly just a matter of performance tradeoffs. Relevance order may differ<br />

slightly based on index settings because having more indexes available (as with fast phrase<br />

searches) will enable MarkLogic to do more accurate relevance calculations, because the<br />

calculations have to be made using indexes before pulling documents off disk. We'll talk<br />

more about relevance calculation later.<br />

14

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!