15.07.2016 Views

MARKLOGIC SERVER

Inside-MarkLogic-Server

Inside-MarkLogic-Server

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CHAPTER 3<br />

ADVANCED TOPICS<br />

ADVANCED TEXT HANDLING<br />

At the start of this book, we introduced MarkLogic's Universal Index and explained<br />

how MarkLogic uses term lists to index words and phrases as well as structure. In<br />

that section, we only scratched the surface of what MarkLogic can do regarding text<br />

indexing. In this section, we'll dig a little deeper.<br />

Note that these indexes work the same as the ones you've already learned about. Each<br />

new index option just tells MarkLogic to track a new type of term list, making index<br />

resolution more efficient and xdmp:estimate() calls more precise.<br />

TEXT SENSITIVITY OPTIONS<br />

Sometimes when querying text, you'll need to specify whether you desire a case-sensitive<br />

match. For example, "Polish" and "polish" mean different things. 1 It's easy to specify this<br />

as part of your query; just pass a "case-sensitive" or "case-insensitive" option to each<br />

query term. 2 The question is, how does MarkLogic resolve these queries?<br />

By default, MarkLogic maintains only case-insensitive term list entries (think of them<br />

as having every term lowercased). If you conduct a query with a case-sensitive term,<br />

MarkLogic will rely on indexes to find case-insensitive matches and filtering to identify<br />

the true case-sensitive matches. That's fine when case-sensitive searches are rare, but<br />

when they're more common, you can improve efficiency by turning on the fast case<br />

sensitive searches index option. This tells MarkLogic to maintain case-sensitive term list<br />

entries along with case-insensitive. With the index enabled, case-insensitive terms will<br />

use the case-insensitive term list entries, case-sensitive terms will use the case-sensitive<br />

term list entries, and all results will resolve quickly and accurately out of indexes.<br />

1 Words like this that have different meanings when capitalized are called "capitonyms." Another example:<br />

"March" and "march." Or "Josh" and "josh."<br />

2 If you don't specify "case-sensitive" or "case-insensitive," MarkLogic does something interesting: it looks at<br />

the case of your query term. If it's all lowercase, MarkLogic assumes that case doesn't matter to you and treats<br />

it as case-insensitive. If the query term includes any uppercase characters, MarkLogic assumes that case does<br />

matter and treats it as case-sensitive.<br />

67

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!