You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
CHAPTER 3<br />
ADVANCED TOPICS<br />
ADVANCED TEXT HANDLING<br />
At the start of this book, we introduced MarkLogic's Universal Index and explained<br />
how MarkLogic uses term lists to index words and phrases as well as structure. In<br />
that section, we only scratched the surface of what MarkLogic can do regarding text<br />
indexing. In this section, we'll dig a little deeper.<br />
Note that these indexes work the same as the ones you've already learned about. Each<br />
new index option just tells MarkLogic to track a new type of term list, making index<br />
resolution more efficient and xdmp:estimate() calls more precise.<br />
TEXT SENSITIVITY OPTIONS<br />
Sometimes when querying text, you'll need to specify whether you desire a case-sensitive<br />
match. For example, "Polish" and "polish" mean different things. 1 It's easy to specify this<br />
as part of your query; just pass a "case-sensitive" or "case-insensitive" option to each<br />
query term. 2 The question is, how does MarkLogic resolve these queries?<br />
By default, MarkLogic maintains only case-insensitive term list entries (think of them<br />
as having every term lowercased). If you conduct a query with a case-sensitive term,<br />
MarkLogic will rely on indexes to find case-insensitive matches and filtering to identify<br />
the true case-sensitive matches. That's fine when case-sensitive searches are rare, but<br />
when they're more common, you can improve efficiency by turning on the fast case<br />
sensitive searches index option. This tells MarkLogic to maintain case-sensitive term list<br />
entries along with case-insensitive. With the index enabled, case-insensitive terms will<br />
use the case-insensitive term list entries, case-sensitive terms will use the case-sensitive<br />
term list entries, and all results will resolve quickly and accurately out of indexes.<br />
1 Words like this that have different meanings when capitalized are called "capitonyms." Another example:<br />
"March" and "march." Or "Josh" and "josh."<br />
2 If you don't specify "case-sensitive" or "case-insensitive," MarkLogic does something interesting: it looks at<br />
the case of your query term. If it's all lowercase, MarkLogic assumes that case doesn't matter to you and treats<br />
it as case-insensitive. If the query term includes any uppercase characters, MarkLogic assumes that case does<br />
matter and treats it as case-sensitive.<br />
67