MARKLOGIC SERVER

Recommendations

Info

subchild. What would you do? With the simple element term list from above you can find documents having , , and elements. That's good, but it doesn't respect the required hierarchical relationship. It's like a phrase query that looks for words without concern for placement. MarkLogic uses a unique parent-child index to track element hierarchies. It's much like a fast phrase searches index except that instead of using adjoining words as the term key it uses parent-child names. There's a term list tracking documents that have a book/ metadata relationship (that is, a as a parent of a ) and another for metadata/title. There's even one tracking which documents have any particular root element. Intersecting these three lists produces an even better set of candidate documents. Now you can see how the XPath /a/b/c can be resolved very much like the phrase "a b c". The parent-child index lets you search against an XPath even when you don't know the path in advance. The index is so useful that it's one MarkLogic always maintains; there's no configuration option to turn it off. Note that even with the parent-child index, there's still the small potential for documents to be in the candidate set that aren't an actual match. Knowing that somewhere inside an XML document there's a parent of and a parent of doesn't mean it's the same between them. That's where filtering comes in. MarkLogic confirms each of the results by looking inside the document before the programmer sees them. While the document is open, MarkLogic also extracts any nodes that match the query. Remember that the goal of index resolution is to make the candidate set so small and accurate that very little filtering is needed. Doc 1 Parent-Child Index Term Doc a/b a/e 1 1 b/c 1 b/d 1 Figure 3: MarkLogic indexes the parent-child relationships of XML elements to enable search based on document structure. (It does the same for the properties in JSON documents.) 16
INDEXING VALUES Now what if we want to search for element values? Let's imagine that I ask you for books published in a certain year. In XPath, that can be expressed as / book[metadata/pubyear = 2016]. How do we resolve this efficiently? Thinking back to the paper-based approach, what you want to do is maintain a term list for each XML element value (or JSON property value). In other words, you can track a term list for documents having a equal to 2016, as well as any other element name with any other value you find during indexing. Then, for any query asking for an element with a particular value, you can immediately resolve which documents have it directly from indexes. Intersect that value index with the parent-child structural indexes discussed above, and you've used several small indexes in cooperation to match a larger query. It works even when you don't know the schema or query in advance. This is how you build a database using the heart of a search engine. Can an element-value index be stored efficiently? Yes, thanks to hashing. Instead of storing the full element name and value, you can hash the element name and value down to a succinct integer and use that as the term list lookup key. Then no matter how long the element name and value, it's actually a small entry in the index. MarkLogic uses hashes behind the scenes to store all term list keys, element-value or otherwise, for the sake of efficiency. The element-value index has proven to be so efficient and useful that it's always and automatically enabled within MarkLogic. In the above example, 2016 is queried as an integer. Does MarkLogic actually store the value as an integer? By default, no—it's stored as the textual representation of the integer value, the same as it appeared in the document, and the above query executes the same as if 2016 were surrounded by string quotes. Often this type of fuzziness is sufficient. For cases where data type encoding matters, it's possible to use a range index, which is discussed later. 17
Page 1 and 2: Inside MARKLOGIC SERVER Jason Hunte
Page 3 and 4: You can find the full set of API do
Page 5 and 6: CHAPTER 1 WHAT IS MARKLOGIC SERVER?
Page 7 and 8: enforced, such as that no two docum
Page 9 and 10: You can even use MarkLogic to enfor
Page 11 and 12: instance, all the way up to (in 201
Page 13 and 14: Doc1 Doc 2 Doc 3 Doc 4 a blue car t
Page 15: INDEXING LONGER PHRASES What happen
Page 19 and 20: The indexes don't know if they're t
Page 21 and 22: for $result in cts:search( /article
Page 23 and 24: DIRECTORY INDEXES MarkLogic include
Page 25 and 26: Every fragment acts as its own self
Page 27 and 28: 4. Perform optimized order by calcu
Page 29 and 30: constraint (term lists are of no us
Page 31 and 32: Performance of range index operatio
Page 33 and 34: Fields are another way to pinpoint
Page 35 and 36: DATABASE MyDocuments FOREST MyFores
Page 37 and 38: stands. Merges tend to be CPU- and
Page 39 and 40: When doing point-in-time queries, y
Page 41 and 42: Isolating an Update When a request
Page 43 and 44: timestamp to make the fragment live
Page 45 and 46: if the global commit happened or no
Page 47 and 48: CLUSTERING AND CACHING As your data
Page 49 and 50: Expanded Tree Cache Each time a D-n
Page 51 and 52: In the regular heartbeat communicat
Page 53 and 54: QUERY QUERY LIFECYCLE 7 RESULT SET
Page 55 and 56: Figure 9: During a commit involving
Page 57 and 58: other transactions as the documents
Page 59 and 60: MODULES AND DEPLOYMENT XQuery, XSLT
Page 61 and 62: REST API FOR MULTI-TIER DEVELOPMENT
Page 63 and 64: SEARCH AND JSEARCH APIS The Search
Page 65 and 66: SQL/ODBC ACCESS FOR BUSINESS INTELL
Page 67 and 68:
CHAPTER 3 ADVANCED TOPICS ADVANCED
Page 69 and 70:
MarkLogic provides basic language s
Page 71 and 72:
as a space removed from all text, s
Page 73 and 74:
If instead of matching documents th
Page 75 and 76:
You can watch as the server does th
Page 77 and 78:
MORE WITH FIELDS Fields also provid
Page 79 and 80:
The cts:register() call returns an
Page 81 and 82:
value but a latitude and longitude
Page 83 and 84:
produces this XML: dog name Ch
Page 85 and 86:
This XQuery code inserts the defini
Page 87 and 88:
It searches across passengers, requ
Page 89 and 90:
five.xml (doc ID 5): { cts:and-quer
Page 91 and 92:
Valid Start: 2016-01-01 End: ∞ Sy
Page 93 and 94:
BITEMPORAL QUERIES Querying on bite
Page 95 and 96:
A key aspect of semantic data is no
Page 97 and 98:
Triple Type Index Object values in
Page 99 and 100:
estimates the efficiency of that pl
Page 101 and 102:
MANAGING BACKUPS MarkLogic supports
Page 103 and 104:
esult of a code bug, modifies data
Page 105 and 106:
and Local-Disk Failover. Failover w
Page 107 and 108:
andwidth. Local-Disk is faster when
Page 109 and 110:
Contemporaneous vs. Non-Blocking Ea
Page 111 and 112:
database allowed to have most of it
Page 113 and 114:
threshold needs to be reached among
Page 115 and 116:
TIERED STORAGE All storage media ar
Page 117 and 118:
storage media but still query those
Page 119 and 120:
LOW-LEVEL SYSTEM CONTROL When scali
Page 121 and 122:
OUTSIDE THE CORE That completes our
Page 123 and 124:
CONNECTOR FOR SHAREPOINT Microsoft
Page 125 and 126:
Sublime Text Plug-in An add-on to t
Page 127:
999 Skyway Road, Suite 200 San Carl
show all

MARKLOGIC SERVER

Create successful ePaper yourself

Delete template?

Save as template?