MARKLOGIC SERVER

Recommendations

Info

PROPERTIES INDEXES Each document within MarkLogic has an optional XML-based properties sheet. Properties are a convenient place for holding metadata about JSON, binary, or text documents that otherwise wouldn't have a place for an XML description. They're also useful for adding XML metadata to an XML document whose schema can't be altered. Depending on configuration settings, MarkLogic can use properties for auto-tracking each document's last modified time. Properties are represented as regular XML documents, held under the same URI (document name) as the document they describe but only available via specific calls. Properties are indexed and can be queried just like regular XML documents. If a query declares constraints on both the main document and the properties sheet (like finding documents matching a query that were updated within the last hour), MarkLogic uses indexes to independently find the properties matches and the main document matches and does a hash join (based on the URI that they share) to determine the final set of matches. It's not quite as efficient as if the properties values were held within the main document itself, but it's close. FRAGMENTATION So far we've use the word document to represent each unit of content. That's a bit of a simplification. MarkLogic actually indexes, retrieves, and stores things called fragments. The default fragment size is the document, and that's how most people leave it (and should leave it). However, with XML documents, it's also possible to break documents into sub-document fragments through configured fragment root or fragment parent settings controlled via the Admin Interface or the APIs. This can be helpful when handling a large document where the unit of indexing, retrieval, storage, and relevance scoring should be something smaller than a document. You specify a QName (a technical term for an XML element name) as a fragment root, and the system automatically splits the document internally at that breakpoint. Or, you can specify a QName as a fragment parent to make each of its child elements a fragment root. You cannot break JSON documents into sub-document fragments. FRAGMENT VS. DOCUMENT You can picture using fragmentation on a book. A full book may be the right unit of index, retrieval, and update, but perhaps it's too large. Perhaps in a world where you're doing chapter-based search and chapter-based display it would be better to have as the fragment root. With that change, each book document then becomes a series of fragments—one for the root element holding the metadata about the book and a series of distinct fragments for each . The book still exists as a document, with a single URI, and it can be stored and retrieved as a single item, but internally it's broken into pieces. 24
Every fragment acts as its own self-contained unit. It's the unit of indexing. A term list doesn't truly reference document IDs; it references fragment IDs. The filtering and retrieval process doesn't actually load documents; it loads fragments. There's actually very little difference between fragmenting a book at the chapter level and just splitting each chapter element into its own document as part of the load. That's why people generally avoid fragmentation and just keep each document as its own singular fragment. It's a slightly easier mental model. In fact, if you see "fragment" in MarkLogic literature (including this book), you can substitute "document" and the statement will be correct for any databases where no fragmentation is enabled. There's one noticeable difference between a fragmented document and a document split into individual documents: a query pulling data from two fragments residing in the same document can perform slightly more efficiently than a query pulling data from two documents. See the documentation for the cts:document-fragmentquery() query construct for more details. Even with this advantage, fragmentation isn't something you should enable unless you're sure you need it. ESTIMATE AND COUNT You'll find that you really understand MarkLogic's indexing and fragmentation system when you understand the difference between the xdmp:estimate() and fn:count() functions, so let's look at them here. Both take an expression and return the number of items matching that expression. The xdmp:estimate() call estimates the number of items using nothing but indexes. That's why it's so fast. It resolves the given expression using indexes and returns how many fragments the indexes see as satisfying all of the term list constraints. The fn:count() returns a number of items based on the actual number of document fragments. This involves indexes and also filtering of the fragments to see which ones truly match the expression and how many times it matches per document. That filtering takes time (due mostly to disk I/O), which is why it's not always fast, even if it's always accurate. It's interesting to note that the xdmp:estimate() call can return results both higher and lower than, as well as identical to, those of fn:count()—depending on the query, data schema, and index options. The estimate results are higher when the index system returns fragments that would be filtered away. For example, a case-sensitive search performed without benefit of a case-sensitive index will likely have some candidate 25
Page 1 and 2: Inside MARKLOGIC SERVER Jason Hunte
Page 3 and 4: You can find the full set of API do
Page 5 and 6: CHAPTER 1 WHAT IS MARKLOGIC SERVER?
Page 7 and 8: enforced, such as that no two docum
Page 9 and 10: You can even use MarkLogic to enfor
Page 11 and 12: instance, all the way up to (in 201
Page 13 and 14: Doc1 Doc 2 Doc 3 Doc 4 a blue car t
Page 15 and 16: INDEXING LONGER PHRASES What happen
Page 17 and 18: INDEXING VALUES Now what if we want
Page 19 and 20: The indexes don't know if they're t
Page 21 and 22: for $result in cts:search( /article
Page 23: DIRECTORY INDEXES MarkLogic include
Page 27 and 28: 4. Perform optimized order by calcu
Page 29 and 30: constraint (term lists are of no us
Page 31 and 32: Performance of range index operatio
Page 33 and 34: Fields are another way to pinpoint
Page 35 and 36: DATABASE MyDocuments FOREST MyFores
Page 37 and 38: stands. Merges tend to be CPU- and
Page 39 and 40: When doing point-in-time queries, y
Page 41 and 42: Isolating an Update When a request
Page 43 and 44: timestamp to make the fragment live
Page 45 and 46: if the global commit happened or no
Page 47 and 48: CLUSTERING AND CACHING As your data
Page 49 and 50: Expanded Tree Cache Each time a D-n
Page 51 and 52: In the regular heartbeat communicat
Page 53 and 54: QUERY QUERY LIFECYCLE 7 RESULT SET
Page 55 and 56: Figure 9: During a commit involving
Page 57 and 58: other transactions as the documents
Page 59 and 60: MODULES AND DEPLOYMENT XQuery, XSLT
Page 61 and 62: REST API FOR MULTI-TIER DEVELOPMENT
Page 63 and 64: SEARCH AND JSEARCH APIS The Search
Page 65 and 66: SQL/ODBC ACCESS FOR BUSINESS INTELL
Page 67 and 68: CHAPTER 3 ADVANCED TOPICS ADVANCED
Page 69 and 70: MarkLogic provides basic language s
Page 71 and 72: as a space removed from all text, s
Page 73 and 74: If instead of matching documents th
Page 75 and 76:
You can watch as the server does th
Page 77 and 78:
MORE WITH FIELDS Fields also provid
Page 79 and 80:
The cts:register() call returns an
Page 81 and 82:
value but a latitude and longitude
Page 83 and 84:
produces this XML: dog name Ch
Page 85 and 86:
This XQuery code inserts the defini
Page 87 and 88:
It searches across passengers, requ
Page 89 and 90:
five.xml (doc ID 5): { cts:and-quer
Page 91 and 92:
Valid Start: 2016-01-01 End: ∞ Sy
Page 93 and 94:
BITEMPORAL QUERIES Querying on bite
Page 95 and 96:
A key aspect of semantic data is no
Page 97 and 98:
Triple Type Index Object values in
Page 99 and 100:
estimates the efficiency of that pl
Page 101 and 102:
MANAGING BACKUPS MarkLogic supports
Page 103 and 104:
esult of a code bug, modifies data
Page 105 and 106:
and Local-Disk Failover. Failover w
Page 107 and 108:
andwidth. Local-Disk is faster when
Page 109 and 110:
Contemporaneous vs. Non-Blocking Ea
Page 111 and 112:
database allowed to have most of it
Page 113 and 114:
threshold needs to be reached among
Page 115 and 116:
TIERED STORAGE All storage media ar
Page 117 and 118:
storage media but still query those
Page 119 and 120:
LOW-LEVEL SYSTEM CONTROL When scali
Page 121 and 122:
OUTSIDE THE CORE That completes our
Page 123 and 124:
CONNECTOR FOR SHAREPOINT Microsoft
Page 125 and 126:
Sublime Text Plug-in An add-on to t
Page 127:
999 Skyway Road, Suite 200 San Carl
show all

MARKLOGIC SERVER

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?