MARKLOGIC SERVER

Recommendations

Info

The first block of code finds all of the author IDs for people who've been on Twitter for at least a year. It uses the signup-date range index to resolve the cts:elementrange-query() constraint and an author-id range index for the cts:elementvalues() retrieval. This should quickly get us a long list of $author-ids. The second block uses that set of $author-ids as a search constraint, combining it with the actual text constraint. Now, without the capabilities of a range index, MarkLogic would have to read a separate term list for every author ID to find out the documents associated with that author, with a potential disk seek per author. With a range index, MarkLogic can map author IDs to document IDs using just in-memory lookups. This is often called a shotgun or, or (for the more politically correct) a scatter query. For long lists, it's vastly more efficient than looking up the individual term lists. USING RANGE INDEXES ON PATHS AND FIELDS FOR EXTRA OPTIMIZATION Traditional range indexes let you specify the name of an element or element-attribute against which to build a range index. Path range indexes let you be more specific. Instead of having to include all elements or element-attributes with the same name, you can limit inclusion to those in a certain path. This proves particularly useful if you have elements with the same name but slightly different meanings based on placement. For example, the DocBook standard has a element, but that can represent a book title, a chapter title, a section title, as well as others. To handle this difference, you can define range index paths for book/title, chapter/title, and section/ title. As another example, perhaps you have prices that differ by currency, and you want to maintain separate range indexes. They can be defined using predicates such as product/price[@currency = "USD"] and product/price[currency = "SGD"]. Path definitions are very flexible. They can be relative or absolute, can include wildcard steps (*), and can even include predicates (the things in square brackets). 4 The core purpose of path range indexes is to give you more specific control over what goes into a range index. However, they also enable a deeper optimization of XPath. Earlier we looked at the expression /book[metadata/pubyear > 2010] and noted how a range index on can be used to resolve the query. If there's also a path range index that matches, then because it's more specific, it will be used instead. If there's an integer path range index on /book/metadata/pubyear, that range index alone can resolve the full XPath; the term lists aren't really necessary. 4 There are also costs associated with path indexes since it's more work to calculate them and filter with them compared to straight element indexes. If you have a lot of overlapping paths defined, you may see greater index sizes and query times. So if straight element indexes work for you, use them. 32
Fields are another way to pinpoint what parts of your documents get indexed. They enable you to include (or exclude) different XML elements or JSON properties as a single indexable unit. For example, you can combine and elements (but exclude ) as a field. By adding a range index on the field, you can perform range operations on those combined values. Or maybe you have documents that define last names in different ways, with and elements. By creating a range-indexed field for those variations, you can order your search results across these documents by last name even though the markup varies. See the "Advanced Topics" chapter for more on fields. LEXICONS When configuring a database, you have the options to configure a "URI lexicon" and "collection lexicon." These are range indexes in disguise. The URI lexicon tracks the document URIs held in the system, making it possible to quickly extract the URIs matching a query without touching the disk. The collection lexicon tracks the collection URIs, letting you do the same with them. Internally, they're both just like any other range index, with the value being the URI. The lexicon retrieval calls can accept a passed-in cts:query object constraint, making it easy and efficient to find the distinct URIs or collections for documents matching a query. Why do we need a range index on these things? Isn't this stuff in the indexes? Not by default. Remember that term list lookup keys are hashes, so while it's possible with the Universal Index to find all documents in a collection (hash the collection name to find the term list and look at the document IDs), it's not efficient to find all collections (hashes are one-way). The lexicons can calculate the document and collection URIs the same way regular range indexes extract values from within documents. It's also possible to configure a "word lexicon" to track the individual words in a database (or limited to a certain element). For memory efficiency the word lexicon is kept as a flat list of words with no associated document IDs, lest a common word require a large number of sequential entries in the list. You can still pass a cts:query object constraint to the word lexicon retrieval call to retrieve only words in documents matching the query. To impose that constraint, MarkLogic pulls each word from the lexicon and checks it, via a quick term list lookup, to see if any document IDs from its term list are in the document IDs matching the query. If so, the word is returned; if not, it's on to the next word. Word lexicons can help with wildcard queries, which will be discussed later. DATA MANAGEMENT In the next section, we'll take a look at how MarkLogic manages data on disk and handles concurrent reads and writes. 33
Page 1 and 2: Inside MARKLOGIC SERVER Jason Hunte
Page 3 and 4: You can find the full set of API do
Page 5 and 6: CHAPTER 1 WHAT IS MARKLOGIC SERVER?
Page 7 and 8: enforced, such as that no two docum
Page 9 and 10: You can even use MarkLogic to enfor
Page 11 and 12: instance, all the way up to (in 201
Page 13 and 14: Doc1 Doc 2 Doc 3 Doc 4 a blue car t
Page 15 and 16: INDEXING LONGER PHRASES What happen
Page 17 and 18: INDEXING VALUES Now what if we want
Page 19 and 20: The indexes don't know if they're t
Page 21 and 22: for $result in cts:search( /article
Page 23 and 24: DIRECTORY INDEXES MarkLogic include
Page 25 and 26: Every fragment acts as its own self
Page 27 and 28: 4. Perform optimized order by calcu
Page 29 and 30: constraint (term lists are of no us
Page 31: Performance of range index operatio
Page 35 and 36: DATABASE MyDocuments FOREST MyFores
Page 37 and 38: stands. Merges tend to be CPU- and
Page 39 and 40: When doing point-in-time queries, y
Page 41 and 42: Isolating an Update When a request
Page 43 and 44: timestamp to make the fragment live
Page 45 and 46: if the global commit happened or no
Page 47 and 48: CLUSTERING AND CACHING As your data
Page 49 and 50: Expanded Tree Cache Each time a D-n
Page 51 and 52: In the regular heartbeat communicat
Page 53 and 54: QUERY QUERY LIFECYCLE 7 RESULT SET
Page 55 and 56: Figure 9: During a commit involving
Page 57 and 58: other transactions as the documents
Page 59 and 60: MODULES AND DEPLOYMENT XQuery, XSLT
Page 61 and 62: REST API FOR MULTI-TIER DEVELOPMENT
Page 63 and 64: SEARCH AND JSEARCH APIS The Search
Page 65 and 66: SQL/ODBC ACCESS FOR BUSINESS INTELL
Page 67 and 68: CHAPTER 3 ADVANCED TOPICS ADVANCED
Page 69 and 70: MarkLogic provides basic language s
Page 71 and 72: as a space removed from all text, s
Page 73 and 74: If instead of matching documents th
Page 75 and 76: You can watch as the server does th
Page 77 and 78: MORE WITH FIELDS Fields also provid
Page 79 and 80: The cts:register() call returns an
Page 81 and 82: value but a latitude and longitude
Page 83 and 84:
produces this XML: dog name Ch
Page 85 and 86:
This XQuery code inserts the defini
Page 87 and 88:
It searches across passengers, requ
Page 89 and 90:
five.xml (doc ID 5): { cts:and-quer
Page 91 and 92:
Valid Start: 2016-01-01 End: ∞ Sy
Page 93 and 94:
BITEMPORAL QUERIES Querying on bite
Page 95 and 96:
A key aspect of semantic data is no
Page 97 and 98:
Triple Type Index Object values in
Page 99 and 100:
estimates the efficiency of that pl
Page 101 and 102:
MANAGING BACKUPS MarkLogic supports
Page 103 and 104:
esult of a code bug, modifies data
Page 105 and 106:
and Local-Disk Failover. Failover w
Page 107 and 108:
andwidth. Local-Disk is faster when
Page 109 and 110:
Contemporaneous vs. Non-Blocking Ea
Page 111 and 112:
database allowed to have most of it
Page 113 and 114:
threshold needs to be reached among
Page 115 and 116:
TIERED STORAGE All storage media ar
Page 117 and 118:
storage media but still query those
Page 119 and 120:
LOW-LEVEL SYSTEM CONTROL When scali
Page 121 and 122:
OUTSIDE THE CORE That completes our
Page 123 and 124:
CONNECTOR FOR SHAREPOINT Microsoft
Page 125 and 126:
Sublime Text Plug-in An add-on to t
Page 127:
999 Skyway Road, Suite 200 San Carl
show all

MARKLOGIC SERVER

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?