MARKLOGIC SERVER

Recommendations

Info

For each range index, MarkLogic creates data structures that make it easy to do two things: for any document ID, get the document's range index value(s); for any range index value, get the document IDs that have that value. Conceptually, you can think of a range index as implemented by two data structures, written to disk and then memory mapped for efficient access. One can be thought of as an array of structures holding document IDs and values, sorted by document IDs, and the other an array of structures holding values and document IDs, sorted by values. It's not actually this simple or wasteful with memory and disk (in reality, the values are only stored once), but it's a good mental model. With our party example, you'd have a list of birthdays mapped to people, sorted by birthday, and a list of people mapped to birthdays, sorted by person. RANGE QUERIES To perform a fast range query, MarkLogic uses the "value to document ID" lookup array. Because the lookup array is sorted by value, there's a specific subsequence in the array that holds the values between the two user-defined endpoints. Each of the values in the range has a document ID associated with it, and those document IDs can be quickly gathered and used like a synthetic term list to limit the search result to documents having a matching value within the user's specified range. For example, to find partygoers with a birthday between January 1, 1980, and May 16, 1980, you'd find the point in the date-sorted range index for the start date, then the end date. Every date in the array between those two endpoints is a birthday of someone at the party, and it's easy to get the people's names because every birthdate has the person listed right next to it. If multiple people have the same birthday, you'll have multiple entries in the array with the same value but a different corresponding name. In MarkLogic, instead of people's names, the system tracks document IDs. Range queries can be combined with any other types of queries in MarkLogic. Let's say that you want to limit results to those within a date range as above but also having a certain metadata tag. MarkLogic uses the range index to get the set of document IDs in the range, uses a term list to get the set of document IDs with the metadata tag, and intersects the sets of IDs to determine the set of results matching both constraints. All indexes in MarkLogic are fully composable with each other. Programmers probably think they're using range indexes only via functions such as cts:element-range-query(). In fact, range indexes are also used to accelerate regular XPath and XQuery expressions. The XPath expression /book[metadata/ price > 19.99] looks for books above a certain price and will leverage a decimal (or double or float) range index on if it exists. What if there's no such range index? Then MarkLogic won't be able to use any index to assist with the price 28
constraint (term lists are of no use) and will examine all books with any elements. The performance difference can be dramatic. Birthday (Value) Person (Document) Person (Document) Birthday (Value) 1946-02-12 1953-02-04 1955-09-15 1958-05-15 1963-05-19 1964-03-25 1965-05-05 1966-04-21 1972-08-14 1978-06-02 1988-03-04 1978-06-02 1946-02-12 1955-09-15 1965-05-05 1966-04-21 1992-06-16 1972-08-14 1988-03-04 1958-05-15 1953-02-04 1964-03-25 1992-06-16 1963-05-19 Figure 4: Conceptually, range indexes are implemented with two data structures. One is an array of values and document IDs, sorted by value. The other is an array of document IDs and values, sorted by document ID. You can use the structures to quickly look up documents associated with a value range (1) as well as individual values associated with one or more documents (2). DATA-TYPE-AWARE EQUALITY QUERIES The same "value to document ID" lookup array can support equality queries that have to be data-type aware. Imagine that you're tracking not just the birthday but the exact time when every person was born. The challenge is that there are numerous ways to serialize the same timestamp value, due to trailing time zone decorations. The timestamps 2013-04-03T00:14:25Z and 2013-04-02T17:14:25-07:00 are semantically identical. It's the same with the numbers 1.5 and 1.50. If all of your values are serialized the exact same way, you can use a term list index to match the string representation. If the serialization can vary, however, it's best to use a range index because those are based on the underlying data-type value (but instead of specifying a range to match, you specify a singular value). To perform a data-type-aware equality query, you can use cts:element-rangequery() with the "=" operator, or you can use XPath and XQuery. Consider the XPath mentioned earlier, /book[metadata/pubyear = 2013]. Because 2013 is an integer value, if there's a range index on of a type castable as an integer, it will be used to resolve this query. Note that for numeric and Boolean values in JSON documents, you can perform equality comparisons without a range index since those data types are indexed in typespecific indexes (see the section "Indexing JSON" for details). However, you still need range indexes for inequality comparisons. 29
Page 1 and 2: Inside MARKLOGIC SERVER Jason Hunte
Page 3 and 4: You can find the full set of API do
Page 5 and 6: CHAPTER 1 WHAT IS MARKLOGIC SERVER?
Page 7 and 8: enforced, such as that no two docum
Page 9 and 10: You can even use MarkLogic to enfor
Page 11 and 12: instance, all the way up to (in 201
Page 13 and 14: Doc1 Doc 2 Doc 3 Doc 4 a blue car t
Page 15 and 16: INDEXING LONGER PHRASES What happen
Page 17 and 18: INDEXING VALUES Now what if we want
Page 19 and 20: The indexes don't know if they're t
Page 21 and 22: for $result in cts:search( /article
Page 23 and 24: DIRECTORY INDEXES MarkLogic include
Page 25 and 26: Every fragment acts as its own self
Page 27: 4. Perform optimized order by calcu
Page 31 and 32: Performance of range index operatio
Page 33 and 34: Fields are another way to pinpoint
Page 35 and 36: DATABASE MyDocuments FOREST MyFores
Page 37 and 38: stands. Merges tend to be CPU- and
Page 39 and 40: When doing point-in-time queries, y
Page 41 and 42: Isolating an Update When a request
Page 43 and 44: timestamp to make the fragment live
Page 45 and 46: if the global commit happened or no
Page 47 and 48: CLUSTERING AND CACHING As your data
Page 49 and 50: Expanded Tree Cache Each time a D-n
Page 51 and 52: In the regular heartbeat communicat
Page 53 and 54: QUERY QUERY LIFECYCLE 7 RESULT SET
Page 55 and 56: Figure 9: During a commit involving
Page 57 and 58: other transactions as the documents
Page 59 and 60: MODULES AND DEPLOYMENT XQuery, XSLT
Page 61 and 62: REST API FOR MULTI-TIER DEVELOPMENT
Page 63 and 64: SEARCH AND JSEARCH APIS The Search
Page 65 and 66: SQL/ODBC ACCESS FOR BUSINESS INTELL
Page 67 and 68: CHAPTER 3 ADVANCED TOPICS ADVANCED
Page 69 and 70: MarkLogic provides basic language s
Page 71 and 72: as a space removed from all text, s
Page 73 and 74: If instead of matching documents th
Page 75 and 76: You can watch as the server does th
Page 77 and 78: MORE WITH FIELDS Fields also provid
Page 79 and 80:
The cts:register() call returns an
Page 81 and 82:
value but a latitude and longitude
Page 83 and 84:
produces this XML: dog name Ch
Page 85 and 86:
This XQuery code inserts the defini
Page 87 and 88:
It searches across passengers, requ
Page 89 and 90:
five.xml (doc ID 5): { cts:and-quer
Page 91 and 92:
Valid Start: 2016-01-01 End: ∞ Sy
Page 93 and 94:
BITEMPORAL QUERIES Querying on bite
Page 95 and 96:
A key aspect of semantic data is no
Page 97 and 98:
Triple Type Index Object values in
Page 99 and 100:
estimates the efficiency of that pl
Page 101 and 102:
MANAGING BACKUPS MarkLogic supports
Page 103 and 104:
esult of a code bug, modifies data
Page 105 and 106:
and Local-Disk Failover. Failover w
Page 107 and 108:
andwidth. Local-Disk is faster when
Page 109 and 110:
Contemporaneous vs. Non-Blocking Ea
Page 111 and 112:
database allowed to have most of it
Page 113 and 114:
threshold needs to be reached among
Page 115 and 116:
TIERED STORAGE All storage media ar
Page 117 and 118:
storage media but still query those
Page 119 and 120:
LOW-LEVEL SYSTEM CONTROL When scali
Page 121 and 122:
OUTSIDE THE CORE That completes our
Page 123 and 124:
CONNECTOR FOR SHAREPOINT Microsoft
Page 125 and 126:
Sublime Text Plug-in An add-on to t
Page 127:
999 Skyway Road, Suite 200 San Carl
show all

MARKLOGIC SERVER

Create successful ePaper yourself

Delete template?

Save as template?