MARKLOGIC SERVER

Recommendations

Info

a transaction is a good middle ground. Each batch gets held in memory (by default, you can also stream), so dividing a transaction into smaller batches minimizes the memory overhead. MLCP can also perform direct forest placement (which it calls "fastload"), but only does this if requested, as direct forest placement doesn't do full duplicate URI checking. MLCP, as with vanilla Hadoop, always communicates directly to the nodes with the target forests. Additionally, MLCP's Direct Access feature enables you to bypass MarkLogic and extract documents from a database by reading them directly from forests on disk. This is primarily intended for accessing data that is archived through tiered storage. Let's say that you have data that ages out over time and doesn't need to be available for real-time queries through MarkLogic. You can archive that data by taking the containing forests offline but still access the contents using Direct Access. For most purposes, MLCP replaces the open source (but unsupported) RecordLoader and XQSync projects. CONTENT PROCESSING FRAMEWORK The Content Processing Framework (CPF) is another officially supported service included with the MarkLogic distribution. It's an automated system for managing document lifecycles: transforming documents from one file format type to another or one schema to another, or breaking documents into pieces. Internally, CPF uses properties sheet entries to track document states and uses triggers and background processing to move documents through their states. It's highly customizable, and you can plug in your own set of processing steps (called a pipeline) to control document processing. MarkLogic includes a "Default Conversion Option" pipeline that takes Microsoft Office, Adobe PDF, and HTML documents and converts them into XHTML and simplified DocBook documents. There are many steps in the conversion process, and all of them are designed to execute automatically, based on the outcome of other steps in the process. OFFICE TOOLKITS MarkLogic offers Office Toolkits for Word, Excel, and PowerPoint. These toolkits make it easy for MarkLogic programs to read, write, and interact with documents in the native Microsoft Office file formats. The toolkits also include a plug-in capability whereby you can add your own custom sidebar or control ribbon to the application, making it easy to, for example, search and select content from within MarkLogic and drag it into a Word document. 122
CONNECTOR FOR SHAREPOINT Microsoft SharePoint is a popular system for document management. MarkLogic offers (and supports) a Connector for SharePoint that integrates with SharePoint, providing more advanced access to the documents held within the system. The connector lets you mirror the SharePoint documents in MarkLogic for search, assembly, and reuse, or it lets MarkLogic act as a node in a SharePoint workflow. DOCUMENT FILTERS Built into MarkLogic behind the unassuming xdmp:document-filter() function is a robust system for extracting metadata and text from binary documents that handles hundreds of document formats. You can filter office documents, emails, database dumps, movies, images, and other multimedia formats, and even archive files. The filter process doesn't attempt to convert these documents to a rich XML format, but instead extracts the standard metadata and whatever text is within the files. It's great for search, classification, or other text-processing needs. For richer extraction (such as feature identification in an image or transcribing a movie) there are third-party tools. LIBRARY SERVICES API Library Services offers an interface for managing documents, letting you do check-in/ check-out and versioning. You can combine the Library Services features with rolebased security and the Search API to build a content management system on top of MarkLogic. COMMUNITY-SUPPORTED TOOLS, LIBRARIES, AND PLUG-INS The MarkLogic Developer Site (http://developer.marklogic.com) also hosts or references a number of highly useful projects. Many are built collaboratively on GitHub (https://github.com/marklogic), where you can contribute to their development if you are so inclined. Converter for MongoDB A Java-based tool for importing data from MongoDB into MarkLogic. It reads JSON data from MongoDB's mongodump tool and loads data into MarkLogic using an XDBC Server. Corb2 A Java-based tool designed for bulk content reprocessing of documents stored in MarkLogic. It works off of a list of database documents and performs operations against them. Operations can include generating a report across all documents, manipulating individual documents, or a combination thereof. 123
Page 1 and 2:
Inside MARKLOGIC SERVER Jason Hunte
Page 3 and 4:
You can find the full set of API do
Page 5 and 6:
CHAPTER 1 WHAT IS MARKLOGIC SERVER?
Page 7 and 8:
enforced, such as that no two docum
Page 9 and 10:
You can even use MarkLogic to enfor
Page 11 and 12:
instance, all the way up to (in 201
Page 13 and 14:
Doc1 Doc 2 Doc 3 Doc 4 a blue car t
Page 15 and 16:
INDEXING LONGER PHRASES What happen
Page 17 and 18:
INDEXING VALUES Now what if we want
Page 19 and 20:
The indexes don't know if they're t
Page 21 and 22:
for $result in cts:search( /article
Page 23 and 24:
DIRECTORY INDEXES MarkLogic include
Page 25 and 26:
Every fragment acts as its own self
Page 27 and 28:
4. Perform optimized order by calcu
Page 29 and 30:
constraint (term lists are of no us
Page 31 and 32:
Performance of range index operatio
Page 33 and 34:
Fields are another way to pinpoint
Page 35 and 36:
DATABASE MyDocuments FOREST MyFores
Page 37 and 38:
stands. Merges tend to be CPU- and
Page 39 and 40:
When doing point-in-time queries, y
Page 41 and 42:
Isolating an Update When a request
Page 43 and 44:
timestamp to make the fragment live
Page 45 and 46:
if the global commit happened or no
Page 47 and 48:
CLUSTERING AND CACHING As your data
Page 49 and 50:
Expanded Tree Cache Each time a D-n
Page 51 and 52:
In the regular heartbeat communicat
Page 53 and 54:
QUERY QUERY LIFECYCLE 7 RESULT SET
Page 55 and 56:
Figure 9: During a commit involving
Page 57 and 58:
other transactions as the documents
Page 59 and 60:
MODULES AND DEPLOYMENT XQuery, XSLT
Page 61 and 62:
REST API FOR MULTI-TIER DEVELOPMENT
Page 63 and 64:
SEARCH AND JSEARCH APIS The Search
Page 65 and 66:
SQL/ODBC ACCESS FOR BUSINESS INTELL
Page 67 and 68:
CHAPTER 3 ADVANCED TOPICS ADVANCED
Page 69 and 70:
MarkLogic provides basic language s
Page 71 and 72: as a space removed from all text, s
Page 73 and 74: If instead of matching documents th
Page 75 and 76: You can watch as the server does th
Page 77 and 78: MORE WITH FIELDS Fields also provid
Page 79 and 80: The cts:register() call returns an
Page 81 and 82: value but a latitude and longitude
Page 83 and 84: produces this XML: dog name Ch
Page 85 and 86: This XQuery code inserts the defini
Page 87 and 88: It searches across passengers, requ
Page 89 and 90: five.xml (doc ID 5): { cts:and-quer
Page 91 and 92: Valid Start: 2016-01-01 End: ∞ Sy
Page 93 and 94: BITEMPORAL QUERIES Querying on bite
Page 95 and 96: A key aspect of semantic data is no
Page 97 and 98: Triple Type Index Object values in
Page 99 and 100: estimates the efficiency of that pl
Page 101 and 102: MANAGING BACKUPS MarkLogic supports
Page 103 and 104: esult of a code bug, modifies data
Page 105 and 106: and Local-Disk Failover. Failover w
Page 107 and 108: andwidth. Local-Disk is faster when
Page 109 and 110: Contemporaneous vs. Non-Blocking Ea
Page 111 and 112: database allowed to have most of it
Page 113 and 114: threshold needs to be reached among
Page 115 and 116: TIERED STORAGE All storage media ar
Page 117 and 118: storage media but still query those
Page 119 and 120: LOW-LEVEL SYSTEM CONTROL When scali
Page 121: OUTSIDE THE CORE That completes our
Page 125 and 126: Sublime Text Plug-in An add-on to t
Page 127: 999 Skyway Road, Suite 200 San Carl
show all

MARKLOGIC SERVER

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?