15.07.2016 Views

MARKLOGIC SERVER

Inside-MarkLogic-Server

Inside-MarkLogic-Server

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

a transaction is a good middle ground. Each batch gets held in memory (by default, you<br />

can also stream), so dividing a transaction into smaller batches minimizes the memory<br />

overhead. MLCP can also perform direct forest placement (which it calls "fastload"),<br />

but only does this if requested, as direct forest placement doesn't do full duplicate URI<br />

checking. MLCP, as with vanilla Hadoop, always communicates directly to the nodes<br />

with the target forests.<br />

Additionally, MLCP's Direct Access feature enables you to bypass MarkLogic and<br />

extract documents from a database by reading them directly from forests on disk. This<br />

is primarily intended for accessing data that is archived through tiered storage. Let's say<br />

that you have data that ages out over time and doesn't need to be available for real-time<br />

queries through MarkLogic. You can archive that data by taking the containing forests<br />

offline but still access the contents using Direct Access.<br />

For most purposes, MLCP replaces the open source (but unsupported) RecordLoader<br />

and XQSync projects.<br />

CONTENT PROCESSING FRAMEWORK<br />

The Content Processing Framework (CPF) is another officially supported service<br />

included with the MarkLogic distribution. It's an automated system for managing<br />

document lifecycles: transforming documents from one file format type to another or<br />

one schema to another, or breaking documents into pieces.<br />

Internally, CPF uses properties sheet entries to track document states and uses triggers<br />

and background processing to move documents through their states. It's highly<br />

customizable, and you can plug in your own set of processing steps (called a pipeline) to<br />

control document processing.<br />

MarkLogic includes a "Default Conversion Option" pipeline that takes Microsoft<br />

Office, Adobe PDF, and HTML documents and converts them into XHTML and<br />

simplified DocBook documents. There are many steps in the conversion process, and all<br />

of them are designed to execute automatically, based on the outcome of other steps in<br />

the process.<br />

OFFICE TOOLKITS<br />

MarkLogic offers Office Toolkits for Word, Excel, and PowerPoint. These toolkits make<br />

it easy for MarkLogic programs to read, write, and interact with documents in the<br />

native Microsoft Office file formats.<br />

The toolkits also include a plug-in capability whereby you can add your own custom<br />

sidebar or control ribbon to the application, making it easy to, for example, search and<br />

select content from within MarkLogic and drag it into a Word document.<br />

122

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!