Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
a transaction is a good middle ground. Each batch gets held in memory (by default, you<br />
can also stream), so dividing a transaction into smaller batches minimizes the memory<br />
overhead. MLCP can also perform direct forest placement (which it calls "fastload"),<br />
but only does this if requested, as direct forest placement doesn't do full duplicate URI<br />
checking. MLCP, as with vanilla Hadoop, always communicates directly to the nodes<br />
with the target forests.<br />
Additionally, MLCP's Direct Access feature enables you to bypass MarkLogic and<br />
extract documents from a database by reading them directly from forests on disk. This<br />
is primarily intended for accessing data that is archived through tiered storage. Let's say<br />
that you have data that ages out over time and doesn't need to be available for real-time<br />
queries through MarkLogic. You can archive that data by taking the containing forests<br />
offline but still access the contents using Direct Access.<br />
For most purposes, MLCP replaces the open source (but unsupported) RecordLoader<br />
and XQSync projects.<br />
CONTENT PROCESSING FRAMEWORK<br />
The Content Processing Framework (CPF) is another officially supported service<br />
included with the MarkLogic distribution. It's an automated system for managing<br />
document lifecycles: transforming documents from one file format type to another or<br />
one schema to another, or breaking documents into pieces.<br />
Internally, CPF uses properties sheet entries to track document states and uses triggers<br />
and background processing to move documents through their states. It's highly<br />
customizable, and you can plug in your own set of processing steps (called a pipeline) to<br />
control document processing.<br />
MarkLogic includes a "Default Conversion Option" pipeline that takes Microsoft<br />
Office, Adobe PDF, and HTML documents and converts them into XHTML and<br />
simplified DocBook documents. There are many steps in the conversion process, and all<br />
of them are designed to execute automatically, based on the outcome of other steps in<br />
the process.<br />
OFFICE TOOLKITS<br />
MarkLogic offers Office Toolkits for Word, Excel, and PowerPoint. These toolkits make<br />
it easy for MarkLogic programs to read, write, and interact with documents in the<br />
native Microsoft Office file formats.<br />
The toolkits also include a plug-in capability whereby you can add your own custom<br />
sidebar or control ribbon to the application, making it easy to, for example, search and<br />
select content from within MarkLogic and drag it into a Word document.<br />
122