15.07.2016 Views

MARKLOGIC SERVER

Inside-MarkLogic-Server

Inside-MarkLogic-Server

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

DATABASE<br />

MyDocuments<br />

FOREST<br />

MyForest-1<br />

FOREST<br />

MyForest-2<br />

Stand<br />

in-memory<br />

Stand<br />

00000000<br />

Stand<br />

00000001<br />

Stand<br />

in-memory<br />

Stand<br />

00000000<br />

Figure 5: The hierarchical relationship between a database and its forests and stands in MarkLogic.<br />

INGESTING DATA<br />

To see how MarkLogic ingests data, let's start with an empty database having a single<br />

forest that (because it has no documents) has only an in-memory stand and no on-disk<br />

stands. At some point, a new document is loaded into MarkLogic, through an XCC,<br />

XQuery, JavaScript, REST, or WebDAV call. It doesn't matter, the effect is the same.<br />

MarkLogic puts this document in an in-memory stand and writes the action to the<br />

forest's on-disk journal to maintain durability and transactional integrity in case of<br />

system failure.<br />

As new documents are loaded, they're also placed in the in-memory stand. A query<br />

request at this point will see all of the data on disk (technically, nothing yet) as well as<br />

everything in the in-memory stand (our small set of documents). The query request<br />

won't be able to tell where the data is, but it will see the full view of data loaded at this<br />

point in time.<br />

After enough documents are loaded, the in-memory stand will fill up and be<br />

checkpointed, which means it is written out as an on-disk stand. Each new stand gets<br />

its own subdirectory under the forest directory, with names that are monotonically<br />

increasing hexadecimal numbers. The first stand gets the lovely name 00000000. That<br />

on-disk stand contains all of the data and indexes for the documents loaded thus far.<br />

It's written from the in-memory stand out to disk as a sequential write for maximum<br />

efficiency. Once it's written, the in-memory stand's allocated memory is freed, and the<br />

data in the journal is released.<br />

As more documents are loaded, they go into a new in-memory stand. At some point<br />

this in-memory stand fills up as well and gets written as a new on-disk stand, probably<br />

named 00000001 and about the same size as the first. Sometimes, under heavy load,<br />

you have two in-memory stands at once, when the first stand is still writing to disk as<br />

a new stand is created for additional documents. At all times, an incoming query or<br />

update request can see all of the data across all of the stands.<br />

35

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!