You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
DATABASE<br />
MyDocuments<br />
FOREST<br />
MyForest-1<br />
FOREST<br />
MyForest-2<br />
Stand<br />
in-memory<br />
Stand<br />
00000000<br />
Stand<br />
00000001<br />
Stand<br />
in-memory<br />
Stand<br />
00000000<br />
Figure 5: The hierarchical relationship between a database and its forests and stands in MarkLogic.<br />
INGESTING DATA<br />
To see how MarkLogic ingests data, let's start with an empty database having a single<br />
forest that (because it has no documents) has only an in-memory stand and no on-disk<br />
stands. At some point, a new document is loaded into MarkLogic, through an XCC,<br />
XQuery, JavaScript, REST, or WebDAV call. It doesn't matter, the effect is the same.<br />
MarkLogic puts this document in an in-memory stand and writes the action to the<br />
forest's on-disk journal to maintain durability and transactional integrity in case of<br />
system failure.<br />
As new documents are loaded, they're also placed in the in-memory stand. A query<br />
request at this point will see all of the data on disk (technically, nothing yet) as well as<br />
everything in the in-memory stand (our small set of documents). The query request<br />
won't be able to tell where the data is, but it will see the full view of data loaded at this<br />
point in time.<br />
After enough documents are loaded, the in-memory stand will fill up and be<br />
checkpointed, which means it is written out as an on-disk stand. Each new stand gets<br />
its own subdirectory under the forest directory, with names that are monotonically<br />
increasing hexadecimal numbers. The first stand gets the lovely name 00000000. That<br />
on-disk stand contains all of the data and indexes for the documents loaded thus far.<br />
It's written from the in-memory stand out to disk as a sequential write for maximum<br />
efficiency. Once it's written, the in-memory stand's allocated memory is freed, and the<br />
data in the journal is released.<br />
As more documents are loaded, they go into a new in-memory stand. At some point<br />
this in-memory stand fills up as well and gets written as a new on-disk stand, probably<br />
named 00000001 and about the same size as the first. Sometimes, under heavy load,<br />
you have two in-memory stands at once, when the first stand is still writing to disk as<br />
a new stand is created for additional documents. At all times, an incoming query or<br />
update request can see all of the data across all of the stands.<br />
35