15.07.2016 Views

MARKLOGIC SERVER

Inside-MarkLogic-Server

Inside-MarkLogic-Server

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Every fragment acts as its own self-contained unit. It's the unit of indexing. A term list<br />

doesn't truly reference document IDs; it references fragment IDs. The filtering and<br />

retrieval process doesn't actually load documents; it loads fragments.<br />

There's actually very little difference between fragmenting a book at the chapter level<br />

and just splitting each chapter element into its own document as part of the load. That's<br />

why people generally avoid fragmentation and just keep each document as its own<br />

singular fragment. It's a slightly easier mental model.<br />

In fact, if you see "fragment" in MarkLogic literature (including this book), you can<br />

substitute "document" and the statement will be correct for any databases where no<br />

fragmentation is enabled.<br />

There's one noticeable difference between a fragmented document and a document<br />

split into individual documents: a query pulling data from two fragments residing in<br />

the same document can perform slightly more efficiently than a query pulling data<br />

from two documents. See the documentation for the cts:document-fragmentquery()<br />

query construct for more details. Even with this advantage, fragmentation<br />

isn't something you should enable unless you're sure you need it.<br />

ESTIMATE AND COUNT<br />

You'll find that you really understand MarkLogic's indexing and fragmentation system<br />

when you understand the difference between the xdmp:estimate() and fn:count()<br />

functions, so let's look at them here. Both take an expression and return the number of<br />

items matching that expression.<br />

The xdmp:estimate() call estimates the number of items using nothing but indexes.<br />

That's why it's so fast. It resolves the given expression using indexes and returns how<br />

many fragments the indexes see as satisfying all of the term list constraints.<br />

The fn:count() returns a number of items based on the actual number of document<br />

fragments. This involves indexes and also filtering of the fragments to see which ones<br />

truly match the expression and how many times it matches per document. That filtering<br />

takes time (due mostly to disk I/O), which is why it's not always fast, even if it's<br />

always accurate.<br />

It's interesting to note that the xdmp:estimate() call can return results both higher<br />

and lower than, as well as identical to, those of fn:count()—depending on the query,<br />

data schema, and index options. The estimate results are higher when the index system<br />

returns fragments that would be filtered away. For example, a case-sensitive search<br />

performed without benefit of a case-sensitive index will likely have some candidate<br />

25

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!