You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Every fragment acts as its own self-contained unit. It's the unit of indexing. A term list<br />
doesn't truly reference document IDs; it references fragment IDs. The filtering and<br />
retrieval process doesn't actually load documents; it loads fragments.<br />
There's actually very little difference between fragmenting a book at the chapter level<br />
and just splitting each chapter element into its own document as part of the load. That's<br />
why people generally avoid fragmentation and just keep each document as its own<br />
singular fragment. It's a slightly easier mental model.<br />
In fact, if you see "fragment" in MarkLogic literature (including this book), you can<br />
substitute "document" and the statement will be correct for any databases where no<br />
fragmentation is enabled.<br />
There's one noticeable difference between a fragmented document and a document<br />
split into individual documents: a query pulling data from two fragments residing in<br />
the same document can perform slightly more efficiently than a query pulling data<br />
from two documents. See the documentation for the cts:document-fragmentquery()<br />
query construct for more details. Even with this advantage, fragmentation<br />
isn't something you should enable unless you're sure you need it.<br />
ESTIMATE AND COUNT<br />
You'll find that you really understand MarkLogic's indexing and fragmentation system<br />
when you understand the difference between the xdmp:estimate() and fn:count()<br />
functions, so let's look at them here. Both take an expression and return the number of<br />
items matching that expression.<br />
The xdmp:estimate() call estimates the number of items using nothing but indexes.<br />
That's why it's so fast. It resolves the given expression using indexes and returns how<br />
many fragments the indexes see as satisfying all of the term list constraints.<br />
The fn:count() returns a number of items based on the actual number of document<br />
fragments. This involves indexes and also filtering of the fragments to see which ones<br />
truly match the expression and how many times it matches per document. That filtering<br />
takes time (due mostly to disk I/O), which is why it's not always fast, even if it's<br />
always accurate.<br />
It's interesting to note that the xdmp:estimate() call can return results both higher<br />
and lower than, as well as identical to, those of fn:count()—depending on the query,<br />
data schema, and index options. The estimate results are higher when the index system<br />
returns fragments that would be filtered away. For example, a case-sensitive search<br />
performed without benefit of a case-sensitive index will likely have some candidate<br />
25