15.07.2016 Views

MARKLOGIC SERVER

Inside-MarkLogic-Server

Inside-MarkLogic-Server

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

database allowed to have most of its documents in one forest would do most of the<br />

work on that forest's host, whereas redistributing the documents will spread the work<br />

more evenly across all of the hosts. In this way, rebalancing results in more efficient use<br />

of hardware and improved cluster performance.<br />

Rebalancing is triggered by any reconfiguration of a database, such as when forests are<br />

added or retired. 18 If you plan to detach and delete a forest, you'll want to retire it first.<br />

This ensures that any content in that forest first gets redistributed among the remaining<br />

forests. If you just detach and delete a forest without retiring it, any content in that<br />

deleted forest is lost.<br />

UNREBALANCED<br />

REBALANCED<br />

REBALANCED<br />

AFTER ADDITION<br />

REBALANCED<br />

AFTER RETIREMENT<br />

DATABASE<br />

DATABASE<br />

DATABASE<br />

DATABASE<br />

Forest-1<br />

700<br />

Forest-1<br />

450<br />

Forest-1<br />

300<br />

Forest-1<br />

450<br />

Forest-2<br />

200<br />

REBALANCING<br />

ENABLED<br />

Forest-2<br />

450<br />

FOREST<br />

ADDED<br />

Forest-2<br />

300<br />

FOREST<br />

RETIRED<br />

Forest-2<br />

0<br />

Forest-3<br />

300<br />

Forest-3<br />

450<br />

Figure 23: Rebalancing keeps documents evenly distributed among a database's forests.<br />

ASSIGNMENT POLICIES<br />

Rebalancing works based on an assignment policy, which is a set of rules that determine<br />

what document goes into what forest. A database's assignment policy applies to<br />

rebalancing as well as how documents are distributed to forests in the first place<br />

during ingest.<br />

An assignment policy defines how data can be distributed horizontally across multiple<br />

hosts to improve performance. This is what MarkLogic is doing when it allocates<br />

documents in a database across multiple forests and those forests exist on different hosts.<br />

But even if you're running MarkLogic on a single host, spreading documents across<br />

more than one forest can improve performance because it takes advantage of parallel<br />

processing on a host's multiple cores.<br />

There are four assignment policies: bucket, legacy, statistical, and range.<br />

The bucket policy (the default) uses an algorithm to map a document's URI to one of<br />

16,000 "buckets," with each bucket being associated with a forest. (A table mapping<br />

18 You can control how aggressively rebalancing occurs by setting a throttle value, which establishes the rebalancer's<br />

priority when it comes to the use of system resources.<br />

111

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!