15.07.2016 Views

MARKLOGIC SERVER

Inside-MarkLogic-Server

Inside-MarkLogic-Server

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

cache in order to keep the update thread-safe. It's a short-lived lock, but it still has the<br />

effect of serializing write access. With only one partition, all threads need to serialize<br />

through that single write lock. With two partitions you get what's in essence two different<br />

caches and two different locks, and double the number of threads can make cache<br />

updates concurrent.<br />

How does a thread know in which cache partition to store or retrieve an entry? It's<br />

deterministic based on the cache lookup key (the name of the item being looked up). A<br />

thread accessing a cache first determines the lookup key, determines which cache would<br />

have that key, and goes to that cache. There's no need to read-lock or write-lock any<br />

partition other than the one appropriate for the key.<br />

You don't want to have an excessive number of partitions because it reduces the<br />

efficiency of the cache. Each cache partition has to manage its own aging-out of entries<br />

and can only elect to remove the most stale entries from itself, even if there's a more stale<br />

entry in another partition.<br />

For more information on cache tuning, see the Query Performance and Tuning guide.<br />

NO NEED FOR GLOBAL CACHE INVALIDATION<br />

A typical search engine forces its administrator to make a tradeoff between update<br />

frequency and cache performance because it maintains a global cache and any document<br />

change invalidates the cache. MarkLogic avoids this problem by making its caches<br />

stand-aware. Stands, if you recall, are the read-only building blocks of forests. Existing<br />

stand contents don't change when new documents are loaded, so there's no need for the<br />

performance-killing global cache invalidation when updates occur.<br />

LOCKS AND TIMESTAMPS IN A CLUSTER<br />

Managing locks and the coordinated transaction timestamp across a cluster seems like a<br />

task that could introduce a bottleneck and hurt performance. Luckily, that's not the case<br />

with MarkLogic.<br />

MarkLogic manages locks in a decentralized way. Each D-node is responsible for<br />

managing the locks for the documents under its forest(s). It does this as part of its regular<br />

data access work. If, for example, an E-node running an update request needs to read a<br />

set of documents, the D-nodes with those documents will acquire the necessary read locks<br />

before returning the data. The D-nodes don't immediately inform the E-node about their<br />

lock actions; it's not necessary. In fact, a D-node doesn't have to check with any other<br />

hosts in the cluster to acquire locks on its subset of data. (This is why all fragments for a<br />

document are always placed in the same forest.)<br />

50

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!