11.05.2016 Views

Apache Solr Reference Guide Covering Apache Solr 6.0

21SiXmO

21SiXmO

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

CDCR Configuration<br />

In order to configure CDCR, the Source Data Center requires the host address of the ZooKeeper cluster<br />

associated with the Target Data Center. The ZooKeeper host address is the only information needed by CDCR<br />

to instantiate the communication with the Target <strong>Solr</strong> cluster. The CDCR configuration file on the Source cluster<br />

will therefore contain a list of ZooKeeper hosts. The CDCR configuration file might also contain<br />

secondary/optional configuration, such as the number of CDC Replicator threads, batch updates related settings,<br />

etc.<br />

CDCR Initialization<br />

CDCR supports incremental upddates to either new or existing collections. CDCR may not be able to keep up<br />

with very high volume updates, especially if there are significant communications latencies due to a slow "pipe"<br />

between the data centers. Some scenarios:<br />

There is an initial bulk load of a corpus followed by lower volume incremental updates. In this case, one<br />

can do the initial bulk load, replicate the index and then keep then synchronized via CDCR. See the<br />

section Starting CDCR the first time with an existing index for more information.<br />

The index is being built up from scratch, without a significant initial bulk load. CDCR can be set up on<br />

empty collections and keep them synchronized from the start.<br />

The index is always being updated at a volume too high for CDCR to keep up. This is especially possible<br />

in situations where the connection between the Source and Target data centers is poor. This scenario is<br />

unsuitable for CDCR in its current form.<br />

Inter-Data Center Communication<br />

Communication between Data Centers will be achieved through HTTP and the <strong>Solr</strong> REST API using the <strong>Solr</strong>J<br />

client. The <strong>Solr</strong>J client will be instantiated with the ZooKeeper host of the Target Data Center. <strong>Solr</strong>J will manage<br />

the shard leader discovery process.<br />

Updates Tracking & Pushing<br />

CDCR replicates data updates from the Source to the Target Data Center by leveraging the Updates Log.<br />

A background thread regularly checks the Updates Log for new entries, and then forwards them to the Target<br />

Data Center. The thread therefore needs to keep a checkpoint in the form of a pointer to the last update<br />

successfully processed in the Updates Log. Upon acknowledgement from the Target Data Center that updates<br />

have been successfully processed, the Updates Log pointer is updated to reflect the current checkpoint.<br />

This pointer must be synchronized across all the replicas. In the case where the leader goes down and a new<br />

leader is elected, the new leader will be able to resume replication from the last update by using this<br />

synchronized pointer. The strategy to synchronize such a pointer across replicas will be explained next.<br />

If for some reason, the Target Data Center is offline or fails to process the updates, the thread will periodically try<br />

to contact the Target Data Center and push the updates.<br />

Synchronization of Update Checkpoints<br />

A reliable synchronization of the update checkpoints between the shard leader and shard replicas is critical to<br />

avoid introducing inconsistency between the Source and Target Data Centers. Another important requirement is<br />

that the synchronization must be performed with minimal network traffic to maximize scalability.<br />

In order to achieve this, the strategy is to:<br />

Uniquely identify each update operation. This unique identifier will serve as pointer.<br />

<strong>Apache</strong> <strong>Solr</strong> <strong>Reference</strong> <strong>Guide</strong> <strong>6.0</strong><br />

612

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!