11.05.2016 Views

Apache Solr Reference Guide Covering Apache Solr 6.0

21SiXmO

21SiXmO

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

The goal of the project is to replicate data to multiple Data Centers. The initial version of the solution will cover<br />

the active-passive scenario where data updates are replicated from a Source Data Center to a Target Data<br />

Center. Data updates include adding/updating and deleting documents.<br />

Data changes on the Source Data Center are replicated to the Target Data Center only after they are persisted<br />

to disk. The data changes can be replicated in real-time (with a small delay) or could be scheduled to be sent in<br />

intervals to the Target Data Center. This solution pre-supposes that the Source and Target data centers begin<br />

with the same documents indexed. Of course the indexes may be empty to start.<br />

Each shard leader in the Source Data Center will be responsible for replicating its updates to the appropriate<br />

collection in the Target Data Center. When receiving updates from the Source Data Center, shard leaders in the<br />

Target Data Center will replicate the changes to their own replicas.<br />

This replication model is designed to tolerate some degradation in connectivity, accommodate limited bandwidth,<br />

and support batch updates to optimize communication.<br />

Replication supports both a new empty index and pre-built indexes. In the scenario where the replication is set<br />

up on a pre-built index, CDCR will ensure consistency of the replication of the updates, but cannot ensure<br />

consistency on the full index. Therefore any index created before CDCR was set up will have to be replicated by<br />

other means (described in the section Starting CDCR the first time with an existing index) in order that Source<br />

and Target indexes be fully consistent.<br />

The active-passive nature of the initial implementation implies a "push" model from the Source collection to the<br />

Target collection. Therefore, the Source configuration must be able to "see" the ZooKeeper ensemble in the<br />

Target cluster. The ZooKeeper ensemble is provided configured in the Source's solrconfig.xml file.<br />

CDCR is configured to replicate from collections in the Source cluster to collections in the Target cluster on a<br />

collection-by-collection basis. Since CDCR is configured in solrconfig.xml (on both Source and Target<br />

clusters), the settings can be tailored for the needs of each collection.<br />

CDCR can be configured to replicate from one collection to a second collection within the same cluster. That is a<br />

specialized scenario not covered in this document.<br />

Glossary<br />

Terms used in this document include:<br />

Node: A JVM instance running <strong>Solr</strong>; a server.<br />

Cluster: A set of <strong>Solr</strong> nodes managed as a single unit by a ZooKeeper ensemble, hosting one or more<br />

Collections.<br />

Data Center: A group of networked servers hosting a <strong>Solr</strong> cluster. In this document, the terms Cluster and<br />

Data Center are interchangeable as we assume that each <strong>Solr</strong> cluster is hosted in a different group of<br />

networked servers.<br />

Shard: A sub-index of a single logical collection. This may be spread across multiple nodes of the cluster.<br />

Each shard can have as many replicas as needed.<br />

Leader: Each shard has one node identified as its leader. All the writes for documents belonging to a<br />

shard are routed through the leader.<br />

Replica: A copy of a shard for use in failover or load balancing. Replicas comprising a shard can either be<br />

leaders or non-leaders.<br />

Follower: A convenience term for a replica that is not the leader of a shard.<br />

Collection: Multiple documents that make up one logical index. A cluster can have multiple collections.<br />

Updates Log: An append-only log of write operations maintained by each node.<br />

Architecture<br />

Here is a picture of the data flow.<br />

<strong>Apache</strong> <strong>Solr</strong> <strong>Reference</strong> <strong>Guide</strong> <strong>6.0</strong><br />

610

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!