11.05.2016 Views

Apache Solr Reference Guide Covering Apache Solr 6.0

21SiXmO

21SiXmO

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

updateRequestProcessorChains and updateProcessors<br />

<br />

<br />

<br />

Update processors in <strong>Solr</strong>Cloud<br />

In a single node, stand alone <strong>Solr</strong>, each update is run through all the update processors in a chain exactly once.<br />

But the behavior of update request processors in <strong>Solr</strong>Cloud deserves special consideration.<br />

A critical <strong>Solr</strong>Cloud functionality is the routing and distributing of requests – for update requests this routing is<br />

implemented by the DistributedUpdateRequestProcessor, and this processor is given a special status by <strong>Solr</strong><br />

due to its important function.<br />

In a distributed <strong>Solr</strong>Cloud situation setup, All processors in the chain before the DistributedUpdateProcessor are<br />

run on the first node that receives an update from the client, regardless of this nodes status as a leader or<br />

replica. The DistributedUpdateProcessor then forwards the update to the appropriate shard leader for the<br />

update (or to multiple leaders in the event of an update that affects multiple documents, such as a delete by<br />

query, or commit). The shard leader uses a transaction log to apply Atomic Updates & Optimistic Concurrency a<br />

nd then forwards the update to all of the shard replicas. The leader and each replica run all of the processors in<br />

the chain that are listed after the DistributedUpdateProcessor.<br />

For example, consider the "dedupe" chain which we saw in a section above. Assume that a 3 node <strong>Solr</strong>Cloud<br />

cluster exists where node A hosts the leader of shard1, node B hosts the leader of shard2 and node C hosts the<br />

replica of shard2. Assume that an update request is sent to node A which forwards the update to node B<br />

(because the update belongs to shard2) which then distributes the update to its replica node C. Let's see what<br />

happens at each node:<br />

In summary:<br />

1.<br />

2.<br />

Node A: Runs the update through the SignatureUpdateProcessor (which computes the signature and puts<br />

it in the "id" field), then LogUpdateProcessor and then DistributedUpdateProcessor. This processor<br />

determines that the update actually belongs to node B and is forwarded to node B. The update is not<br />

processed further. This is required because the next processor which is RunUpdateProcessor will execute<br />

the update against the local shard1 index which would lead to duplicate data on shard1 and shard2.<br />

Node B: Receives the update and sees that it was forwarded by another node. The update is directly sent<br />

to DistributedUpdateProcessor because it has already been through the SignatureUpdateProcessor on<br />

node A and doing the same signature computation again would be redundant. The DistributedUpdateProc<br />

essor determines that the update indeed belongs to this node, distributes it to its replica on Node C and<br />

then forwards the update further in the chain to RunUpdateProcessor.<br />

Node C: Receives the update and sees that it was distributed by its leader. The update is directly sent to<br />

DistributedUpdateProcessor which performs some consistency checks and forwards the update further in<br />

the chain to RunUpdateProcessor.<br />

All processors before DistributedUpdateProcessor are only run on the first node that receives an update<br />

request whether it be a forwarding node (e.g. node A in the above example) or a leader (e.g. node B). We<br />

call these pre-processors or just processors.<br />

All processors after DistributedUpdateProcessor run only on the leader and the replica nodes. They are<br />

not executed on forwarding nodes. Such processors are called "post-processors".<br />

In the previous section, we saw that the updateRequestProcessorChain was configured with processor="rem<br />

ove_blanks, signature". This means that such processors are of the #1 kind and are run only on the<br />

forwarding nodes. Similarly, we can configure them as the #2 kind by specifying with the attribute<br />

"post-processor" as follows:<br />

<strong>Apache</strong> <strong>Solr</strong> <strong>Reference</strong> <strong>Guide</strong> <strong>6.0</strong><br />

461

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!