11.05.2016 Views

Apache Solr Reference Guide Covering Apache Solr 6.0

21SiXmO

21SiXmO

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Each shard serves top-level query requests and then makes sub-requests to all of the other shards. Care should<br />

be taken to ensure that the max number of threads serving HTTP requests is greater than the possible number<br />

of requests from both top-level clients and other shards. If this is not the case, the configuration may result in a<br />

distributed deadlock.<br />

For example, a deadlock might occur in the case of two shards, each with just a single thread to service HTTP<br />

requests. Both threads could receive a top-level request concurrently, and make sub-requests to each other.<br />

Because there are no more remaining threads to service requests, the incoming requests will be blocked until the<br />

other pending requests are finished, but they will not finish since they are waiting for the sub-requests. By<br />

ensuring that <strong>Solr</strong> is configured to handle a sufficient number of threads, you can avoid deadlock situations like<br />

this.<br />

Prefer Local Shards<br />

<strong>Solr</strong> allows you to pass an optional boolean parameter named preferLocalShards to indicate that a<br />

distributed query should prefer local replicas of a shard when available. In other words, if a query includes prefe<br />

rLocalShards=true, then the query controller will look for local replicas to service the query instead of<br />

selecting replicas at random from across the cluster. This is useful when a query requests many fields or large<br />

fields to be returned per document because it avoids moving large amounts of data over the network when it is<br />

available locally. In addition, this feature can be useful for minimizing the impact of a problematic replica with<br />

degraded performance, as it reduces the likelihood that the degraded replica will be hit by other healthy replicas.<br />

Lastly, it follows that the value of this feature diminishes as the number of shards in a collection increases<br />

because the query controller will have to direct the query to non-local replicas for most of the shards. In other<br />

words, this feature is mostly useful for optimizing queries directed towards collections with a small number of<br />

shards and many replicas. Also, this option should only be used if you are load balancing requests across all<br />

nodes that host replicas for the collection you are querying, as <strong>Solr</strong>'s Cloud<strong>Solr</strong>Client will do. If not<br />

load-balancing, this feature can introduce a hotspot in the cluster since queries won't be evenly distributed<br />

across the cluster.<br />

Read and Write Side Fault Tolerance<br />

<strong>Solr</strong>Cloud supports elasticity, high availability, and fault tolerance in reads and writes. What this means,<br />

basically, is that when you have a large cluster, you can always make requests to the cluster: Reads will return<br />

results whenever possible, even if some nodes are down, and Writes will be acknowledged only if they are<br />

durable; i.e., you won't lose data.<br />

Read Side Fault Tolerance<br />

In a <strong>Solr</strong>Cloud cluster each individual node load balances read requests across all the replicas in collection. You<br />

still need a load balancer on the 'outside' that talks to the cluster, or you need a smart client which understands<br />

how to read and interact with <strong>Solr</strong>'s metadata in ZooKeeper and only requests the ZooKeeper ensemble's<br />

address to start discovering to which nodes it should send requests. (<strong>Solr</strong> provides a smart Java <strong>Solr</strong>J client<br />

called Cloud<strong>Solr</strong>Client.)<br />

Even if some nodes in the cluster are offline or unreachable, a <strong>Solr</strong> node will be able to correctly respond to a<br />

search request as long as it can communicate with at least one replica of every shard, or one replica of every rel<br />

evant shard if the user limited the search via the ' shards ' or '_route_' parameters. The more replicas there are<br />

of every shard, the more likely that the <strong>Solr</strong> cluster will be able to handle search results in the event of node<br />

failures.<br />

zkConnected<br />

A <strong>Solr</strong> node will return the results of a search request as long as it can communicate with at least one replica of<br />

<strong>Apache</strong> <strong>Solr</strong> <strong>Reference</strong> <strong>Guide</strong> <strong>6.0</strong><br />

553

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!