11.05.2016 Views

Apache Solr Reference Guide Covering Apache Solr 6.0

21SiXmO

21SiXmO

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Limitations to Distributed Search<br />

Distributed searching in <strong>Solr</strong> has the following limitations:<br />

Each document indexed must have a unique key.<br />

If <strong>Solr</strong> discovers duplicate document IDs, <strong>Solr</strong> selects the first document and discards subsequent ones.<br />

The index for distributed searching may become momentarily out of sync if a commit happens between<br />

the first and second phase of the distributed search. This might cause a situation where a document that<br />

once matched a query and was subsequently changed may no longer match the query but will still be<br />

retrieved. This situation is expected to be quite rare, however, and is only possible for a single query<br />

request.<br />

The number of shards is limited by number of characters allowed for GET method's URI; most Web<br />

servers generally support at least 4000 characters, but many servers limit URI length to reduce their<br />

vulnerability to Denial of Service (DoS) attacks.<br />

Shard information can be returned with each document in a distributed search by including fl=i<br />

d, [shard] in the search request. This returns the shard URL.<br />

In a distributed search, the data directory from the core descriptor overrides any data directory in solrco<br />

nfig.xml.<br />

Update commands may be sent to any server with distributed indexing configured correctly. Document<br />

adds and deletes are forwarded to the appropriate server/shard based on a hash of the unique document<br />

id. commit commands and deleteByQuery commands are sent to every server in shards.<br />

Formerly a limitation was that TF/IDF relevancy computations only used shard-local statistics. This is still the<br />

case by default. If your data isn't randomly distributed, or if you want more exact statistics, then remember to<br />

configure the ExactStatsCache.<br />

Avoiding Distributed Deadlock<br />

Like in <strong>Solr</strong>Cloud mode, inter-shard requests could lead to a distributed deadlock. It can be avoided by following<br />

the instructions here.<br />

Testing Index Sharding on Two Local Servers<br />

For simple functionality testing, it's easiest to just set up two local <strong>Solr</strong> servers on different ports. (In a production<br />

environment, of course, these servers would be deployed on separate machines.)<br />

1.<br />

Make two <strong>Solr</strong> home directories:<br />

mkdir example/nodes<br />

mkdir example/nodes/node1<br />

# Copy solr.xml into this solr.home<br />

cp server/solr/solr.xml example/nodes/node1/.<br />

# Repeat the above steps for the second node<br />

mkdir example/nodes/node2<br />

cp server/solr/solr.xml example/nodes/node2/.<br />

2.<br />

Start the two <strong>Solr</strong> instances<br />

# Start first node on port 8983<br />

bin/solr start -s example/nodes/node1 -p 8983<br />

# Start second node on port 8984<br />

bin/solr start -s example/nodes/node2 -p 8984<br />

3.<br />

<strong>Apache</strong> <strong>Solr</strong> <strong>Reference</strong> <strong>Guide</strong> <strong>6.0</strong><br />

627

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!