11.05.2016 Views

Apache Solr Reference Guide Covering Apache Solr 6.0

21SiXmO

21SiXmO

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

fields all fields The fields to use to generate the signature hash in a comma<br />

separated list. By default, all fields on the document will be used.<br />

signatureField signatureField The name of the field used to hold the fingerprint/signature. The<br />

field should be defined in schema.xml.<br />

enabled true Enable/disable de-duplication processing.<br />

overwriteDupes true If true, when a document exists that already matches this signature,<br />

it will be overwritten.<br />

In schema.xml<br />

If you are using a separate field for storing the signature you must have it indexed:<br />

<br />

Be sure to change your update handlers to use the defined chain, as below:<br />

<br />

<br />

dedupe<br />

<br />

...<br />

<br />

(This example assumes you have other sections of your request handler defined.)<br />

The update processor can also be specified per request with a parameter of update.chain=dedupe.<br />

Detecting Languages During Indexing<br />

<strong>Solr</strong> can identify languages and map text to language-specific fields during indexing using the langid UpdateRe<br />

questProcessor. <strong>Solr</strong> supports two implementations of this feature:<br />

Tika's language detection feature: http://tika.apache.org/0.10/detection.html<br />

LangDetect language detection: http://code.google.com/p/language-detection/<br />

You can see a comparison between the two implementations here: http://blog.mikemccandless.com/2011/10/acc<br />

uracy-and-performance-of-googles.html. In general, the LangDetect implementation supports more languages<br />

with higher performance.<br />

For specific information on each of these language identification implementations, including a list of supported<br />

languages for each, see the relevant project websites. For more information about the langid UpdateRequestP<br />

rocessor, see the <strong>Solr</strong> wiki: http://wiki.apache.org/solr/LanguageDetection. For more information about language<br />

analysis in <strong>Solr</strong>, see Language Analysis.<br />

Configuring Language Detection<br />

You can configure the langid UpdateRequestProcessor in solrconfig.xml. Both implementations take the<br />

same parameters, which are described in the following section. At a minimum, you must specify the fields for<br />

<strong>Apache</strong> <strong>Solr</strong> <strong>Reference</strong> <strong>Guide</strong> <strong>6.0</strong><br />

232

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!