11.05.2016 Views

Apache Solr Reference Guide Covering Apache Solr 6.0

21SiXmO

21SiXmO

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Levenshtein metric, which is the same metric used with the other spell checker implementations.<br />

Because this spell checker is querying the main index, you may want to limit how often it queries the index to be<br />

sure to avoid any performance conflicts with user queries. The accuracy setting defines the threshold for a valid<br />

suggestion, while maxEdits defines the number of changes to the term to allow. Since most spelling mistakes<br />

are only 1 letter off, setting this to 1 will reduce the number of possible suggestions (the default, however, is 2);<br />

the value can only be 1 or 2. minPrefix defines the minimum number of characters the terms should share.<br />

Setting this to 1 means that the spelling suggestions will all start with the same letter, for example.<br />

The maxInspections parameter defines the maximum number of possible matches to review before returning<br />

results; the default is 5. minQueryLength defines how many characters must be in the query before<br />

suggestions are provided; the default is 4. maxQueryFrequency sets the maximum threshold for the number of<br />

documents a term must appear in before being considered as a suggestion. This can be a percentage (such as<br />

.01, or 1%) or an absolute value (such as 4). A lower threshold is better for small indexes. Finally, tresholdTok<br />

enFrequency sets the minimum number of documents a term must appear in, and can also be expressed as a<br />

percentage or an absolute value.<br />

FileBasedSpellChecker<br />

The FileBasedSpellChecker uses an external file as a spelling dictionary. This can be useful if using <strong>Solr</strong> as<br />

a spelling server, or if spelling suggestions don't need to be based on actual terms in the index. In solrconfig<br />

.xml, you would define the searchComponent as so:<br />

<br />

<br />

solr.FileBasedSpellChecker<br />

file<br />

spellings.txt<br />

UTF-8<br />

./spellcheckerFile<br />

<br />

<br />

<br />

The differences here are the use of the sourceLocation to define the location of the file of terms and the use<br />

of characterEncoding to define the encoding of the terms file.<br />

In the previous example, name is used to name this specific definition of the spellchecker. Multiple<br />

definitions can co-exist in a single solrconfig.xml, and the name helps to differentiate them. If only<br />

defining one spellchecker, no name is required.<br />

WordBreak<strong>Solr</strong>SpellChecker<br />

WordBreak<strong>Solr</strong>SpellChecker offers suggestions by combining adjacent query terms and/or breaking terms<br />

into multiple words. It is a SpellCheckComponent enhancement, leveraging Lucene's WordBreakSpellChec<br />

ker. It can detect spelling errors resulting from misplaced whitespace without the use of shingle-based<br />

dictionaries and provides collation support for word-break errors, including cases where the user has a mix of<br />

single-word spelling errors and word-break errors in the same query. It also provides shard support.<br />

Here is how it might be configured in solrconfig.xml:<br />

<strong>Apache</strong> <strong>Solr</strong> <strong>Reference</strong> <strong>Guide</strong> <strong>6.0</strong><br />

325

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!