11.05.2016 Views

Apache Solr Reference Guide Covering Apache Solr 6.0

21SiXmO

21SiXmO

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<br />

<br />

<br />

In: "the quick brown fox jumped over the lazy dog"<br />

Tokenizer to Filter: "the", "quick", "brown", "fox", "jumped", "over", "the", "lazy", "dog"<br />

Out: "brown_dog_fox_jumped_lazy_over_quick_the"<br />

Hunspell Stem Filter<br />

The Hunspell Stem Filter provides support for several languages. You must provide the dictionary (.dic) and<br />

rules (.aff) files for each language you wish to use with the Hunspell Stem Filter. You can download those<br />

language files here. Be aware that your results will vary widely based on the quality of the provided dictionary<br />

and rules files. For example, some languages have only a minimal word list with no morphological information.<br />

On the other hand, for languages that have no stemmer but do have an extensive dictionary file, the Hunspell<br />

stemmer may be a good choice.<br />

Factory class: solr.HunspellStemFilterFactory<br />

Arguments:<br />

dictionary: (required) The path of a dictionary file.<br />

affix: (required) The path of a rules file.<br />

ignoreCase: (boolean) controls whether matching is case sensitive or not. The default is false.<br />

strictAffixParsing: (boolean) controls whether the affix parsing is strict or not. If true, an error while<br />

reading an affix rule causes a ParseException, otherwise is ignored. The default is true.<br />

Example:<br />

<br />

<br />

<br />

<br />

In: "jump jumping jumped"<br />

Tokenizer to Filter: "jump", "jumping", "jumped"<br />

Out: "jump", "jump", "jump"<br />

Hyphenated Words Filter<br />

This filter reconstructs hyphenated words that have been tokenized as two tokens because of a line break or<br />

other intervening whitespace in the field test. If a token ends with a hyphen, it is joined with the following token<br />

and the hyphen is discarded. Note that for this filter to work properly, the upstream tokenizer must not remove<br />

trailing hyphen characters. This filter is generally only useful at index time.<br />

<strong>Apache</strong> <strong>Solr</strong> <strong>Reference</strong> <strong>Guide</strong> <strong>6.0</strong><br />

123

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!