11.05.2016 Views

Apache Solr Reference Guide Covering Apache Solr 6.0

21SiXmO

21SiXmO

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

This filter converts alphabetic, numeric, and symbolic Unicode characters which are not in the Basic Latin<br />

Unicode block (the first 127 ASCII characters) to their ASCII equivalents, if one exists. This filter converts<br />

characters from the following Unicode blocks:<br />

C1 Controls and Latin-1 Supplement (PDF)<br />

Latin Extended-A (PDF)<br />

Latin Extended-B (PDF)<br />

Latin Extended Additional (PDF)<br />

Latin Extended-C (PDF)<br />

Latin Extended-D (PDF)<br />

IPA Extensions (PDF)<br />

Phonetic Extensions (PDF)<br />

Phonetic Extensions Supplement (PDF)<br />

General Punctuation (PDF)<br />

Superscripts and Subscripts (PDF)<br />

Enclosed Alphanumerics (PDF)<br />

Dingbats (PDF)<br />

Supplemental Punctuation (PDF)<br />

Alphabetic Presentation Forms (PDF)<br />

Halfwidth and Fullwidth Forms (PDF)<br />

Factory class: solr.ASCIIFoldingFilterFactory<br />

Arguments: None<br />

Example:<br />

<br />

<br />

<br />

In: "á" (Unicode character 00E1)<br />

Out: "a" (ASCII character 97)<br />

Beider-Morse Filter<br />

Implements the Beider-Morse Phonetic Matching (BMPM) algorithm, which allows identification of similar names,<br />

even if they are spelled differently or in different languages. More information about how this works is available in<br />

the section on Phonetic Matching.<br />

BeiderMorseFilter changed its behavior in <strong>Solr</strong> 5.0 (version 3.04 of the BMPM algorithm is implemented,<br />

while previous <strong>Solr</strong> versions implemented BMPM version 3.00 - see http://stevemorse.org/phoneticinfo.h<br />

tm), so any index built using this filter with earlier versions of <strong>Solr</strong> will need to be rebuilt.<br />

Factory class: solr.BeiderMorseFilterFactory<br />

Arguments:<br />

nameType: Types of names. Valid values are GENERIC, ASHKENAZI, or SEPHARDIC. If not processing<br />

Ashkenazi or Sephardic names, use GENERIC.<br />

ruleType: Types of rules to apply. Valid values are APPROX or EXACT.<br />

concat: Defines if multiple possible matches should be combined with a pipe ("|").<br />

languageSet: The language set to use. The value "auto" will allow the Filter to identify the language, or a<br />

comma-separated list can be supplied.<br />

<strong>Apache</strong> <strong>Solr</strong> <strong>Reference</strong> <strong>Guide</strong> <strong>6.0</strong><br />

118

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!