11.05.2016 Views

Apache Solr Reference Guide Covering Apache Solr 6.0

21SiXmO

21SiXmO

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Algorithms discussed in this section:<br />

Beider-Morse Phonetic Matching (BMPM)<br />

Daitch-Mokotoff Soundex<br />

Double Metaphone<br />

Metaphone<br />

Soundex<br />

Refined Soundex<br />

Caverphone<br />

Kölner Phonetik a.k.a. Cologne Phonetic<br />

NYSIIS<br />

Beider-Morse Phonetic Matching (BMPM)<br />

To use this encoding in your analyzer, see Beider Morse Filter in the Filter Descriptions section.<br />

Beider-Morse Phonetic Matching (BMPM) is a "soundalike" tool that lets you search using a new phonetic<br />

matching system. BMPM helps you search for personal names (or just surnames) in a <strong>Solr</strong>/Lucene index, and is<br />

far superior to the existing phonetic codecs, such as regular soundex, metaphone, caverphone, etc.<br />

In general, phonetic matching lets you search a name list for names that are phonetically equivalent to the<br />

desired name. BMPM is similar to a soundex search in that an exact spelling is not required. Unlike soundex, it<br />

does not generate a large quantity of false hits.<br />

From the spelling of the name, BMPM attempts to determine the language. It then applies phonetic rules for that<br />

particular language to transliterate the name into a phonetic alphabet. If it is not possible to determine the<br />

language with a fair degree of certainty, it uses generic phonetic instead. Finally, it applies language-independent<br />

rules regarding such things as voiced and unvoiced consonants and vowels to further insure the reliability of the<br />

matches.<br />

For example, assume that the matches found when searching for Stephen in a database are "Stefan", "Steph",<br />

"Stephen", "Steve", "Steven", "Stove", and "Stuffin". "Stefan", "Stephen", and "Steven" are probably relevant, and<br />

are names that you want to see. "Stuffin", however, is probably not relevant. Also rejected were "Steph", "Steve",<br />

and "Stove". Of those, "Stove" is probably not one that we would have wanted. But "Steph" and "Steve" are<br />

possibly ones that you might be interested in.<br />

For <strong>Solr</strong>, BMPM searching is available for the following languages:<br />

English<br />

French<br />

German<br />

Greek<br />

Hebrew written in Hebrew letters<br />

Hungarian<br />

Italian<br />

Polish<br />

Romanian<br />

Russian written in Cyrillic letters<br />

Russian transliterated into English letters<br />

Spanish<br />

Turkish<br />

The name matching is also applicable to non-Jewish surnames from the countries in which those languages are<br />

spoken.<br />

For more information, see here: http://stevemorse.org/phoneticinfo.htm and http://stevemorse.org/phonetics/bmp<br />

m.htm.<br />

Daitch-Mokotoff Soundex<br />

<strong>Apache</strong> <strong>Solr</strong> <strong>Reference</strong> <strong>Guide</strong> <strong>6.0</strong><br />

173

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!