11.05.2016 Views

Apache Solr Reference Guide Covering Apache Solr 6.0

21SiXmO

21SiXmO

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<br />

<br />

<br />

In: "To be, or not to be."<br />

Tokenizer to Filter: "To"(1), "be"(2), "or"(3), "not"(4), "to"(5), "be"(6)<br />

Out: "To be"(1), "To be or"(1), "To be or not"(1), "be or"(2), "be or not"(2), "be or not to"(2), "or not"(3), "or not<br />

to"(3), "or not to be"(3), "not to"(4), "not to be"(4), "to be"(5)<br />

Snowball Porter Stemmer Filter<br />

This filter factory instantiates a language-specific stemmer generated by Snowball. Snowball is a software<br />

package that generates pattern-based word stemmers. This type of stemmer is not as accurate as a table-based<br />

stemmer, but is faster and less complex. Table-driven stemmers are labor intensive to create and maintain and<br />

so are typically commercial products.<br />

<strong>Solr</strong> contains Snowball stemmers for Armenian, Basque, Catalan, Danish, Dutch, English, Finnish, French,<br />

German, Hungarian, Italian, Norwegian, Portuguese, Romanian, Russian, Spanish, Swedish and Turkish. For<br />

more information on Snowball, visit http://snowball.tartarus.org/ .<br />

StopFilterFactory, CommonGramsFilterFactory, and CommonGramsQueryFilterFactory can<br />

optionally read stopwords in Snowball format (specify format="snowball" in the configuration of those<br />

FilterFactories).<br />

Factory class: solr.SnowballPorterFilterFactory<br />

Arguments:<br />

language: (default "English") The name of a language, used to select the appropriate Porter stemmer to use.<br />

Case is significant. This string is used to select a package name in the "org.tartarus.snowball.ext" class<br />

hierarchy.<br />

protected: Path of a text file containing a list of protected words, one per line. Protected words will not be<br />

stemmed. Blank lines and lines that begin with "#" are ignored. This may be an absolute path, or a simple file<br />

name in the <strong>Solr</strong> config directory.<br />

Example:<br />

Default behavior:<br />

<br />

<br />

<br />

<br />

In: "flip flipped flipping"<br />

Tokenizer to Filter: "flip", "flipped", "flipping"<br />

Out: "flip", "flip", "flip"<br />

Example:<br />

French stemmer, English words:<br />

<strong>Apache</strong> <strong>Solr</strong> <strong>Reference</strong> <strong>Guide</strong> <strong>6.0</strong><br />

135

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!