11.05.2016 Views

Apache Solr Reference Guide Covering Apache Solr 6.0

21SiXmO

21SiXmO

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

useWhitelist: If true, the file defined in types should be used as include list. If false, or undefined, the file<br />

defined in types is used as a blacklist.<br />

enablePositionIncrements: if luceneMatchVersion is 4.3 or earlier and enablePositionIncrement<br />

s="false" , no position holes will be left by this filter when it removes tokens. This argument is invalid if luc<br />

eneMatchVersion is 5.0 or later.<br />

Example:<br />

<br />

<br />

<br />

Word Delimiter Filter<br />

This filter splits tokens at word delimiters. The rules for determining delimiters are determined as follows:<br />

A change in case within a word: "CamelCase" -> "Camel", "Case". This can be disabled by setting split<br />

OnCaseChange="0".<br />

A transition from alpha to numeric characters or vice versa: "Gonzo5000" -> "Gonzo", "5000" "4500XL" -><br />

"4500", "XL". This can be disabled by setting splitOnNumerics="0" .<br />

Non-alphanumeric characters (discarded): "hot-spot" -> "hot", "spot"<br />

A trailing "'s" is removed: "O'Reilly's" -> "O", "Reilly"<br />

Any leading or trailing delimiters are discarded: "--hot-spot--" -> "hot", "spot"<br />

Factory class: solr.WordDelimiterFilterFactory<br />

Arguments:<br />

generateWordParts: (integer, default 1) If non-zero, splits words at delimiters. For example:"CamelCase",<br />

"hot-spot" -> "Camel", "Case", "hot", "spot"<br />

generateNumberParts: (integer, default 1) If non-zero, splits numeric strings at delimiters:"1947-32" ->"1947",<br />

"32"<br />

splitOnCaseChange: (integer, default 1) If 0, words are not split on camel-case changes:"BugBlaster-XL" -> "B<br />

ugBlaster", "XL". Example 1 below illustrates the default (non-zero) splitting behavior.<br />

splitOnNumerics: (integer, default 1) If 0, don't split words on transitions from alpha to numeric:"FemBot3000"<br />

-> "Fem", "Bot3000"<br />

catenateWords: (integer, default 0) If non-zero, maximal runs of word parts will be joined: "hot-spot-sensor's" -<br />

> "hotspotsensor"<br />

catenateNumbers: (integer, default 0) If non-zero, maximal runs of number parts will be joined: 1947-32" -> "1<br />

94732"<br />

catenateAll: (0/1, default 0) If non-zero, runs of word and number parts will be joined: "Zap-Master-9000" -> "<br />

ZapMaster9000"<br />

preserveOriginal: (integer, default 0) If non-zero, the original token is preserved: "Zap-Master-9000" -> "Zap<br />

-Master-9000", "Zap", "Master", "9000"<br />

protected: (optional) The pathname of a file that contains a list of protected words that should be passed<br />

through without splitting.<br />

<strong>Apache</strong> <strong>Solr</strong> <strong>Reference</strong> <strong>Guide</strong> <strong>6.0</strong><br />

141

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!