11.05.2016 Views

Apache Solr Reference Guide Covering Apache Solr 6.0

21SiXmO

21SiXmO

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Path Hierarchy Tokenizer<br />

This tokenizer creates synonyms from file path hierarchies.<br />

Factory class: solr.PathHierarchyTokenizerFactory<br />

Arguments:<br />

delimiter: (character, no default) You can specify the file path delimiter and replace it with a delimiter you<br />

provide. This can be useful for working with backslash delimiters.<br />

replace: (character, no default) Specifies the delimiter character <strong>Solr</strong> uses in the tokenized output.<br />

Example:<br />

<br />

<br />

<br />

<br />

<br />

In: "c:\usr\local\apache"<br />

Out: "c:", "c:/usr", "c:/usr/local", "c:/usr/local/apache"<br />

Regular Expression Pattern Tokenizer<br />

This tokenizer uses a Java regular expression to break the input text stream into tokens. The expression<br />

provided by the pattern argument can be interpreted either as a delimiter that separates tokens, or to match<br />

patterns that should be extracted from the text as tokens.<br />

See the Javadocs for java.util.regex.Pattern for more information on Java regular expression syntax.<br />

Factory class: solr.PatternTokenizerFactory<br />

Arguments:<br />

pattern: (Required) The regular expression, as defined by in java.util.regex.Pattern.<br />

group: (Optional, default -1) Specifies which regex group to extract as the token(s). The value -1 means the<br />

regex should be treated as a delimiter that separates tokens. Non-negative group numbers (>= 0) indicate that<br />

character sequences matching that regex group should be converted to tokens. Group zero refers to the entire<br />

regex, groups greater than zero refer to parenthesized sub-expressions of the regex, counted from left to right.<br />

Example:<br />

A comma separated list. Tokens are separated by a sequence of zero or more spaces, a comma, and zero or<br />

more spaces.<br />

<br />

<br />

<br />

In: "fee,fie, foe , fum, foo"<br />

Out: "fee", "fie", "foe", "fum", "foo"<br />

Example:<br />

<strong>Apache</strong> <strong>Solr</strong> <strong>Reference</strong> <strong>Guide</strong> <strong>6.0</strong><br />

114

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!