11.05.2016 Views

Apache Solr Reference Guide Covering Apache Solr 6.0

21SiXmO

21SiXmO

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Each non-comment, non-blank line consists of a mapping of the form: "source" => "target"<br />

Double-quoted source string, optional whitespace, an arrow ( => ), optional whitespace,<br />

double-quoted target string.<br />

Trailing comments on mapping lines are not allowed.<br />

The source string must contain at least one character, but the target string may be empty.<br />

The following character escape sequences are recognized within source and target strings:<br />

Escape<br />

sequence<br />

Resulting character ( ECMA-48 alia<br />

s)<br />

Unicode<br />

character<br />

Example mapping line<br />

\\ \ U+005C "\\" => "/"<br />

\" " U+0022 "\"and\"" => "'and'"<br />

\b backspace (BS) U+0008 "\b" => " "<br />

\t tab (HT) U+0009 "\t" => ","<br />

\n newline (LF) U+000A "\n" => ""<br />

\f form feed (FF) U+000C "\f" => "\n"<br />

\r carriage return (CR) U+000D "\r" =><br />

"/carriage-return/"<br />

\uXXXX<br />

Unicode char referenced by the 4 hex<br />

digits<br />

U+XXXX "\uFEFF" => ""<br />

A backslash followed by any other character is interpreted as if the character were present without<br />

the backslash.<br />

solr.HTMLStripCharFilterFactory<br />

This filter creates org.apache.solr.analysis.HTMLStripCharFilter. This Char Filter strips HTML from<br />

the input stream and passes the result to another Char Filter or a Tokenizer.<br />

This filter:<br />

Removes HTML/XML tags while preserving other content.<br />

Removes attributes within tags and supports optional attribute quoting.<br />

Removes XML processing instructions, such as: <br />

Removes XML comments.<br />

Removes XML elements starting with .<br />

Removes contents of and elements.<br />

Handles XML comments inside these elements (normal comment processing will not always work).<br />

Replaces numeric character entities references like &#65; or &#x7f; with the corresponding character.<br />

The terminating ';' is optional if the entity reference at the end of the input; otherwise the terminating ';' is<br />

mandatory, to avoid false matches on something like "Alpha&Omega Corp".<br />

Replaces all named character entity references with the corresponding character.<br />

&nbsp; is replaced with a space instead of the 0xa0 character.<br />

Newlines are substituted for block-level elements.<br />

sections are recognized.<br />

Inline tags, such as , , or will be removed.<br />

Uppercase character entities like quot, gt, lt and amp are recognized and handled as lowercase.<br />

The input need not be an HTML document. The filter removes only constructs that look like HTML. If the<br />

input doesn't include anything that looks like HTML, the filter won't remove any input.<br />

<strong>Apache</strong> <strong>Solr</strong> <strong>Reference</strong> <strong>Guide</strong> <strong>6.0</strong><br />

144

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!