11.05.2016 Views

Apache Solr Reference Guide Covering Apache Solr 6.0

21SiXmO

21SiXmO

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Analyzers<br />

An analyzer examines the text of fields and generates a token stream. Analyzers are specified as a child of the <<br />

fieldType> element in the schema.xml configuration file (in the same conf/ directory as solrconfig.xml)<br />

.<br />

In normal usage, only fields of type solr.TextField will specify an analyzer. The simplest way to configure an<br />

analyzer is with a single element whose class attribute is a fully qualified Java class name. The<br />

named class must derive from org.apache.lucene.analysis.Analyzer. For example:<br />

<br />

<br />

<br />

In this case a single class, WhitespaceAnalyzer, is responsible for analyzing the content of the named text<br />

field and emitting the corresponding tokens. For simple cases, such as plain English prose, a single analyzer<br />

class like this may be sufficient. But it's often necessary to do more complex analysis of the field content.<br />

Even the most complex analysis requirements can usually be decomposed into a series of discrete, relatively<br />

simple processing steps. As you will soon discover, the <strong>Solr</strong> distribution comes with a large selection of<br />

tokenizers and filters that covers most scenarios you are likely to encounter. Setting up an analyzer chain is very<br />

straightforward; you specify a simple element (no class attribute) with child elements that name<br />

factory classes for the tokenizer and filters to use, in the order you want them to run.<br />

For example:<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

Note that classes in the org.apache.solr.analysis package may be referred to here with the shorthand so<br />

lr. prefix.<br />

In this case, no Analyzer class was specified on the element. Rather, a sequence of more<br />

specialized classes are wired together and collectively act as the Analyzer for the field. The text of the field is<br />

passed to the first item in the list ( solr.StandardTokenizerFactory), and the tokens that emerge from the<br />

last one ( solr.EnglishPorterFilterFactory) are the terms that are used for indexing or querying any<br />

fields that use the "nametext" fieldType.<br />

Field Values versus Indexed Terms<br />

The output of an Analyzer affects the terms indexed in a given field (and the terms used when parsing<br />

queries against those fields) but it has no impact on the stored value for the fields. For example: an<br />

analyzer might split "Brown Cow" into two indexed terms "brown" and "cow", but the stored value will still<br />

be a single String: "Brown Cow"<br />

<strong>Apache</strong> <strong>Solr</strong> <strong>Reference</strong> <strong>Guide</strong> <strong>6.0</strong><br />

105

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!