11.05.2016 Views

Apache Solr Reference Guide Covering Apache Solr 6.0

21SiXmO

21SiXmO

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

While there are use cases where you might need to create a <strong>Solr</strong> document for each line read from a file, it is<br />

expected that in most cases that the lines read by this processor will consist of a pathname, which in turn will be<br />

consumed by another EntityProcessor, such as XPathEntityProcessor.<br />

PlainTextEntityProcessor<br />

This EntityProcessor reads all content from the data source into an single implicit field called plainText. The<br />

content is not parsed in any way, however you may add transformers to manipulate the data within the plainTe<br />

xt as needed, or to create other additional fields.<br />

For example:<br />

<br />

<br />

<br />

<br />

Ensure that the dataSource is of type DataSource ( FileDataSource, URLDataSource).<br />

<strong>Solr</strong>EntityProcessor<br />

Uses <strong>Solr</strong> instance as a datasource, see https://wiki.apache.org/solr/DataImportHandler#<strong>Solr</strong>EntityProcessor<br />

Transformers<br />

Transformers manipulate the fields in a document returned by an entity. A transformer can create new fields or<br />

modify existing ones. You must tell the entity which transformers your import operation will be using, by adding<br />

an attribute containing a comma separated list to the element.<br />

<br />

Specific transformation rules are then added to the attributes of a element, as shown in the examples<br />

below. The transformers are applied in the order in which they are specified in the transformer attribute.<br />

The Data Import Handler contains several built-in transformers. You can also write your own custom<br />

transformers, as described in the <strong>Solr</strong> Wiki (see http://wiki.apache.org/solr/DIHCustomTransformer). The<br />

ScriptTransformer (described below) offers an alternative method for writing your own transformers.<br />

<strong>Solr</strong> includes the following built-in transformers:<br />

Transformer Name<br />

ClobTransformer<br />

DateFormatTransformer<br />

HTMLStripTransformer<br />

LogTransformer<br />

NumberFormatTransformer<br />

RegexTransformer<br />

Use<br />

Used to create a String out of a Clob type in database.<br />

Parse date/time instances.<br />

Strip HTML from a field.<br />

Used to log data to log files or a console.<br />

Uses the NumberFormat class in java to parse a string into a number.<br />

Use regular expressions to manipulate fields.<br />

<strong>Apache</strong> <strong>Solr</strong> <strong>Reference</strong> <strong>Guide</strong> <strong>6.0</strong><br />

221

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!