11.05.2016 Views

Apache Solr Reference Guide Covering Apache Solr 6.0

21SiXmO

21SiXmO

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

ExtractingRequestHandler<br />

Uploading Structured Data Store Data with the Data Import<br />

Handler<br />

Many search applications store the content to be indexed in a structured data store, such as a relational<br />

database. The Data Import Handler (DIH) provides a mechanism for importing content from a data store and<br />

indexing it. In addition to relational databases, DIH can index content from HTTP based data sources such as<br />

RSS and ATOM feeds, e-mail repositories, and structured XML where an XPath processor is used to generate<br />

fields.<br />

The example/example-DIH directory contains several collections many of the features of the data import<br />

handler. To run this " dih" example:<br />

bin/solr -e dih<br />

For more information about the Data Import Handler, see https://wiki.apache.org/solr/DataImportHandler.<br />

Topics covered in this section:<br />

Concepts and Terminology<br />

Configuration<br />

Data Import Handler Commands<br />

Property Writer<br />

Data Sources<br />

Entity Processors<br />

Transformers<br />

Special Commands for the Data Import Handler<br />

Concepts and Terminology<br />

Descriptions of the Data Import Handler use several familiar terms, such as entity and processor, in specific<br />

ways, as explained in the table below.<br />

Term<br />

Datasource<br />

Entity<br />

Processor<br />

Transformer<br />

Definition<br />

As its name suggests, a datasource defines the location of the data of interest. For a database,<br />

it's a DSN. For an HTTP datasource, it's the base URL.<br />

Conceptually, an entity is processed to generate a set of documents, containing multiple fields,<br />

which (after optionally being transformed in various ways) are sent to <strong>Solr</strong> for indexing. For a<br />

RDBMS data source, an entity is a view or table, which would be processed by one or more<br />

SQL statements to generate a set of rows (documents) with one or more columns (fields).<br />

An entity processor does the work of extracting content from a data source, transforming it, and<br />

adding it to the index. Custom entity processors can be written to extend or replace the ones<br />

supplied.<br />

Each set of fields fetched by the entity may optionally be transformed. This process can modify<br />

the fields, create new fields, or generate multiple rows/documents form a single row. There are<br />

several built-in transformers in the DIH, which perform functions such as modifying dates and<br />

stripping HTML. It is possible to write custom transformers using the publicly available interface.<br />

<strong>Apache</strong> <strong>Solr</strong> <strong>Reference</strong> <strong>Guide</strong> <strong>6.0</strong><br />

206

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!