11.05.2016 Views

Apache Solr Reference Guide Covering Apache Solr 6.0

21SiXmO

21SiXmO

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Indexing and Basic Data Operations<br />

This section describes how <strong>Solr</strong> adds data to its index. It covers the following topics:<br />

Introduction to <strong>Solr</strong> Indexing: An overview of <strong>Solr</strong>'s indexing process.<br />

Post Tool: Information about using post.jar to quickly upload some content to your system.<br />

Uploading Data with Index Handlers: Information about using <strong>Solr</strong>'s Index Handlers to upload<br />

XML/XSLT, JSON and CSV data.<br />

Uploading Data with <strong>Solr</strong> Cell using <strong>Apache</strong> Tika: Information about using the <strong>Solr</strong> Cell framework to<br />

upload data for indexing.<br />

Uploading Structured Data Store Data with the Data Import Handler: Information about uploading and<br />

indexing data from a structured data store.<br />

Updating Parts of Documents: Information about how to use atomic updates and optimistic concurrency<br />

with <strong>Solr</strong>.<br />

Detecting Languages During Indexing: Information about using language identification during the<br />

indexing process.<br />

De-Duplication: Information about configuring <strong>Solr</strong> to mark duplicate documents as they are indexed.<br />

Content Streams: Information about streaming content to <strong>Solr</strong> Request Handlers.<br />

UIMA Integration: Information about integrating <strong>Solr</strong> with <strong>Apache</strong>'s Unstructured Information<br />

Management Architecture (UIMA). UIMA lets you define custom pipelines of Analysis Engines that<br />

incrementally add metadata to your documents as annotations.<br />

Indexing Using Client APIs<br />

Using client APIs, such as <strong>Solr</strong>J, from your applications is an important option for updating <strong>Solr</strong> indexes. See the<br />

Client APIs section for more information.<br />

Introduction to <strong>Solr</strong> Indexing<br />

This section describes the process of indexing: adding content to a <strong>Solr</strong> index and, if necessary, modifying that<br />

content or deleting it. By adding content to an index, we make it searchable by <strong>Solr</strong>.<br />

A <strong>Solr</strong> index can accept data from many different sources, including XML files, comma-separated value (CSV)<br />

files, data extracted from tables in a database, and files in common file formats such as Microsoft Word or PDF.<br />

Here are the three most common ways of loading data into a <strong>Solr</strong> index:<br />

Using the <strong>Solr</strong> Cell framework built on <strong>Apache</strong> Tika for ingesting binary files or structured files such as<br />

Office, Word, PDF, and other proprietary formats.<br />

Uploading XML files by sending HTTP requests to the <strong>Solr</strong> server from any environment where such<br />

requests can be generated.<br />

Writing a custom Java application to ingest data through <strong>Solr</strong>'s Java Client API (which is described in<br />

more detail in Client APIs. Using the Java API may be the best choice if you're working with an<br />

application, such as a Content Management System (CMS), that offers a Java API.<br />

Regardless of the method used to ingest data, there is a common basic data structure for data being fed into a<br />

<strong>Solr</strong> index: a document containing multiple fields, each with a name and containing content, which may be<br />

empty. One of the fields is usually designated as a unique ID field (analogous to a primary key in a database),<br />

although the use of a unique ID field is not strictly required by <strong>Solr</strong>.<br />

<strong>Apache</strong> <strong>Solr</strong> <strong>Reference</strong> <strong>Guide</strong> <strong>6.0</strong><br />

179

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!