11.05.2016 Views

Apache Solr Reference Guide Covering Apache Solr 6.0

21SiXmO

21SiXmO

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

EEEE, dd-MMM-yy HH:mm:ss zzz<br />

EEE MMM d HH:mm:ss yyyy<br />

You may also need to adjust the multipartUploadLimitInKB attribute as follows if you are submitting very<br />

large documents.<br />

<br />

<br />

...<br />

Multi-Core Configuration<br />

For a multi-core configuration, you can specify sharedLib='lib' in the section of solr.xml and<br />

place the necessary jar files there.<br />

For more information about <strong>Solr</strong> cores, see The Well-Configured <strong>Solr</strong> Instance.<br />

Indexing Encrypted Documents with the ExtractingUpdateRequestHandler<br />

The ExtractingRequestHandler will decrypt encrypted files and index their content if you supply a password in<br />

either resource.password on the request, or in a passwordsFile file.<br />

In the case of passwordsFile, the file supplied must be formatted so there is one line per rule. Each rule<br />

contains a file name regular expression, followed by "=", then the password in clear-text. Because the passwords<br />

are in clear-text, the file should have strict access restrictions.<br />

# This is a comment<br />

myFileName = myPassword<br />

.*\.docx$ = myWordPassword<br />

.*\.pdf$ = myPdfPassword<br />

Examples<br />

Metadata<br />

As mentioned before, Tika produces metadata about the document. Metadata describes different aspects of a<br />

document, such as the author's name, the number of pages, the file size, and so on. The metadata produced<br />

depends on the type of document submitted. For instance, PDFs have different metadata than Word documents<br />

do.<br />

In addition to Tika's metadata, <strong>Solr</strong> adds the following metadata (defined in ExtractingMetadataConstants)<br />

:<br />

<strong>Solr</strong> Metadata<br />

stream_name<br />

stream_source_info<br />

stream_size<br />

Description<br />

The name of the Content Stream as uploaded to <strong>Solr</strong>. Depending on how the file is<br />

uploaded, this may or may not be set<br />

Any source info about the stream. (See the section on Content Streams later in this<br />

section.)<br />

The size of the stream in bytes.<br />

<strong>Apache</strong> <strong>Solr</strong> <strong>Reference</strong> <strong>Guide</strong> <strong>6.0</strong><br />

203

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!