11.05.2016 Views

Apache Solr Reference Guide Covering Apache Solr 6.0

21SiXmO

21SiXmO

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

The literal.id=doc1 parameter provides the necessary unique ID for the document being indexed.<br />

The commit=true parameter causes <strong>Solr</strong> to perform a commit after indexing the document, making it<br />

immediately searchable. For optimum performance when loading many documents, don't call the commit<br />

command until you are done.<br />

The -F flag instructs curl to POST data using the Content-Type multipart/form-data and supports<br />

the uploading of binary files. The @ symbol instructs curl to upload the attached file.<br />

The argument myfile=@tutorial.html needs a valid path, which can be absolute or relative.<br />

You can also use bin/post to send a PDF file into <strong>Solr</strong> (without the params, the literal.id parameter would be<br />

set to the absolute path to the file):<br />

bin/post -c techproducts example/exampledocs/solr-word.pdf -params "literal.id=a"<br />

Now you should be able to execute a query and find that document. You can make a request like http://loc<br />

alhost:8983/solr/techproducts/select?q=pdf .<br />

You may notice that although the content of the sample document has been indexed and stored, there are not a<br />

lot of metadata fields associated with this document. This is because unknown fields are ignored according to the<br />

default parameters configured for the /update/extract handler in solrconfig.xml, and this behavior can<br />

be easily changed or overridden. For example, to store and see all metadata and content, execute the following:<br />

bin/post -c techproducts example/exampledocs/solr-word.pdf -params<br />

"literal.id=doc1&uprefix=attr_"<br />

In this command, the uprefix=attr_ parameter causes all generated fields that aren't defined in the schema<br />

to be prefixed with attr_ , which is a dynamic field that is stored and indexed.<br />

This command allows you to query the document using an attribute, as in: http://localhost:8983/solr/t<br />

echproducts/select?q=attr_meta:microsoft.<br />

Input Parameters<br />

The table below describes the parameters accepted by the Extracting Request Handler.<br />

Parameter<br />

boost.< fieldname><br />

capture<br />

captureAttr<br />

commitWithin<br />

date.formats<br />

Description<br />

Boosts the specified field by the defined float amount. (Boosting a field alters its<br />

importance in a query response. To learn about boosting fields, see Searching.)<br />

Captures XHTML elements with the specified name for a supplementary addition<br />

to the <strong>Solr</strong> document. This parameter can be useful for copying chunks of the<br />

XHTML into a separate field. For instance, it could be used to grab paragraphs () and index them into a separate field. Note that content is still also captured into<br />

the overall "content" field.<br />

Indexes attributes of the Tika XHTML elements into separate fields, named after<br />

the element. If set to true, for example, when extracting from HTML, Tika can<br />

return the href attributes in tags as fields named "a". See the examples below.<br />

Add the document within the specified number of milliseconds.<br />

Defines the date format patterns to identify in the documents.<br />

<strong>Apache</strong> <strong>Solr</strong> <strong>Reference</strong> <strong>Guide</strong> <strong>6.0</strong><br />

200

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!