11.05.2016 Views

Apache Solr Reference Guide Covering Apache Solr 6.0

21SiXmO

21SiXmO

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

carrot.produceSummary When true the clustering component will run a highlighter pass on the<br />

content of logical fields pointed to by carrot.title and carrot.snippet.<br />

Otherwise full content of those fields will be clustered.<br />

carrot.fragSize<br />

carrot.summarySnippets<br />

The size, in characters, of the snippets (aka fragments) created by the<br />

highlighter. If not specified, the default highlighting fragsize ( hl.fragsize)<br />

will be used.<br />

The number of summary snippets to generate for clustering. If not specified,<br />

the default highlighting snippet count ( hl.snippets) will be used.<br />

Logical to Document Field Mapping<br />

As already mentioned in Preliminary Concepts, the clustering component clusters "documents" consisting of<br />

logical parts that need to be mapped onto physical schema of data stored in <strong>Solr</strong>. The field mapping attributes<br />

provide a connection between fields and logical document parts. Note that the content of title and snippet fields<br />

must be stored so that it can be retrieved at search time.<br />

Parameter<br />

carrot.title<br />

carrot.snippet<br />

carrot.url<br />

Description<br />

The field (alternatively comma- or space-separated list of fields) that should be mapped<br />

to the logical document's title. The clustering algorithms typically give more weight to the<br />

content of the title field compared to the content (snippet). For best results, the field<br />

should contain concise, noise-free content. If there is no clear title in your data, you can<br />

leave this parameter blank.<br />

The field (alternatively comma- or space-separated list of fields) that should be mapped<br />

to the logical document's main content. If this mapping points to very large content fields<br />

the performance of clustering may drop significantly. An alternative then is to use<br />

query-context snippets for clustering instead of full field content. See the description of<br />

the carrot.produceSummary parameter for details.<br />

The field that should be mapped to the logical document's content URL. Leave blank if<br />

not required.<br />

Clustering Multilingual Content<br />

The field mapping specification can include a carrot.lang parameter, which defines the field that stores ISO<br />

639-1 code of the language in which the title and content of the document are written. This information can be<br />

stored in the index based on apriori knowledge of the documents' source or a language detection filter applied at<br />

indexing time. All algorithms inside the Carrot2 framework will accept ISO codes of languages defined in Langua<br />

geCode enum.<br />

The language hint makes it easier for clustering algorithms to separate documents from different languages on<br />

input and to pick the right language resources for clustering. If you do have multi-lingual query results (or query<br />

results in a language different than English), it is strongly advised to map the language field appropriately.<br />

Parameter<br />

carrot.lang<br />

Description<br />

The field that stores ISO 639-1 code of the language of the document's text fields.<br />

carrot.lcmap A mapping of arbitrary strings into ISO 639 two-letter codes used by carrot.lang. The<br />

syntax of this parameter is the same as langid.map.lcmap, for example: langid.map.<br />

lcmap=japanese:ja polish:pl english:en<br />

<strong>Apache</strong> <strong>Solr</strong> <strong>Reference</strong> <strong>Guide</strong> <strong>6.0</strong><br />

368

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!