11.05.2016 Views

Apache Solr Reference Guide Covering Apache Solr 6.0

21SiXmO

21SiXmO

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

The query issued to the system was <strong>Solr</strong>. It seems clear that faceting could not yield a similar set of groups,<br />

although the goals of both techniques are similar—to let the user explore the set of search results and either<br />

rephrase the query or narrow the focus to a subset of current documents. Clustering is also similar to Result<br />

Grouping in that it can help to look deeper into search results, beyond the top few hits.<br />

Topics covered in this section:<br />

Preliminary Concepts<br />

Quick Start Example<br />

Installation<br />

Configuration<br />

Tweaking Algorithm Settings<br />

Performance Considerations<br />

Additional Resources<br />

Preliminary Concepts<br />

Each document passed to the clustering component is composed of several logical parts:<br />

a unique identifier,<br />

origin URL,<br />

the title,<br />

the main content,<br />

a language code of the title and content.<br />

The identifier part is mandatory, everything else is optional but at least one of the text fields (title or content) will<br />

be required to make the clustering process reasonable. It is important to remember that logical document parts<br />

must be mapped to a particular schema and its fields. The content (text) for clustering can be sourced from either<br />

a stored text field or context-filtered using a highlighter, all these options are explained below in the configuration<br />

section.<br />

A clustering algorithm is the actual logic (implementation) that discovers relationships among the documents in<br />

the search result and forms human-readable cluster labels. Depending on the choice of the algorithm the clusters<br />

may (and probably will) vary. <strong>Solr</strong> comes with several algorithms implemented in the open source Carrot2 project<br />

, commercial alternatives also exist.<br />

Quick Start Example<br />

The " techproducts" example included with <strong>Solr</strong> is pre-configured with all the necessary components for result<br />

clustering - but they are disabled by default.<br />

<strong>Apache</strong> <strong>Solr</strong> <strong>Reference</strong> <strong>Guide</strong> <strong>6.0</strong><br />

362

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!