11.05.2016 Views

Apache Solr Reference Guide Covering Apache Solr 6.0

21SiXmO

21SiXmO

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

clustering When true, clustering component is enabled.<br />

clustering.engine<br />

Declares which clustering engine to use. If not present, the first declared engine<br />

will become the default one.<br />

clustering.results When true, the component will perform clustering of search results (this<br />

should be enabled).<br />

clustering.collection When true, the component will perform clustering of the whole document<br />

index (this section does not cover full-index clustering).<br />

At the engine declaration level, the following parameters are supported.<br />

Parameter<br />

Description<br />

carrot.algorithm<br />

carrot.resourcesDir<br />

The algorithm class.<br />

Algorithm-specific resources and configuration files (stop words, other<br />

lexical resources, default settings). By default points to conf/clustering<br />

/carrot2/<br />

carrot.outputSubClusters If true and the algorithm supports hierarchical clustering, sub-clusters will<br />

also be emitted.<br />

carrot.numDescriptions<br />

Maximum number of per-cluster labels to return (if the algorithm assigns<br />

more than one label to a cluster).<br />

The carrot.algorithm parameter should contain a fully qualified class name of an algorithm supported by<br />

the Carrot2 framework. Currently, the following algorithms are available:<br />

org.carrot2.clustering.lingo.LingoClusteringAlgorithm (open source)<br />

org.carrot2.clustering.stc.STCClusteringAlgorithm (open source)<br />

org.carrot2.clustering.kmeans.BisectingKMeansClusteringAlgorithm (open source)<br />

com.carrotsearch.lingo3g.Lingo3GClusteringAlgorithm (commercial)<br />

For a comparison of characteristics of these algorithms see the following links:<br />

http://doc.carrot2.org/#section.advanced-topics.fine-tuning.choosing-algorithm<br />

http://project.carrot2.org/algorithms.html<br />

http://carrotsearch.com/lingo3g-comparison.html<br />

The question of which algorithm to choose depends on the amount of traffic (STC is faster than Lingo, but<br />

arguably produces less intuitive clusters, Lingo3G is the fastest algorithm but is not free or open source),<br />

expected result (Lingo3G provides hierarchical clusters, Lingo and STC provide flat clusters), and the input data<br />

(each algorithm will cluster the input slightly differently). There is no one answer which algorithm is "the best".<br />

Contextual and Full Field Clustering<br />

The clustering engine can apply clustering to the full content of (stored) fields or it can run an internal highlighter<br />

pass to extract context-snippets before clustering. Highlighting is recommended when the logical snippet field<br />

contains a lot of content (this would affect clustering performance). Highlighting can also increase the quality of<br />

clustering because the content passed to the algorithm will be more focused around the query (it will be<br />

query-specific context). The following parameters control the internal highlighter.<br />

Parameter<br />

Description<br />

<strong>Apache</strong> <strong>Solr</strong> <strong>Reference</strong> <strong>Guide</strong> <strong>6.0</strong><br />

367

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!