11.05.2016 Views

Apache Solr Reference Guide Covering Apache Solr 6.0

21SiXmO

21SiXmO

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

The default language can also be set using Carrot2-specific algorithm attributes (in this case the MultilingualClust<br />

ering.defaultLanguage attribute).<br />

Tweaking Algorithm Settings<br />

The algorithms that come with <strong>Solr</strong> are using their default settings which may be inadequate for all data sets. All<br />

algorithms have lexical resources and resources (stop words, stemmers, parameters) that may require tweaking<br />

to get better clusters (and cluster labels). For Carrot2-based algorithms it is probably best to refer to a dedicated<br />

tuning application called Carrot2 Workbench (screenshot below). From this application one can export a set of<br />

algorithm attributes as an XML file, which can be then placed under the location pointed to by carrot.resourc<br />

esDir.<br />

Providing Defaults<br />

The default attributes for all engines (algorithms) declared in the clustering component are placed under carrot<br />

.resourcesDir and with an expected file name of engineName-attributes.xml. So for an engine named<br />

lingo and the default value of carrot.resourcesDir, the attributes would be read from a file in conf/clus<br />

tering/carrot2/lingo-attributes.xml.<br />

An example XML file changing the default language of documents to Polish is shown below.<br />

<br />

<br />

<br />

attributes<br />

<br />

<br />

<br />

<br />

<br />

<br />

Tweaking at Query-Time<br />

The clustering component and Carrot2 clustering algorithms can accept query-time attribute overrides. Note that<br />

certain things (for example lexical resources) can only be initialized once (at startup, via the XML configuration<br />

files).<br />

An example query that changes the LingoClusteringAlgorithm.desiredClusterCountBase parameter<br />

for the Lingo algorithm: http://localhost:8983/solr/techproducts/clustering?q=*:*&rows=100&LingoClusteringAlgor<br />

ithm.desiredClusterCountBase=20.<br />

The clustering engine (the algorithm declared in solrconfig.xml) can also be changed at runtime by passing<br />

clustering.engine=name request attribute: http://localhost:8983/solr/techproducts/clustering?q=*:*&rows=10<br />

0&clustering.engine=kmeans<br />

Performance Considerations<br />

Dynamic clustering of search results comes with two major performance penalties:<br />

Increased cost of fetching a larger-than-usual number of search results (50, 100 or more documents),<br />

Additional computational cost of the clustering itself.<br />

<strong>Apache</strong> <strong>Solr</strong> <strong>Reference</strong> <strong>Guide</strong> <strong>6.0</strong><br />

369

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!