11.05.2016 Views

Apache Solr Reference Guide Covering Apache Solr 6.0

21SiXmO

21SiXmO

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Merging Index Segments<br />

mergePolicyFactory<br />

Defines how merging segments is done. The default in <strong>Solr</strong> is to use a TieredMergePolicy, which merges<br />

segments of approximately equal size, subject to an allowed number of segments per tier. Other policies<br />

available are the LogByteSizeMergePolicy and LogDocMergePolicy. For more information on these<br />

policies, please see the MergePolicy javadocs.<br />

<br />

10<br />

10<br />

<br />

Controlling Segment Sizes: Merge Factors<br />

The most common adjustment some folks make to the configuration of TieredMergePolicy (or<br />

LogByteSizeMergePolicy) are the "merge factors" to change how many segments should be merged at one<br />

time. For TieredMergePolicy, this is controlled by setting the and options, while LogByteSizeMergePolicy has a single option (all of which default to " 10").<br />

To understand why these options are important, consider what happens when an update is made to an index<br />

using LogByteSizeMergePolicy: Documents are always added to the most recently opened segment. When<br />

a segment fills up, a new segment is created and subsequent updates are placed there. If creating a new<br />

segment would cause the number of lowest-level segments to exceed the mergeFactor value, then all<br />

those segments are merged together to form a single large segment. Thus, if the merge factor is 10, each<br />

merge results in the creation of a single segment that is roughly ten times larger than each of its ten<br />

constituents. When there are 10 of these larger segments, then they in turn are merged into an even larger<br />

single segment. This process can continue indefinitely.<br />

When using TieredMergePolicy, the process is the same, but instead of a single mergeFactor value, the se<br />

gmentsPerTier setting is used as the threshold to decide if a merge should happen, and the maxMergeAt<br />

Once setting determines how many segments should be included in the merge.<br />

Choosing the best merge factors is generally a trade-off of indexing speed vs. searching speed. Having fewer<br />

segments in the index generally accelerates searches, because there are fewer places to look. It also can<br />

also result in fewer physical files on disk. But to keep the number of segments low, merges will occur more<br />

often, which can add load to the system and slow down updates to the index.<br />

Conversely, keeping more segments can accelerate indexing, because merges happen less often, making an<br />

update is less likely to trigger a merge. But searches become more computationally expensive and will likely<br />

be slower, because search terms must be looked up in more index segments. Faster index updates also<br />

means shorter commit turnaround times, which means more timely search results.<br />

Customizing Merge Policies<br />

If the configuration options for the built-in merge policies do not fully suit your use case, you can customize<br />

them: either by creating a custom merge policy factory that you specify in your configuration, or by configuring<br />

a merge policy wrapper which uses a wrapped.prefix configuration option to control how the factory it<br />

wraps will be configured:<br />

<strong>Apache</strong> <strong>Solr</strong> <strong>Reference</strong> <strong>Guide</strong> <strong>6.0</strong><br />

441

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!