11.05.2016 Views

Apache Solr Reference Guide Covering Apache Solr 6.0

21SiXmO

21SiXmO

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Document Centric Versioning Constraints<br />

Optimistic Concurrency is extremely powerful, and works very efficiently because it uses an internally assigned,<br />

globally unique values for the _version_ field. However, In some situations users may want to configure their<br />

own document specific version field, where the version values are assigned on a per-document basis by an<br />

external system, and have <strong>Solr</strong> reject updates that attempt to replace a document with an "older" version. In<br />

situations like this the DocBasedVersionConstraintsProcessorFactory can be useful.<br />

The basic usage of DocBasedVersionConstraintsProcessorFactory is to configure it in solrconfig.x<br />

ml as part of the UpdateRequestProcessorChain and specify the name of your custom versionField in your<br />

schema that should be checked when validating updates:<br />

<br />

my_version_l<br />

<br />

Once configured, this update processor will reject (HTTP error code 409) any attempt to update an existing<br />

document where the value of the my_version_l field in the "new" document is not greater then the value of<br />

that field in the existing document.<br />

versionField vs _version_<br />

The _version_ field used by <strong>Solr</strong> for its normal optimistic concurrency also has important semantics in<br />

how updates are distributed to replicas in <strong>Solr</strong>Cloud, and MUST be assigned internally by <strong>Solr</strong>. Users<br />

can not re-purpose that field and specify it as the versionField for use in the DocBasedVersionCo<br />

nstraintsProcessorFactory configuration.<br />

DocBasedVersionConstraintsProcessorFactory supports two additional configuration params which are<br />

optional:<br />

ignoreOldUpdates - A boolean option which defaults to false. If set to true then instead of rejecting<br />

updates where the versionField is too low, the update will be silently ignored (and return a status 200<br />

to the client).<br />

deleteVersionParam - A String parameter that can be specified to indicate that this processor should<br />

also inspect Delete By Id commands. The value of this configuration option should be the name of a<br />

request parameter that the processor will now consider mandatory for all attempts to Delete By Id, and<br />

must be be used by clients to specify a value for the versionField which is greater then the existing<br />

value of the document to be deleted. When using this request param, any Delete By Id command with a<br />

high enough document version number to succeed will be internally converted into an Add Document<br />

command that replaces the existing document with a new one which is empty except for the Unique Key<br />

and versionField to keeping a record of the deleted version so future Add Document commands will<br />

fail if their "new" version is not high enough.<br />

Please consult the processor javadocs and test configs for additional information and example usages.<br />

De-Duplication<br />

Preventing duplicate or near duplicate documents from entering an index or tagging documents with a<br />

signature/fingerprint for duplicate field collapsing can be efficiently achieved with a low collision or fuzzy hash<br />

algorithm. <strong>Solr</strong> natively supports de-duplication techniques of this type via the class and allows<br />

for the easy addition of new hash/signature implementations. A Signature can be implemented several ways:<br />

<strong>Apache</strong> <strong>Solr</strong> <strong>Reference</strong> <strong>Guide</strong> <strong>6.0</strong><br />

230

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!