11.05.2016 Views

Apache Solr Reference Guide Covering Apache Solr 6.0

21SiXmO

21SiXmO

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Handling text properly will make your users happy by providing them with the best possible results for text<br />

searches.<br />

One technique is using a text field as a catch-all for keyword searching. Most users are not sophisticated about<br />

their searches and the most common search is likely to be a simple keyword search. You can use copyField to<br />

take a variety of fields and funnel them all into a single text field for keyword searches. In the schema.xml file for<br />

the " techproducts" example included with <strong>Solr</strong>, copyField declarations are used to dump the contents of ca<br />

t, name, manu, features, and includes into a single field, text. In addition, it could be a good idea to copy<br />

ID into text in case users wanted to search for a particular product by passing its product number to a keyword<br />

search.<br />

Another technique is using copyField to use the same field in different ways. Suppose you have a field that is<br />

a list of authors, like this:<br />

Schildt, Herbert; Wolpert, Lewis; Davies, P.<br />

For searching by author, you could tokenize the field, convert to lower case, and strip out punctuation:<br />

schildt / herbert / wolpert / lewis / davies / p<br />

For sorting, just use an untokenized field, converted to lower case, with punctuation stripped:<br />

schildt herbert wolpert lewis davies p<br />

Finally, for faceting, use the primary author only via a StringField:<br />

Schildt, Herbert<br />

Related Topics<br />

SchemaXML<br />

DocValues<br />

DocValues are a way of recording field values internally that is more efficient for some purposes, such as sorting<br />

and faceting, than traditional indexing.<br />

Why DocValues?<br />

The standard way that <strong>Solr</strong> builds the index is with an inverted index. This style builds a list of terms found in all<br />

the documents in the index and next to each term is a list of documents that the term appears in (as well as how<br />

many times the term appears in that document). This makes search very fast - since users search by terms,<br />

having a ready list of term-to-document values makes the query process faster.<br />

For other features that we now commonly associate with search, such as sorting, faceting, and highlighting, this<br />

approach is not very efficient. The faceting engine, for example, must look up each term that appears in each<br />

document that will make up the result set and pull the document IDs in order to build the facet list. In <strong>Solr</strong>, this is<br />

maintained in memory, and can be slow to load (depending on the number of documents, terms, etc.).<br />

In Lucene 4.0, a new approach was introduced. DocValue fields are now column-oriented fields with a<br />

document-to-value mapping built at index time. This approach promises to relieve some of the memory<br />

requirements of the fieldCache and make lookups for faceting, sorting, and grouping much faster.<br />

Enabling DocValues<br />

To use docValues, you only need to enable it for a field that you will use it with. As with all schema design, you<br />

need to define a field type and then define fields of that type with docValues enabled. All of these actions are<br />

<strong>Apache</strong> <strong>Solr</strong> <strong>Reference</strong> <strong>Guide</strong> <strong>6.0</strong><br />

95

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!