11.05.2016 Views

Apache Solr Reference Guide Covering Apache Solr 6.0

21SiXmO

21SiXmO

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

worker nodes. It involves sorting and partitioning the entire result set and sending it to worker nodes. In<br />

this approach the tuples arrive at the worker nodes sorted by the GROUP BY fields. The worker nodes<br />

can then rollup the aggregates one group at a time. This allows for unlimited cardinality aggregation, but<br />

you pay the price of sending the entire result set across the network to worker nodes.<br />

facet: This uses the JSON Facet API or StatsComponent for aggregations. In this scenario the<br />

aggregations logic is pushed down into the search engine and only the aggregates are sent across the<br />

network. This is <strong>Solr</strong>'s normal mode of operation. This is fast when the cardinality of GROUP BY fields is<br />

low to moderate. But it breaks down when you have high cardinality fields in the GROUP BY field.<br />

These modes are defined with the aggregationMode property when sending the request to <strong>Solr</strong>.<br />

As noted, the choice between aggregation modes depends on the cardinality of the fields you are working with. If<br />

you have low-to-moderate cardinality in the fields you are grouping by, the 'facet' aggregation mode will give you<br />

a higher performance because only the final groups are returned, very similar to how facets work today. If,<br />

however, you have high cardinality in the fields, the "map_reduce" aggregation mode with worker nodes provide<br />

a much more performant option.<br />

More detail on the architecture of the "map_reduce" query is in the section Parallel Query Architecture.<br />

Configuration<br />

The request handlers used for the SQL interface are configured to load implicitly, meaning there is little to do to<br />

start using this feature.<br />

/sql Request Handler<br />

The /sql handler is the front end of the Parallel SQL interface. All SQL queries are sent to the /sql handler to<br />

be processed. The handler also coordinates the distributed MapReduce jobs when running GROUP BY and SELE<br />

CT DISTINCT queries in map_reduce mode. By default the /sql handler will choose worker nodes from it's<br />

own collection to handle the distributed operations. In this default scenario the collection where the /sql handler<br />

resides acts as the default worker collection for MapReduce queries.<br />

By default, the /sql request handler is configured as an implicit handler, meaning that it is always enabled in<br />

every <strong>Solr</strong> installation and no further configuration is required.<br />

As described below in the section Best Practices, you may want to set up a separate collection for<br />

parallelized SQL queries. If you have high cardinality fields and a large amount of data, please be sure to<br />

review that section and<br />

/stream and /export Request Handlers<br />

The Streaming API is an extensible parallel computing framework for <strong>Solr</strong>Cloud. Streaming Expressions provide<br />

a query language and a serialization format for the Streaming API. The Streaming API provides support for fast<br />

MapReduce allowing it to perform parallel relational algebra on extremely large data sets. Under the covers the<br />

SQL interface parses SQL queries using the Presto SQL Parser. It then translates the queries to the parallel<br />

query plan. The parallel query plan is expressed using the Streaming API and Streaming Expressions.<br />

Like the /sql request handler, the /stream and /export request handlers are configured as implicit handlers,<br />

and no further configuration is required.<br />

Fields<br />

In some cases, fields used in SQL queries must be configured as DocValue fields. If queries are unlimited, all<br />

fields must be DocValue fields. If queries are limited (with the limit clause) then fields do not have to be have<br />

<strong>Apache</strong> <strong>Solr</strong> <strong>Reference</strong> <strong>Guide</strong> <strong>6.0</strong><br />

427

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!