11.05.2016 Views

Apache Solr Reference Guide Covering Apache Solr 6.0

21SiXmO

21SiXmO

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

the Data Tables. The worker nodes execute the query plan and stream tuples back to the worker nodes.<br />

Data Table Tier<br />

The Data Table tier is where the tables reside. Each table is it's own <strong>Solr</strong>Cloud collection. The Data Table layer<br />

receives queries from the worker nodes and emits tuples (search results). The Data Table tier also handles the<br />

initial sorting and partitioning of tuples sent to the workers. This means the tuples are always sorted and<br />

partitioned before they hit the network. The partitioned tuples are sent directly to the correct worker nodes in the<br />

proper sort order, ready to be reduced.<br />

The image above shows the three tiers broken out into different <strong>Solr</strong>Cloud collections for clarity. In practice the /<br />

sql handler and worker collection by default share the same collection.<br />

Note: The image shows the network flow for a single Parallel SQL Query (SQL over MapReduce). This network<br />

flow is used when map_reduce aggregation mode is used for GROUP BY aggregations or the SELECT<br />

DISTINCT query. The traditional <strong>Solr</strong>Cloud network flow (without workers) is used when the facet aggregation<br />

mode is used.<br />

Below is a description of the flow:<br />

1.<br />

2.<br />

3.<br />

4.<br />

5.<br />

6.<br />

7.<br />

8.<br />

9.<br />

The client sends a SQL query to the /sql handler. The request is handled by a single /sql handler<br />

instance.<br />

The /sql handler parses the SQL query and creates the parallel query plan.<br />

The query plan is sent to worker nodes (in green).<br />

The worker nodes execute the plan in parallel. The diagram shows each worker node contacting a<br />

collection in the Data Table tier (in blue).<br />

The collection in the Data Table tier is the table from the SQL query. Notice that the collection has five<br />

shards each with 3 replicas.<br />

Notice that each worker contacts one replica from each shard. Because there are 5 workers, each worker<br />

is returned 1/5 of the search results from each shard. The partitioning is done inside of the Data Table tier<br />

so there is no duplication of data across the network.<br />

Also notice with this design ALL replicas in the data layer are shuffling (sorting & partitioning) data<br />

simultaneously. As the number of shards, replicas and workers grows this design allows for a massive<br />

amount of computing power to be applied to a single query.<br />

The worker nodes process the tuples returned from the Data Table tier in parallel. The worker nodes<br />

perform the relational algebra needed to satisfy the query plan.<br />

The worker nodes stream tuples back to the /sql handler where the final merge is done, and finally the<br />

tuples are streamed back to the client.<br />

SQL Clients and Database Visualization Tools<br />

<strong>Apache</strong> <strong>Solr</strong> <strong>Reference</strong> <strong>Guide</strong> <strong>6.0</strong><br />

433

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!