13.02.2013 Views

2 Debian Code Search: An Overview

2 Debian Code Search: An Overview

2 Debian Code Search: An Overview

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4 <strong>Search</strong> result quality<br />

4.10 Overall performance<br />

Of course, outside of theoretical simulations, a search engine is not used by precisely one<br />

person at a time, but more people might want to access it. This section examines the overall<br />

performance of <strong>Debian</strong> <strong>Code</strong> <strong>Search</strong> by simulating real-world usage in various ways.<br />

This section seeks to determine the number of queries per second (qps) that <strong>Debian</strong> <strong>Code</strong><br />

<strong>Search</strong> can handle. Keep in mind that this number is a worst-case limit: Only actual search<br />

queries are measured, while the time which users normally spend looking at the search results<br />

or accessing static pages (the index or help page for example) is neglected. That is, assuming<br />

DCS could handle 5 queries per second, that means that in the worst case, when there are<br />

6 people who want to search in the very same second, one of them has to wait longer than<br />

the others (or everyone has to wait a little bit longer, depending on whether requests are<br />

artificially limited to the backend).<br />

4.10.1 Measurement setup<br />

To measure HTTP requests per second for certain access patterns, there are multiple programs<br />

available:<br />

ApacheBench Shipping with the Apache HTTP server, this tool has been used for years to<br />

measure the performance of the Apache server and others. While it supports a gnuplot<br />

output format, the gnuplot output is a histogram since it is sorted by total request time,<br />

not sequentially.<br />

siege A multi-threaded HTTP load testing and benchmarking utility.<br />

weighttp Developed as part of the lighttpd project, weighttp describes itself as a lightweight<br />

and small benchmarking tool for webservers. It is multi-threaded and event-based.<br />

weighttp’s command-line options are similar to those of ApacheBench, but it does not<br />

support dumping the raw timing measurements.<br />

httperf A single-threaded but non-blocking HTTP performance measurement tool. Its<br />

strength is its workload generator, with which more realistic scenarios can be tested<br />

rather than just hitting the same URL over and over again.<br />

Unfortunately, none of these tools provides a way of obtaining the raw timing measurements.<br />

Therefore, an additional parameter has been implemented in dcs-web which makes dcs-web<br />

measure and save the time spent for processing each request. This measurement file can later<br />

be processed with gnuplot.<br />

To verify that these measurements are correct, two lines of the very simple weighttp source<br />

code have been changed to make it print the request duration to stdout. Then, dcs-web was<br />

started and the benchmark was run in this way:<br />

48<br />

export DCS="http://localhost :28080"<br />

./dcs -web -timing_total_path="dcs -total -1.dat" &<br />

./weighttp -c 1 -n 100 "$DCS/search?q=XCreateWindow" \<br />

| grep 'req duration ' > weighttp -vs-internal.raw

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!