13.02.2013 Views

2 Debian Code Search: An Overview

2 Debian Code Search: An Overview

2 Debian Code Search: An Overview

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3 Architecture<br />

server can simply cache requests to /search. To cache index query results, a new cache<br />

could be added in front of requests to index-backend processes.<br />

When <strong>Debian</strong> <strong>Code</strong> <strong>Search</strong> was launched on 2012-11-06, no explicit caching was configured.<br />

Three sample queries were included in the launch announcement 16 : “XCreateWindow”,<br />

“workaround package:linux” and “<strong>An</strong>yEvent::I3 filetype:perl”. These queries were<br />

selected to demonstrate certain features, but also because they are served quickly.<br />

By monitoring the web server log files after sending the announcement, it quickly became<br />

clear that explicit caching of the /search URL would be helpful: people shared interesting<br />

search queries, such as http://codesearch.debian.net/search?q=fuck 17 or http://<br />

codesearch.debian.net/search?q=The+Software+shall+be+used+for+Good%2C+not+<br />

Evil 18 . As table 3.2 shows, these shared queries are the most popular queries. At least some<br />

of them also profited from explicit caching, e.g. “The Software shall be used for Good, not<br />

Evil” with a cache ratio of 80.2 %.<br />

search term hits cached cache ratio<br />

The Software shall be used for Good, not Evil 2066 1657 80.2 %<br />

fuck 1247 423 33.9 %<br />

workaround package:linux 683 128 18.7 %<br />

XCreateWindow 528 71 13.4 %<br />

idiot 265 8 3.0 %<br />

<strong>An</strong>yEvent::I3 filetype:perl 255 35 13.7 %<br />

shit 130 10 7.7 %<br />

FIXME 116 30 25.9 %<br />

B16B00B5 105 45 42.9 %<br />

(babefee1|B16B00B5|0B00B135|deadbeef) 94 38 40.4 %<br />

Table 3.2: Top 20 search queries and their cache hit ratio from 2012-11-07 to 2012-11-14. The<br />

cache was 500 MiB in size and entries expire after 15 minutes; see listing 3.3 (page<br />

29).<br />

The varying cache hit ratios are caused by the different time spans in which the queries are<br />

popular. The top query was so popular that it always stayed in the cache, while other queries<br />

did not stay in the cache for very long.<br />

16 http://lists.debian.org/debian-devel-announce/2012/11/msg00001.html<br />

17 https://twitter.com/antanst/status/266095288266153984 and http://www.reddit.com/r/<br />

programming/comments/12sni3/debian_code_search/c6xz82h<br />

18 Referred to in http://apebox.org/wordpress/rants/456/, a blog post about a harmful software license<br />

28

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!