Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
3 Architecture<br />
server can simply cache requests to /search. To cache index query results, a new cache<br />
could be added in front of requests to index-backend processes.<br />
When <strong>Debian</strong> <strong>Code</strong> <strong>Search</strong> was launched on 2012-11-06, no explicit caching was configured.<br />
Three sample queries were included in the launch announcement 16 : “XCreateWindow”,<br />
“workaround package:linux” and “<strong>An</strong>yEvent::I3 filetype:perl”. These queries were<br />
selected to demonstrate certain features, but also because they are served quickly.<br />
By monitoring the web server log files after sending the announcement, it quickly became<br />
clear that explicit caching of the /search URL would be helpful: people shared interesting<br />
search queries, such as http://codesearch.debian.net/search?q=fuck 17 or http://<br />
codesearch.debian.net/search?q=The+Software+shall+be+used+for+Good%2C+not+<br />
Evil 18 . As table 3.2 shows, these shared queries are the most popular queries. At least some<br />
of them also profited from explicit caching, e.g. “The Software shall be used for Good, not<br />
Evil” with a cache ratio of 80.2 %.<br />
search term hits cached cache ratio<br />
The Software shall be used for Good, not Evil 2066 1657 80.2 %<br />
fuck 1247 423 33.9 %<br />
workaround package:linux 683 128 18.7 %<br />
XCreateWindow 528 71 13.4 %<br />
idiot 265 8 3.0 %<br />
<strong>An</strong>yEvent::I3 filetype:perl 255 35 13.7 %<br />
shit 130 10 7.7 %<br />
FIXME 116 30 25.9 %<br />
B16B00B5 105 45 42.9 %<br />
(babefee1|B16B00B5|0B00B135|deadbeef) 94 38 40.4 %<br />
Table 3.2: Top 20 search queries and their cache hit ratio from 2012-11-07 to 2012-11-14. The<br />
cache was 500 MiB in size and entries expire after 15 minutes; see listing 3.3 (page<br />
29).<br />
The varying cache hit ratios are caused by the different time spans in which the queries are<br />
popular. The top query was so popular that it always stayed in the cache, while other queries<br />
did not stay in the cache for very long.<br />
16 http://lists.debian.org/debian-devel-announce/2012/11/msg00001.html<br />
17 https://twitter.com/antanst/status/266095288266153984 and http://www.reddit.com/r/<br />
programming/comments/12sni3/debian_code_search/c6xz82h<br />
18 Referred to in http://apebox.org/wordpress/rants/456/, a blog post about a harmful software license<br />
28