Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
4.2 Ranking<br />
4.2 Ranking<br />
When a search engine gets a search query, it is not feasible to compute all matches. Doing so<br />
would take a long time: for XCreateWindow, computing the matches within all files which<br />
are returned by the index query takes about 20 s to complete.<br />
Given that typical web search engine queries are answered in the fraction of a second, this<br />
is clearly unacceptable. Therefore, the ranking is not only used for presenting the matches<br />
in the best order to the user, but also before even computing any matches this reduces the<br />
amount of data which needs to be processed. This kind of ranking is called “pre-ranking”<br />
from now on. Since results with low ranking are unlikely to be requested by the user, they<br />
are not even computed in the first place.<br />
Each of the ranking factors which are presented in the subsequent chapters aims to assign<br />
a single number to each package or source code file.<br />
4.2.1 Ranking factors which can be pre-computed per-package<br />
Popularity contest installations See section 4.2.4 (page 33).<br />
Number of reverse dependencies See section 4.2.5 (page 34).<br />
Number of bugreports While a relation between bug frequency and package popularity<br />
may exist [3] , the two previous ranking factors also indicate popularity. Aside from<br />
popularity, it is not intuitively clear whether the number of bug reports should be<br />
weighted in favor of or against any particular package. The number of bugreports has<br />
therefore not been considered as part of the ranking within this work.<br />
File modification time See section 4.2.6 (page 35).<br />
Version number cleanliness A very low version number like 0.1 might be an indication<br />
for an immature FOSS project. Therefore, one might intuitively think that it would be<br />
a good idea to use version numbers as a ranking factor: The higher the version, the<br />
better.<br />
However, the perception of what a version number means differs from project to<br />
project. Some projects use the release date as version number 1 while others have<br />
version numbers that asymptotically approach π 2 .<br />
Due to the wildly varying meaning of version numbers, they have not been used for<br />
ranking within <strong>Debian</strong> <strong>Code</strong> <strong>Search</strong>.<br />
4.2.2 Ranking factors which depend on the query<br />
Query contained in file path When the query string is contained in the file path, the file<br />
should get a higher ranking. This is very simple to implement and computationally<br />
cheap.<br />
1 e.g. git-annex, TeXlive, SLIME, axiom, batctl, caspar, …<br />
2 TEX approaches π, Metafont approaches e, there might be others<br />
31