13.02.2013 Views

2 Debian Code Search: An Overview

2 Debian Code Search: An Overview

2 Debian Code Search: An Overview

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

4.2.6 Modification time of source code files<br />

4.2 Ranking<br />

In general, newer source code is more interesting than old source code. Either the code was<br />

modified to fix a bug, in which case the user is interested in obtaining the latest version of<br />

the code with as many bugfixes as possible. Or the code was written more recently, meaning<br />

that it is more relevant to the user because it solves a current problem.<br />

In <strong>Debian</strong> and Linux in general, there are three different timestamps for each file: The access<br />

time (atime), the change time (ctime) and the modification time (mtime). The difference<br />

between change time and modification time is that changes to the inode (permissions or<br />

owner for example) are recorded in the former, while the latter only gets set when the actual<br />

file contents are modified.<br />

One concern with using the mtimes of individual files within a source package is that —<br />

due to packaging mechanisms — the mtimes might be set to the time at which the package<br />

was created, rendering them useless for our purpose. To determine whether this effect has a<br />

measurable impact, for each source package the following analysis was run: Find each file<br />

within the source package which ends in .c or .h (a quick heuristic for recognizing a portion<br />

of our source code), then calculate the difference between each file’s mtime and the package’s<br />

mtime. For each source package, the average difference was stored.<br />

This analysis can be expressed with the following formula (with f being the list of files for<br />

each source package s):<br />

Amount of source packages<br />

0 10 20 30 40 50 60<br />

ds =<br />

�<br />

f |mtime(f) − mtime(s)|<br />

|f|<br />

mtime difference<br />

0 20000 40000 60000 80000<br />

Average mtime difference [s]<br />

Figure 4.1: Average mtime difference between the source package and its .c or .h files<br />

A little over 60 packages (0.6 % of all analyzed source packages) have an average mtime<br />

difference near zero. In these cases, mtime is not a good ranking factor, but 60 packages is a<br />

negligible number.<br />

35

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!