You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Abstract<br />
Since the shutdown of Google <strong>Code</strong> <strong>Search</strong> in January 2012, developers of free/libre open<br />
source software (FOSS) have lacked a source code search engine covering the large corpus of<br />
open source computer program code.<br />
Without such a tool, developers could not easily find example code for poorly documented<br />
libraries. They could not quickly determine the scope of a problem — for example, figuring<br />
out how many packages call a specific library function with a bug in it.<br />
This thesis discusses the design and implementation of <strong>Debian</strong> <strong>Code</strong> <strong>Search</strong>, a search engine<br />
based on a re-implementation of the ideas behind Google <strong>Code</strong> <strong>Search</strong>, but with <strong>Debian</strong>’s<br />
FOSS archive as a corpus, and using <strong>Debian</strong>’s clean meta data about software packages to<br />
improve search results.<br />
The work presented includes optimizing Russ Cox’s re-implementation to work with a large<br />
corpus of data, refactoring it as a scalable web application backend, and enriching it with<br />
various ranking factors so that more relevant results are presented first. Detailed analysis of<br />
these main contributions as well as various smaller utilities to update, index and rank <strong>Debian</strong><br />
packages are included.<br />
With the completion of this thesis, <strong>Debian</strong> <strong>Code</strong> <strong>Search</strong> is working and accessible to the<br />
public at http://codesearch.debian.net/. <strong>Debian</strong> <strong>Code</strong> <strong>Search</strong> can be used to search<br />
129 GiB of source code, typically within one second.<br />
iv