13.02.2013 Views

2 Debian Code Search: An Overview

2 Debian Code Search: An Overview

2 Debian Code Search: An Overview

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3.4 Architecture overview<br />

HTTP frontend dcs-web<br />

(nginx)<br />

delivers static assets<br />

load-balances requests<br />

3.4 Architecture overview<br />

queries backends<br />

generates response<br />

index backends (sharded) PostgreSQL source-backend<br />

Figure 3.2: Architecture overview, showing which different processes are involved in handling<br />

requests to <strong>Debian</strong> <strong>Code</strong> <strong>Search</strong>. 1<br />

<strong>Debian</strong> <strong>Code</strong> <strong>Search</strong> consists of three different types of processes (dcs-web, index-backend,<br />

source-backend) running “behind” an nginx webserver and accessing a PostgreSQL database<br />

when starting up.<br />

When a new request comes in to http://codesearch.debian.net/, nginx will deliver<br />

the static index page. However, when the request is not for a static page but an actual search<br />

query, say http://codesearch.debian.net/search?q=XCreateWindow, the request will<br />

be forwarded by nginx to the dcs-web process.<br />

dcs-web first parses the search query, meaning it handles special keywords contained in<br />

the query term, e.g. “filetype:perl”, and stores parameters for pagination. Afterwards,<br />

dcs-web sends requests to every index-backend process and gets back a list of filenames<br />

which possibly contain the search query from the index-backends. See section 3.5 on why<br />

there are multiple index-backend instances. See section 3.8, “The trigram index”, on page 15<br />

for details of the index-backend lookup process.<br />

These filenames are then ranked with ranking data loaded from PostgreSQL in such a way<br />

that the filename which is most likely to contain a good result comes first. Afterwards, the<br />

list of ranked filenames is sent to the source-backend process, which performs the actual<br />

searching using a regular expression matcher, just like the UNIX tool grep(1).<br />

As soon as the source-backend has returned enough results, dcs-web ranks them again with<br />

the new information that was obtained by actually looking into the files and then presents<br />

the results to the user.<br />

1 This figure has been created with dia. The icons are gnomeDIAicons, licensed under the GPL.<br />

9

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!