2 Debian Code Search: An Overview

3.5 Resource requirements and load-balancing 

As you can see in figure 3.2, when the user requests http://codesearch.debian.net/, 

the browser first needs to resolve the name codesearch.debian.net to an IP address using 

the Domain Name System (DNS). This is the first step where the load can be balanced between 

multiple servers: the browser will connect to the first IP address it gets, so a DNS server 

can just return all IP addresses in a different order (e.g. round-robin). DNS for debian.net is 

hosted by the Debian project, so Debian Code Search doesn’t have to setup or maintain any 

software or hardware for that. 

After resolving the hostname, the browser will open a TCP connection on port 80 to the 

resolved IP address and send an HTTP request. This request will be answered by the HTTP 

frontend webserver, which is the second step where the load can be balanced and redundancy 

can be added: The frontend can split the load between the available backends and requests 

can still be answered if a certain number of backends fail. 

Furthermore, the backend only has to communicate with the frontend, therefore the burden 

of handling TCP connections — especially slow connections — is entirely on the frontend. 

Requests which can be answered from the cache (such as static pages, images, stylesheets 

and JavaScript files) can be served directly from the frontend without causing any load on 

the backend. The HTTP frontend runs on a Debian Code Search machine. 

dcs-web receives actual search requests and runs on a Debian Code Search machine. This 

might be the same machine as the frontend runs on, or a different, dedicated machine, if the 

demand is so high that this is necessary to maintain good performance. To answer a request, 

dcs-web needs to perform the following steps: 

1. Query all index backends. The index is sharded into multiple index backend processes 

due to technical limitations, see section 3.8.1, page 16. 

2. Rank the results. 

3. Send the results to one of the source backends, which performs the actual searching. 

4. Format the response. 

Each index backend and source backend corresponds to one process, which typically will 

run on the same machine that dcs-web runs on. Should the index size grow so much that it 

cannot be held by one machine anymore, index backends can also run on different machines 

which are connected by a low-latency network. 

Should it turn out that disk bandwidth is a problem, one can run multiple source backends, 

one for each disk. These source backend processes can be run on the same machine with 

different disks or on different machines, just like the index backend processes. 

Index backends, if all deployed on a single machine, need to run on a machine with at 

least 8 GiB of RAM. Not keeping the index in RAM means that each request needs to perform 

a lot of additional random disk accesses, which are particularly slow when the machine does 

not use a solid state disk (SSD) for storing the index [28] . 

Source backends profit from storing their data on a solid state disk (SSD) for low-latency, 

high-bandwidth random file access. Keeping the filesystem metadata in RAM reduces disk 

access even further. The more RAM the machine which hosts the source backend has, the 

better: unused RAM will be used by Linux to cache file contents [24] , so search queries for 

popular files might never even hit the disk at all, if the machine has plenty of RAM. 16 GiB 

11

Previous page

Next page

1

2

3

4

5

6

7

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

2 Debian Code Search: An Overview

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?