13.02.2013 Views

2 Debian Code Search: An Overview

2 Debian Code Search: An Overview

2 Debian Code Search: An Overview

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

intersection steps<br />

20<br />

15<br />

10<br />

5<br />

0<br />

average steps until ∆R = 0<br />

possible steps<br />

necessary steps<br />

0 10 20 30<br />

# trigrams<br />

40 50 60<br />

3.8 The trigram index<br />

Figure 3.9: It takes only a fraction (≈ 1)<br />

of intersection steps to get to the final result R. 1500<br />

3<br />

random function and variable names have been tested.<br />

Instead, DCS can only use some kind of heuristic to decide when decoding can stop because<br />

the amount of false positives will not be reduced significantly by reading more posting lists.<br />

Figure 3.9 confirms the hunch: It is (on average) not necessary to perform all intersections<br />

to get to the final result R, or very close.<br />

A heuristic which yields a low number of false positives but still saves a considerable<br />

number of steps (and thus time) is:<br />

Stop processing if ∆Pi−1 < 10 and i > 0.70×n (70 % of the posting lists have been decoded).<br />

As figure 3.10 shows, the amount of false positives does not exceed one or two files on average,<br />

while the total speed-up for executing the AND query is ≈ 2×.<br />

# of files<br />

10<br />

8<br />

6<br />

4<br />

2<br />

0<br />

false positives by skipping decoding<br />

false positives<br />

200 400 600 800 1000 1200<br />

query #<br />

total AND-query time<br />

10 ms<br />

8 ms<br />

6 ms<br />

4 ms<br />

2 ms<br />

0 ms<br />

saved time<br />

saved time<br />

total time<br />

200 400 600 800 1000 1200<br />

query #<br />

Figure 3.10: With the heuristic explained above, the amount of false positives does<br />

not exceed two files on average; the total speed-up is ≈ 2×.<br />

23

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!