17.01.2013 Views

Algorithms and Data Structures for External Memory

Algorithms and Data Structures for External Memory

Algorithms and Data Structures for External Memory

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

5<br />

a b<br />

a<br />

3<br />

c<br />

a<br />

0<br />

b<br />

a<br />

4<br />

b<br />

6 6<br />

a b a b<br />

6 10 4 7 7 7 8<br />

abaaba<br />

abaabbabba<br />

abac<br />

bcbcaba<br />

bcbcabb<br />

14.2 String B-Trees 125<br />

Fig. 14.1 Patricia trie representation of a single node of an SB-tree, with branching factor<br />

B = 8. The seven strings used <strong>for</strong> partitioning are pictured at the leaves; in the actual data<br />

structure, pointers to the strings, not the strings themselves, are stored at the leaves. The<br />

pointers to the B children of the SB-tree node are also stored at the leaves.<br />

label <strong>for</strong> each of its outgoing edges. Navigation from root to leaf in<br />

the Patricia trie is done using the bit representation of the strings. For<br />

example, suppose we want to search <strong>for</strong> the leaf “abac.” We start at<br />

the root, which has index 0; the index indicates that we should examine<br />

character 0 of the search string “abac” (namely, “a”), which leads us to<br />

follow the branch labeled “a” (left branch). The next node we encounter<br />

has index 3, <strong>and</strong> so we examine character 3 (namely, “c”), follow the<br />

branch labeled “c” (right branch), <strong>and</strong> arrive at the leaf “abac.”<br />

Searching <strong>for</strong> a text string that does not match one of the leaves<br />

is more complicated <strong>and</strong> exploits the full power of the data structure,<br />

using an amortization technique of Ajtai et al. [25]. Suppose we want<br />

to search <strong>for</strong> “bcbabcba.” Starting at the root, with index 0, we examine<br />

character 0 of “bcbabcba” (namely, “b”) <strong>and</strong> follow the branch<br />

labeled “b” (right branch). We continue searching in this manner, which<br />

leads along the rightmost path, examining indexes 4 <strong>and</strong> 6, <strong>and</strong> eventually<br />

we arrive at the far-right leaf “bcbcbbba.” However, the search<br />

string “bcbabcba” does not match the leaf string “bcbcbbba.” The<br />

problem is that they differ at index 3, but only indexes 0, 4, <strong>and</strong> 6 were<br />

examined in the traversal, <strong>and</strong> thus the difference was not detected.<br />

In order to determine efficiently whether or not there is a match, we<br />

go back <strong>and</strong> sequentially compare the characters of the search string<br />

bcbcbba<br />

bcbcbbba

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!