13.10.2014 Views

OPTIMIZING THE JAVA VIRTUAL MACHINE INSTRUCTION SET BY ...

OPTIMIZING THE JAVA VIRTUAL MACHINE INSTRUCTION SET BY ...

OPTIMIZING THE JAVA VIRTUAL MACHINE INSTRUCTION SET BY ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

201<br />

alphabet employed by each of the m strings in L. The alphabet used across all<br />

strings is constructed as Σ 0 ∪ ... ∪ Σ m−1 and will be denoted by Σ.<br />

A generalized suffix tree is a tree data structure that can be constructed from a<br />

collection of strings. An equivalent tree can be constructed for a single string that<br />

is the concatenation of all of the original strings in the set, provided that a special<br />

sentinel character is placed after each original string. Each sentinel character must be<br />

unique and must not occur anywhere else in any string. The set of sentinel characters<br />

will be denoted as S. The examples used in the remainder of this document will use<br />

the punctuation marks # and $ to denote the sentinel characters. The symbol s will<br />

be used to represent the string generated by concatenating the strings in L with a<br />

distinct sentinel character between each string.<br />

Several suffix tree construction algorithms have been developed which are linear<br />

in both time and space [119, 78, 112, 113]. Assuming that the string ends in a sentinel<br />

character, the following properties hold for a suffix tree:<br />

1. The tree contains n leaves where each leaf corresponds to one of the n suffixes<br />

of a string of length n. The number of interior nodes in the tree is bounded by<br />

n guaranteeing that the total number of nodes in the tree is O(n).<br />

2. Each branch in the tree is labeled with one or more characters from the string.<br />

The label is represented by two integers which specify the starting and ending<br />

position as indices in the concatenated string. Collecting the characters as one<br />

traverses from the root of the tree to a leaf will give one of the suffixes of the<br />

string.<br />

3. The first character on each branch leaving a node is distinct. As a result, no<br />

node has more than |Σ| + |S| children.<br />

4. Each non-leaf node has a minimum of two children.<br />

5. The string depth of a node is defined to be the total length of all the labels on<br />

the branches that form a path from the root of the tree to the node. Because<br />

a branch may be labeled with a string of more than 1 character in length, the<br />

string depth of a node will always be greater than or equal to the depth of a<br />

node.<br />

Figure 9.1 shows a suffix tree constructed for the strings ababab and abab after they<br />

have been merged into a single, sentinel character delimited string, ababab$abab#.<br />

The suffix generated by traversing the tree is shown in each leaf node.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!