13.10.2014 Views

OPTIMIZING THE JAVA VIRTUAL MACHINE INSTRUCTION SET BY ...

OPTIMIZING THE JAVA VIRTUAL MACHINE INSTRUCTION SET BY ...

OPTIMIZING THE JAVA VIRTUAL MACHINE INSTRUCTION SET BY ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

9.4 An Algorithm for the Most Contributory Substring<br />

Problem<br />

The algorithm for determining the most contributory substring begins by creating a<br />

suffix tree for s using one of the linear time algorithms published previously. Once<br />

the suffix tree is constructed, a transformation is performed on some leaf nodes in the<br />

tree. The purpose of this transformation is to ensure that each leaf node is reached by<br />

a branch labeled with a string that begins with a sentinel character. Performing this<br />

transformation ensures that all strings from the original input set are represented by<br />

interior nodes in the tree while all strings that occur as a result of forming the merged,<br />

sentinel character delimited string are represented by a leaf node. This transformation<br />

is performed in the following manner.<br />

• Any leaf node that is reached by a branch label that begins with a sentinel<br />

character is left untouched.<br />

• Any leaf node that is reached by a branch labeled with a string that begins<br />

with a letter in Σ is split into two nodes. A new interior node with only one<br />

child is inserted between the leaf and its parent. The branch to the new node<br />

is labeled with all characters before the first sentinel character in the original<br />

branch label. The branch from the new node to the leaf node is labeled with<br />

the remaining characters.<br />

Once this transformation has been performed the number of nodes in the tree<br />

has increased by at most n, retaining a total number of nodes that is O(n). This<br />

transformation can be performed in O(n log |S|) time if the positions of the sentinel<br />

characters are recorded when the concatenated string is formed, which does not impact<br />

the ability to construct the suffix tree in linear time. Recording the positions<br />

of the sentinel characters does not impact the space complexity of the data structure<br />

because the positions can stored as |S| integers. The algorithm employed to transform<br />

the leaf nodes is shown in Figure 9.2. Figure 9.3 shows the example tree after<br />

the splitting algorithm is applied.<br />

A depth first traversal of the tree is performed once the leaf nodes have been split.<br />

The purpose of this traversal is to perform scoring to identify the most contributory<br />

substring. Both the string depth of each node and the number of leaf nodes below<br />

it are determined. The score for each node is computed as the product of the string<br />

depth and the number of leaf nodes below it. Depending on the application, this<br />

203

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!