13.10.2014 Views

OPTIMIZING THE JAVA VIRTUAL MACHINE INSTRUCTION SET BY ...

OPTIMIZING THE JAVA VIRTUAL MACHINE INSTRUCTION SET BY ...

OPTIMIZING THE JAVA VIRTUAL MACHINE INSTRUCTION SET BY ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

208<br />

sequences at any level in the tree will be at most n and the number of lists for any<br />

one node within the tree will be constrained by |Σ| + |L|. As a result, performing the<br />

multiway merge for each level in the tree is O(n log(|Σ| + |L|)).<br />

The multiway merge is the most expensive operation that must be performed.<br />

Each of the loops based on the number of children of the node is known to be O(n)<br />

because the number of leaves in the entire tree is O(n). Similarly, CountUniqueOccurrences,<br />

shown in Figure 9.6, executes in linear time because the number of elements<br />

in the list being checked will never be larger than the number of leaves in the tree.<br />

As a result, it is concluded that the complexity of the algorithm is O(n log(|Σ| + |L|))<br />

for each level in the tree.<br />

In the worst case, the number of levels in a suffix tree is equal to n resulting in<br />

an overall complexity of O(n 2 log(|Σ| + |L|)). However, this worst case only occurs<br />

in pathological cases. Studies have been conducted that show, contrary to the worst<br />

case height, the expected height of a suffix tree is O(log n) [40, 15, 67]. Because<br />

the height of the tree in the expected case is O(log n) and the complexity of the<br />

NonOverlappingOccurrences algorithm is O(n log(|Σ| + |L|)) per level in the tree, it<br />

is concluded that the overall complexity of the algorithm is O(n log n log(|Σ| + |L|)).<br />

The NonOverlappingScore algorithm is O(n) in space. By its construction, a suffix<br />

tree is O(n), containing at most 2n nodes. After leaf node splitting is performed, the<br />

total number of nodes has grown to 3n at most. The other data structure required<br />

for this algorithm is the list of starting positions for each occurrence of the string<br />

represented by a node. While it is necessary to maintain the list for the current node,<br />

these lists can be discarded once the node has been processed. As a result, each<br />

starting position only ever exists in at most two lists: one of the lists pointed to by<br />

the ChildPositions array for the current node and within the Retval array for the<br />

current node. Consequently, the number of elements that may be stored at any time<br />

is bounded by 2n, making the space requirement for their storage O(n).<br />

After this algorithm executes the number of non-overlapping substrings has been<br />

determined and the resulting score has been computed. Again, the score may be<br />

stored in the node and used subsequently or single variables may be used to store the<br />

best string and score encountered so far. Figure 9.7 shows the tree after scoring has<br />

been performed using the NonOverlappingOccurrences algorithm. It shows that the<br />

best substring is ab with a score of 10 – a different result than the string abab which<br />

was achieved when overlapping strings were permitted.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!