13.10.2014 Views

OPTIMIZING THE JAVA VIRTUAL MACHINE INSTRUCTION SET BY ...

OPTIMIZING THE JAVA VIRTUAL MACHINE INSTRUCTION SET BY ...

OPTIMIZING THE JAVA VIRTUAL MACHINE INSTRUCTION SET BY ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

209<br />

Algorithm CountUniqueOccurrences(ArrayOfIntegers list, String testString)<br />

count = 0;<br />

current = list[0];<br />

for (i = 1; i < list.size(); i++)<br />

{<br />

if (list[i] - current >= testString.length())<br />

{<br />

count++;<br />

current = list[i];<br />

}<br />

}<br />

return count;<br />

9.6 Other Uses<br />

Figure 9.6: Algorithm CountUniqueOccurrences<br />

The algorithm presented in this chapter is not restricted to evaluating profile data.<br />

Because it is efficient, offering performance that is linear in the total length of the<br />

input strings when overlapping occurrences are permitted, it can also be applied to<br />

other research areas that examine large strings. Once such area is bioinformatics<br />

which evaluates strings of millions of characters representing DNA strands.<br />

This algorithm may also have applications to data compression. Determining the<br />

substring that contributes the most to a set of strings is equivalent to identifying the<br />

substring that will shrink the set of strings by the largest number of characters if all<br />

occurrences of the substring are removed. This makes the most contributory substring<br />

an ideal candidate for encoding using a short codeword in a data compression<br />

algorithm. The author intends to explore the possibility of applying this algorithm<br />

to Huffman or arithmetic coding as an area of unrelated future work.<br />

9.7 Conclusion<br />

This chapter has defined and solved the Most Contributory Substring Problem. Two<br />

solutions are presented. The first offers a running time that is O(n log |L|) where n is<br />

the total length of the input strings and |L| is the number of input strings. However,

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!