23.11.2014 Views

Data Structures and Algorithms in Java[1].pdf - Fulvio Frisone

Data Structures and Algorithms in Java[1].pdf - Fulvio Frisone

Data Structures and Algorithms in Java[1].pdf - Fulvio Frisone

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

us<strong>in</strong>g an alternative shift heuristic to the partially matched text str<strong>in</strong>g, whenever it<br />

shifts the pattern more than the character-jump heuristic. This alternative shift<br />

heuristic is based on apply<strong>in</strong>g the ma<strong>in</strong> idea from the Knuth-Morris-Pratt pattern<br />

match<strong>in</strong>g algorithm, which we discuss next.<br />

12.2.3 The Knuth-Morris-Pratt Algorithm<br />

In study<strong>in</strong>g the worst-case performance of the brute-force <strong>and</strong> BM pattern match<strong>in</strong>g<br />

algorithms on specific <strong>in</strong>stances of the problem, such as that given <strong>in</strong> Example 12.3,<br />

we should notice a major <strong>in</strong>efficiency. Specifically, we may perform many<br />

comparisons while test<strong>in</strong>g a potential placement of the pattern aga<strong>in</strong>st the text, yet if<br />

we discover a pattern character that does not match <strong>in</strong> the text, then we throw away<br />

all the <strong>in</strong>formation ga<strong>in</strong>ed by these comparisons <strong>and</strong> start over aga<strong>in</strong> from scratch<br />

with the next <strong>in</strong>cremental placement of the pattern. The Knuth-Morris-Pratt (or<br />

"KMP") algorithm, discussed <strong>in</strong> this section, avoids this waste of <strong>in</strong>formation <strong>and</strong>,<br />

<strong>in</strong> so do<strong>in</strong>g, it achieves a runn<strong>in</strong>g time of O(n + m), which is optimal <strong>in</strong> the worst<br />

case. That is, <strong>in</strong> the worst case any pattern match<strong>in</strong>g algorithm will have to exam<strong>in</strong>e<br />

all the characters of the text <strong>and</strong> all the characters of the pattern at least once.<br />

The Failure Function<br />

The ma<strong>in</strong> idea of the KMP algorithm is to preprocess the pattern str<strong>in</strong>g P so as to<br />

compute failure function f that <strong>in</strong>dicates the proper shift of P so that, to the<br />

largest extent possible, we can reuse previously performed comparisons.<br />

Specifically, the failure function f(j) is def<strong>in</strong>ed as the length of the longest prefix<br />

of P that is a suffix of P[1..j] (note that we did not put P[0..j] here). We also use<br />

the convention that f(0) = 0. Later, we will discuss how to compute the failure<br />

function efficiently. The importance of this failure function is that it "encodes"<br />

repeated substr<strong>in</strong>gs <strong>in</strong>side the pattern itself.<br />

Example 12.4: Consider the pattern str<strong>in</strong>g P = "abacab" from Example 12.3.<br />

The Knuth-Morris-Pratt (KMP) failure function f(j) for the str<strong>in</strong>g P is as shown <strong>in</strong><br />

the follow<strong>in</strong>g table:<br />

The KMP pattern match<strong>in</strong>g algorithm, shown <strong>in</strong> Code Fragment 12.4,<br />

<strong>in</strong>crementally processes the text str<strong>in</strong>g T compar<strong>in</strong>g it to the pattern str<strong>in</strong>g P. Each<br />

time there is a match, we <strong>in</strong>crement the current <strong>in</strong>dices. On the other h<strong>and</strong>, if there<br />

is a mismatch <strong>and</strong> we have previously made progress <strong>in</strong> P, then we consult the<br />

failure function to determ<strong>in</strong>e the new <strong>in</strong>dex <strong>in</strong> P where we need to cont<strong>in</strong>ue<br />

check<strong>in</strong>g P aga<strong>in</strong>st T. Otherwise (there was a mismatch <strong>and</strong> we are at the<br />

757

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!