25.11.2014 Views

Algorithms and Data Structures

Algorithms and Data Structures

Algorithms and Data Structures

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

N.Wirth. <strong>Algorithms</strong> <strong>and</strong> <strong>Data</strong> <strong>Structures</strong>. Oberon version 42<br />

i will now store the comparison point; the variable j will as before point to the corresponding element of the<br />

pattern. See fig. 1.10.<br />

0<br />

i<br />

N-1<br />

A<br />

B<br />

C<br />

D<br />

string<br />

A B C E<br />

pattern<br />

Fig. 1.10. In the notations of the KMP algorithm, the alignment position of the pattern is now i-j (<strong>and</strong><br />

not i, as was the case with the simple algorithm).<br />

The central pont of the algorithm is the comparison of s[i] <strong>and</strong> p[j], if they are equal then i <strong>and</strong> j are<br />

both increased by 1, otherwise the pattern must be shifted by assigning to j of some smaller value D. The<br />

boundary case j = 0 shows that one should provide for a shift of the pattern entirely beyond the current<br />

comparison point (so that p[0] becomes aligned with s[i+1]). For this, it is convenient to choose D = -1.<br />

The main loop of the algorithm takes the following form:<br />

i := 0; j := 0;<br />

0 j M-1<br />

WHILE (i < N) & (j < M) & ((j < 0) OR (s[i] = p[j])) DO<br />

INC( i ); INC( j )<br />

ELSIF (i < N) & (j < M) DO (* (j >= 0) & (s[i] # p[j]) *)<br />

j := D<br />

END<br />

This formulation is admittedly not quite complete, because it contains an unspecified shift value D. We shall<br />

return to it shortly, but first point out that the invariant here is chosen the same as in the simple algorithm; in<br />

the new notation it is Q(i-j) & P(i-j, j).<br />

The post condition of the loop — evaluated as a conjuction of negations of all guards — is given by the<br />

expression (j >= M) OR (i >= N), but in reality only equalities can occur. If the algorithm terminates due to<br />

j = M, the term P(i-j, j) of the invariant implies P(i-M, M) = R(i), that is, a match at position i-M.<br />

Otherwise it terminates with i = N, <strong>and</strong> since j < M, the first term of the invariant, Q(i-j), implies that no<br />

match exists at all.<br />

We must now demonstrate that the algorithm never falsifies the invariant. It is easy to show that it is<br />

established at the beginning with the values i = j = 0. Let us first investigate the effect of the two statements<br />

incrementing i <strong>and</strong> j by 1 in the first branch. They apparently do not falsify Q(i-j), since the difference i-j<br />

remains unchanged. Nor do they falsify P(i-j, j) thanks to the equality in the guard (see the definition of P).<br />

As to the second branch, we shall simply postuate that the value D always be such that replacing j by D will<br />

maintain the invariant.<br />

Provided that D < j the assignment j := D represents a shift of the pattern to the right by j-D positions.<br />

Naturally, we wish this shift to be as large as possible, i.e., D to be as small as possible. This is illustrated<br />

by Fig. 1.11.<br />

i<br />

A B C D<br />

A B C E<br />

string<br />

pattern<br />

A B C D<br />

A B C E<br />

j = 3<br />

D = 0<br />

j = 0<br />

Fig. 1.11. Assignment j := D shifts pattern by j-D positions

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!