13.11.2014 Views

Introduction to Computational Linguistics

Introduction to Computational Linguistics

Introduction to Computational Linguistics

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

17. Using Finite State Transducers 65<br />

In the fourth step, we have no choice in the first line: we have consumed the input,<br />

now we need <strong>to</strong> choose t(2). In the other cases we have both options still.<br />

(171)<br />

And so on.<br />

Step 4<br />

t(0), t(1), t(1), t(2) : 1, aaa| : bb|bb<br />

t(0), t(1), t(2), t(1) : 1, aaa| : bb|bb<br />

t(0), t(1), t(2), t(2) : 1, aaa| : bbb|b<br />

t(0), t(2), t(1), t(1) : 1, aaa| : bbb|b<br />

t(0), t(2), t(1), t(2) : 1, aa|a : bbb|b<br />

t(0), t(2), t(2), t(1) : 1, aa|a : bbb|b<br />

t(0), t(2), t(2), t(2) : 1, a|aa : bbbb|<br />

It should be noted that remembering each and every run is actually excessive.<br />

If two runs terminate in the same configuration (state plus pair of read strings),<br />

they can be extended in the same way. Basically, the number of runs initially<br />

shows exponential growth, while the number of configurations is quadratic in the<br />

length of the smaller of the strings. So, removing this excess can be vital for<br />

computational reaons. In Step 3 we had 4 runs and 3 different end situations, in<br />

Step 4 we have 7 runs, but only 4 different end configurations.<br />

Thus we can improve the algorithm once more as follows. We do not keep a<br />

record of the run, only of its resulting configuration, which consists of the state<br />

and positions at the input and the output string. In each step we just calculate the<br />

possible next configurations for the situations that have recently been added. Each<br />

step will advance the positions by at least one, so we are sure <strong>to</strong> make progress.<br />

How long does this algorithm take <strong>to</strong> work? Let’s count. The configurations are<br />

triples 〈x, j, k〉, where x is a state of the machine, i is the position of the character<br />

<strong>to</strong> the right of the bar on the input string, k the position of the character <strong>to</strong> the right<br />

of the bar on the output string. On 〈⃗x,⃗y〉 there are |Q| · |⃗x| · |⃗y| many such triples.<br />

All the algorithm does is compute which ones are accessible from 〈0, 0, 0〉. At<br />

the end, it looks whether there is one of the form 〈q, |⃗x |⃗y|〉, where q ∈ F (q is<br />

accepting). Thus the algorithm takes |Q| · |⃗x| · |⃗y| time, time propotional in both<br />

lengths. (Recall here the discussion of Section ??. What we are computing is the<br />

reachability relation in the set of configuations. This can be done in linear time.)<br />

One might be inclined <strong>to</strong> think that the first algorithm is faster because it goes<br />

in<strong>to</strong> the run very quickly. However, it is possible <strong>to</strong> show that if the input is

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!