13.11.2014 Views

Introduction to Computational Linguistics

Introduction to Computational Linguistics

Introduction to Computational Linguistics

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

17. Using Finite State Transducers 66<br />

appropriately chosen, it might take exponential time <strong>to</strong> reach an answer. So, on<br />

certain examples this algorithm is excessively slow. (Nevertheless, sometimes it<br />

might be much faster that the breadth first algorithm.)<br />

Now we look ata different problem: given an input string, what are the possible<br />

output strings? We have already seen that there can be infinitely many, so one<br />

might need a regular expression <strong>to</strong> list them all. Let us not do that here. Let<br />

us look instead in<strong>to</strong> ways of finding at least one output string. First, an artificial<br />

example. The au<strong>to</strong>ma<strong>to</strong>n has the following transitions.<br />

(172)<br />

0 a:c −→ 1<br />

1 a:c −→ 1<br />

2 b:ε<br />

−→ 3<br />

0 a:d −→ 2<br />

2 a:d −→ 2<br />

0 is initial, and 1 and 3 are accepting. This machine has as input strings all strings<br />

of the form a + b?. However, the relation it defines is this:<br />

(173) {〈a n , c n 〉 : n > 0} ∪ {〈a n b, d〉 : n > 0}<br />

The translation of each given string is unique; however, it depends on the last<br />

character of the input! So, we cannot simply think of the machine as taking each<br />

character from the input and immediately returning the corresponding output character.<br />

We have <strong>to</strong> buffer the entire output until we are sure that the run we are<br />

looking at will actually succeed.<br />

Let us look at a real problem. Suppose we design a transducer for Arabic.<br />

Let us say that the surface form katab should be translated in<strong>to</strong> the deep form<br />

ktb+3Pf, but the form baktib in<strong>to</strong> ktb+1Pr. Given: input string baktib. Configuration<br />

is now: 0, |baktib. No output yet. The first character is b. We face<br />

two choices: consider it as part of a root (say, of a form badan <strong>to</strong> be translated<br />

as bdn+3Pf), or <strong>to</strong> consider it is the first letter of the prefix ba. So far there is<br />

nothing that we can do <strong>to</strong> eliminate either option, so we keep them both open. It<br />

is only when we have reached the fourth letter, t, when the situation is clear: if<br />

this was a 3rd perfect form, we should see a. Thus, the algorithm we propose<br />

is this: try <strong>to</strong> find a run that matches the input string and then look at the output<br />

string it defines. The algorithm <strong>to</strong> find a run for the input string can be done as<br />

above, using depth first or breadth first search. Once again, this can be done in<br />

quadratic time. (Basically, the possible runs can be compactly represented and<br />

then converted in<strong>to</strong> an output string.)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!