20.07.2013 Views

Notes on computational linguistics.pdf - UCLA Department of ...

Notes on computational linguistics.pdf - UCLA Department of ...

Notes on computational linguistics.pdf - UCLA Department of ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Stabler - Lx 185/209 2003<br />

17.2 A simple ph<strong>on</strong>ology, orthography<br />

Ph<strong>on</strong>ological analysis <strong>of</strong> an acoustic input, and orthographic analysis <strong>of</strong> an written input, will comm<strong>on</strong>ly yield<br />

morethan<strong>on</strong>epossibleanalysis<strong>of</strong>theinputtobeparsed. In fact, the relati<strong>on</strong> the input and the morpheme<br />

sequence to be parsed will typically be many-many: the definite articles a,an will get mapped to the same<br />

syntactic article, and an input element like read will get mapped to the bare verb, the bare noun, the verb +<br />

present, and the verb + past.<br />

Sometimes it is assumed that the set <strong>of</strong> possible analyses can be represented with a regular grammar or<br />

finite state machine. Let’s explore this idea first, before c<strong>on</strong>sidering reas<strong>on</strong>s for thinking that it cannot be right.<br />

(10) For any set S, letSɛ = (S ∪{ɛ}). Thenasusual,afinite state machine(FSM) A =〈Q, Σ,δ,I,F〉 where<br />

Q is a finite set <strong>of</strong> states (= ∅);<br />

Σ1 is a finite set <strong>of</strong> input symbols (= ∅);<br />

δ ⊆ Q × Σɛ × Q,<br />

I ⊆ Q, the initial states;<br />

F ⊆ Q, the final states.<br />

(11) Intuitively, a finite transducer is an acceptor where the transiti<strong>on</strong>s between states are labeled by pairs.<br />

Formally, we let the pairs come from different alphabets: T =〈Q, Σ1, Σ2,δ,I,F〉 where<br />

Q is a finite set <strong>of</strong> states (= ∅);<br />

Σ1 is a finite set <strong>of</strong> input symbols (= ∅);<br />

Σ2 is a finite set <strong>of</strong> output symbols (= ∅);<br />

δ ⊆ Q × Σ ɛ 1 × Σɛ2 × Q,<br />

I ⊆ Q, the initial states;<br />

F ⊆ Q, the final states.<br />

(12) And as usual, we assume that for any state q and any transiti<strong>on</strong> functi<strong>on</strong> δ, 〈q, ɛ, ɛ, q〉 ∈δ.<br />

(13) For any transducers T = 〈Q, Σ1, Σ2,δ1,I,F〉 and T ′ = 〈Q ′ , Σ ′ 1 , Σ′ 2 ,δ2,I ′ ,F ′ 〉,definethecompositi<strong>on</strong><br />

T ◦ T ′ = 〈Q × Q ′ , Σ1, Σ ′ 2 ,δ,I × I′ ,F × F ′ 〉 where δ = {〈〈qi,q ′<br />

′<br />

i 〉,a,b,〈qj,q j 〉〉| for some c ∈ (Σɛ2 ∩<br />

Σ ′ɛ<br />

1 ), 〈qi,a,c,qj〉 ∈δ1 and 〈q ′<br />

i ,c,b,q′ j 〉∈δ2} (Kaplan and Kay, 1994, for example).<br />

(14) And finally, for any transducer T =〈Q, Σ1, Σ2,δ,I,F〉 let its sec<strong>on</strong>d projecti<strong>on</strong> 2(T ) be the FSM A =<br />

〈Q, Σ1,δ ′ ,I,F〉, whereδ ′ ={〈qi,a,qj〉| for some b ∈ Σ ɛ 2 , 〈qi,a,b,qj〉 ∈δ}.<br />

(15) Now for any input s ∈ V ∗ where s = w1w2 ...wn for some n ≥ 0, let string(s) be the transducer<br />

〈{0, 1,...,n}, Σ, Σ,δ0, {0}, {n}〉, whereδ ={〈i−1,wi,wi,i〉| 0 ≤ i}.<br />

(16) Let a (finite state) orthography be a transducer M = 〈Q, V, Σ, δ,I,F〉 such that for any s ∈ V ∗ ,<br />

2(str ing(s) ◦ M) represents the sequences <strong>of</strong> syntactic atoms to be parsed with a grammar whose<br />

vocabulary is Σ. For any morphology M, let the functi<strong>on</strong> inputM from V ∗ to Σ∗ be such that for any<br />

s ∈ V ∗ , input(s) = 2(str ing(s) ◦ M).<br />

263

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!