Notes on computational linguistics.pdf - UCLA Department of ...
Notes on computational linguistics.pdf - UCLA Department of ...
Notes on computational linguistics.pdf - UCLA Department of ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Stabler - Lx 185/209 2003<br />
17.2 A simple ph<strong>on</strong>ology, orthography<br />
Ph<strong>on</strong>ological analysis <strong>of</strong> an acoustic input, and orthographic analysis <strong>of</strong> an written input, will comm<strong>on</strong>ly yield<br />
morethan<strong>on</strong>epossibleanalysis<strong>of</strong>theinputtobeparsed. In fact, the relati<strong>on</strong> the input and the morpheme<br />
sequence to be parsed will typically be many-many: the definite articles a,an will get mapped to the same<br />
syntactic article, and an input element like read will get mapped to the bare verb, the bare noun, the verb +<br />
present, and the verb + past.<br />
Sometimes it is assumed that the set <strong>of</strong> possible analyses can be represented with a regular grammar or<br />
finite state machine. Let’s explore this idea first, before c<strong>on</strong>sidering reas<strong>on</strong>s for thinking that it cannot be right.<br />
(10) For any set S, letSɛ = (S ∪{ɛ}). Thenasusual,afinite state machine(FSM) A =〈Q, Σ,δ,I,F〉 where<br />
Q is a finite set <strong>of</strong> states (= ∅);<br />
Σ1 is a finite set <strong>of</strong> input symbols (= ∅);<br />
δ ⊆ Q × Σɛ × Q,<br />
I ⊆ Q, the initial states;<br />
F ⊆ Q, the final states.<br />
(11) Intuitively, a finite transducer is an acceptor where the transiti<strong>on</strong>s between states are labeled by pairs.<br />
Formally, we let the pairs come from different alphabets: T =〈Q, Σ1, Σ2,δ,I,F〉 where<br />
Q is a finite set <strong>of</strong> states (= ∅);<br />
Σ1 is a finite set <strong>of</strong> input symbols (= ∅);<br />
Σ2 is a finite set <strong>of</strong> output symbols (= ∅);<br />
δ ⊆ Q × Σ ɛ 1 × Σɛ2 × Q,<br />
I ⊆ Q, the initial states;<br />
F ⊆ Q, the final states.<br />
(12) And as usual, we assume that for any state q and any transiti<strong>on</strong> functi<strong>on</strong> δ, 〈q, ɛ, ɛ, q〉 ∈δ.<br />
(13) For any transducers T = 〈Q, Σ1, Σ2,δ1,I,F〉 and T ′ = 〈Q ′ , Σ ′ 1 , Σ′ 2 ,δ2,I ′ ,F ′ 〉,definethecompositi<strong>on</strong><br />
T ◦ T ′ = 〈Q × Q ′ , Σ1, Σ ′ 2 ,δ,I × I′ ,F × F ′ 〉 where δ = {〈〈qi,q ′<br />
′<br />
i 〉,a,b,〈qj,q j 〉〉| for some c ∈ (Σɛ2 ∩<br />
Σ ′ɛ<br />
1 ), 〈qi,a,c,qj〉 ∈δ1 and 〈q ′<br />
i ,c,b,q′ j 〉∈δ2} (Kaplan and Kay, 1994, for example).<br />
(14) And finally, for any transducer T =〈Q, Σ1, Σ2,δ,I,F〉 let its sec<strong>on</strong>d projecti<strong>on</strong> 2(T ) be the FSM A =<br />
〈Q, Σ1,δ ′ ,I,F〉, whereδ ′ ={〈qi,a,qj〉| for some b ∈ Σ ɛ 2 , 〈qi,a,b,qj〉 ∈δ}.<br />
(15) Now for any input s ∈ V ∗ where s = w1w2 ...wn for some n ≥ 0, let string(s) be the transducer<br />
〈{0, 1,...,n}, Σ, Σ,δ0, {0}, {n}〉, whereδ ={〈i−1,wi,wi,i〉| 0 ≤ i}.<br />
(16) Let a (finite state) orthography be a transducer M = 〈Q, V, Σ, δ,I,F〉 such that for any s ∈ V ∗ ,<br />
2(str ing(s) ◦ M) represents the sequences <strong>of</strong> syntactic atoms to be parsed with a grammar whose<br />
vocabulary is Σ. For any morphology M, let the functi<strong>on</strong> inputM from V ∗ to Σ∗ be such that for any<br />
s ∈ V ∗ , input(s) = 2(str ing(s) ◦ M).<br />
263