Notes on computational linguistics.pdf - UCLA Department of ...
Notes on computational linguistics.pdf - UCLA Department of ...
Notes on computational linguistics.pdf - UCLA Department of ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Stabler - Lx 185/209 2003<br />
With this code we need <strong>on</strong>ly about 2.58 bits per symbol<br />
(126) C<strong>on</strong>sider<br />
12123333123333123312<br />
Here we have P(1) = P(2) = 1<br />
1<br />
4 and P(3) = 2 ,sotheentropyis1.5/bitspersymbol.<br />
The sequence has length 20, so we should be able to encode it with 30 bits.<br />
However, c<strong>on</strong>sider blocks <strong>of</strong> 2. P(12) = 1<br />
1<br />
, P(33) = , and the entropy is 1 bit/symbol.<br />
2 2<br />
For the sequence <strong>of</strong> 10 blocks <strong>of</strong> 2, we need <strong>on</strong>ly 10 bits.<br />
So it is <strong>of</strong>ten worth looking for structure in larger and larger blocks.<br />
8.1.17 Kraft’s inequality and Shann<strong>on</strong>’s theorem<br />
(127) MacMillan:<br />
(128) Kraft (1949):<br />
If uniquely decodable code C has K codewords <strong>of</strong> lengths l1,...,lK then<br />
K<br />
2 −li ≤ 1.<br />
i=1<br />
If a sequence l1,...,lK satisfies the previous inequality, then there is a uniquely decodable<br />
code C that has K codewords <strong>of</strong> lengths l1,...,lK<br />
(129) Shann<strong>on</strong>’s theorem. Using the definiti<strong>on</strong> <strong>of</strong> Hr in (116), Shann<strong>on</strong> (1948) proves the following famous<br />
theorem which specifies the informati<strong>on</strong>-theoretic limits <strong>of</strong> data compressi<strong>on</strong>:<br />
Suppose that X is a first order source with outcomes (or outputs) ΩX. Encoding the characters<br />
<strong>of</strong> ΩX in a code with characters Γ where |Γ |=r>1 requires an average <strong>of</strong> Hr (X) characters<br />
<strong>of</strong> Γ per character <strong>of</strong> ΩX.<br />
Furthermore, for any real number ɛ>0, there is a code that uses an average <strong>of</strong> Hr (X) + ɛ<br />
characters <strong>of</strong> Γ per character <strong>of</strong> ΩX.<br />
8.1.18 String edits and other varieties <strong>of</strong> sequence comparis<strong>on</strong><br />
Overviews <strong>of</strong> string edit distance methods are provided in Hall and Dowling (1980) and Kukich (1992).<br />
Masek and Paters<strong>on</strong> (1980) present a fast algorithm for computing string edit distances.<br />
Ristad (1997) and Ristad and Yianilos (1996) c<strong>on</strong>sider the problem <strong>of</strong> learning string edit distances.<br />
8.2 Probabilisitic c<strong>on</strong>text free grammars and parsing<br />
8.2.1 PCFGs<br />
(130) A probabilistic c<strong>on</strong>text free grammar (PCFG)<br />
where<br />
G =〈Σ,N,(→), S,P〉,<br />
1. Σ,N are finite, n<strong>on</strong>empty sets,<br />
2. S is some symbol in N,<br />
3. the binary relati<strong>on</strong> (→) ⊆ N × (Σ ∪ N) ∗ is also finite (i.e. it has finitely many pairs),<br />
159