20.07.2013 Views

Notes on computational linguistics.pdf - UCLA Department of ...

Notes on computational linguistics.pdf - UCLA Department of ...

Notes on computational linguistics.pdf - UCLA Department of ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Stabler - Lx 185/209 2003<br />

With this code we need <strong>on</strong>ly about 2.58 bits per symbol<br />

(126) C<strong>on</strong>sider<br />

12123333123333123312<br />

Here we have P(1) = P(2) = 1<br />

1<br />

4 and P(3) = 2 ,sotheentropyis1.5/bitspersymbol.<br />

The sequence has length 20, so we should be able to encode it with 30 bits.<br />

However, c<strong>on</strong>sider blocks <strong>of</strong> 2. P(12) = 1<br />

1<br />

, P(33) = , and the entropy is 1 bit/symbol.<br />

2 2<br />

For the sequence <strong>of</strong> 10 blocks <strong>of</strong> 2, we need <strong>on</strong>ly 10 bits.<br />

So it is <strong>of</strong>ten worth looking for structure in larger and larger blocks.<br />

8.1.17 Kraft’s inequality and Shann<strong>on</strong>’s theorem<br />

(127) MacMillan:<br />

(128) Kraft (1949):<br />

If uniquely decodable code C has K codewords <strong>of</strong> lengths l1,...,lK then<br />

K<br />

2 −li ≤ 1.<br />

i=1<br />

If a sequence l1,...,lK satisfies the previous inequality, then there is a uniquely decodable<br />

code C that has K codewords <strong>of</strong> lengths l1,...,lK<br />

(129) Shann<strong>on</strong>’s theorem. Using the definiti<strong>on</strong> <strong>of</strong> Hr in (116), Shann<strong>on</strong> (1948) proves the following famous<br />

theorem which specifies the informati<strong>on</strong>-theoretic limits <strong>of</strong> data compressi<strong>on</strong>:<br />

Suppose that X is a first order source with outcomes (or outputs) ΩX. Encoding the characters<br />

<strong>of</strong> ΩX in a code with characters Γ where |Γ |=r>1 requires an average <strong>of</strong> Hr (X) characters<br />

<strong>of</strong> Γ per character <strong>of</strong> ΩX.<br />

Furthermore, for any real number ɛ>0, there is a code that uses an average <strong>of</strong> Hr (X) + ɛ<br />

characters <strong>of</strong> Γ per character <strong>of</strong> ΩX.<br />

8.1.18 String edits and other varieties <strong>of</strong> sequence comparis<strong>on</strong><br />

Overviews <strong>of</strong> string edit distance methods are provided in Hall and Dowling (1980) and Kukich (1992).<br />

Masek and Paters<strong>on</strong> (1980) present a fast algorithm for computing string edit distances.<br />

Ristad (1997) and Ristad and Yianilos (1996) c<strong>on</strong>sider the problem <strong>of</strong> learning string edit distances.<br />

8.2 Probabilisitic c<strong>on</strong>text free grammars and parsing<br />

8.2.1 PCFGs<br />

(130) A probabilistic c<strong>on</strong>text free grammar (PCFG)<br />

where<br />

G =〈Σ,N,(→), S,P〉,<br />

1. Σ,N are finite, n<strong>on</strong>empty sets,<br />

2. S is some symbol in N,<br />

3. the binary relati<strong>on</strong> (→) ⊆ N × (Σ ∪ N) ∗ is also finite (i.e. it has finitely many pairs),<br />

159

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!