20.07.2013 Views

Notes on computational linguistics.pdf - UCLA Department of ...

Notes on computational linguistics.pdf - UCLA Department of ...

Notes on computational linguistics.pdf - UCLA Department of ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Stabler - Lx 185/209 2003<br />

8.1.15 Informati<strong>on</strong> and entropy<br />

(108) Suppose |ΩX| =10, where these events are equally likely and partiti<strong>on</strong> Ω.<br />

If we find out that X = a, how much informati<strong>on</strong> have we gotten?<br />

9 possibilities are ruled out.<br />

The possibilities are reduced by a factor <strong>of</strong> 10.<br />

But Shann<strong>on</strong> (1948, p32) suggests that a more natural measure <strong>of</strong> the amount <strong>of</strong> informati<strong>on</strong> is the<br />

number <strong>of</strong> “bits.” (A name from J.W. Tukey? Is it an acr<strong>on</strong>ym for BInary digiT?)<br />

How many binary decisi<strong>on</strong>s would it take to pick <strong>on</strong>e element out <strong>of</strong> the 10? We can pick 1 out <strong>of</strong> 8<br />

with 3 bits; 1 out <strong>of</strong> 16 with 4 bits; so 1 out <strong>of</strong> 10 with 4 (and a little redundancy). More precisely, the<br />

number <strong>of</strong> bits we need is log2(10) ≈ 3.32.<br />

Exp<strong>on</strong>entiati<strong>on</strong> and logarithms review<br />

km · kn = km+n k0 = 1<br />

k−n = 1<br />

kn am an = am−n logk x = y iff ky = x<br />

logk(kx ) = x since: kx = kx and so: logk k = 1<br />

and: logk 1 = 0<br />

logk( M<br />

N ) = logk M − logk N<br />

logk(MN) = logk M + logk N<br />

so, in general: logk(M p ) = p · logk M<br />

1<br />

and we will use: logk x = logk x−1 =−1 · logk x =−logk x<br />

E.g. 512 = 29 and so log2 512 = 9. And log10 3000 = 3.48 = 103 · 100.48 .And5−2 = 1<br />

We’ll stick to log2 and “bits,” but another comm<strong>on</strong> choice is loge,where e = lim(1<br />

+ x)<br />

x→0 1<br />

x =<br />

∞<br />

n=0<br />

1<br />

≈ 2.7182818284590452<br />

n!<br />

25 ,solog 5<br />

1<br />

25 =−2<br />

Or, more comm<strong>on</strong>ly, e is defined as the x such that a unit area is found under the curve 1<br />

from u = 1to<br />

u<br />

u = x, that is, it is the positive root x <strong>of</strong> x 1<br />

1 u du = 1.<br />

This number is irrati<strong>on</strong>al, as shown by the Swiss mathematician Le<strong>on</strong>hard Euler (1707-1783), in whose<br />

h<strong>on</strong>or we call it e. In general: ex = ∞ x<br />

k=0<br />

k<br />

k! . And furthermore, as Euler discovered, eπ√ −1 + 1 = 0. This is<br />

sometimes called the most famous <strong>of</strong> all formulas (Maor, 1994). It’s not, but it’s amazing.<br />

Using log 2 gives us “bits,” log e givesus“nats,”andlog 10 givesus“hartleys.”<br />

153

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!