20.07.2013 Views

Notes on computational linguistics.pdf - UCLA Department of ...

Notes on computational linguistics.pdf - UCLA Department of ...

Notes on computational linguistics.pdf - UCLA Department of ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Stabler - Lx 185/209 2003<br />

?- sum_lengths([],N).<br />

N = 0<br />

Yes<br />

?- sum_lengths([a,akjjdfkpodsaijfospdafpods],N).<br />

N = 26<br />

Yes<br />

Extra credit: The number <strong>of</strong> characters in a string is not a very good measure <strong>of</strong> its size, since it matters<br />

whether the elements <strong>of</strong> the string are taken from a 26 symbol alphabet like {a-z} or a 10 symbol<br />

alphabet like {0-9} or a two symbol alphabet like {0,1}.<br />

The most comm<strong>on</strong> size measures are given in terms <strong>of</strong> two symbol alphabets: we c<strong>on</strong>sider how many<br />

symbols are needed for a binary encoding, how many “bits” are needed.<br />

Now suppose that we want to represent a sequence <strong>of</strong> letters or numbers. Let’s c<strong>on</strong>sider sequences <strong>of</strong><br />

the digits 0-9 first. A naive idea is this: to code up a number like 52 in binary notati<strong>on</strong>, simply represent<br />

each digit in binary notati<strong>on</strong>. Since 5 is 101 and 2 is 10, we would write 10110 for 52. This is obviously<br />

not a good strategy, since there is no indicati<strong>on</strong> <strong>of</strong> the boundaries between the 5 and the 2. The same<br />

sequence would be the code for 26.<br />

Instead, we could just express 52 in base 2, which happens to be 110100. While this is possible, it is a<br />

rather inefficient code, because there are actually infinitely many binary representati<strong>on</strong>s <strong>of</strong> 52:<br />

110100, 0110100, 00110100, 000110100,,...<br />

Adding any number <strong>of</strong> preceding zeroes has no effect! A better code would not be so wasteful.<br />

Here is a better idea. We will represent the numbers with binary sequences as follows:<br />

decimal number 0 1 2 3 4 5 6 …<br />

binary sequence ɛ 0 1 00 01 10 11 …<br />

Now here is the prolog exercise:<br />

i. Write a prolog predicate e(N,L) that relates each decimal number N to its binary sequence representati<strong>on</strong><br />

L.<br />

ii. Write a prolog predicate elength(N,Length) that relates each decimal number N to the length <strong>of</strong> its<br />

binary sequence representati<strong>on</strong>.<br />

iii. We saw above that the length <strong>of</strong> the (smallest – no preceding zeroes) binary representati<strong>on</strong> <strong>of</strong> 52<br />

is 6. Use the definiti<strong>on</strong> you just wrote to have prolog compute the length <strong>of</strong> the binary sequence<br />

encoding for 52.<br />

23

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!