20.07.2013 Views

Notes on computational linguistics.pdf - UCLA Department of ...

Notes on computational linguistics.pdf - UCLA Department of ...

Notes on computational linguistics.pdf - UCLA Department of ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Stabler - Lx 185/209 2003<br />

Two harder extra credit problems for the go-getters<br />

(7) More extra credit, part 1. The previous extra credit problem can be solved in lots <strong>of</strong> ways. Here is a<br />

simple way to do part a <strong>of</strong> that problem: to encode N, we count up to N with our binary sequences. But<br />

since the fr<strong>on</strong>t <strong>of</strong> the list is the easiest to access, we use count with the order most-significant-digit to<br />

least-significant-digit, and then reverse the result. Here is a prolog program that does this:<br />

e(N,L) :countupReverse(N,R),<br />

reverse(R,L).<br />

countupReverse(0,[]).<br />

countupReverse(N,L) :-<br />

N>0,<br />

N1 is N-1,<br />

countupReverse(N1,L1),<br />

add<strong>on</strong>e(L1,L).<br />

add<strong>on</strong>e([],[0]).<br />

add<strong>on</strong>e([0|R],[1|R]).<br />

add<strong>on</strong>e([1|R0],[0|R]) :- add<strong>on</strong>e(R0,R).<br />

This is good, but suppose that we want to communicate two numbers in sequence. For this purpose,<br />

our binary representati<strong>on</strong>s are still no good, because you cannot tell where <strong>on</strong>e number ends and the<br />

next <strong>on</strong>e begins.<br />

One way to solve this problem is to decide, in advance, that every number will be represented by a<br />

certain number <strong>of</strong> bits – say 7. This is what is d<strong>on</strong>e in standard ascii codes for example. But blocks<br />

<strong>of</strong> n bits limit you in advance to encoding no more than 2n elements, and they are inefficient if some<br />

symbols are more comm<strong>on</strong> than others.<br />

For many purposes, a better strategy is to use a coding scheme where no symbol (represented by a<br />

sequence <strong>of</strong> bits) is the prefix <strong>of</strong> any other <strong>on</strong>e. That means, we would never get c<strong>on</strong>fused about where<br />

<strong>on</strong>e symbol ends and the next <strong>on</strong>e begins. One extremely simple way to encode numbers in this way<br />

is this. To represent a number like 5, we put in fr<strong>on</strong>t <strong>of</strong> [1,0] an (unambiguous) representati<strong>on</strong> <strong>of</strong> the<br />

length n <strong>of</strong> [1,0] – namely, we use n 1’s followed by a 0. So then, to represent 5, we use [1,1,0,1,0]. The<br />

first 3 bits indicate that the number we have encoded is two bits l<strong>on</strong>g.<br />

So in this notati<strong>on</strong>, we can unambiguously determine what sequence <strong>of</strong> numbers is represented by<br />

[1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1].<br />

This is a binary code for the number sequence [1, 2, 6]. Define a predicate e1(NumberSequence,BinaryCode)<br />

that transforms any sequence <strong>of</strong> numbers into this binary code. (We will improve <strong>on</strong> this code later.)<br />

(8) More extra credit, part 2 (hard!). While the definiti<strong>on</strong> <strong>of</strong> e(N, L) given above works, it involves counting<br />

from 0=[] all the way up to the number you want. Can you find a simpler way?<br />

Hint: The empty sequence ɛ represents 0, and any other sequence <strong>of</strong> binary digits [an,an−1,...,a0]<br />

represents<br />

n<br />

(ai + 1)2 i .<br />

i=0<br />

So for example, [1,0] represents (0 + 1)20 + (1 + 1)21 = 1 + 4 = 5. Equivalently, [an,an−1,...,a0]<br />

represents<br />

2 n+1 n<br />

− 1 + ai2 i .<br />

So for example, [1,0] represents 2 1+1 − 1 + (0 · 2 0 ) + (1 · 2 1 ) = 4 − 1 + 0 + 2 = 5.<br />

(Believe it or not, some students in the class already almost figured this out, instead <strong>of</strong> using a simple<br />

counting strategy like the <strong>on</strong>e I used in the definiti<strong>on</strong> <strong>of</strong> e above.)<br />

25<br />

i=0

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!