Notes on computational linguistics.pdf - UCLA Department of ...
Notes on computational linguistics.pdf - UCLA Department of ...
Notes on computational linguistics.pdf - UCLA Department of ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Stabler - Lx 185/209 2003<br />
Two harder extra credit problems for the go-getters<br />
(7) More extra credit, part 1. The previous extra credit problem can be solved in lots <strong>of</strong> ways. Here is a<br />
simple way to do part a <strong>of</strong> that problem: to encode N, we count up to N with our binary sequences. But<br />
since the fr<strong>on</strong>t <strong>of</strong> the list is the easiest to access, we use count with the order most-significant-digit to<br />
least-significant-digit, and then reverse the result. Here is a prolog program that does this:<br />
e(N,L) :countupReverse(N,R),<br />
reverse(R,L).<br />
countupReverse(0,[]).<br />
countupReverse(N,L) :-<br />
N>0,<br />
N1 is N-1,<br />
countupReverse(N1,L1),<br />
add<strong>on</strong>e(L1,L).<br />
add<strong>on</strong>e([],[0]).<br />
add<strong>on</strong>e([0|R],[1|R]).<br />
add<strong>on</strong>e([1|R0],[0|R]) :- add<strong>on</strong>e(R0,R).<br />
This is good, but suppose that we want to communicate two numbers in sequence. For this purpose,<br />
our binary representati<strong>on</strong>s are still no good, because you cannot tell where <strong>on</strong>e number ends and the<br />
next <strong>on</strong>e begins.<br />
One way to solve this problem is to decide, in advance, that every number will be represented by a<br />
certain number <strong>of</strong> bits – say 7. This is what is d<strong>on</strong>e in standard ascii codes for example. But blocks<br />
<strong>of</strong> n bits limit you in advance to encoding no more than 2n elements, and they are inefficient if some<br />
symbols are more comm<strong>on</strong> than others.<br />
For many purposes, a better strategy is to use a coding scheme where no symbol (represented by a<br />
sequence <strong>of</strong> bits) is the prefix <strong>of</strong> any other <strong>on</strong>e. That means, we would never get c<strong>on</strong>fused about where<br />
<strong>on</strong>e symbol ends and the next <strong>on</strong>e begins. One extremely simple way to encode numbers in this way<br />
is this. To represent a number like 5, we put in fr<strong>on</strong>t <strong>of</strong> [1,0] an (unambiguous) representati<strong>on</strong> <strong>of</strong> the<br />
length n <strong>of</strong> [1,0] – namely, we use n 1’s followed by a 0. So then, to represent 5, we use [1,1,0,1,0]. The<br />
first 3 bits indicate that the number we have encoded is two bits l<strong>on</strong>g.<br />
So in this notati<strong>on</strong>, we can unambiguously determine what sequence <strong>of</strong> numbers is represented by<br />
[1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1].<br />
This is a binary code for the number sequence [1, 2, 6]. Define a predicate e1(NumberSequence,BinaryCode)<br />
that transforms any sequence <strong>of</strong> numbers into this binary code. (We will improve <strong>on</strong> this code later.)<br />
(8) More extra credit, part 2 (hard!). While the definiti<strong>on</strong> <strong>of</strong> e(N, L) given above works, it involves counting<br />
from 0=[] all the way up to the number you want. Can you find a simpler way?<br />
Hint: The empty sequence ɛ represents 0, and any other sequence <strong>of</strong> binary digits [an,an−1,...,a0]<br />
represents<br />
n<br />
(ai + 1)2 i .<br />
i=0<br />
So for example, [1,0] represents (0 + 1)20 + (1 + 1)21 = 1 + 4 = 5. Equivalently, [an,an−1,...,a0]<br />
represents<br />
2 n+1 n<br />
− 1 + ai2 i .<br />
So for example, [1,0] represents 2 1+1 − 1 + (0 · 2 0 ) + (1 · 2 1 ) = 4 − 1 + 0 + 2 = 5.<br />
(Believe it or not, some students in the class already almost figured this out, instead <strong>of</strong> using a simple<br />
counting strategy like the <strong>on</strong>e I used in the definiti<strong>on</strong> <strong>of</strong> e above.)<br />
25<br />
i=0