E E 3J2 D ata M ining L ecture 3 Z ipf's L aw Stem m ing & Stop L ists
E E 3J2 D ata M ining L ecture 3 Z ipf's L aw Stem m ing & Stop L ists
E E 3J2 D ata M ining L ecture 3 Z ipf's L aw Stem m ing & Stop L ists
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
More formal analysis<br />
Suppose the alphabet has M characters, plus a<br />
space character <br />
p = p(A) = … = p(Z) = 1/(M+1)<br />
So, the probability of a particular ‘word’ wk of<br />
length k is (remember the spaces before and<br />
after the word)<br />
p(w k) = p (k+2)<br />
EE<strong>3J2</strong> D<strong>ata</strong> M<strong>in<strong>ing</strong></strong> 2008 – l<strong>ecture</strong> 3<br />
Slide 14