17.06.2014 Views

Notes for the Lifebox, the Seashell, and the Soul - Rudy Rucker

Notes for the Lifebox, the Seashell, and the Soul - Rudy Rucker

Notes for the Lifebox, the Seashell, and the Soul - Rudy Rucker

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Notes</strong> <strong>for</strong> The <strong>Lifebox</strong>, <strong>the</strong> <strong>Seashell</strong>, <strong>and</strong> <strong>the</strong> <strong>Soul</strong>, by <strong>Rudy</strong> <strong>Rucker</strong><br />

Zipf’s Law Table<br />

The table below illustrates Zipf’s law, based on a word sample consisting of some<br />

five hundred old articles from Time magazine totaling about a quarter of a million words. 8<br />

The idea is that, row by row, <strong>the</strong> last two columns should be roughly equal to each o<strong>the</strong>r. 9<br />

8 I found this data on a web page by computer scientist Jamie Callan of <strong>the</strong> University of<br />

Massachusetts, http://web.archive.org/web/20001005120011/hobart.cs.umass.edu/~allan/cs646-<br />

f97/char_of_text.html. For <strong>the</strong> mo<strong>the</strong>r of all Zipf’s law web pages, see Wentian Li’s site,<br />

http://linkage.rockefeller.edu/wli/zipf/.<br />

9 He scanned an XML edition of Webster's 1913 Revised Unabridged Dictionary, as revised <strong>and</strong><br />

extended by <strong>the</strong> GNU Collaborative International Dictionary of English project, found at<br />

http://www.ibiblio.org/webster/<br />

We defined two words to be linked if ei<strong>the</strong>r appears in <strong>the</strong> definition of <strong>the</strong> o<strong>the</strong>r. Walker created a<br />

table of pairs (L, N) which states <strong>the</strong> number of words N having a given number of links L, with L from 1 to<br />

149. There are three traditional ways of describing this kind of data.<br />

An inverse power law of <strong>the</strong> <strong>for</strong>m N ~ 1/L D . This means that <strong>the</strong> number of words N that have a given<br />

linkiness L is proportional to 1/ L.<br />

A Zipf style law (to be discussed a bit later in this section) in which we rank words from <strong>the</strong> most<br />

linked to <strong>the</strong> least, <strong>and</strong> if R is a word’s rank order, <strong>the</strong>n <strong>the</strong> linkiness L ~ 1 / R E .<br />

A Pareto style law that would say that <strong>for</strong> any linkiness L, <strong>the</strong> number of words M having linkiness<br />

greater than L is M ~ 1 / L F .<br />

A paper by Lada A. Adamic, Zipf, Power-laws, <strong>and</strong> Pareto - a Ranking Tutorial at<br />

http://www.hpl.hp.com/research/idl/papers/ranking/ranking.html, points out that we can get from a power law to<br />

a Pareto style law to a Zipf style law <strong>and</strong> vice versa. Let’s see how to go from our power law data to a Zipf<br />

<strong>for</strong>m.<br />

Power. I have N = c/L D . L is a linkiness level <strong>and</strong> N is <strong>the</strong> number of words at that level.<br />

Pareto. To get <strong>the</strong> number of words with linkiness greater than L 0 , integrate c*L -D with respect to L<br />

from L 0 to infinity. Assume D > 1. I get (c/(1-D))*L (1-D) evaluated from L 0 to ∞, which cooks down to (c/(D-<br />

1)) / L 0 (D-1) . So I can say that if M is <strong>the</strong> number of words with linkiness greater than L, <strong>the</strong>n M = (c/(D-1)) /<br />

L (D-1) .<br />

Zipf. If I rank words in order of linkiness, <strong>and</strong> R is <strong>the</strong> Rth ranking word <strong>and</strong> it has linkiness L, <strong>the</strong>n all<br />

<strong>the</strong> R words of higher rank have linkiness better than <strong>the</strong> word in question, so in fact R is <strong>the</strong> same as <strong>the</strong> M of<br />

<strong>the</strong> Pareto <strong>for</strong>m, so I can say R = (c/(D-1)) / L (D-1) . And now if I turn this around to solve <strong>for</strong> L in terms of R, I<br />

get L = e / R (1/(D-1)) , where e = (c/(D-1)) (1/(D-1)) .<br />

Some Ma<strong>the</strong>matica curve-fitting to Walker’s data gave me <strong>the</strong>se numbers:<br />

Power Law. N = 1,000,000 / L 2.2 .<br />

Pareto Style Law. M is <strong>the</strong> number of words with linkiness above L, <strong>and</strong> M = 833,333 / L 1.1<br />

Zipf Style Law. L is linkiness of <strong>the</strong> Rth ranking word <strong>and</strong> L = 244,312 / R^0.91<br />

p. 90

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!