Notes for the Lifebox, the Seashell, and the Soul - Rudy Rucker
Notes for the Lifebox, the Seashell, and the Soul - Rudy Rucker
Notes for the Lifebox, the Seashell, and the Soul - Rudy Rucker
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>Notes</strong> <strong>for</strong> The <strong>Lifebox</strong>, <strong>the</strong> <strong>Seashell</strong>, <strong>and</strong> <strong>the</strong> <strong>Soul</strong>, by <strong>Rudy</strong> <strong>Rucker</strong><br />
Zipf’s Law Table<br />
The table below illustrates Zipf’s law, based on a word sample consisting of some<br />
five hundred old articles from Time magazine totaling about a quarter of a million words. 8<br />
The idea is that, row by row, <strong>the</strong> last two columns should be roughly equal to each o<strong>the</strong>r. 9<br />
8 I found this data on a web page by computer scientist Jamie Callan of <strong>the</strong> University of<br />
Massachusetts, http://web.archive.org/web/20001005120011/hobart.cs.umass.edu/~allan/cs646-<br />
f97/char_of_text.html. For <strong>the</strong> mo<strong>the</strong>r of all Zipf’s law web pages, see Wentian Li’s site,<br />
http://linkage.rockefeller.edu/wli/zipf/.<br />
9 He scanned an XML edition of Webster's 1913 Revised Unabridged Dictionary, as revised <strong>and</strong><br />
extended by <strong>the</strong> GNU Collaborative International Dictionary of English project, found at<br />
http://www.ibiblio.org/webster/<br />
We defined two words to be linked if ei<strong>the</strong>r appears in <strong>the</strong> definition of <strong>the</strong> o<strong>the</strong>r. Walker created a<br />
table of pairs (L, N) which states <strong>the</strong> number of words N having a given number of links L, with L from 1 to<br />
149. There are three traditional ways of describing this kind of data.<br />
An inverse power law of <strong>the</strong> <strong>for</strong>m N ~ 1/L D . This means that <strong>the</strong> number of words N that have a given<br />
linkiness L is proportional to 1/ L.<br />
A Zipf style law (to be discussed a bit later in this section) in which we rank words from <strong>the</strong> most<br />
linked to <strong>the</strong> least, <strong>and</strong> if R is a word’s rank order, <strong>the</strong>n <strong>the</strong> linkiness L ~ 1 / R E .<br />
A Pareto style law that would say that <strong>for</strong> any linkiness L, <strong>the</strong> number of words M having linkiness<br />
greater than L is M ~ 1 / L F .<br />
A paper by Lada A. Adamic, Zipf, Power-laws, <strong>and</strong> Pareto - a Ranking Tutorial at<br />
http://www.hpl.hp.com/research/idl/papers/ranking/ranking.html, points out that we can get from a power law to<br />
a Pareto style law to a Zipf style law <strong>and</strong> vice versa. Let’s see how to go from our power law data to a Zipf<br />
<strong>for</strong>m.<br />
Power. I have N = c/L D . L is a linkiness level <strong>and</strong> N is <strong>the</strong> number of words at that level.<br />
Pareto. To get <strong>the</strong> number of words with linkiness greater than L 0 , integrate c*L -D with respect to L<br />
from L 0 to infinity. Assume D > 1. I get (c/(1-D))*L (1-D) evaluated from L 0 to ∞, which cooks down to (c/(D-<br />
1)) / L 0 (D-1) . So I can say that if M is <strong>the</strong> number of words with linkiness greater than L, <strong>the</strong>n M = (c/(D-1)) /<br />
L (D-1) .<br />
Zipf. If I rank words in order of linkiness, <strong>and</strong> R is <strong>the</strong> Rth ranking word <strong>and</strong> it has linkiness L, <strong>the</strong>n all<br />
<strong>the</strong> R words of higher rank have linkiness better than <strong>the</strong> word in question, so in fact R is <strong>the</strong> same as <strong>the</strong> M of<br />
<strong>the</strong> Pareto <strong>for</strong>m, so I can say R = (c/(D-1)) / L (D-1) . And now if I turn this around to solve <strong>for</strong> L in terms of R, I<br />
get L = e / R (1/(D-1)) , where e = (c/(D-1)) (1/(D-1)) .<br />
Some Ma<strong>the</strong>matica curve-fitting to Walker’s data gave me <strong>the</strong>se numbers:<br />
Power Law. N = 1,000,000 / L 2.2 .<br />
Pareto Style Law. M is <strong>the</strong> number of words with linkiness above L, <strong>and</strong> M = 833,333 / L 1.1<br />
Zipf Style Law. L is linkiness of <strong>the</strong> Rth ranking word <strong>and</strong> L = 244,312 / R^0.91<br />
p. 90