36 6. DISCUSSION—POWER LAWS AND DEVIATIONS The approach avoids the loss of data due to averaging inside a histogram bin. To see how the plot of F(x)versus x will look like, we can bound F(x): ∞ x Az −γ dz
6.2. DEVIATIONS FROM POWER LAWS 37 tributions of subsets of the WWW with exponents 2.1 and 2.38 − 2.72 respectively [37, 61, 179], the in-degree distribution of the African web graph with exponent 1.92 [48], a citation graph with exponent 3 [237], distributions of website sizes and traffic [4], and many others. Newman [215] provides a long list of such works. One may wonder: is every distribution a power law? If not, are there deviations? The answer is that, yes, there are deviations. In log-log scales, sometimes a parabola fits better, or some more complicated curves fit better. For example, Pennock et al. [231], and others, have observed deviations from a pure power-law distribution in several datasets. Common deviations are exponential cutoffs, the so-called “lognormal” distribution, and the “doubly-Pareto-lognormal” distribution. We briefly cover them all, next. 6.2.1 EXPONENTIAL CUTOFFS Sometimes the distribution looks like a power law over the lower range of values along the x-axis, but decays very quickly for higher values. Often, this decay is exponential, and this is usually called an exponential cutoff: y(x = k) ∝ e −k/κ k −γ (6.3) where e −k/κ is the exponential cutoff term and k −γ is the power-law term. Amaral et al. [23] find such behaviors in the electric power-grid graph of Southern California and the network of airports, the vertices being airports and the links being non-stop connections between them. They offer two possible explanations for the existence of such cutoffs. One: high-degree nodes might have taken a long time to acquire all their edges and now might be “aged,” and this might lead them to attract fewer new edges (for example, older actors might act in fewer movies). Two: high-degree nodes might end up reaching their “capacity” to handle new edges; this might be the case for airports where airlines prefer a small number of high-degree hubs for economic reasons, but are constrained by limited airport capacity. 6.2.2 LOGNORMALS OR THE “DGX” DISTRIBUTION Pennock et al. [231] recently found while the whole WWW does exhibit power-law degree distributions, subsets of the WWW (such as university homepages and newspaper homepages) deviate significantly. They observed unimodal distributions on the log-log scale. Similar distributions were studied by Bi et al. [46], who found that a discrete truncated lognormal (called the Discrete Gaussian Exponential or “DGX” by the authors) gives a very good fit. A lognormal is a distribution whose logarithm is a Gaussian; its pdf (probability density function) looks like a parabola in log-log scales. The DGX distribution extends the lognormal to discrete distributions (which is what we get in degree distributions), and can be expressed by the formula: y(x = k) = A(μ, σ ) k (ln k − μ)2 exp − 2σ 2 k = 1, 2,... (6.4)