Download Chapters 3-6 (.PDF) - ODBMS

CHAPTER 6 

Discussion—Power Laws and 

Deviations 

6.1 POWER LAWS—SLOPE ESTIMATION 

We saw many power laws in the previous sections. Here we describe how to estimate the slope of a 

power law, and how to estimate the goodness of fit. We discuss these issues below, using the detection 

of power laws in degree distributions as an example. 

Computing the power-law exponent: This is no simple task: the power law could be only in the 

tail of the distribution and not over the entire distribution, estimators of the power-law exponent 

could be biased, some required assumptions may not hold, and so on. Several methods are currently 

employed, though there is no clear “winner” at present. 

1. Linear regression on the log-log scale: We could plot the data on a log-log scale, then optionally 

“bin” them into equal-sized buckets, and finally find the slope of the linear fit. However, there 

are at least three problems: (i) this can lead to biased estimates [130], (ii) sometimes the power 

law is only in the tail of the distribution, and the point where the tail begins needs to be 

hand-picked, and (iii) the right end of the distribution is very noisy [215]. However, this is 

the simplest technique, and seems to be the most popular one. 

2. Linear regression after logarithmic binning: This is the same as above, but the bin widths increase 

exponentially as we go toward the tail. In other words, the number of data points in each bin 

is counted, and then the height of each bin is divided by its width to normalize. Plotting the 

histogram on a log-log scale would make the bin sizes equal, and the power law can be fitted to 

the heights of the bins.This reduces the noise in the tail buckets, fixing problem (iii). However, 

binning leads to loss of information; all that we retain in a bin is its average. In addition, issues 

(i) and (ii) still exist. 

3. Regression on the cumulative distribution: We convert the pdf p(x) (that is, the scatter plot) into 

a cumulative distribution F(x): 

∞ ∞ 

F(x) = P(X≥ x) = p(z) = Az −γ 

z=x 

z=x 

(6.1) 

35

Previous page

Next page

1

3

4

5

6

7

8

9

11

12

13

14

15

17

19

21

22

23

24

25

26

27

29

30

31

32

33

34

35

37

38

39

40

41

42

Download Chapters 3-6 (.PDF) - ODBMS

Create successful ePaper yourself

Delete template?

Save as template?