02.04.2013 Views

Download Chapters 3-6 (.PDF) - ODBMS

Download Chapters 3-6 (.PDF) - ODBMS

Download Chapters 3-6 (.PDF) - ODBMS

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

38 6. DISCUSSION—POWER LAWS AND DEVIATIONS<br />

where μ and σ are parameters and A(μ, σ ) is a constant (used for normalization if y(x) is a<br />

probability distribution). The DGX distribution has been used to fit the degree distribution of a<br />

bipartite “clickstream” graph linking websites and users (Figure 2.2(c)), telecommunications, and<br />

other data.<br />

6.2.3 DOUBLY-PARETO LOGNORMAL (DPLN )<br />

Another deviation is well modeled by the so-called Doubly Pareto Lognormal (dPln). Mitzenmacher<br />

[210] obtained good fits for file size distributions using dPln. Seshadri et al. [245] studied<br />

the distribution of phone calls per customer, and also found it to be a good fit. We will describe the<br />

results of Seshadri et al. below.<br />

Informally, a random variable that follows the dPln distribution looks like the plots of Figure<br />

6.1: in log-log scales, the distribution is approximated by two lines that meet in the middle of<br />

the plot. More specifically, Figure 6.1 shows the empirical pdf (that is, the density histogram) for a<br />

switch in a telephone company, over a time period of several months. Plot (a) gives the distribution<br />

of the number of distinct partners (“callees”) per customer.The overwhelming majority of customers<br />

called only one person; until about 80-100 “callees,” a power law seems to fit well; but after that,<br />

there is a sudden drop, following a power-law with a different slope. This is exactly the behavior of<br />

the dPln: piece-wise linear, in log-log scales. Similarly, Figure 6.1(b) shows the empirical pdf for the<br />

count of phone calls per customer: again, the vast majority of customers make just one phone call,<br />

with a piece-wise linear behavior, and the “knee” at around 200 phone calls. Figure 6.1(c) shows the<br />

empirical pdf for the count of minutes per customer.The qualitative behavior is the same: piece-wise<br />

linear, in log-log scales. Additional plots from the same source ([245]) showed similar behavior for<br />

several other switches and several other time intervals. In fact, the dataset in [245] included four<br />

switches, over month-long periods; each switch recorded calls made to and from callers who were<br />

physically present in a contiguous geographical area.<br />

<strong>PDF</strong><br />

10 -2<br />

10 -4<br />

10 -6<br />

10 0<br />

10 1<br />

Data<br />

Fitted DPLN[2.8, 0.01, 0.35, 3.8]<br />

10 2<br />

Partners<br />

10 3<br />

10 4<br />

<strong>PDF</strong><br />

10 -2<br />

10 -4<br />

10 -6<br />

10 0<br />

10 2<br />

Data<br />

Fitted DPLN[2.8, 0.01, 0.55, 5.6]<br />

10 3<br />

Calls<br />

10 4<br />

10 5<br />

<strong>PDF</strong><br />

10 -2<br />

10 -4<br />

10 -6<br />

Data<br />

Fitted DPLN[2.5, 0.01, 0.45, 6.5]<br />

10 2<br />

Duration<br />

(a) pdf of partners (b) pdf of calls (c) pdf of minutes<br />

Figure 6.1: Results of using dPln to model. (a) the number of call-partners, (b) the number of calls made,<br />

and (c) total duration (in minutes) talked, by users at a telephone-company switch, during a given the<br />

time period.<br />

10 4

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!