Download Chapters 3-6 (.PDF) - ODBMS
Download Chapters 3-6 (.PDF) - ODBMS
Download Chapters 3-6 (.PDF) - ODBMS
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
38 6. DISCUSSION—POWER LAWS AND DEVIATIONS<br />
where μ and σ are parameters and A(μ, σ ) is a constant (used for normalization if y(x) is a<br />
probability distribution). The DGX distribution has been used to fit the degree distribution of a<br />
bipartite “clickstream” graph linking websites and users (Figure 2.2(c)), telecommunications, and<br />
other data.<br />
6.2.3 DOUBLY-PARETO LOGNORMAL (DPLN )<br />
Another deviation is well modeled by the so-called Doubly Pareto Lognormal (dPln). Mitzenmacher<br />
[210] obtained good fits for file size distributions using dPln. Seshadri et al. [245] studied<br />
the distribution of phone calls per customer, and also found it to be a good fit. We will describe the<br />
results of Seshadri et al. below.<br />
Informally, a random variable that follows the dPln distribution looks like the plots of Figure<br />
6.1: in log-log scales, the distribution is approximated by two lines that meet in the middle of<br />
the plot. More specifically, Figure 6.1 shows the empirical pdf (that is, the density histogram) for a<br />
switch in a telephone company, over a time period of several months. Plot (a) gives the distribution<br />
of the number of distinct partners (“callees”) per customer.The overwhelming majority of customers<br />
called only one person; until about 80-100 “callees,” a power law seems to fit well; but after that,<br />
there is a sudden drop, following a power-law with a different slope. This is exactly the behavior of<br />
the dPln: piece-wise linear, in log-log scales. Similarly, Figure 6.1(b) shows the empirical pdf for the<br />
count of phone calls per customer: again, the vast majority of customers make just one phone call,<br />
with a piece-wise linear behavior, and the “knee” at around 200 phone calls. Figure 6.1(c) shows the<br />
empirical pdf for the count of minutes per customer.The qualitative behavior is the same: piece-wise<br />
linear, in log-log scales. Additional plots from the same source ([245]) showed similar behavior for<br />
several other switches and several other time intervals. In fact, the dataset in [245] included four<br />
switches, over month-long periods; each switch recorded calls made to and from callers who were<br />
physically present in a contiguous geographical area.<br />
<strong>PDF</strong><br />
10 -2<br />
10 -4<br />
10 -6<br />
10 0<br />
10 1<br />
Data<br />
Fitted DPLN[2.8, 0.01, 0.35, 3.8]<br />
10 2<br />
Partners<br />
10 3<br />
10 4<br />
<strong>PDF</strong><br />
10 -2<br />
10 -4<br />
10 -6<br />
10 0<br />
10 2<br />
Data<br />
Fitted DPLN[2.8, 0.01, 0.55, 5.6]<br />
10 3<br />
Calls<br />
10 4<br />
10 5<br />
<strong>PDF</strong><br />
10 -2<br />
10 -4<br />
10 -6<br />
Data<br />
Fitted DPLN[2.5, 0.01, 0.45, 6.5]<br />
10 2<br />
Duration<br />
(a) pdf of partners (b) pdf of calls (c) pdf of minutes<br />
Figure 6.1: Results of using dPln to model. (a) the number of call-partners, (b) the number of calls made,<br />
and (c) total duration (in minutes) talked, by users at a telephone-company switch, during a given the<br />
time period.<br />
10 4