11.07.2015 Views

A Data Streaming Algorithm for Estimating Entropies of OD Flows

A Data Streaming Algorithm for Estimating Entropies of OD Flows

A Data Streaming Algorithm for Estimating Entropies of OD Flows

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Fraction <strong>of</strong> experiments10.90.80.70.60.50.40.30.20.100 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5Relative Error35.5%14.5%5.3%3.3%2.5%Figure 8: Varying fraction <strong>of</strong> traffic from ingressFraction <strong>of</strong> experiments10.90.80.70.60.50.40.30.20.10Entropy error0 0.05 0.1 0.15 0.2 0.25Relative ErrorFigure 9: Error distribution <strong>for</strong> actual entropysource and destination. In Figure 9 we observe that the errorplot <strong>for</strong> the entropy is comparable to that <strong>for</strong> the entropynorm. Hence, this confirms the fact that our algorithm is arobust estimator <strong>of</strong> the entropy <strong>of</strong> <strong>OD</strong> flows.9. RELATED WORKThere has been considerable previous work in computingthe traffic matrix in a network [18, 21, 22, 23, 27]. The trafficmatrix is simply the matrix defined by the packet (byte)counts between each pair <strong>of</strong> ingress and egress nodes in thenetwork over some measurement interval. For fine-grainednetwork measurement we are sometimes interested in theflow matrix [28], which quantifies the volume <strong>of</strong> the trafficbetween individual <strong>OD</strong> flows. In this paper we propose thecomputation <strong>of</strong> the entropy <strong>of</strong> every <strong>OD</strong> flows, which givesus more in<strong>for</strong>mation than a simple traffic matrix and hasconsiderably less overhead than maintaining the entire flowmatrix.In the last few years, entropy has been suggested as a usefulmeasure <strong>for</strong> different network-monitoring applications.Lakhina et al. [15] use the entropy measure to per<strong>for</strong>m anomalydetection and network diagnosis. In<strong>for</strong>mation measures suchas entropy have been suggested <strong>for</strong> tracking malicious networkactivity [11, 24]. Entropy has been used by Xu etal. [25] to infer patterns <strong>of</strong> interesting activity by using itto cluster traffic. For detecting specific types <strong>of</strong> attacks,researchers have suggested the use <strong>of</strong> entropy <strong>of</strong> differenttraffic features <strong>for</strong> worm [24] and DDoS detection [11]. Recently,it has been shown that entropy-based techniques <strong>for</strong>anomaly detection are much more resistant to the effects <strong>of</strong>sampling [3] than other, volume-based methods.The use <strong>of</strong> stable distributions to produce a sketch wasfirst proposed in [12] to measure the L 1 distance betweentwo streams. This result was generalized to the L p distance<strong>for</strong> all 0 < p ≤ 2 in [8, 13]. This sketch data structure isthe starting point <strong>for</strong> the one that we propose in this paper.The main difference is that we have to make several keymodifications to make it work in practice. In particular, wehave to ensure that the number <strong>of</strong> updates per packet issmall enough to feasibly per<strong>for</strong>m in realtime.Other than <strong>for</strong> p = 1, 2, there is no known closed <strong>for</strong>m<strong>for</strong> the p-stable distribution. To independently draw valuesfrom an arbitrary p-stable distribution, we make use <strong>of</strong> the<strong>for</strong>mula proposed by [6]. Since this <strong>for</strong>mula is expensiveto compute, we create a lookup table to hold several precomputedvalues.In [7] it is suggested that the stable distribution sketchcan be used as a building block to compute empirical entropy,but no methods <strong>for</strong> doing this are suggested. Moreimportantly, there are already known to be algorithms thatwork well <strong>for</strong> the single stream case [16] and we believe thatit is <strong>for</strong> this distributed (i.e., traffic matrix) setting that thestable sketch really shines.10. CONCLUSIONIn this paper we motivate the problem <strong>of</strong> estimating theentropy between origin destination pairs in a network andpresent an algorithm <strong>for</strong> solving this problem. Along theway, we present a completely novel scheme <strong>for</strong> estimatingentropy, introduce applications <strong>for</strong> non-standard L p norms,present an extension <strong>of</strong> Indyk’s algorithm, and show howit can be used in our distributed setting. Via simulationon real-world data, collected at a tier-1 ISP, we are able todemonstrate that our algorithm is practically viable.11. REFERENCES[1] A. Chakrabarti, K. Do Ba, and S. Muthukrishnan.<strong>Estimating</strong> entropy and entropy norm on data streams. InSTACS, 2006.[2] L. Bhuvanagiri and S. Ganguly. <strong>Estimating</strong> entropy overdata streams. In ESA, 2006.[3] D. Brauckh<strong>of</strong>f, B. Tellenbach, A. Wagner, M. May, andA. Lakhina. Impact <strong>of</strong> packet sampling on anomalydetection metrics. In IMC, 2006.[4] G. Casella and R. L. Berger. Statistical Inference. Duxbury,2nd edition, 2002.[5] A. Chakrabarti and G. Cormode. A near-optimal algorithm<strong>for</strong> computing the entropy <strong>of</strong> a stream. In S<strong>OD</strong>A, 2007.[6] J. M. Chambers, C. L. Mallows, and B. W. Stuck. Amethod <strong>for</strong> simulating stable random variables. Journal <strong>of</strong>the American Statistical Association, 71(354), 1976.[7] G. Cormode. Stable distributions <strong>for</strong> stream computations:It’s as easy as 0,1,2. In Workshop on Management andProcessing <strong>of</strong> <strong>Data</strong> Streams, 2003.[8] G. Cormode, P. Indyk, N. Koudas, and S. Muthukrishnan.Fast mining <strong>of</strong> massive tabular data via approximatedistance computations. In ICDE, 2002.[9] M. Durand and P. Flajolet. Loglog counting <strong>of</strong> largecardinalities. In ESA, 2003.[10] C. Estan and G. Varghese. New Directions in TrafficMeasurement and Accounting. In SIGCOMM, Aug. 2002.[11] L. Feinstein, D. Schnackenberg, R. Balupari, andD. Kindred. Statistical approaches to DDoS attackdetection and response. In Proceedings <strong>of</strong> the DARPAIn<strong>for</strong>mation Survivability Conference and Exposition, 2003.[12] P. Indyk. Stable distributions, pseudorandom generators,embeddings and data stream computation. In FOCS, 2000.[13] P. Indyk. Stable distributions, pseudorandom generators,embeddings, and data stream computation. J. ACM,53(3):307–323, 2006.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!