11.07.2015 Views

A Data Streaming Algorithm for Estimating Entropies of OD Flows

A Data Streaming Algorithm for Estimating Entropies of OD Flows

A Data Streaming Algorithm for Estimating Entropies of OD Flows

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

hardware problems) and is cost out <strong>for</strong> maintenance. Thetraffic flows that used to be carried on that link will mostlikely be redistributed to alternative paths. In the very rarecase where no alternative path exists or the network experienceslong convergence delay, the flows may abort completely.In either case, traffic distribution on a number <strong>of</strong>links across the network will change. It is possible that thetraffic volume changes on those impacted links are small,and sometimes too small to be detected on each link usingvolume based detection algorithms. However, the overallchange in the traffic distribution across the network can bequite large. If so, the examination <strong>of</strong> traffic entropy alongwith the traffic matrix <strong>for</strong> all <strong>OD</strong> flows is likely to enableus to capture such distributed and dynamically occurringevents.Another example is a DoS attack. When there is a distributedDoS attack launched from outside the network, itsimpact may not be immediately visible in the traffic matrixin terms <strong>of</strong> traffic volume change. However, it is very important<strong>for</strong> operators to be able to detect and mitigate theattack at the earliest time to minimize the possible negativeimpact on their services. That is, it may be too late<strong>for</strong> certain types <strong>of</strong> events to be detected when their impactbecomes visible in the traffic volume matrix. In addition,certain types <strong>of</strong> DoS attacks are difficult to detect due tothe low-rate nature <strong>of</strong> attack flows [14, 26]. In such cases,we hope to be able to achieve real time detection based ontraffic entropy estimation.1.2 Our solution and contributionsIn this work, we propose the first solution to the challengingproblem <strong>of</strong> estimating the entropies <strong>of</strong> all <strong>OD</strong> flows.Our key innovation is to invent a sketch that allows us toapproximately estimate not only (a) the entropy <strong>of</strong> a datastream, but also (b) the entropy <strong>of</strong> the intersection <strong>of</strong> twodata streams A and B from the sketches <strong>of</strong> A and B. We referto property (b) as the “intersection measurable property”(IMP) in the sequel. Note that all existing entropy estimationalgorithms (their sketches) have property (a), but none<strong>of</strong> them has IMP.The intersection measurable property (IMP) <strong>of</strong> our sketchleads to the following very elegant solution <strong>of</strong> our problem.Each participating ingress and egress node in the networkmaintains a sketch <strong>of</strong> the traffic that flows in/out <strong>of</strong> it duringa measurement interval. When we would like to measure theentropy <strong>of</strong> an <strong>OD</strong> flow stream between an ingress point i andan egress point j, we simply summon from node i the sketch<strong>of</strong> its ingress stream O i, and from node j the sketch <strong>of</strong> itsegress stream D j, and then infer the entropy <strong>of</strong> O i ∩ D jusing the sketches’ IMP. This solution is extremely costeffectivein that we will be able to estimate O(k 2 ) quantities(the entropies <strong>of</strong> a k by k <strong>OD</strong> flow stream matrix) usingonly O(k) sketches. However sketches that have IMP aremuch harder to design than those without it, and hence thesignificant challenge we have in this work.Our sketch builds upon and extends the L p sketch <strong>of</strong> Indyk[12] with significant additional innovations. Indyk’ssketch was designed <strong>for</strong> the estimation <strong>of</strong> the L p norm <strong>of</strong>a stream (defined later). We observe that Indyk’s L p sketchhas the desirable IMP, and there<strong>for</strong>e allows <strong>for</strong> the estimation<strong>of</strong> the L p norm <strong>of</strong> an <strong>OD</strong> flow O i ∩ D j using the L psketches at O i and D j. However, we have to develop theIMP theory <strong>of</strong> L p sketch ourselves since IMP was never evenclaimed in [13] (probably because there was no applicationin sight <strong>for</strong> IMP at that time).Our most important contribution is to discover that theentropy <strong>of</strong> an <strong>OD</strong> flow can be approximated by a function<strong>of</strong> just two L p norms (L p1 and L p2 , p 1 ≠ p 2) <strong>of</strong> the <strong>OD</strong>flow. With this insight, our sketch at each ingress and egresspoint is simply one L p1 sketch and one L p2 sketch. In thisway, we not only solve this difficult problem, but also find anice application <strong>for</strong> the L p sketch. From a theoretical point<strong>of</strong> view, our solution resolves one <strong>of</strong> the open questions <strong>of</strong>Cormode [7].Our contributions can be summarized as follows. First,our discovery builds a mathematical connection between ourproblem and Indyk’s L p norm estimation problem, leadingto a highly cost-effective solution. Second, we extend Indyk’s(ɛ, δ) analysis <strong>of</strong> L p sketches [13] using asymptotic normality<strong>of</strong> order statistics, leading to much tighter (yet slightlyless rigorous) accuracy bounds. We also modify Indyk’s algorithmto make it run fast enough <strong>for</strong> processing traffic atvery high-speed links (e.g., 10 million packets per second).Finally, we thoroughly evaluate the accuracy <strong>of</strong> our solutionby simulating it on real-world packet traces collected at atier-1 ISP backbone. We show that our algorithm deliversvery accurate estimations <strong>of</strong> the <strong>OD</strong> flow entropies.The remainder <strong>of</strong> this paper is organized as follows. InSection 2 we give a high-level overview <strong>of</strong> how the differentcomponents <strong>of</strong> our entire scheme works. In Section 3through 6 we describe the major components <strong>of</strong> our algorithm.• In Section 3 we introduce stable distributions and describeIndyk’s algorithm to estimate the L p norm usingthem.• In Section 4 we show how to estimate entropy from L pnorm.• In Section 5 we show the intersection measurable property(IMP) <strong>of</strong> the L p norm algorithm.• In Section 6 we modify the L p norm algorithm to improveits accuracy under tight resource constraints.In Section 7 we discuss the hardware implementation <strong>of</strong>our algorithm. In Section 8 we demonstrate the accuracy<strong>of</strong> our algorithm via simulations run on real-world data. InSection 9 we briefly survey previous work that is related toours. Finally, we conclude in Section 10. The Appendixcontains the pro<strong>of</strong> <strong>of</strong> some <strong>of</strong> the theorems.2. PROBLEM STATEMENT AND OVERVIEWIn this section, we describe precisely the problem <strong>of</strong> estimatingthe entropies <strong>of</strong> <strong>OD</strong> flows and <strong>of</strong>fer an overview <strong>of</strong>our solution approach. As defined be<strong>for</strong>e in [16], given astream S <strong>of</strong> packets that contains n (transport-layer) flowswith sizes (number <strong>of</strong> packets) a 1, a 2, ..., a n respectively,its empirical entropy H(S) is defined as − P n a ii=1 slog 2 ( a is),where s = P ni=1ai is the total number <strong>of</strong> packets in S.1We have shown be<strong>for</strong>e [16] that to estimate the empirical1 Throughout this paper logarithms will be base e except inthe definition <strong>of</strong> empirical entropy here. Since the logarithmbase e and the logarithm base 2 are different by a constantmultiplicative factor, it does not affect our analysis <strong>of</strong> relativeerrors.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!