13.07.2015 Views

A Unifying Model and Analysis of P2P VoD Replication and ...

A Unifying Model and Analysis of P2P VoD Replication and ...

A Unifying Model and Analysis of P2P VoD Replication and ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

popularity. Since the peer request population is stationary(although r<strong>and</strong>om), it could have been taken into consideration,<strong>and</strong> is by PFS. The result is that FBA is ineffective in reducingserver load 6 .B. <strong>Analysis</strong> <strong>of</strong> FSFDFirst, we derive the obtained b<strong>and</strong>width from one peer bysending one request to a neighbor i. For a peer i, we useλ i to represent the expected number <strong>of</strong> received sub-requeststo serve the movies replicated in its local storage. If a peerviewing movie j continues to get b<strong>and</strong>width from peer i, theexpected b<strong>and</strong>width peer j obtained from peer i is:E[X j (i)] = 1 λ i. (5)The pro<strong>of</strong> is in the Appendix. This leads to the followingProposition:Proposition 2: In FSFD, the proportional movie replicationcan achieve load balancing in peer b<strong>and</strong>width allocation asdefined in EQ.3. The average server load is bounded by B ≤√1√yN2π.The bound is derived by considering the case whenVar[X j ]is maximized. The detailed pro<strong>of</strong> is in the Appendix. Witha fixed number <strong>of</strong> copies C j = NL × η j , we propose thefollowing content placement algorithm to minimize Var[X j ].Algorithm 1 Content Placement Algorithm for FSFD1: for j = 1 to K do2: C j = NL×η j3: end for4: for i = 1 to N do5: Peer i r<strong>and</strong>omly selects L movies from the movie setQ i ;6: for ∀j ∈ Q i do7: C j = C j −18: if C j ≤ 0 then9: Never select movie j any more10: end if11: end for12: end forThis algorithm is adapted from RLB, the main differenceis to introduce the enforcement proportional replication in theFSFD case. By r<strong>and</strong>omly allocating movies toQ i , the variancein X j hence the average server load is minimized as in RLB.Here, when a peer selects neighboring peers to request service,it must also do a r<strong>and</strong>om selection to minimize the correlationamong X j (i)s from different neighbors i in order to minimizevariance <strong>of</strong> X j for each movie j.Selecting only one neighbor for service, i.e. y = 1, is aspecial case. This case is studied in paper [4]. The performance<strong>of</strong> this case is the worst compared to FSFD with y > 1, whichis consistent with the result in [4].6 In [7], the peers <strong>and</strong> their b<strong>and</strong>width allocated to each movie are adjustedby an algorithm, to deal with peer churn <strong>and</strong> movie popularity churn. Suchadaptation can indirectly help amend the limitations <strong>of</strong> FBA.C. <strong>Analysis</strong> <strong>of</strong> PFSSince, the PFS case is analyzed in paper [6], we discuss itbriefly here. For detail, please refer to paper [6].Proposition 3: In PFS, the average server load is boundedby B ≤ √√NK 12π L .Pro<strong>of</strong>: The worst case happens when peers form severalclusters. The peers in the same cluster replicates exactly thesame movie set. The cluster is analyzed as a super serverto provide service. The pro<strong>of</strong> is very similar to the pro<strong>of</strong> <strong>of</strong>FSFD. The difference is that the cluster size is depend on thesum <strong>of</strong> replicated movie popularity.D. The power <strong>of</strong> unlimited service distributionIn this section, we summarize <strong>and</strong> discuss the results so far.• The optimal movie replication strategy depends on therequest scheduling model assumed. For both FBA <strong>and</strong>FSFD, the optimal replication is proportional to moviepopularity, whereas for PFS, the optimal replication isbiased towards cold movies.• PFS <strong>and</strong> its optimal replication strategy (RLB) outperformsFBA <strong>and</strong> FSFD based on worst-case analysis, assummarized in Table I.TABLE ICOMPARISON OF DIFFERENT REQUEST SCHEDULINGFBAFSFDPFSWorst-case Expected Server Load√1 √N2π H/L1√2πN √y , y ≤ H1√2πN √HIn Table I, we use H = NLKto represent the average storageresource per movie provided by <strong>P2P</strong> system. This allows usto express the worst-case bounds <strong>of</strong> the expected server loadfor all three cases in terms <strong>of</strong> N, L, H <strong>and</strong> y, for ease <strong>of</strong>comparison. Clearly, PFS is the best, As y becomes large(approach H), the performance <strong>of</strong> FSFD becomes the sameas PFS. We can consider PFS to represent unlimited servicedistribution, whereas FSFD is limited service distribution.From this comparison, we have characterized the power <strong>of</strong>unlimited service distribution for load balancing. Unfortunately,PFS (like a fluid approximation) is not practical in realworldimplementations. Yet, FSFD is also problematic, sinceit requires y to be smaller than the minimum copy number <strong>of</strong>movies (for cold movies this number can be very small). Thismotivates us to consider a unifying service scheduling model<strong>and</strong> the corresponding replication algorithm in the next twosections.IV. A UNIFYING SCHEDULING MODELIn practical systems, the adoption <strong>of</strong> FSFD scheduling hassome difficulties because it is not realistic to maintain a fixedvalue <strong>of</strong> y for all movie requests. Given a skewed moviepopularity distribution, cold movies are replicated with few


copies. FSFD means we have to set y to be no bigger thanthe minimum number <strong>of</strong> copies <strong>of</strong> any movie in the system.Therefore, we introduce the Fair Sharing with Bounded Outdegree(FSBD), defined as follows:• each peer is allowed to send out at most Y sub-requeststo other peers. The actual number <strong>of</strong> sub-requests sent bya peer is y = min(Y,C j ).Based on the analysis in Section III, Y is a key parameter todetermine the optimal replication strategy <strong>and</strong> system performance.If Y = +∞, the scheduling is the same as PFS firstdefined in [6]. If y = Y for all movie sub-requests, then thescheduling is the same as FSFD. Thus FSBD is more general,<strong>and</strong> its analysis is more complicated than FSFD or PFS.We use I(j,i) to indicate that movie j is stored by peer i:{1, j ∈ QiI(j,i) =0, otherwise.The probability to get b<strong>and</strong>width from a copy <strong>of</strong> movie j ismin( Y C j,1). According to EQ. 5, the expected b<strong>and</strong>width to1be obtained from a single copy is ∑k∈Q . Forn k ×min( Y i C ,1) kthe FSBD case, EQ. 4 can be re-expressed as the followingoptimization problem:minK∑η j Var[X j ] (6)j=1N∑ min( Y CI(j,i)×j,1)∑k∈Q in k ×min( Y C k,1) = 1i=1C j =N∑I(j,i),1 ≤ j ≤ Ki=1|Q i | = L,1 ≤ i ≤ N.From the pro<strong>of</strong> <strong>of</strong> Proposition2<strong>and</strong> reference [6], the worstcase server load is achieved when the peers form different clusterssuch that X j (i) has perfect correlation for all i providingb<strong>and</strong>width for the same peer, in which case the variance <strong>of</strong>X j is maximized. In each cluster, peers store exactly the samemovie set Q i . The worst case analysis can also be extendedto FSBD. We assume that there are K/L clusters. Let themovie set in the i th cluster be Q i . To satisfy EQ. 3, the peerpopulation in the i th cluster is R i = ∑ j∈Q iN j . Therefore,the expected server load caused by the i th cluster is:The total expected server load is:s.t.1 R√ √ i2π min(Y,Ri ) . (7)KL∑i=11√2πKL∑R i = N.i=1R i√min(Y,Ri )(8)The value <strong>of</strong> upper bound depends on the value <strong>of</strong> min(Y,R i )for each cluster i. If Y = +∞ such that min(Y,R i ) = R i .The EQ.8 becomes:B =KL∑ 1 √√ Ri .2πi=1Because √ R i is concave, the maximum value <strong>of</strong> B is√1√NK2π L , achieved when R 1··· = R K/L = NLK. The resultis exactly the same as the PFS case. On the other h<strong>and</strong>, ifmin(Y,R i ) = Y for any cluster i, EQ. 8 is simplified as:B =KL∑ 1 R√ √ i= 1 N√ √Y , 2π Y 2πi=1which is the same with the server load bound derived inProposition 2. For the FSBD case, we have the followingresult:Proposition 4: When K L≫ 1, the expected server load canbe approximately expressed as:<strong>and</strong>,B ≤ √ 1 N √Y (1+ Y ), when Y ∈ [1,4H), (9)2π 4HB ≤ 1 √2πN √H , when Y ∈ [4H,+∞). (10)Here, H is the average storage resource per movie, i.e. H =NLK .The detailed pro<strong>of</strong> is in the Appendix. It is not trivial toachieve the upper bound given a movie popularity distributionη 1 ,...,η K . The cluster is like a bag. How to allocate Kmovies to K Lclusters without repetition such that each clusteris allocated L movies to satisfy the requirement to achievethe maximum expected server load is a classical integerprogramming problem. There is no algorithm to solve it withpolynomial time complexity. But our analysis proves thatthe expected server load will not exceed the result given inProposition 4, even if the worst case happens.V. HETEROGENEOUS CASEProposition 5: In heterogeneous upload capacity case, theserver loads in FBA, FSFD, PFS <strong>and</strong> FSBD share the samebound with the homogeneous case.Pro<strong>of</strong>:• PFS: It is proved in [6] that the bound <strong>of</strong> server load inheterogeneous upload capacity case with PFS is the samewith the bound <strong>of</strong> server load in homogeneous uploadcapacity case.• FBA: The FBA allocates b<strong>and</strong>width for any particularmovie according to the movie’s popularity. In heterogeneouscase, the allocated b<strong>and</strong>width by one copy shouldbe adjusted by the upload capacity <strong>of</strong> the peer, whichreplicates this movie. Thus, the line 7 in ProportionalContent Placement Algorithm should be C j = C j −U i .• FSFD <strong>and</strong> FSBD: The worst case <strong>of</strong> FSFD <strong>and</strong> FSBDis achieved when peers form into different clusters. In


each cluster, the peers replicate exactly the same movieset. In homogeneous case, the size <strong>of</strong> the cluster is thesame with the sum <strong>of</strong> expected peer population to viewthe movie set. For heterogeneous case, the cluster sizeshould be adjusted by the upload capacity <strong>of</strong> each peerin the cluster such that R j = ∑ i∈Cluster j U i, where R j isthe expected peer population to view the movie replicatedby cluster j <strong>and</strong> U i is the upload capacity <strong>of</strong> peer i.We will propose a distributed adaptive movie replicationalgorithm for <strong>P2P</strong> <strong>VoD</strong> system. The algorithm is applicablefor both heterogeneous <strong>and</strong> homogeneous network.VI. REPLICATION ALGORITHMFor both FSFD <strong>and</strong> PFS, r<strong>and</strong>omization is used to placemovies in Q i to minimize variance <strong>of</strong> service capacity foreach movie. For FSFD, r<strong>and</strong>om placement works because subrequestsfor a movie are sent r<strong>and</strong>omly to peers holding thatmovie; <strong>and</strong> for PFS, it works because all (r<strong>and</strong>omly selected)peers storing a movie serve requests for that movie together.But, it is a challenge to design or construct a solution to satisfythe constraints in EQ. 6 <strong>and</strong> minimize that objective function inpolynomial time. Therefore, we propose a new distributed <strong>and</strong>adaptive replication (DAR) algorithm to iteratively converge toa solution that load-balances as well as minimizes the variancefor each movie’s rate. The idea <strong>of</strong> our algorithm is simple.Each peer i viewing movie j will use a sliding window tocalculate the average streaming rated j provided by other peers(i.e. excluding any contribution from the server). After a peerfinishes viewing a movie, a replication decision is made basedon the value <strong>of</strong> d j . The peer must also obtain the current value<strong>of</strong> d k for k ∈ Q i by communicating with other peers (or acentral server maintaining such information). Then, the peerselects the movie with maximum downloading rate d max . Ifd j < d max , the peer will store movie j locally, replacingthe movie with downloading rate d max . Otherwise, movie jis discarded. The generated overhead just is the exchange <strong>of</strong>the information <strong>of</strong> d j between peers. The overhead is causedonly when there exists sub-requests for a particular movie.Compared with the transmission <strong>of</strong> the movie content, theoverhead is negligible.Note, this algorithm is independent <strong>of</strong> the out-degree boundY <strong>of</strong> the scheduling policy. In FSBD, the peers viewing thesame movie are expected to get the same expected downloadingrate due to the fair sharing service. Therefore, the b<strong>and</strong>widthallocation for movie j is reflected in the downloadingrate from other peers for any particular peer viewing movie j.For any particular movie j, if there is no peer viewingit for awhile, the (default) downloading rate will be set tod j = +∞ for replication decision purposes. Actually, thisreplication algorithm is a distributed iterative algorithm forsolving the optimization problem in EQ. 6. In each round, thesystem will allocate more b<strong>and</strong>width resource to the moviethat satisfies the inequality E[X] < 1. The movie with largerAlgorithm 2 Distributed <strong>and</strong> Adaptive <strong>Replication</strong> (DAR)1: Peer i viewing movie j will use a sliding window toestimate the average downloading rated j from other peers.2: After viewing, the peer i has movie j as c<strong>and</strong>idate forreplication.3: d max = 04: for ∀k ∈ Q i do5: Get value <strong>of</strong> d k from other peers viewing movie k6: if d k > d max then7: d max = d k8: ID max = k9: end if10: end for11: if d max > d j then12: Use movie j to replace movie ID max13: end ifexcessive b<strong>and</strong>width resource E[X]−1 will be replaced withhigher priority.Finally, a word about the heterogeneous peer case: althoughall the discussion so far is for the homogeneous peer case, itis not difficult to extend the replication algorithms (at leastin theoretically terms) to the heterogeneous case, in a similarfashion as in [6]. But due to limited space, we cannot includethat discussion here.VII. SIMULATIONSimulation is used to validate the performance bounds onserver load, <strong>and</strong> compare different replication algorithms.A. Simulation SettingTime is divided into time slots. As is reported by N.Venkatasubramanian et al [9], movie popularity is assumedto follow the Zipf distribution. If each movie is ranked bymovie popularity in a descending order, the popularity <strong>of</strong> thej th movie can be expressed as in Eq. (11):η j =j −θ∑ K(11)i=1i−θ, where θ is a parameter reflecting the skewness <strong>of</strong> the distributioncurve. The larger the value <strong>of</strong> θ, the more skewed is thepopularity distribution. As suggested by [9], the value <strong>of</strong> θ isusually in the range <strong>of</strong> [0.271,1].For all our experiments, we set the peer population, N =4000 <strong>and</strong> the movie population K = 400. Each peer can storeL = 4 movies.B. FBA vs. FSFD vs. PFSThe first simulation experiment tries to validate our propositions.We setθ = 0, that is all movies have the same popularity,<strong>and</strong> by symmetry they are all replicated the same number <strong>of</strong>copies to balance the load. This setting allows us to vary theout-degree y from 1 to NLK. For FBA, we use the proportionalcontent placement algorithm <strong>and</strong> FBA scheduling, which isindependent <strong>of</strong> out-degree y. The result is a flat curve. For


FSFD, the out-degree varies from 1 to NLK= 40. We also plotthe curve <strong>of</strong> the worst bound <strong>of</strong> the expected server load forPFS strategy (PFS - Worst Case curve). The result is shownin Fig. 1. We observe: (a) The consumed server load by the• ARLB: selection is based on centralized information asdefined in the ARLB algorithm [6];• FIFO: the earliest movie that got stored locally;Average server load1600140012001000800600400200Prop − FSFDProp − FBAFSFD − Worst CaseFBA − Worst CasePFS − Worst Case00 5 10 15 20 25 30 35 40yFig. 1. Proportional content placement with θ = 0 <strong>and</strong> homogeneous uploadcapacityFBA algorithm is very close to the numerical result calculatedby our model. For FSFD, proportional replication results inreducing server load as we increase out-degreey. In particular,when y ≥ 40, the worse case bounds for FSFD <strong>and</strong> PFSconverge. (b) Overall, the FBA is not an efficient strategy.Only when the out-degree is very small, does FBA outperformthe FSFD.C. FSBDFrom our analysis, the optimal replication for FSFD isproportional, whereas PFS favors cold movies more. Then wedesign the DAR algorithm for FSBD as described in the lastsection. Since FSBD is like FSFD when Y is small, <strong>and</strong> islike PFS when Y is large, it is interesting to find out howDAR algorithm behaves as we vary Y . To help visualize thebehavior <strong>of</strong> DAR algorithm, we define the following metric:D =K∑|C j −NL×η j |. (12)j=1Given a replication solution (Q i ), D is its distance from theproportional movie replication solution.In this simulation experiment, we compare several adaptivereplication algorithms using their distance D for differentvalues <strong>of</strong> Y (from 1 to 100). The result is shown in Fig. 2 <strong>and</strong>Fig. 3. Besides the DAR algorithm, the other two adaptivealgorithms considered are ARLB (used for PFS) <strong>and</strong> FIFO(which is known to produce proportional replication). In each<strong>of</strong> these adaptive replication algorithms, after a movie isviewed by a peer, the peer makes a decision whether to replaceone <strong>of</strong> the locally stored movies with the one just completedviewing. The movie to be replaced is selected based on thefollowing criteria:Average server load1600140012001000800600400200Rep − DARRep − ARLBRep − FIFO00 10 20 30 40 50 60 70 80 90 100YFig. 2. Comparing Server Load <strong>of</strong> three adaptive strategies, θ = 0.6 <strong>and</strong>homogeneous upload capacityD7000600050004000300020001000Rep − DARRep − ARLBRep − FIFO00 10 20 30 40 50 60 70 80 90 100YFig. 3. Comparing Distance <strong>of</strong> three adaptive strategies, θ = 0.6 <strong>and</strong>homogeneous upload capacityFrom the figures, indeed the performance <strong>of</strong> FIFO is closeto being proportional since D is small <strong>and</strong> constant over allY . The consumed server load for FIFO increases for largeY . This is because proportional is known to deviate fromoptimal whenY increases (<strong>and</strong> the scheduling becomes similarto PFS [6]. The performance <strong>of</strong> ARLB is not as good as otherstrategies for small value <strong>of</strong> Y because ARLB is developedwith PFS in mind (hence favoring cold movies which is notoptimal for small Y ). For DAR under FSBD scheduling, whenY is small, its distance D is very close to the distance forFIFO (proportional); but when Y is large, its D is close to thedistance <strong>of</strong> ARLB. Overall, DAR algorithm works best withdifferent values <strong>of</strong> Y . For Y < 5, performances <strong>of</strong> all threestrategies are very close, because the room for improvement islimited for small Y . For very large values <strong>of</strong> Y , ARLB+PFSbeats the DAR+FSBD, but very marginally.


D. Robustness ValidationIn addition to the above simulation to validate our theoreticalmodel, we also conduct following experiment to validatethe robustness <strong>of</strong> DAR algorithm.In practical system, the upload capacity is heterogeneous<strong>and</strong> difficult for prediction. We conducted following simulationby setting heterogeneous upload capacity. The upload capacityis 80/20 distributed, i.e. 20% peers take up 80% total uploadcapacity while the other 80% peers take the left 20% uploadcapacity. In heterogeneous case, the performance gap betweenDAR <strong>and</strong> ARLB is smaller than the homogeneous network,since the ARLB requires the knowledge <strong>of</strong> each peer’s uploadcapacity. In practical system, it is a challenging work to collectpeers’ upload capacity. Fortunately, DAR does not need thisinformation, but can work better than ARLB.For other parameters, we set N = 4000, K = 400, L = 4,θ = 0.6 <strong>and</strong> Y = 40.Average server load18001600140012001000800600400200Average server loadclose to 3000 forall the 3 algorithmswhen Y = 1Rep − DARRep − ARLBRep − FIFO05 10 20 30 40 50 60 70 80 90 100YFig. 5. Comparing server load with limited both upload <strong>and</strong> downloadconnections, θ = 0.6, Y = 40 <strong>and</strong> heterogeneous upload capacity14001200Rep − DARRep − ARLBRep − FIFOto 2 as shown in Fig. 6. The distribution <strong>of</strong> upload capacitystill is 80/20, but the average value is equal to U.Server load1000800600400200FIFOARLBDAR00 50 100 150 200 250 300 350 400Simulation TimeslotFig. 4. Detailed server load <strong>of</strong> three adaptive algorithms in each time slot,θ = 0.6, Y = 40 <strong>and</strong> heterogeneous upload capacity1) Convergence Time: To get a feel for convergence timefor these adaptive replication algorithms, The server load foreach time slot is plotted in Fig. 4. All three curves start atthe point <strong>of</strong> heavy server load (around 1200). This is dueto the fact that we use the Equal Copy content placementalgorithm to initialize the peers’ storage at the beginning <strong>of</strong>the simulation. After that, these curves quickly converge to thestable level <strong>of</strong> sever load for each algorithm respectively. Thissimulation experiment is run for Y = 40, <strong>and</strong> we find thatthe DAR algorithm achieves the best performance (minimumserver load) while FIFO is the worst. Furthermore, we observethat the DAR algorithm is also the one with the smallestoscillation amongst all three, which indicates that it is morestable than the other two algorithms.2) Limited both upload <strong>and</strong> download connection: In ouranalysis, only the connections a peer sends out is limited byY . In this experiment the number <strong>of</strong> requests can be repliedby a peer also is limited by Y . As shown in Fig. 5, the DARresults smallest server load than other algorithms.3) Varying U: In analysis, we assume that U = 1. In thisexperiment, we relax the assumption by varying U from 0.6Average server load1600140012001000800600400200Rep − DARRep − ARLBRep − FIFO00.6 0.8 1 1.2 1.4 1.6 1.8 2UFig. 6. Comparing server load by varying U , θ = 0.6, Y = 40The DAR still is the best one. When U is small, it is easy toutilize peer’s b<strong>and</strong>width efficiently because the probability toobtain redundant b<strong>and</strong>width is small. Thus, the performancegap when U is less than 1 is very small. On the other h<strong>and</strong>,when U is large, a little wasted b<strong>and</strong>width may not causeserver load due to sufficient b<strong>and</strong>width. The experiment showsthat it is meaningful to study the replication problem whenU = 1.VIII. RELATED WORKSZhou et al [6] first proposed the <strong>P2P</strong> <strong>VoD</strong> model <strong>and</strong>analysis technique used in this paper. The Perfect Fair Sharing(PFS) scheduling model, RLB <strong>and</strong> ARLB replication algorithms<strong>and</strong> the server bounds for these cases are all from [6].But PFS is a very idealized scheduling policy that is notpractical. In this paper, our contribution is to unify the <strong>P2P</strong><strong>VoD</strong> replication work from several important papers in theliterature; proposed a unifying <strong>and</strong> more practical requestscheduling policy FSBD. Furthermore, we propose a new


distributed <strong>and</strong> adaptive replication algorithm that works wellwith FSBD.Wu et al [5], [7] proposed view-upload decoupling for <strong>P2P</strong>streaming to deal with the problem <strong>of</strong> channel churn. Theirproblem is similar to <strong>P2P</strong> <strong>VoD</strong>. They need to assign peers toserve channels (like replication), <strong>and</strong> allocate peer’s b<strong>and</strong>widthto different channels (like scheduling). Their user dem<strong>and</strong>model is the same as ours, <strong>and</strong> their b<strong>and</strong>width allocationstrategy is similar to FBA, which lead to our interest in FBA.Our model <strong>of</strong> FSFD is similar to the scheduling policy usedin an important early work on <strong>P2P</strong> <strong>VoD</strong>. In [8], each peercan send exactly y sub-requests to neighbors who are servingleast sub-requests. They rely on a “balancer” to distributethe sub-requests evenly, <strong>and</strong> each peer only serves a fixednumber <strong>of</strong> sub-requests (the excessive load is rejected <strong>and</strong>wait in the queue). The authors also proposed to apply ratelesscoding [10], [11] to encode movie window. A small part <strong>of</strong>coded content is distributed to every peer in the <strong>P2P</strong> <strong>VoD</strong>system.Wu <strong>and</strong> Li [4] studied a special case <strong>of</strong> fair sharing. In thiswork, each peer only seek one neighbor to serve it, <strong>and</strong> theneighbor r<strong>and</strong>omly selects one sub-request to reply if morethan one sub-requests are received. Due to this constraint onthe scheduling strategy, the server load cannot be reducedby <strong>P2P</strong> significantly. They conclude that the performance <strong>of</strong>LRU is very close to the theoretical optimal. Their results areconsistent with ours.Both literature [12] <strong>and</strong> [13] studied chunk schedulingstrategy in <strong>P2P</strong> <strong>VoD</strong> system. They focus on the process <strong>of</strong>distributing a single movie to multiple peers. At the beginning,only the server stores the movie. The trade-<strong>of</strong>fs among threemetrics: throughput, sequential delivery <strong>and</strong> chunk availabilityare studied. The conclusion is that there is no chunk selectionstrategy that maximizes all three metrics at the same time.However, in practical <strong>P2P</strong> <strong>VoD</strong> system, movies can be pushedto peers, or cached by peers after viewing; in this case, thereis no need to study chunk availability, which corresponds toour formulation <strong>of</strong> the problem.Tan et al [14] is recently able to obtain some analyticalresults by dividing movies into hot, cold <strong>and</strong> warm categories.Under a different service scheduling model (r<strong>and</strong>om peerselection), they are able to show that it is optimal for eachpeer to store only the hottest L−1 movies, with one additionalmovie from the warm category. Their request scheduling issimilar to FSFD in our analysis, but their user dem<strong>and</strong> modelis somewhat different so a direct comparison is not possible.But their results are consistent with ours.Applegate et al [15] considered the movie replication problemin <strong>P2P</strong> <strong>VoD</strong> as an integer programming problem. Theauthors use linear programming to derive an approximationsolution. Due to the large scale <strong>of</strong> the problem, even solvingthe linear programming via approximation takes several hours,for experiments with the scale <strong>of</strong> a practical system. Therefore,their optimization procedure would be run once per day oronce in several days, whereas in our case, the <strong>P2P</strong> systemcontinuously adapt its replication. These are two differentapproaches to the problem.In [16], the authors try to characterize the optimal replicationstrategy. Their conclusion is that proportional replicationis not optimal <strong>and</strong> <strong>P2P</strong> <strong>VoD</strong> system should allocate moreresources to cold movies, which is consistent with our results.But in their model, peers also serve the movie that they aredownloading, which is a detail not in our model.The paper [11] is an early work to study the relationshipbetween the movie population <strong>and</strong> peer resources in <strong>P2P</strong> <strong>VoD</strong>system. The peer resources includes peers’ upload capacity<strong>and</strong> the storage size. The work studied the number <strong>of</strong> moviessupported by peers to satisfy a minimum number <strong>of</strong> distinctmovie requests with limited peer resource.The paper [17] discuss the relationship between the outdegree<strong>and</strong> streaming performance in <strong>P2P</strong> live streamingsystem. The streaming performance includes minimum serverload, streaming continuity <strong>and</strong> tree depth.The sustainable streaming rate supported by a <strong>P2P</strong> system isstudied in the paper [18]. In this work, peers’ upload capacity<strong>and</strong> the ratio <strong>of</strong> peers over seeds are the key factor to determinethe sustainable streaming rate.The paper [19] discussed an adaptive movie replication algorithmwith feedback information. The feedback informationis the downloading rate from other peers. Server assists peersto make replication decision based on the collected feedbackinformation.IX. CONCLUSIONIn this paper, we apply the same methodology to analyzethree kinds <strong>of</strong> request scheduling strategies <strong>and</strong> analyze theircorresponding optimal movie replication strategies to achievebalanced b<strong>and</strong>width allocation <strong>and</strong> minimize expected serverload. Through this analysis, we can explain why some <strong>P2P</strong><strong>VoD</strong> systems prefer proportional replication, whereas othersprefer providing more than proportional replication for coldmovies. Actually, request scheduling in real-world systems islikely to be in between fair sharing (with some fixed degree)<strong>and</strong> perfect fair sharing. Therefore, we propose a FSBD modelwith varying limit <strong>of</strong> out-degree for different movies. Thisallows us to illustrate the effect <strong>of</strong> out-degree in requestscheduling, <strong>and</strong> visualize the reason for allocating more copyresources to cold movies in networks with large out-degree.We use simulation to validate our analysis <strong>and</strong> make variouscomparisons between different scenarios.REFERENCES[1] Bittorrent, “http://www.bittorrent.com/.”[2] PPLive, “http://www.pplive.com/.”[3] Y. Huang, T. Z. J. Fu, D. M. Chiu, J. C. S. Lui, <strong>and</strong> C. Huang,“Challenges, design <strong>and</strong> analysis <strong>of</strong> a large-scale p2p vod system,” inProc. <strong>of</strong> ACM Sigcomm, 2008.[4] J. Wu <strong>and</strong> B. Li, “Keep cache replacement simple in peer-assisted vodsystems,” in Proc. <strong>of</strong> IEEE Infocom, 2009.[5] D. Wu, Y. Liu, <strong>and</strong> K. W. Ross, “Queuing network models for multichannellive streaming systems,” in Proc. <strong>of</strong> IEEE Infocom, 2009.[6] Y. Zhou, T. Z. J. Fu, <strong>and</strong> D. M. Chiu, “Statistical modeling <strong>and</strong> analysis<strong>of</strong> p2p replication to support vod service,” in Proc. <strong>of</strong> IEEE Infocom,2011.


[7] D. Wu, C. Liang, Y. Liu, <strong>and</strong> K. W. Ross, “View-upload decoupling: Aredesign <strong>of</strong> multi-channel p2p video systems,” in Proc. <strong>of</strong> IEEE Infocom,2009.[8] K. Suh, C. Diot, J. Kurose, L. Massoulie, C. Neumann, D. Towsley,<strong>and</strong> M. Valleo, “Push-to-peer video-on-dem<strong>and</strong> system: design <strong>and</strong>evaluation,” in IEEE Journal on Selected Areas in Communications,special issue on Advances in Peer-to-Peer Streaming Systems, 2007.[9] N. Venkatasubramanian <strong>and</strong> S. Ramanathan, “Load management indistributed video servers,” in Proc. <strong>of</strong> IEEE ICDCS’97, 1997.[10] P. Maymounkov <strong>and</strong> D. Mazieres, “Rateless codes <strong>and</strong> big downloads,”in Proc. <strong>of</strong> International Workshop on Peer-to-Peer Systems, 2003.[11] M. Luby, “Lt codes,” in Proc. <strong>of</strong> IEEE Symposium on foundation <strong>of</strong>computer science (FOCS). IEEE Computer Society, 2002.[12] B. Fan, D. Andersen, M. Kaminsky, <strong>and</strong> K. Papagiannaki, “Balancingthroughput, robustness, <strong>and</strong> in-order delivery in p2p vod,” in Proc. <strong>of</strong>ACM CoNEXT, 2010.[13] N. Parvez, C. Williamson, A. Mahanti, <strong>and</strong> N. Carlsson, “<strong>Analysis</strong> <strong>of</strong>bittorrent-like protocols for on-dem<strong>and</strong> stored media streaming,” in Proc.<strong>of</strong> ACM SIGMETRICS, 2008.[14] B. R. Tan <strong>and</strong> L. Massoulie, “Optimal content placement for peer-to-peervideo-on-dem<strong>and</strong> systems,” in Proc. <strong>of</strong> IEEE Infocom, 2011.[15] D. Applegate, A. Archer, V. Gopalakrishnan, S. Lee, <strong>and</strong> K. K. Ramakrishnan,“Optimal content placement for a large-scale vod system,”in Proc. <strong>of</strong> ACM CoNEXT, 2010.[16] W. Wu <strong>and</strong> J. C. S. Lui, “Exploring the optimal replication strategyin p2p-vod systems: Characterization <strong>and</strong> evaluation,” in Proc. IEEEInfocom, 2011.[17] S. Liu, R. Zhang-Shen, W. Jiang, J. Rexford, <strong>and</strong> M. Chiang, “Performancebounds for peer-assisted live streaming,” in In Proceeding <strong>of</strong>Sigmetrics, 2008.[18] F. Benbadis, N. Hegde, F. Mathieu, <strong>and</strong> D. Perino, “Playing with theb<strong>and</strong>width conservation law,” in <strong>P2P</strong>, 2008.[19] Y. Zhou, T. Z. J. Fu, <strong>and</strong> D. M. Chiu, “Server-assisted adaptive videoreplication for p2p vod,” in Elsevier Journal Signal Processing: ImageCommunication Special Issue on Advances in video streaming for <strong>P2P</strong>networks, 2011.APPENDIXPro<strong>of</strong> <strong>of</strong> EQ. 5: The arrival process to view a particularmovie is Poisson. We assume these arrivals will generatesub-requests to other peers replicating the required movieevenly. Thus, the arrival process <strong>of</strong> sub-requests the particularmovie is also Poisson. As we discussed in the user behaviormodel, the viewing time follows the exponential distribution.Therefore, the number <strong>of</strong> received sub-requests for replicatedmovie by each peer is Binomial with expected value λ i .Poisson distribution is used as the approximation <strong>of</strong> Binomialdistribution. If a particular peer continues to get content frompeeri, the obtained b<strong>and</strong>width depends on the number <strong>of</strong> otherrequests, which can be calculated approximately as:E[X j (i)] =≈+∞∑k=0+∞∑k=01k +1λ k i e−λiPr(#req = k)k!λ i λ k i e−λik +1 k!= 1 λ i(1−e −λi ) ≈ 1 λ i,where k is the number <strong>of</strong> received sub-requests from otherpeers. We assumeN−1 ≈ N when derivingPr(#req = k).Pro<strong>of</strong> <strong>of</strong> Proposition 1 The peer population to watch anymovie j follows a Binomial distribution with expected valueE[n j ] = N j . The Poisson distribution is a good approximation<strong>of</strong> Binomial distribution with expected value N j . Now, weprove that proportional replication will achieve E[X j ] = 1.For peer i watching movie j, if the rest <strong>of</strong> the peers maker<strong>and</strong>om selection <strong>and</strong> N − 1 ≈ N, the expected b<strong>and</strong>widthallocated is:E[X j ] = ∑ n jN jn jPr(n j )≈+∞∑k=0N j Nj ke−Njk +1 k!= 1−e −λi ≈ 1.Based on proportional replication, we can derive the expectedserver load consumed for movie j as:B j = 0×Pr(n j < N j )+= NNj jN j ! e−Nj .+∞∑k=N j(1− N jk +1 )Pr(N j = k)According √ to Stirling’s Approximation, i.e. N! ≈2π×N(Ne )N , we substitute it back to the above equation<strong>and</strong> the expected server load becomes:√ K∑ NjB = Nη j B j = √ . 2πj=1With the constraint ∑ Kj=1 N j = N, it is not difficult to√verify that the maximum value <strong>of</strong> B is B = √ 12π NK. Thecondition to achieve the maximum value is N 1 = N 2 = ... =N K = N K .From the pro<strong>of</strong>, we can find the bound keep unchangedfor different b<strong>and</strong>width allocation, i.e. U i (k) is not equalto 1 L, if the EQ. 3 still is satisfied. The difference is thereplication result, which may not be proportional any more.Pro<strong>of</strong> <strong>of</strong> Proposition 2: To make sure every peer cansend out y sub-requests, we assume y ≤ minC j withoutloss <strong>of</strong> generality. If the number <strong>of</strong> copies <strong>of</strong> movie j isC j = NLη j , the expected number <strong>of</strong> sub-requests receivedby each copy is y L, hence the expected received sub-requestby each peer is y. The distribution <strong>of</strong> the number <strong>of</strong> receivedrequests is Binomial. We still use Poisson distribution as anapproximation <strong>of</strong> the number <strong>of</strong> received requests. Accordingto EQ. 5, the obtained expected b<strong>and</strong>width from a single copyis 1 y. There are a total <strong>of</strong> y sub-requests sent out by each peer,the expected total obtained b<strong>and</strong>width is 1, which proves thatproportional replication satisfies EQ. 3.The worst case is achieved by maximizing the variance <strong>of</strong>X j . This is in turn obtained by maximizing the correlation <strong>of</strong>X j (i) from different neighbors i. Therefore, the worst caseis when peers form K Lclusters. In the same clusters, thepeers have the same movie set. Then, the correlation <strong>of</strong> theb<strong>and</strong>width provided by peers in one cluster is 1. We assumethe movie set provided by the i th cluster is Q i . According toproportional replication, the peer population <strong>of</strong> the i th clusteris N × ∑ j∈Q iη j . For a peer i viewing any movie j, the


obtained b<strong>and</strong>width depends on the number <strong>of</strong> sub-requestsreceived by neighbors <strong>and</strong> the expected server load caused bya single peer is:0×Pr(#req < y)+= yy ×e −yy!≈ 1 √2π1 √y .+∞∑k=y(1− y )Pr(#req = k)k +1Thus, the total expected server load is1 √2πN √y .4NWhen Y ∈ ( √ the expected server load isK ( L+1)2,+∞),bounded by:B ≤ 1 √2π√NKL . (16)If K L≫ 1, the server load can be simplified approximately tothe expression in Proposition 4.Pro<strong>of</strong> <strong>of</strong> Proposition 4: We assume that there is no constrainton the movie popularity distribution <strong>and</strong> analyze B as thesum <strong>of</strong> two separate parts. The first part <strong>of</strong> B is contributedby clusters with Y < R i , while in the second part, we haveY ≥ R i . Assume the first part has γ clusters, then EQ. 8 canbe expressed as:B = 1 √2π(γ∑i=1R i√Y+KL∑i=γ+1√Ri ). (13)The case <strong>of</strong> γ = 0 is the same as the case for PFS. Therefore,we focus on the case that γ > 0. For any value <strong>of</strong> γ, we canuse Lagrange Multiplier to find the maximum value <strong>of</strong> B. TheEQ. 13 can be expressed as:B = 1 √2π(γ∑i=1R i√Y+KL∑i=γ+1K√ L∑Ri )+τ( R i −N).By solving this optimization problem, the condition toachieve maximum expected server load is R i = Y 4for i > γ.Therefore, the expression <strong>of</strong> EQ.13 can be simplified as:B = √ 1 ( √ N + 1 2π Y 4 (K L −γ)√ Y). (14)From EQ. 14, we have dBdγ< 0 for γ ≥ 1. There are twopossible value <strong>of</strong>γ that can lead to maximized server load. Onecase is γ = 0 which yields the same result as PFS. Anotherpossible case is γ = 1 <strong>and</strong> the expected server load is:B = √ 1 ( √ N + 1 2π Y 4 (K L −1)√ Y),which is worse than the bound derived in the analysis in FSFD.This is because FSBD makes no assumptions on skewnesswhereas FSFD does; hence for some highly skewed moviepopularity, FSBD can incur more expected server load thanFSFD. The condition to achieve the maximum expected serverload is R 1 = N − Y 4 (K L − 1) <strong>and</strong> R i = Y 4. The constraintis that R 1 ≥ Y , hence Y ≤ 4NdBKL +3.On the other h<strong>and</strong>,dY

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!