A Unifying Model and Analysis of P2P VoD Replication and ...

popularity. Since the peer request population is stationary(although random), it could have been taken into consideration,and is by PFS. The result is that FBA is ineffective in reducingserver load 6 .B. Analysis of FSFDFirst, we derive the obtained bandwidth from one peer bysending one request to a neighbor i. For a peer i, we useλ i to represent the expected number of received sub-requeststo serve the movies replicated in its local storage. If a peerviewing movie j continues to get bandwidth from peer i, theexpected bandwidth peer j obtained from peer i is:E[X j (i)] = 1 λ i. (5)The proof is in the Appendix. This leads to the followingProposition:Proposition 2: In FSFD, the proportional movie replicationcan achieve load balancing in peer bandwidth allocation asdefined in EQ.3. The average server load is bounded by B ≤√1√yN2π.The bound is derived by considering the case whenVar[X j ]is maximized. The detailed proof is in the Appendix. Witha fixed number of copies C j = NL × η j , we propose thefollowing content placement algorithm to minimize Var[X j ].Algorithm 1 Content Placement Algorithm for FSFD1: for j = 1 to K do2: C j = NL×η j3: end for4: for i = 1 to N do5: Peer i randomly selects L movies from the movie setQ i ;6: for ∀j ∈ Q i do7: C j = C j −18: if C j ≤ 0 then9: Never select movie j any more10: end if11: end for12: end forThis algorithm is adapted from RLB, the main differenceis to introduce the enforcement proportional replication in theFSFD case. By randomly allocating movies toQ i , the variancein X j hence the average server load is minimized as in RLB.Here, when a peer selects neighboring peers to request service,it must also do a random selection to minimize the correlationamong X j (i)s from different neighbors i in order to minimizevariance of X j for each movie j.Selecting only one neighbor for service, i.e. y = 1, is aspecial case. This case is studied in paper [4]. The performanceof this case is the worst compared to FSFD with y > 1, whichis consistent with the result in [4].6 In [7], the peers and their bandwidth allocated to each movie are adjustedby an algorithm, to deal with peer churn and movie popularity churn. Suchadaptation can indirectly help amend the limitations of FBA.C. Analysis of PFSSince, the PFS case is analyzed in paper [6], we discuss itbriefly here. For detail, please refer to paper [6].Proposition 3: In PFS, the average server load is boundedby B ≤ √√NK 12π L .Proof: The worst case happens when peers form severalclusters. The peers in the same cluster replicates exactly thesame movie set. The cluster is analyzed as a super serverto provide service. The proof is very similar to the proof ofFSFD. The difference is that the cluster size is depend on thesum of replicated movie popularity.D. The power of unlimited service distributionIn this section, we summarize and discuss the results so far.• The optimal movie replication strategy depends on therequest scheduling model assumed. For both FBA andFSFD, the optimal replication is proportional to moviepopularity, whereas for PFS, the optimal replication isbiased towards cold movies.• PFS and its optimal replication strategy (RLB) outperformsFBA and FSFD based on worst-case analysis, assummarized in Table I.TABLE ICOMPARISON OF DIFFERENT REQUEST SCHEDULINGFBAFSFDPFSWorst-case Expected Server Load√1 √N2π H/L1√2πN √y , y ≤ H1√2πN √HIn Table I, we use H = NLKto represent the average storageresource per movie provided by P2P system. This allows usto express the worst-case bounds of the expected server loadfor all three cases in terms of N, L, H and y, for ease ofcomparison. Clearly, PFS is the best, As y becomes large(approach H), the performance of FSFD becomes the sameas PFS. We can consider PFS to represent unlimited servicedistribution, whereas FSFD is limited service distribution.From this comparison, we have characterized the power ofunlimited service distribution for load balancing. Unfortunately,PFS (like a fluid approximation) is not practical in realworldimplementations. Yet, FSFD is also problematic, sinceit requires y to be smaller than the minimum copy number ofmovies (for cold movies this number can be very small). Thismotivates us to consider a unifying service scheduling modeland the corresponding replication algorithm in the next twosections.IV. A UNIFYING SCHEDULING MODELIn practical systems, the adoption of FSFD scheduling hassome difficulties because it is not realistic to maintain a fixedvalue of y for all movie requests. Given a skewed moviepopularity distribution, cold movies are replicated with few

copies. FSFD means we have to set y to be no bigger thanthe minimum number of copies of any movie in the system.Therefore, we introduce the Fair Sharing with Bounded Outdegree(FSBD), defined as follows:• each peer is allowed to send out at most Y sub-requeststo other peers. The actual number of sub-requests sent bya peer is y = min(Y,C j ).Based on the analysis in Section III, Y is a key parameter todetermine the optimal replication strategy and system performance.If Y = +∞, the scheduling is the same as PFS firstdefined in [6]. If y = Y for all movie sub-requests, then thescheduling is the same as FSFD. Thus FSBD is more general,and its analysis is more complicated than FSFD or PFS.We use I(j,i) to indicate that movie j is stored by peer i:{1, j ∈ QiI(j,i) =0, otherwise.The probability to get bandwidth from a copy of movie j ismin( Y C j,1). According to EQ. 5, the expected bandwidth to1be obtained from a single copy is ∑k∈Q . Forn k ×min( Y i C ,1) kthe FSBD case, EQ. 4 can be re-expressed as the followingoptimization problem:minK∑η j Var[X j ] (6)j=1N∑ min( Y CI(j,i)×j,1)∑k∈Q in k ×min( Y C k,1) = 1i=1C j =N∑I(j,i),1 ≤ j ≤ Ki=1|Q i | = L,1 ≤ i ≤ N.From the proof of Proposition2and reference [6], the worstcase server load is achieved when the peers form different clusterssuch that X j (i) has perfect correlation for all i providingbandwidth for the same peer, in which case the variance ofX j is maximized. In each cluster, peers store exactly the samemovie set Q i . The worst case analysis can also be extendedto FSBD. We assume that there are K/L clusters. Let themovie set in the i th cluster be Q i . To satisfy EQ. 3, the peerpopulation in the i th cluster is R i = ∑ j∈Q iN j . Therefore,the expected server load caused by the i th cluster is:The total expected server load is:s.t.1 R√ √ i2π min(Y,Ri ) . (7)KL∑i=11√2πKL∑R i = N.i=1R i√min(Y,Ri )(8)The value of upper bound depends on the value of min(Y,R i )for each cluster i. If Y = +∞ such that min(Y,R i ) = R i .The EQ.8 becomes:B =KL∑ 1 √√ Ri .2πi=1Because √ R i is concave, the maximum value of B is√1√NK2π L , achieved when R 1··· = R K/L = NLK. The resultis exactly the same as the PFS case. On the other hand, ifmin(Y,R i ) = Y for any cluster i, EQ. 8 is simplified as:B =KL∑ 1 R√ √ i= 1 N√ √Y , 2π Y 2πi=1which is the same with the server load bound derived inProposition 2. For the FSBD case, we have the followingresult:Proposition 4: When K L≫ 1, the expected server load canbe approximately expressed as:and,B ≤ √ 1 N √Y (1+ Y ), when Y ∈ [1,4H), (9)2π 4HB ≤ 1 √2πN √H , when Y ∈ [4H,+∞). (10)Here, H is the average storage resource per movie, i.e. H =NLK .The detailed proof is in the Appendix. It is not trivial toachieve the upper bound given a movie popularity distributionη 1 ,...,η K . The cluster is like a bag. How to allocate Kmovies to K Lclusters without repetition such that each clusteris allocated L movies to satisfy the requirement to achievethe maximum expected server load is a classical integerprogramming problem. There is no algorithm to solve it withpolynomial time complexity. But our analysis proves thatthe expected server load will not exceed the result given inProposition 4, even if the worst case happens.V. HETEROGENEOUS CASEProposition 5: In heterogeneous upload capacity case, theserver loads in FBA, FSFD, PFS and FSBD share the samebound with the homogeneous case.Proof:• PFS: It is proved in [6] that the bound of server load inheterogeneous upload capacity case with PFS is the samewith the bound of server load in homogeneous uploadcapacity case.• FBA: The FBA allocates bandwidth for any particularmovie according to the movie’s popularity. In heterogeneouscase, the allocated bandwidth by one copy shouldbe adjusted by the upload capacity of the peer, whichreplicates this movie. Thus, the line 7 in ProportionalContent Placement Algorithm should be C j = C j −U i .• FSFD and FSBD: The worst case of FSFD and FSBDis achieved when peers form into different clusters. In

each cluster, the peers replicate exactly the same movieset. In homogeneous case, the size of the cluster is thesame with the sum of expected peer population to viewthe movie set. For heterogeneous case, the cluster sizeshould be adjusted by the upload capacity of each peerin the cluster such that R j = ∑ i∈Cluster j U i, where R j isthe expected peer population to view the movie replicatedby cluster j and U i is the upload capacity of peer i.We will propose a distributed adaptive movie replicationalgorithm for P2P VoD system. The algorithm is applicablefor both heterogeneous and homogeneous network.VI. REPLICATION ALGORITHMFor both FSFD and PFS, randomization is used to placemovies in Q i to minimize variance of service capacity foreach movie. For FSFD, random placement works because subrequestsfor a movie are sent randomly to peers holding thatmovie; and for PFS, it works because all (randomly selected)peers storing a movie serve requests for that movie together.But, it is a challenge to design or construct a solution to satisfythe constraints in EQ. 6 and minimize that objective function inpolynomial time. Therefore, we propose a new distributed andadaptive replication (DAR) algorithm to iteratively converge toa solution that load-balances as well as minimizes the variancefor each movie’s rate. The idea of our algorithm is simple.Each peer i viewing movie j will use a sliding window tocalculate the average streaming rated j provided by other peers(i.e. excluding any contribution from the server). After a peerfinishes viewing a movie, a replication decision is made basedon the value of d j . The peer must also obtain the current valueof d k for k ∈ Q i by communicating with other peers (or acentral server maintaining such information). Then, the peerselects the movie with maximum downloading rate d max . Ifd j < d max , the peer will store movie j locally, replacingthe movie with downloading rate d max . Otherwise, movie jis discarded. The generated overhead just is the exchange ofthe information of d j between peers. The overhead is causedonly when there exists sub-requests for a particular movie.Compared with the transmission of the movie content, theoverhead is negligible.Note, this algorithm is independent of the out-degree boundY of the scheduling policy. In FSBD, the peers viewing thesame movie are expected to get the same expected downloadingrate due to the fair sharing service. Therefore, the bandwidthallocation for movie j is reflected in the downloadingrate from other peers for any particular peer viewing movie j.For any particular movie j, if there is no peer viewingit for awhile, the (default) downloading rate will be set tod j = +∞ for replication decision purposes. Actually, thisreplication algorithm is a distributed iterative algorithm forsolving the optimization problem in EQ. 6. In each round, thesystem will allocate more bandwidth resource to the moviethat satisfies the inequality E[X] < 1. The movie with largerAlgorithm 2 Distributed and Adaptive Replication (DAR)1: Peer i viewing movie j will use a sliding window toestimate the average downloading rated j from other peers.2: After viewing, the peer i has movie j as candidate forreplication.3: d max = 04: for ∀k ∈ Q i do5: Get value of d k from other peers viewing movie k6: if d k > d max then7: d max = d k8: ID max = k9: end if10: end for11: if d max > d j then12: Use movie j to replace movie ID max13: end ifexcessive bandwidth resource E[X]−1 will be replaced withhigher priority.Finally, a word about the heterogeneous peer case: althoughall the discussion so far is for the homogeneous peer case, itis not difficult to extend the replication algorithms (at leastin theoretically terms) to the heterogeneous case, in a similarfashion as in [6]. But due to limited space, we cannot includethat discussion here.VII. SIMULATIONSimulation is used to validate the performance bounds onserver load, and compare different replication algorithms.A. Simulation SettingTime is divided into time slots. As is reported by N.Venkatasubramanian et al [9], movie popularity is assumedto follow the Zipf distribution. If each movie is ranked bymovie popularity in a descending order, the popularity of thej th movie can be expressed as in Eq. (11):η j =j −θ∑ K(11)i=1i−θ, where θ is a parameter reflecting the skewness of the distributioncurve. The larger the value of θ, the more skewed is thepopularity distribution. As suggested by [9], the value of θ isusually in the range of [0.271,1].For all our experiments, we set the peer population, N =4000 and the movie population K = 400. Each peer can storeL = 4 movies.B. FBA vs. FSFD vs. PFSThe first simulation experiment tries to validate our propositions.We setθ = 0, that is all movies have the same popularity,and by symmetry they are all replicated the same number ofcopies to balance the load. This setting allows us to vary theout-degree y from 1 to NLK. For FBA, we use the proportionalcontent placement algorithm and FBA scheduling, which isindependent of out-degree y. The result is a flat curve. For

FSFD, the out-degree varies from 1 to NLK= 40. We also plotthe curve of the worst bound of the expected server load forPFS strategy (PFS - Worst Case curve). The result is shownin Fig. 1. We observe: (a) The consumed server load by the• ARLB: selection is based on centralized information asdefined in the ARLB algorithm [6];• FIFO: the earliest movie that got stored locally;Average server load1600140012001000800600400200Prop − FSFDProp − FBAFSFD − Worst CaseFBA − Worst CasePFS − Worst Case00 5 10 15 20 25 30 35 40yFig. 1. Proportional content placement with θ = 0 and homogeneous uploadcapacityFBA algorithm is very close to the numerical result calculatedby our model. For FSFD, proportional replication results inreducing server load as we increase out-degreey. In particular,when y ≥ 40, the worse case bounds for FSFD and PFSconverge. (b) Overall, the FBA is not an efficient strategy.Only when the out-degree is very small, does FBA outperformthe FSFD.C. FSBDFrom our analysis, the optimal replication for FSFD isproportional, whereas PFS favors cold movies more. Then wedesign the DAR algorithm for FSBD as described in the lastsection. Since FSBD is like FSFD when Y is small, and islike PFS when Y is large, it is interesting to find out howDAR algorithm behaves as we vary Y . To help visualize thebehavior of DAR algorithm, we define the following metric:D =K∑|C j −NL×η j |. (12)j=1Given a replication solution (Q i ), D is its distance from theproportional movie replication solution.In this simulation experiment, we compare several adaptivereplication algorithms using their distance D for differentvalues of Y (from 1 to 100). The result is shown in Fig. 2 andFig. 3. Besides the DAR algorithm, the other two adaptivealgorithms considered are ARLB (used for PFS) and FIFO(which is known to produce proportional replication). In eachof these adaptive replication algorithms, after a movie isviewed by a peer, the peer makes a decision whether to replaceone of the locally stored movies with the one just completedviewing. The movie to be replaced is selected based on thefollowing criteria:Average server load1600140012001000800600400200Rep − DARRep − ARLBRep − FIFO00 10 20 30 40 50 60 70 80 90 100YFig. 2. Comparing Server Load of three adaptive strategies, θ = 0.6 andhomogeneous upload capacityD7000600050004000300020001000Rep − DARRep − ARLBRep − FIFO00 10 20 30 40 50 60 70 80 90 100YFig. 3. Comparing Distance of three adaptive strategies, θ = 0.6 andhomogeneous upload capacityFrom the figures, indeed the performance of FIFO is closeto being proportional since D is small and constant over allY . The consumed server load for FIFO increases for largeY . This is because proportional is known to deviate fromoptimal whenY increases (and the scheduling becomes similarto PFS [6]. The performance of ARLB is not as good as otherstrategies for small value of Y because ARLB is developedwith PFS in mind (hence favoring cold movies which is notoptimal for small Y ). For DAR under FSBD scheduling, whenY is small, its distance D is very close to the distance forFIFO (proportional); but when Y is large, its D is close to thedistance of ARLB. Overall, DAR algorithm works best withdifferent values of Y . For Y < 5, performances of all threestrategies are very close, because the room for improvement islimited for small Y . For very large values of Y , ARLB+PFSbeats the DAR+FSBD, but very marginally.

D. Robustness ValidationIn addition to the above simulation to validate our theoreticalmodel, we also conduct following experiment to validatethe robustness of DAR algorithm.In practical system, the upload capacity is heterogeneousand difficult for prediction. We conducted following simulationby setting heterogeneous upload capacity. The upload capacityis 80/20 distributed, i.e. 20% peers take up 80% total uploadcapacity while the other 80% peers take the left 20% uploadcapacity. In heterogeneous case, the performance gap betweenDAR and ARLB is smaller than the homogeneous network,since the ARLB requires the knowledge of each peer’s uploadcapacity. In practical system, it is a challenging work to collectpeers’ upload capacity. Fortunately, DAR does not need thisinformation, but can work better than ARLB.For other parameters, we set N = 4000, K = 400, L = 4,θ = 0.6 and Y = 40.Average server load18001600140012001000800600400200Average server loadclose to 3000 forall the 3 algorithmswhen Y = 1Rep − DARRep − ARLBRep − FIFO05 10 20 30 40 50 60 70 80 90 100YFig. 5. Comparing server load with limited both upload and downloadconnections, θ = 0.6, Y = 40 and heterogeneous upload capacity14001200Rep − DARRep − ARLBRep − FIFOto 2 as shown in Fig. 6. The distribution of upload capacitystill is 80/20, but the average value is equal to U.Server load1000800600400200FIFOARLBDAR00 50 100 150 200 250 300 350 400Simulation TimeslotFig. 4. Detailed server load of three adaptive algorithms in each time slot,θ = 0.6, Y = 40 and heterogeneous upload capacity1) Convergence Time: To get a feel for convergence timefor these adaptive replication algorithms, The server load foreach time slot is plotted in Fig. 4. All three curves start atthe point of heavy server load (around 1200). This is dueto the fact that we use the Equal Copy content placementalgorithm to initialize the peers’ storage at the beginning ofthe simulation. After that, these curves quickly converge to thestable level of sever load for each algorithm respectively. Thissimulation experiment is run for Y = 40, and we find thatthe DAR algorithm achieves the best performance (minimumserver load) while FIFO is the worst. Furthermore, we observethat the DAR algorithm is also the one with the smallestoscillation amongst all three, which indicates that it is morestable than the other two algorithms.2) Limited both upload and download connection: In ouranalysis, only the connections a peer sends out is limited byY . In this experiment the number of requests can be repliedby a peer also is limited by Y . As shown in Fig. 5, the DARresults smallest server load than other algorithms.3) Varying U: In analysis, we assume that U = 1. In thisexperiment, we relax the assumption by varying U from 0.6Average server load1600140012001000800600400200Rep − DARRep − ARLBRep − FIFO00.6 0.8 1 1.2 1.4 1.6 1.8 2UFig. 6. Comparing server load by varying U , θ = 0.6, Y = 40The DAR still is the best one. When U is small, it is easy toutilize peer’s bandwidth efficiently because the probability toobtain redundant bandwidth is small. Thus, the performancegap when U is less than 1 is very small. On the other hand,when U is large, a little wasted bandwidth may not causeserver load due to sufficient bandwidth. The experiment showsthat it is meaningful to study the replication problem whenU = 1.VIII. RELATED WORKSZhou et al [6] first proposed the P2P VoD model andanalysis technique used in this paper. The Perfect Fair Sharing(PFS) scheduling model, RLB and ARLB replication algorithmsand the server bounds for these cases are all from [6].But PFS is a very idealized scheduling policy that is notpractical. In this paper, our contribution is to unify the P2PVoD replication work from several important papers in theliterature; proposed a unifying and more practical requestscheduling policy FSBD. Furthermore, we propose a new

distributed and adaptive replication algorithm that works wellwith FSBD.Wu et al [5], [7] proposed view-upload decoupling for P2Pstreaming to deal with the problem of channel churn. Theirproblem is similar to P2P VoD. They need to assign peers toserve channels (like replication), and allocate peer’s bandwidthto different channels (like scheduling). Their user demandmodel is the same as ours, and their bandwidth allocationstrategy is similar to FBA, which lead to our interest in FBA.Our model of FSFD is similar to the scheduling policy usedin an important early work on P2P VoD. In [8], each peercan send exactly y sub-requests to neighbors who are servingleast sub-requests. They rely on a “balancer” to distributethe sub-requests evenly, and each peer only serves a fixednumber of sub-requests (the excessive load is rejected andwait in the queue). The authors also proposed to apply ratelesscoding [10], [11] to encode movie window. A small part ofcoded content is distributed to every peer in the P2P VoDsystem.Wu and Li [4] studied a special case of fair sharing. In thiswork, each peer only seek one neighbor to serve it, and theneighbor randomly selects one sub-request to reply if morethan one sub-requests are received. Due to this constraint onthe scheduling strategy, the server load cannot be reducedby P2P significantly. They conclude that the performance ofLRU is very close to the theoretical optimal. Their results areconsistent with ours.Both literature [12] and [13] studied chunk schedulingstrategy in P2P VoD system. They focus on the process ofdistributing a single movie to multiple peers. At the beginning,only the server stores the movie. The trade-offs among threemetrics: throughput, sequential delivery and chunk availabilityare studied. The conclusion is that there is no chunk selectionstrategy that maximizes all three metrics at the same time.However, in practical P2P VoD system, movies can be pushedto peers, or cached by peers after viewing; in this case, thereis no need to study chunk availability, which corresponds toour formulation of the problem.Tan et al [14] is recently able to obtain some analyticalresults by dividing movies into hot, cold and warm categories.Under a different service scheduling model (random peerselection), they are able to show that it is optimal for eachpeer to store only the hottest L−1 movies, with one additionalmovie from the warm category. Their request scheduling issimilar to FSFD in our analysis, but their user demand modelis somewhat different so a direct comparison is not possible.But their results are consistent with ours.Applegate et al [15] considered the movie replication problemin P2P VoD as an integer programming problem. Theauthors use linear programming to derive an approximationsolution. Due to the large scale of the problem, even solvingthe linear programming via approximation takes several hours,for experiments with the scale of a practical system. Therefore,their optimization procedure would be run once per day oronce in several days, whereas in our case, the P2P systemcontinuously adapt its replication. These are two differentapproaches to the problem.In [16], the authors try to characterize the optimal replicationstrategy. Their conclusion is that proportional replicationis not optimal and P2P VoD system should allocate moreresources to cold movies, which is consistent with our results.But in their model, peers also serve the movie that they aredownloading, which is a detail not in our model.The paper [11] is an early work to study the relationshipbetween the movie population and peer resources in P2P VoDsystem. The peer resources includes peers’ upload capacityand the storage size. The work studied the number of moviessupported by peers to satisfy a minimum number of distinctmovie requests with limited peer resource.The paper [17] discuss the relationship between the outdegreeand streaming performance in P2P live streamingsystem. The streaming performance includes minimum serverload, streaming continuity and tree depth.The sustainable streaming rate supported by a P2P system isstudied in the paper [18]. In this work, peers’ upload capacityand the ratio of peers over seeds are the key factor to determinethe sustainable streaming rate.The paper [19] discussed an adaptive movie replication algorithmwith feedback information. The feedback informationis the downloading rate from other peers. Server assists peersto make replication decision based on the collected feedbackinformation.IX. CONCLUSIONIn this paper, we apply the same methodology to analyzethree kinds of request scheduling strategies and analyze theircorresponding optimal movie replication strategies to achievebalanced bandwidth allocation and minimize expected serverload. Through this analysis, we can explain why some P2PVoD systems prefer proportional replication, whereas othersprefer providing more than proportional replication for coldmovies. Actually, request scheduling in real-world systems islikely to be in between fair sharing (with some fixed degree)and perfect fair sharing. Therefore, we propose a FSBD modelwith varying limit of out-degree for different movies. Thisallows us to illustrate the effect of out-degree in requestscheduling, and visualize the reason for allocating more copyresources to cold movies in networks with large out-degree.We use simulation to validate our analysis and make variouscomparisons between different scenarios.REFERENCES[1] Bittorrent, “http://www.bittorrent.com/.”[2] PPLive, “http://www.pplive.com/.”[3] Y. Huang, T. Z. J. Fu, D. M. Chiu, J. C. S. Lui, and C. Huang,“Challenges, design and analysis of a large-scale p2p vod system,” inProc. of ACM Sigcomm, 2008.[4] J. Wu and B. Li, “Keep cache replacement simple in peer-assisted vodsystems,” in Proc. of IEEE Infocom, 2009.[5] D. Wu, Y. Liu, and K. W. Ross, “Queuing network models for multichannellive streaming systems,” in Proc. of IEEE Infocom, 2009.[6] Y. Zhou, T. Z. J. Fu, and D. M. Chiu, “Statistical modeling and analysisof p2p replication to support vod service,” in Proc. of IEEE Infocom,2011.

[7] D. Wu, C. Liang, Y. Liu, and K. W. Ross, “View-upload decoupling: Aredesign of multi-channel p2p video systems,” in Proc. of IEEE Infocom,2009.[8] K. Suh, C. Diot, J. Kurose, L. Massoulie, C. Neumann, D. Towsley,and M. Valleo, “Push-to-peer video-on-demand system: design andevaluation,” in IEEE Journal on Selected Areas in Communications,special issue on Advances in Peer-to-Peer Streaming Systems, 2007.[9] N. Venkatasubramanian and S. Ramanathan, “Load management indistributed video servers,” in Proc. of IEEE ICDCS’97, 1997.[10] P. Maymounkov and D. Mazieres, “Rateless codes and big downloads,”in Proc. of International Workshop on Peer-to-Peer Systems, 2003.[11] M. Luby, “Lt codes,” in Proc. of IEEE Symposium on foundation ofcomputer science (FOCS). IEEE Computer Society, 2002.[12] B. Fan, D. Andersen, M. Kaminsky, and K. Papagiannaki, “Balancingthroughput, robustness, and in-order delivery in p2p vod,” in Proc. ofACM CoNEXT, 2010.[13] N. Parvez, C. Williamson, A. Mahanti, and N. Carlsson, “Analysis ofbittorrent-like protocols for on-demand stored media streaming,” in Proc.of ACM SIGMETRICS, 2008.[14] B. R. Tan and L. Massoulie, “Optimal content placement for peer-to-peervideo-on-demand systems,” in Proc. of IEEE Infocom, 2011.[15] D. Applegate, A. Archer, V. Gopalakrishnan, S. Lee, and K. K. Ramakrishnan,“Optimal content placement for a large-scale vod system,”in Proc. of ACM CoNEXT, 2010.[16] W. Wu and J. C. S. Lui, “Exploring the optimal replication strategyin p2p-vod systems: Characterization and evaluation,” in Proc. IEEEInfocom, 2011.[17] S. Liu, R. Zhang-Shen, W. Jiang, J. Rexford, and M. Chiang, “Performancebounds for peer-assisted live streaming,” in In Proceeding ofSigmetrics, 2008.[18] F. Benbadis, N. Hegde, F. Mathieu, and D. Perino, “Playing with thebandwidth conservation law,” in P2P, 2008.[19] Y. Zhou, T. Z. J. Fu, and D. M. Chiu, “Server-assisted adaptive videoreplication for p2p vod,” in Elsevier Journal Signal Processing: ImageCommunication Special Issue on Advances in video streaming for P2Pnetworks, 2011.APPENDIXProof of EQ. 5: The arrival process to view a particularmovie is Poisson. We assume these arrivals will generatesub-requests to other peers replicating the required movieevenly. Thus, the arrival process of sub-requests the particularmovie is also Poisson. As we discussed in the user behaviormodel, the viewing time follows the exponential distribution.Therefore, the number of received sub-requests for replicatedmovie by each peer is Binomial with expected value λ i .Poisson distribution is used as the approximation of Binomialdistribution. If a particular peer continues to get content frompeeri, the obtained bandwidth depends on the number of otherrequests, which can be calculated approximately as:E[X j (i)] =≈+∞∑k=0+∞∑k=01k +1λ k i e−λiPr(#req = k)k!λ i λ k i e−λik +1 k!= 1 λ i(1−e −λi ) ≈ 1 λ i,where k is the number of received sub-requests from otherpeers. We assumeN−1 ≈ N when derivingPr(#req = k).Proof of Proposition 1 The peer population to watch anymovie j follows a Binomial distribution with expected valueE[n j ] = N j . The Poisson distribution is a good approximationof Binomial distribution with expected value N j . Now, weprove that proportional replication will achieve E[X j ] = 1.For peer i watching movie j, if the rest of the peers makerandom selection and N − 1 ≈ N, the expected bandwidthallocated is:E[X j ] = ∑ n jN jn jPr(n j )≈+∞∑k=0N j Nj ke−Njk +1 k!= 1−e −λi ≈ 1.Based on proportional replication, we can derive the expectedserver load consumed for movie j as:B j = 0×Pr(n j < N j )+= NNj jN j ! e−Nj .+∞∑k=N j(1− N jk +1 )Pr(N j = k)According √ to Stirling’s Approximation, i.e. N! ≈2π×N(Ne )N , we substitute it back to the above equationand the expected server load becomes:√ K∑ NjB = Nη j B j = √ . 2πj=1With the constraint ∑ Kj=1 N j = N, it is not difficult to√verify that the maximum value of B is B = √ 12π NK. Thecondition to achieve the maximum value is N 1 = N 2 = ... =N K = N K .From the proof, we can find the bound keep unchangedfor different bandwidth allocation, i.e. U i (k) is not equalto 1 L, if the EQ. 3 still is satisfied. The difference is thereplication result, which may not be proportional any more.Proof of Proposition 2: To make sure every peer cansend out y sub-requests, we assume y ≤ minC j withoutloss of generality. If the number of copies of movie j isC j = NLη j , the expected number of sub-requests receivedby each copy is y L, hence the expected received sub-requestby each peer is y. The distribution of the number of receivedrequests is Binomial. We still use Poisson distribution as anapproximation of the number of received requests. Accordingto EQ. 5, the obtained expected bandwidth from a single copyis 1 y. There are a total of y sub-requests sent out by each peer,the expected total obtained bandwidth is 1, which proves thatproportional replication satisfies EQ. 3.The worst case is achieved by maximizing the variance ofX j . This is in turn obtained by maximizing the correlation ofX j (i) from different neighbors i. Therefore, the worst caseis when peers form K Lclusters. In the same clusters, thepeers have the same movie set. Then, the correlation of thebandwidth provided by peers in one cluster is 1. We assumethe movie set provided by the i th cluster is Q i . According toproportional replication, the peer population of the i th clusteris N × ∑ j∈Q iη j . For a peer i viewing any movie j, the

obtained bandwidth depends on the number of sub-requestsreceived by neighbors and the expected server load caused bya single peer is:0×Pr(#req < y)+= yy ×e −yy!≈ 1 √2π1 √y .+∞∑k=y(1− y )Pr(#req = k)k +1Thus, the total expected server load is1 √2πN √y .4NWhen Y ∈ ( √ the expected server load isK ( L+1)2,+∞),bounded by:B ≤ 1 √2π√NKL . (16)If K L≫ 1, the server load can be simplified approximately tothe expression in Proposition 4.Proof of Proposition 4: We assume that there is no constrainton the movie popularity distribution and analyze B as thesum of two separate parts. The first part of B is contributedby clusters with Y < R i , while in the second part, we haveY ≥ R i . Assume the first part has γ clusters, then EQ. 8 canbe expressed as:B = 1 √2π(γ∑i=1R i√Y+KL∑i=γ+1√Ri ). (13)The case of γ = 0 is the same as the case for PFS. Therefore,we focus on the case that γ > 0. For any value of γ, we canuse Lagrange Multiplier to find the maximum value of B. TheEQ. 13 can be expressed as:B = 1 √2π(γ∑i=1R i√Y+KL∑i=γ+1K√ L∑Ri )+τ( R i −N).By solving this optimization problem, the condition toachieve maximum expected server load is R i = Y 4for i > γ.Therefore, the expression of EQ.13 can be simplified as:B = √ 1 ( √ N + 1 2π Y 4 (K L −γ)√ Y). (14)From EQ. 14, we have dBdγ< 0 for γ ≥ 1. There are twopossible value ofγ that can lead to maximized server load. Onecase is γ = 0 which yields the same result as PFS. Anotherpossible case is γ = 1 and the expected server load is:B = √ 1 ( √ N + 1 2π Y 4 (K L −1)√ Y),which is worse than the bound derived in the analysis in FSFD.This is because FSBD makes no assumptions on skewnesswhereas FSFD does; hence for some highly skewed moviepopularity, FSBD can incur more expected server load thanFSFD. The condition to achieve the maximum expected serverload is R 1 = N − Y 4 (K L − 1) and R i = Y 4. The constraintis that R 1 ≥ Y , hence Y ≤ 4NdBKL +3.On the other hand,dY

A Unifying Model and Analysis of P2P VoD Replication and ...

Create successful ePaper yourself

Delete template?

Save as template?