Non-homogeneous distributed storage systems - Singapore ...

Non-homogeneous distributed storage systemsVo Tam Van, Chau YuenSingapore Univ. of Tech. and DesignEmail: {tamvan_vo,yuenchau@sutd.edu.sg}@sutd.edu.sgJing (Tiffany) LiLehigh UniversityEmail: jingli@lehigh.eduarXiv:1208.2078v1 [cs.IT] 10 Aug 2012Abstract—This paper describes a non-homogeneous distributedstorage systems (DSS), where there is one super node whichhas a larger storage size and higher reliability and availabilitythan the other storage nodes. We propose three distributedstorage schemes based on (k+2,k) maximum distance separable(MDS) codes and non-MDS codes to show the efficiency of suchnon-homogeneous DSS in terms of repair efficiency and dataavailability. Our schemes achieve optimal bandwidth k+1 Mwhen 2 krepairing 1-node failure, but require only one fourth of theminimum required file size and can operate with a smaller fieldsize leading to significant complexity reduction than traditionalhomogeneous DSS. Moreover, with non-MDS codes, our schemecan achieve an even smaller repair bandwidth of M . Finally, we2kshow that our schemes can increase the data availability by 10%than the traditional homogeneous DSS scheme.Index Terms— Exact-repair MDS codes, non-homogeneousDSS, minimum storage regenerating (MSR) codes.I. INTRODUCTIONDistributed storage systems (DSS) are widely used todayfor storing data reliably over long periods of time using adistributed collection of storage nodes, which may be individuallyunreliable. Application scenarios include large datacenters such as Total Recall [4], OceanStore [12] and peer-topeerstorage systems such as Wuala [9], that use nodes acrossthe Internet for distributed file storage.One of the challenges for DSS is the repair problem: If anode storing a coded piece fails or leaves the system, in orderto maintain the same level of reliability, we need to createa new encoded piece and store it at a new node with theminimum repair bandwidth. To solve this problem, Dimakiset al. introduced a generic framework based on regeneratingcodes (RC) in [2]. RC use ideas from network coding to definea new family of erasure codes that can achieve different tradeoffsin the optimization of storage capacity and communicationcosts. The optimal tradeoff curve for achievable codes has twoextremal points which are of particular interest: the minimumstorage regenerating (MSR) codes with minimum possiblestorage size for a given repair capability, and the minimumbandwidth regenerating (MBR) codes with minimum possiblerepair bandwidth.Consider a minimum storage system, where a source file ofsize M units is split into k parts, defined over a finite field F qand stored across n nodes in the DSS. While the economyin storage is highly desirable, issues may arise when thesystem tries to repair failure at the optimal repair bandwidth.Specifically, if q or M grows to be arbitrarily large, then thesystem may become inefficient and impractical due to theFigure 1.An example of traditional repairing 1-node failure.high computational complexity or the fast growing storageconsumption.Another challenge for DSS is data availability, which isof critical importance to a peer-to-peer (P2P) storage/backupsystem that relies on a swarm of distributed and independentnodes for file storage. As the nodes not only differ in theirstorage capacity and traffic bandwidth, but they may not beonline or available at all times. Hence, there is a pressingneed to increase the data availability, such that infomationis available with a probability approaching 1. Clearly, P2Penrironments are heterogeneous by nature, and code design forsuch systems must explicitly account for this heterogeneity.The primary interest of this paper is to study a nonhomogeneousDSS, where one "super node" has a largerstorage size and higher reliability and availability than theother storage nodes. We study a class of high-rate (k +2,k)MDS storage codes, and show that with MDS code such nonhomogeneousDSS can achieve the optimal bound in [2] whenrepairing single or double-node failures, but require smallerMand q than the traditional homogeneous model in [1]. Anotherproposed scheme based on non-MDS codes is shown to repair1-node failure below the optimal repair bandwidth bound in[2]. Moreover, we show that our proposed non-homogeneousDSS schemes can achieve a higher data availability than thetraditional homogeneous DSS scheme.This paper is organized as follows. Section II shows thedefinition of non-homogeneous DSS. Section III shows threeproposed schemes of exact repair with (k+2,k) storing codesin non-homogeneous DSS. Section IV shows the numericalresults of our schemes and the comparison with previousmethods. Finally, the paper is concluded in Section V.

II. MODELS OF DISTRIBUTED STORAGE SYSTEMSIn this section, we present a brief review of the traditionalhomogeneous DSS proposed in [2]. Then, a new model of nonhomogeneousDSS is proposed to realize the practical DSS.A. Model of traditional homogeneous DSSWe follow the definition of traditional homogeneous DSSusing (n,k,d,α,γ) regenerating codes over finite field F q .This network has n storage nodes and every k nodes sufficeto reconstruct all the data. The size of the file to be stored isM units 1 and partitioned into k equal parts f 1 ,··· ,f k ∈ F N qwhere N = M k. After encoding them into n coded parts usingan (n,k) maximum distance separable (MDS) code, we storethem at n nodes.We define here the MDS property of a storage code usingthe notion of data collectors as presented in [2]. A storagecode where each node contains M kworth of storage, has theMDS property if a data collector can reconstruct the all theM units by connecting to any k out of n storage nodes.When a node fails, the data stored therein is recoveredby downloading β packets each from any d (≥ k) of theremaining (n − 1) nodes; the total repair bandwidth is thenγ = dβ as shown in Fig. 1. It has been shown in [2] that thereexists an optimal tradeoff between the storage per node,α, andthe bandwidth to repair one node, γ. In this paper, we focuson the extreme point where the smallest α = M k correspondsto a minimum-storage regenerating (MSR) code.( ) M(α MSR ,γ MSR ) =k , Md(1)k(d−k +1)To minimize γ MSR , let d = n − 1 and we get( ) ( )αMSR ,γMSRmin = . In the case of high-rateMk , M k .n−1 n−kcodes n = k + 2, a lower bound for repair bandwidth γ 1 of1-node failure was shown as [2]:γ 1 = (n−1)β ≥ M k .n−1 n−k = M +1.kk 2B. Model of the proposed non-homogeneous DSSDefinition 1. A non-homogeneous DSS with the parameter(n,k,h) is a distributed storage systems with h nodes basedon (n,k) storing codes and the amount of data stored anddownloaded from any nodes are variable. Node i in thenetwork stores α i ≥ M kunits. When node i fails then it isrepaired by downloading β j packets from node j, j ∈ {n}\i.✷It is clear that we must have β j ≤ α j for all j ≠ i sincea node can not transmit more information than it is storing.When n = h,α i = α,β j = β for all i, j ≠ i, we obtain thetraditional homogeneous DSS. When n > h, there are moreredundant blocks than the storage nodes. The storage processhas to decide which node(s) to store more blocks.Example 2. In this paper, we present the idea of nonhomogeneousDSS using the following setting: there is one1 We use “packets”, “units”, “blocks” interchangeably.(2)big node, called the super node, which has a larger storagecapacity and higher reliability and availability than the othernodes. Such scenario is possible in practical system, e.g.in a peer-to-peer backup system, the super node could bethe service provider that has higher availability and provideshigher storage capacity than other peers.Consider a system with one super node and three otherstorage nodes non-homogeneous DSS based on a (5,3) MDScode, which can be denoted as (n = 5,k = 3,h = 4). Assumea file of size M = 6, then this file is divided into k = 3 parts,each part containing N = M k= 2 packets. After encodingthem into 5 encoded parts or 10 packets, we store the first2N = 4 packets in the super node, and each of the remainingthree nodes stores N = 2 packets as shown in Fig. 2.Figure 2. An example of non-homogeneous DSS based on (5,3) MDS codesand 4 storage nodes.(Node 1 is the super node.)III. EXACT REPAIR OF (k +2,k) STORING CODES INNON-HOMOGENEOUS DSSIn this paper, we limit our study to high-rate (n = k +2,k) exact-repair storing codes. This homogeneous problemhas been considered in [1], [5], [14]. We propose three efficientDSS schemes using MDS and non-MDS storage codes in such(n = k +2,k,h = k +1) non-homogeneous DSS, which aredenoted as Scheme A, B, and C in Table. I. Scheme A and Cuse MDS codes while Scheme B uses non-MDS codes. Thenew system consists of (k +1) nodes which include k nodesof storage size N and one super node of size 2N.A. Scheme A: Store two systematic data at the super nodewith MDS codes.It can be seen in Table. I, the first (k − 1) storage nodesof Scheme A store the systematic file parts f 1 ,··· ,f k wheref 1 ,f 2 are stored in the same systematic node s 1 and the otherfile parts f 3 ,··· ,f k are stored individually in the remaining(k−2) systematic nodes s 2 ,··· ,s k−1 , respectively. The firstand second parity p 1 , p 2 store a linear combination of allk systematic parts as f 1 A 1 + ··· + f k A k and f 1 B 1 + ··· +f k B k . Here, A i and B i denote an N × N matrix of codingcoefficients defined over finite field F q .• Repair systematic failure nodes for Scheme A:We first consider repairing 1-node failure (in the case of supernode s 1 fails, it is considered as 2-node failures which will bediscussed in more detail later). Without loss of generality weassume node s 2 that contains f 3 is failed. For simplicity, wefirst consider the case (n = 5,k = 3). To recover desired dataf 3 , we have to download the following equations from the twosurvival parity nodes:

Table ITHREE SCHEMES OF (k +2,k,k +1) NON-HOMOGENEOUS MODEL VS. TRADITIONAL MODEL BASED ON (k +2,k) MDS CODES WHERE S AND P ARETHE ABBREVIATION OF SYSTEMATIC AND PARITY, RESPECTIVELY. HERE,f i ∈ F 1×Nq AND A i ,B i ∈ F N×Nq FOR ALL 1 ≤ i ≤ k. NOTE THAT ALLSCHEMES A, B AND C USE ONLY (k +1) STORAGE NODES TO STORE(k +2) PACKETS.Non-homogeneousHomogeneousProposed Scheme A and B Proposed Scheme C Traditional model [1]S. nodes 1 f 1 f 2 f 1 f 1s 2 f 3 f 2 f 2....s k−1 f k f k−1 f k−1s k × f k f kP. nodep 1 f 1 A 1 +···+f k A kf 1 A 1 +···+f k A kf 1 B 1 +···+f k B kf 1 A 1 +···+f k A kp 2 f 1 B 1 +···+f k B k × f 1 B 1 +···+f k B kwhere A i , B i ∈ F N×NqF N×N 2q{f1 A 1 V 1 +f 2 A 2 V 1 +f 3 A 3 V 1f 1 B 1 V 2 +f 2 B 2 V 2 +f 3 B 3 V 2 (3)for all 1 ≤ i ≤ k and V 1 ,V 2 ∈can be derived based on the failure node. To repairdifferent failure nodes, different V 1 ,V 2 are needed whichcan be precalculated. It can be seen from Fig. 3 that theterm ( f 1 A 1 V 1 +f 2 A 2 V 1) and ( f 1 B 1 V 2 +f 2 B 2 V 2) are removableby downloading ( N2 + ) N2 packets from super node.Therefore, the desired dataf 3 can be recovered if the followingrank constraint is satisfied:rank [ A 3 V 1 , B 3 V 2] = N (4)To recover the desired data f 3 in the general (n = k+2,k)case, we have to use the following equations:{f1 A 1 V 1 +f 2 A 2 V 1 +f 3 A 3 V 1 +···+f k A k V 1f 1 B 1 V 2 +f 2 B 2 V 2 +f 3 B 3 V 2 +···+f k B k V 2 (5)(( Similarly, the term f1 A 1 V 1 +f 2 A 2 V 1) and( f1 B 1 V 2 +f 2 B 2 V 2) are removable by downloadingN2 + ) N2 packets from super node 1. The followingconditions must be satisfied to achieve the optimal repairbandwidth:rank [ A 3 V 1 , B 3 V 2] = Nrank [ A 4 V 1 , B 4 V 2] = N 2rank [ A k V 1 , B k V 2] = N 2.To relax the complexity of the constraints found in (6),we set A i = I N and V 1 = V 2 , then obtain the followingequations:rank [ B 3 V 1 , V 1] = Nrank [ B 4 V 1 , V 1] = N 2rank [ B k V 1 , V 1] = N 2.(6)(7)Figure 3. An example of repairing 1-node failure using Scheme A based on(5,3) MDS codes in non-homogeneous DSSThe problem of finding matrixB i is similar to [1]. However,here we only need to solve (k −2) equations. Therefore, thefragment size and finite field will be smaller, with M = 2 k−1 k(which means the fragment size reduce to 1 4of the traditionalhomogeneous model when k ≥ 3), and q = 2k − 1. Theseadvantages allow us to reduce the minimum size unit of storingfile and reduce the complexity of computation to a smallerfinite field. It can be seen that in this case of 1-node failure, theproposed Scheme A can achieve the optimal repair bandwidthof k+1 M2 k, which is the same as traditional homogenous DSSscheme.• Repair first parity node for Scheme A:If the first parity node p 1 fails, we make a change of variableto obtain a new representation for our code such that the firstparityp 1 becomes a systematic node in the new representation.We make the change of variables as follows:k∑f i = y 3 , f s = y s for 1 ≤ s ≠ 3 ≤ k (8)i=1We solve (8) by replacing f 3 in terms of the y i variables andobtainf 3 = y 3 −(y 1 +y 2 +y 4 +···+y k )

The problem of repairing first parity is equivalent to repairsystematic node y 3 in the new presentation. Note that y 1 ,y 2are stored in the same node since they are correspondentto f 1 ,f 2 . To repair y 3 , we have to download the followingequations from node s 2 and p 2 :{(−y1 )+(−y 2 )+y 3 +···+(−y k )(B 1 −B 3 )y 1 +(B 2 −B 3 )y 2 +B 3 y 3 +···+(B k −B 3 )y kAgain, the V 1 ,V 2 matrices need to satisfy the followingconditions in order to achieve the optimal repair bandwidth.rank [ B 3 V 1 , V 1] = Nrank [ (B 4 −B 3 )V 1 , V 1] = N 2.rank [ (B k −B 3 )V 1 , V 1] = N 2Similar to the systematic case, the solution of matrix B iis similar to [1]. However, we need to solve only (k − 2)equations, which means the fragment size and the finite fieldwill be smaller M = 2 k−1 k, and q = 2k −1.• Repair second parity node for Scheme A:Similar to the above, we rewrite this code in a form where thesecond parity is a systematic node in some presentation⎡⎤I N 0 0 ··· 00 I N 0 ··· 00 0 I N ··· 0. . . . .f⎢ 0 0 0 ··· I N⎥⎣ I N I N I N ··· I N⎦B 1 B 2 B 3 ···⎡B k⎤I N 0 0 ··· 00 I N 0 ··· 00 0 I N ··· 0=. . . . .f ′ ⎢ 0 0 0 ··· I N⎥⎣ B ′ 1 B ′ 2 B ′ 3 ··· B ′ ⎦kI N I N I N ··· I N(10)wheref ′ is a full rank row transformation off. We proceed in away similar to how we handled the first parity repair to achievethe optimal repair bandwidth. A similar set of equations to thecase of repairing the first parity node can be obtained as shownbelow.]rank[B ′ 3 V1 , V 1 = N]rank[(B ′ 4 −B′ 3 )V1 , V 1 = N 2, (11). ]rank[(B ′ k −B′ 3 )V1 , V 1= N 2It can be seen that in this case the size and the finite field willbe again M = 2 k−1 k, and q = 2k−1, which are smaller thanthose in [1], and still achieve the optimal repair bandwidth.(9)Figure 4. Illustration of exact repair of 2-node failure for a (5,3) Exact-RepairMDS code for Scheme A in non-homogeneous DSS. Total repair bandwidthγ 2 = M + M achieves lower bandwidth bound.k• Repair 2-node failures for Scheme ATo repair 2-node failure at the optimal repair bandwidth, onesolution is shown in Fig. 4. Lets assume that s 2 and p 1 fail, torepair them, first download the k packets from the survivalnodes, then the original file can be recovered due to theproperty of MDS codes. Therefore, we can obtain the dataof node s 2 and p 1 , and store them in new node, say new p 1 .Next, the data of the failure node s 2 (i.e. f 3 in this case) isforwarded to the new node s 2 . The total repair bandwidth willbe γ 2 = M + M k . It is trivial to repair the super node s 1 atthe repair bandwidth of M by downloading data from survivalnodes. It should be noted that the failure of one super nodeplus one additional node cannot be repaired since it can beregarded as three-node failure, therefore beyond the correctingability of (k +2,k) MDS codes.B. Scheme B: Store two systematic data at the super nodewith non-MDS codes.Scheme B uses the same model as Scheme A. However,we can achieve the repair bandwidth of 1-node failure belowthe optimal bound in this non-homogeneous model ifthe term ( f 1 A 1 V 1 +f 2 A 2 V 1) and ( f 1 B 1 V 2 +f 2 B 2 V 2) in(5) are the same or the following constraints are satisfiedA 1 V 1 = λB 1 V 2 ,A 2 V 1 = λB 2 V 2 . It means that we onlyneed to download N 2 packets instead of ( N2 + ) N2 packetsfrom the super node to eliminate these terms. The followingexample is used to present the idea of repairing 1-node failurebelow the optimal bandwidth bound for the case k = 3,n = 5.Suppose f 1 = [a 1 ,a 2 ] T ,f 2 = [b 1 ,b 2 ] T ,f 3 = [c 1 ,c 2 ] T andp 1 = f 1 A 1 +f 2 A 2 +f 3 A 3 , p 2 = f 1 B 1 +f 2 B 2 +f 3 B 3 are thesystematic and parity data of a (5,3) storage code over finitefield F 3 where[ ] [ ] [ ]2 0 1 2 2 0A 1 = , A2 1 2 = , A0 2 3 = ,[ ] [ ] [ 1 2 ]2 0 1 1 1 1B 1 = , B1 2 2 = , B2 1 3 = .0 1(12)It can be seen that any 1-node failure (systematic or paritynode) except the super node can be repaired with bandwidth

Figure 5. Total repair bandwidth of 1-node failure γ 1 = M is smaller than 2the bound. In this example, γ 1 = 3 < 4 of repair bandwidth in the traditionalcaseFigure 6. An example of repairing 1-node failure using Scheme C based on(5,3) MDS codes in non-homogeneous DSSof M 2 , which is below the optimal bound ( )k+1 M2 k . Fig 5[ shows ] the process [ ] of using two projection vectors V 1 =1 1,V0 2 = for repairing one systematic node failure2below the optimal bandwidth bound. It is straightforward forthe general case (k +2,k). In the case of systematic 1-nodefailure, the general design constraints for Scheme B is:A 1 V 1 = λB 1 V 2A 2 V 1 = λB 2 V 2rank [ A 3 V 1 , B 3 V 2] = Nrank [ A 4 V 1 , B 4 V 2] = N 2rank [ A k V 1 , B k V 2] = N 2.(13)Similar to Scheme A, we can find the solution for scheme B.However, in Scheme B the MDS property of the storage codebreaks since we cannot reconstruct the original informationfrom the survival nodes in the case of 2-node failure.C. Scheme C: Store two parity data at the super node withMDS codes.We first consider an example with n = 5,k = 3 forsimplicity. Without loss of generality, assume that node 1 withdata f 1 is failed and the two parity packets p 1 ,p 2 are stored atthe super node. To recoverf 1 , we have the following equationsafter eliminating f 2 and f 3 from the parity node:{ {f1 A 1 V 1 +f 2 A 2 V 1 +f 3 A 3 V 1 f1 Cf 1 B 1 V 2 +f 2 B 2 V 2 +f 3 B 3 V 2 → 1 V 1 +f 2 C 2 V 1f 1 D 1 V 2 +f 3 D 2 V 2(14)where C i ,D i ∈ F N×Nq for i = 1,2 and C 1 = A 1 A −13 −B 1 B −13 ,C 2 = A 2 A3 −1 − B 2 B −13 ,D 1 = A 1 A −12 −B 1 B −12 ,D 2 = A 3 A −12 −B 3 B2 −1 . It can be seen from Fig. 6that the term f 2 C 2 V 1 and f 3 D 2 V 2 are removable by downloading( N2 + ) N2 packets from the parity node. Therefore,the desired data f 1 can be recovered if the following rankconstraint is satisfied:rank [ C 1 V 1 , D 1 V 2] = N (15)For general (k + 2,k) case, similar to Scheme A, we setA i = I N for all i ≤ N. To recover the desired data f 1 , wehave the following equations reduction from parity node:{f 1 (B 1 −B 3 )+f 2 (B 2 −B 3 )+ ∑ kj=4 f j (B j −B 3 )f 1 (B 1 −B 2 )+f 3 (B 3 −B 2 )+ ∑ kj=4 f j (B j −B 2 )(16)The following conditions must be satisfied to achieve theoptimal repair bandwidth of k+12Mk :rank [ (B 1 −B 2 )V 1 , (B 1 −B 3 )V 2] = Nrank [ (B 4 −B 2 )V 1 , (B 4 −B 3 )V 2] = N 2.rank [ (B k −B 2 )V 1 , (B k −B 3 )V 2] = N 2(17)In general, solving (17) is still an open problem. Here, wegive a numerical solution for the case n = 6,k = 4. Considerf 1 = [a 1 ,a 2 ] T ,f 2 = [b 1 ,b 2 ] T ,f 3 = [c 1 ,c 2 ] T ,f 4 = [c 1 ,c 2 ] Tand p 1 = f 1 +f 2 +f 3 +f 4 , p 2 = f 1 B 1 +f 2 B 2 +f 3 B 3 +f 4 B 4are the systematic and parity data of a (6,4) storage code overfinite field F 3 where[ ] [ ]0 1 0 2B 1 =[ 1 02 0B 3 =0 1, B 2 =], B 4 =[ 2 01 11 2,].(18)It can be seen that any 1-node failure except the supernode can be repaired with optimal bandwidth ( )k+1 M2 k . Fig.7 shows [ the]process of using two projection vectors V 1 =0V 2 = for repairing the first systematic node.1For the case of super node p 1 and 2-node fail, the repairprocess is similar to scheme A. The total repair bandwidthwill be M and M + M k , respectively.

achieve the optimal repair bandwidth in the case of 1-nodefailure. [5] and [14] methods are also impractical since theycan repair only the systematic nodes.Figure 7. An example of repairing 1-node failure using Scheme C based on(6,4) MDS codes in non-homogeneous DSSIV. PERFORMANCE ANALYSISAs compared with the previous work in [1], our schemes Aand C can achieve optimal repair bandwidth of 1-node failureat a smaller finite field q and 75% smaller data size M than[1]. Moreover, Scheme B that uses non-MDS code can repair1-node failure with M 2ksmaller bandwidth than the optimalbound. A summary is presented in Table II for the varioustechnologies.A. Numerical Case StudyTo show an illustration, we continue using the example n =5, k = 3. Assume we have the data file of size 48 blocks tostore across the DSS.Scheme A and C: Divide the file into 4 fragments of sizeM 1 = 12 blocks. These fragments are stored across k+1 = 4nodes in the non-homogeneous DSS. In the case of 1-nodefailure, the repair bandwidth will be 4× M1 k+1k 2= 32 blocks.In the case of 2-node failure, the repair bandwidth will be4× ( )M 1 + M1k = 64 blocks. To update one fragment M1 ofthe file, the update bandwidth will be M1kn = 20 blocks.Scheme B: Similar to Scheme A, the file is divided into 4fragments of size M 1 = 12. In the case of 1-node failure, therepair bandwidth will be 4× M12= 24. To update one fragmentM 1 of the file, the update bandwidth will be M1k n = 20.Alex method [1]: Divide the file into 1 fragment of sizeM 2 = 48. The repair bandwidth of 1-node failure willbe 1 × ( M 2)k+1k 2 = 32 and for 2-node failure requires1× ( )M 2 + M2k = 64. The update bandwidth of one fragmentM 2 will be M2kn = 80. The repair and update bandwidth ofother methods are computed in the same manner and shownas in Table. III. Note that C.R.C method cannot repair 1-nodefailure with optimal bandwidth.From Table. III, it can be seen that Schemes A and Ccan achieve the optimal bandwidth for repairing 1- or 2-node failure, which is the same as homogeneous DSS. Byscarifying the MDS property, Scheme B requires a lower repairbandwidth for 1-node failure. It can be seen that all proposedschemes have an advantage of small update bandwidth incompare with the other schemes except the CRC method.However, the CRC method is not practical since it cannotB. Data AvailabilityIn this subsection, we employ the framework proposed in[16] to measure and compare the data availability betweenour proposed non-homogenous DSS schemes and traditionalhomogenous DSS schemes to show the efficiency of ourproposed schemes. Let [p 1 ,··· ,p h ] be the nodes’ online probabilityof h nodes in the (n,k,h) DSS. Let the power set of h,2 h , denote the set of all possible combinations of online nodes.Let A ⊂ 2 h represents one of these possible combinations.Then, we will use Q A to represent the event that combinationA occurs. Since node availabilities are independent, we havePr[Q A ] = ∏ i∈Ap i∏j∈2 h \A(1−p j ) (19)Let x i be the number of data blocks stored instorage node i, for example x i = 1, it meansα i = M k. The data allocation of our schemes will be(x 1 = 2,x 2 = 1,··· ,x k+1 = 1,x k+2 = 0). Let L k ⊂ 2 h bethe subset containing those combinations of available nodeswhich together store k different redundant blocks.{ }L k =A : A ∈ 2 h , ∑ i∈Ax i ≥ k(20)Since the retrieval process needs to download k differentblocks out of the total n redundant blocks, the probabilityof successful recovery for an allocation (x 1 ,··· ,x n ) can bemeasured asPr[successful recovery] = ∑ A∈L kPr[Q A ]= ∑ [ ∏A∈L k i∈A p ∏ ]i j∈2 h \A (1−p j)(21)To compare the data availability, we examine a scenario ofnode online probability where the online probability of supernode is greater than the other node p 1 ≥ p 2 = p 3 = ··· =p n = p. The data availability of homogeneous Pr homo (e.g.scheme in [1]) and non-homogeneousPr non−homo DSS (e.g.Schemes A and C, since Scheme B is based on non-MDS code,it is excluded in this study as its availability is calculated in adifferent manner) can be computed by the following equations:Pr homo = p k+1 +(k+1)(1−p)p k +k(k +1)p 1 (1−p) 2 p k−12(22)Pr non−homo = p k +kp 1 (1−p)p k−1 k(k +1)+ p 1 (1−p) 2 p k−12(23)Let p 1 = χp where χ ≥ 1. The condition Pr non−homo ≥Pr homo will induceχ ≥ p/(p+ 1 2(1−p)[(k −1)−(k +1)p]).It can be seen that if p ≤ k−1k+1 , then p/(p + 1 2 (1 −p)[(k −1)−(k +1)p]) ≤ 1 ≤ χ. Therefore, Pr non−homo ≥Pr homo for all p ≤ k−1k+1. We run the simulations for the case

Table IICOMPARISON OF NON-HOMOGENEOUS MODEL VS. TRADITIONAL MODEL BASED ON (k +2,k) CODES WHEREM AND γ ARE THE FILE SIZE AND REPAIRBANDWIDTH, RESPECTIVELY. SMALL VALUE OF M,γ AND q MEAN EFFICIENT.Scheme A&C Scheme B Alex [1] Perm. code [5] Tamo [14] C.R.C [10]M = 2 k−1 kq ≥ 2k −1M = 2 k−1 kq ≥ 2k −1M = 2 k+1 kq ≥ 2k +3M = 2 k kq ≥ 2k +1M = 2 k kq ≥ 2k +1M = 2kq ≥ n1- node failure γ = M k+1γ = M γ = M k+1γ = M k+1γ = M k+1N.Ak 2 2 k 2 k 2 k 22- node failures γ = M + M N.A γ = M + M γ = M + M γ = M + M γ = M + M kk k k kTable IIINUMERICAL RESULTS OF STORING A FILE OF SIZE 48 BLOCKS USING THE(5,3) MDS CODES IN NON-HOMOGENEOUS AND HOMOGENEOUS DSS.Scheme A&C Scheme B Alex[1] Perm. code[5] Tamo [14] C.R.C [10]M 1 = 12q ≥ 5M 1 = 12q ≥ 5M 2 = 48q ≥ 9M 3 = 24q ≥ 7M 4 = 24q ≥ 7M 5 = 6q ≥ 51-node failure γ = 32 γ = 24 γ = 32 γ = 32 γ = 32 N.A2-node failures γ = 64 N.A γ = 64 γ = 64 γ = 64 γ = 64update≤ 12 data blockδ = 20 δ = 20 δ = 80 δ = 40 δ = 40 δ = 10Data Availability0.90.850.80.750.70.650.60.55p = 0.6HomogeneousNon−homogeneous (proposed)p = 0.650.50.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95Figure 8. A comparision of data availability between non-homogeneous DSSand homogeneous DSS.of k = 4, p = 0.6 and p = 0.65 and obtain the resultin Fig. 8. It can be seen that for p = k−1k+1= 0.6, dataavailability of non-homogeneous DSS scheme outperforms thehomogeneous DSS scheme. For p = 0.65 > k−1k+1, the nonhomogeneousschemes also have a big improvement whenp 1 has a high online availability. Therefore, it can be seenthat our proposed non-homogeneous DSS schemes achieve ahigher data availability than the traditional homogeneous DSS.The gap between the two becomes larger when the onlineavailability of the super node increases, e.g. when p 1 is greaterthan 25% of p, the data availability of the proposed nonhomogenousover homogenous DSS is increased by 10%.p 1V. CONCLUSIONSWe proposed three distributed storage schemes for nonhomogeneousDSS with high rate (k + 2,k) codes. Two ofthe schemes make use of MDS code, and can achieve optimalMkrepair bandwidth of k+12at smaller finite field q and 75%smaller fragment M than [2]. Small M and q are desirable,because they reduce the update bandwidth and complexity.Another scheme based on non-MDS code can achieve asmaller repair bandwidth than the optimal bandwidth based onMDS code by M 2kfor 1-node failure. We further demonstratethat in such non-homogeneous DSS, if we can ensure onesuper node with a higher online probability than the othernodes, we can achieve a higher data availability than thehomogeneous DSS.ACKNOWLEDGEMENTThis research is partly supported by the International DesignCenter (grant no. IDG31100102 & IDD11100101). Li’s workis supported in part by the National Science Foundation underGrants No. CCF-0829888, CMMI-0928092, and EAGER-1133027.REFERENCES[1] D.S. Papailiopoulos, A.G. Dimakis, V.R. Cadambe, “Repair optimalerasure codes through Hadamard designs,” the 49th Annual AllertonConference on Communication, Control and Computation, pp.1382-1389, Sep. 2011.[2] A.G. Dimakis, P. Godfrey, M. Wainwright, and K. Ramchandran,“Network coding for distributed storage systems,” IEEE Trans. Inform.Theory, pp.4539-4551, Sep. 2010.[3] Y.Wu, A.G. Dimakis, “Reducing repair traffic for erasure coding basedstorage via interference alignment,” IEEE International Symposium onInformation Theory, pp.2276-2280, July 2009.[4] R. Bhagwan, K. Tati, Y.-C. Cheng, S. Savage, and G. M. Voelker, “Totalrecall: System support for automated availability management,” the 1stSymp. Networked Systems Design and Implementation (NSDI), Mar.2004.[5] V.R. Cadambe, C. Huang, J. Li, “Permutation code: optimal exact-repairof a single failed node in MDS code based distributed storage systems,”IEEE International Symposium on Information Theory, pp.1225-1229,Aug. 2011.[6] V.R. Cadambe, S.A. Jafar, H. Maleki, “Asymptotic interference alignmentfor exact repair in distributed storage systems,” the 44th Signals,Systems and Computers (ASILOMAR), pp.1617-1621, Nov. 2010.[7] D. Cullina, A. G. Dimakis, and T. Ho, “Searching for minimum storageregenerating codes,” Allerton Conf. Control Comput. Commun., Sep.2009.[8] F. Dabek, J. Li, E. Sit, J. Robertson, M. Kaashoek, and R. Morris,“Designing a DHT for low latency and high throughput,” the 1st Symp.Networked Systems Design and Implementation (NSDI), Mar. 2004.[9] D. Grolimund, “Wuala - A Distributed File System”, Google Tech Talkhttp://www.youtube.com/watch?v=3xKZ4KGkQY8

[10] Cooperative regenerating codes, http://home.ie.cuhk.edu.hk/~wkshum/papers/CRC4.pdf[11] N.B. Shah, K.V. Rashmi, P.V. Kumar, “A flexible class of generatingcodes for distributed storage,” IEEE International Symposium on InformationTheory, pp.1943-1947, Jun. 2010.[12] S. Rhea, C. Wells, P. Eaton, D. Geels, B. Zhao, H. Weatherspoon, andJ. Kubiatowicz, “Maintenance-free global data storage,” IEEE InternetComputer, pp.40-49, Sep. 2011.[13] C. Suh, K. Ramchandran, “Exact-repair MDS code construction usinginterference alignment,” IEEE Trans. Inform. Theory, pp.1425-1442,Mar. 2011.[14] I. Tamo, Z. Wang, J. Bruck, “MDS array codes with optimal rebuilding,”IEEE International Symposium on Information Theory, pp.1240-1244,Aug. 2011.[15] C.Armstrong, A. Vardy, “Distributed storage with communication cost,”the 49th Annual Allerton Conference on Communication, Control andComputation, pp.1358-1365, Sep. 2011.[16] L. Pamies-Juarez, P. Garcia-Lopez, M. Sanchez-Artigas, B. Herrera, “Towardsthe design of optimal data redundancy schemes for heterogeneouscloud storage infrastructures”, Computer Network, pp. 1100-1113, Nov.2011.

Non-homogeneous distributed storage systems - Singapore ...

Create successful ePaper yourself

Delete template?

Save as template?