25.07.2013 Views

Replication in Node Partitioned Data Warehouses - Universidade de ...

Replication in Node Partitioned Data Warehouses - Universidade de ...

Replication in Node Partitioned Data Warehouses - Universidade de ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

5HSOLFDWLRQ LQ 1RGH 3DUWLWLRQHG 'DWD :DUHKRXVHV<br />

$EVWUDFW<br />

,Q WKLV SDSHU ZH FRQFHQWUDWH RQ JXDUDQWHHLQJ<br />

HIILFLHQW DYDLODELOLW\ DQG SURPRWLQJ<br />

PDQDJHDELOLW\ LQ D QRGH SDUWLWLRQHG GDWD<br />

ZDUHKRXVH 13': 7KH REMHFWLYH LV WKDW WKH<br />

V\VWHP EH DOZD\V RQ DQG DOZD\V HIILFLHQW HYHQ<br />

ZKHQ HQWLUH SDUWV RI LW DUH WDNHQ RIIOLQH IRU<br />

PDLQWHQDQFH DQG PDQDJHPHQW IXQFWLRQV VXFK DV<br />

ORDGLQJ ZLWK QHZ GDWD RU RWKHU '%$<br />

IXQFWLRQDOLW\ 5HSOLFDWLRQ KDV DOUHDG\ EHHQ<br />

VWXGLHG IRU SDUDOOHO GDWDEDVHV LQ JHQHUDO :H<br />

LQYHVWLJDWH KRZ DOWHUQDWLYH UHSOLFDWLRQ VWUDWHJLHV<br />

FDQ EH DSSOLHG WR WKH 13': FRQWH[W DQG DQDO\]H<br />

DGYDQWDJHV DQG GUDZEDFNV DJDLQVW PHWULFV<br />

,QWURGXFWLRQ<br />

3DUDOOHO DUFKLWHFWXUHV FDQ VSHHGXS VLJQLILFDQWO\ WKH<br />

SURFHVVLQJ RYHU ODUJH GDWD ZDUHKRXVHV :H KDYH EHHQ<br />

SXUVXLQJ WKH LGHD RI UHSODFLQJ IXOO\ GHGLFDWHG DQG<br />

SRZHUIXO VHUYHUV E\ D SRVVLEO\ QRQ GHGLFDWHG QHWZRUN RI<br />

ORZ FRVW XQGHU XWLOL]HG FRPSXWHUV WR KROG DQG SURFHVV<br />

GDWD ZDUHKRXVHV 7KH GDWD ZDUHKRXVH FDQ UHDFK JLJD RU<br />

HYHQ WHUDE\WHV DQG LV W\SLFDOO\ RUJDQL]HG DV D VHW RI<br />

PXOWLGLPHQVLRQDO VFKHPDV > @ 7KHUH DUH W\SLFDOO\ VRPH<br />

YHU\ ELJ UHODWLRQV ± IDFWV VWRULQJ KLVWRULFDO GHWDLO VXFK DV<br />

HDFK LQGLYLGXDO VDOH RI HDFK SURGXFW LQ HDFK VWRUH RI D<br />

UHWDLO FKDLQ DQG VPDOOHU UHODWLRQV ± GLPHQVLRQV ± ZLWK<br />

GHVFULSWLYH SURSHUWLHV IRU WKH GLPHQVLRQV H J SURGXFW<br />

VWRUH WLPH ,Q WKDW FRQWH[W SDUWLWLRQLQJ UHIHUV WR GLYLGLQJ<br />

UHODWLRQV LQWR QRGHV VRPHKRZ WR WDNH DGYDQWDJH RI<br />

SDUDOOHO QRGH SURFHVVLQJ :H KDYH GLVFXVVHG KRUL]RQWDO<br />

SDUWLWLRQLQJ VWUDWHJLHV IRU 13': LQ > @ DQG VKRZHG WKDW D<br />

FDUHIXO SDUWLWLRQLQJ VWUDWHJ\ RYHU D VZLWFKHG QHWZRUN<br />

HQYLURQPHQW FDQ DFKLHYH DFFHSWDEOH VSHHGXSV +RZHYHU<br />

3HGUR )XUWDGR<br />

8QLYHUVLW\ RI &RLPEUD<br />

'HSDUWDPHQWR GH (QJHQKDULD ,QIRUPiWLFD<br />

3yOR ,, 3LQKDO GH 0DUURFRV<br />

&RLPEUD<br />

3RUWXJDO<br />

SQI#GHL XF SW<br />

DYDLODELOLW\ LV DQ LVVXH LQ VXFK D FRQWH[W VR WKDW DYDLODELOLW\<br />

RULHQWHG UHSOLFDWLRQ EHFRPHV D PDMRU QHFHVVLW\ DV D ZD\ WR<br />

SURYLGH DYDLODELOLW\ $ UHSOLFD LV D ³VWDQGE\´ FRS\ RI<br />

VRPH GDWD WKDW FDQ EH DFWLYDWHG DW DQ\ PRPHQW LQ FDVH RI<br />

XQDYDLODELOLW\ RU IDLOXUH RI WKH QRGH KROGLQJ WKH<br />

³RULJLQDO´ VR WKDW SURFHVVLQJ UHVXPHV DV XVXDO ,I<br />

SURFHVVLQJ ZLWK XQDYDLODEOH QRGHV LV LPSOHPHQWHG<br />

HIILFLHQWO\ XQDYDLODELOLW\ EHFRPHV OHVV RQHURXV WR WKH<br />

ZKROH V\VWHP DQG LW DOVR EHFRPHV IHDVLEOH WR VWRS D VHW RI<br />

QRGHV IRU GDWD ORDGLQJ PDLQWHQDQFH XSJUDGLQJ RU RWKHU<br />

PDQDJHPHQW DFWLYLWLHV ZLWKRXW DQ\ PDMRU UHSHUFXVVLRQV<br />

WR SURFHVVLQJ 7KH V\VWHP UHPDLQV DOZD\V RQ DQG DOZD\V<br />

HIILFLHQW<br />

5HSOLFD SODFHPHQW KDV EHHQ VWXGLHG LQ WKH FRQWH[W RI<br />

JHQHULF SDUDOOHO DQG GLVWULEXWHG GDWDEDVHV LQ ZKLFK WKH<br />

UHODWLRQV DUH QRW SDUWLWLRQHG > @ :H UHYLHZ<br />

WKRVH ZRUNV LQ WKH UHODWHG ZRUN VHFWLRQ ,Q WKLV SDSHU ZH<br />

GLVFXVV UHSOLFDWLRQ IRU DYDLODELOLW\ LQ WKH 13': FRQWH[W<br />

DQG GLVFXVV WKHLU XVH IRU ERWK WROHUDWLQJ QRGH IDLOXUHV DQG<br />

DOORZLQJ PXOWLSOH QRGHV WR EH RIIOLQH VLPXOWDQHRXVO\ IRU<br />

ORDGLQJ RU DGPLQLVWUDWLRQ :H FRPSDUH WKH DSSURDFKHV<br />

IURP WKH SHUVSHFWLYH RI HIILFLHQF\ 2XU PDLQ FRQWULEXWLRQV<br />

LQFOXGH VKRZLQJ KRZ UHSOLFDWLRQ VWUDWHJLHV FDQ EH DSSOLHG<br />

WR D ZRUNORDG EDVHG SUH SDUWLWLRQHG 13': VHWWLQJ DQG<br />

KRZ SURFHVVLQJ FDQ LQFRUSRUDWH WKH UHSOLFDV LQ FDVH RI<br />

QRGH IDLOXUHV DQDO\]LQJ DOWHUQDWLYHV DJDLQVW UHOHYDQW<br />

PHWULFV HYDOXDWLQJ WKH DOWHUQDWLYHV ZLWK HPSKDVLV RQ<br />

HIILFLHQF\ DQG IOH[LELOLW\ IRU DOORZLQJ PXOWLSOH RIIOLQH<br />

QRGHV DQDO\]LQJ WKH WUDGHRII EHWZHHQ HIILFLHQF\ DQG WKH<br />

FDSDFLW\ WR WDNH PXOWLSOH QRGHV RIIOLQH VLPXOWDQHRXVO\<br />

7KH SDSHU LV RUJDQL]HG DV IROORZV VHFWLRQ GLVFXVVHV<br />

UHODWHG ZRUN 6HFWLRQ RYHUYLHZV WKH 1RGH 3DUWLWLRQHG<br />

'DWD :DUHKRXVH 6HFWLRQV DQG GLVFXVV UHSOLFDWLRQ<br />

DOWHUQDWLYHV DQG VHFWLRQ FRPSDUHV WKH DSSURDFKHV<br />

6HFWLRQ FRQWDLQV FRQFOXGLQJ UHPDUNV DQG IXWXUH ZRUN


5HODWHG :RUN<br />

)LJXUH %DVLF 3DUWLWLRQLQJ ([DPSOH LQ 13': 73& + VFKHPD<br />

7KH PRVW UHOHYDQW UHODWHG ZRUN IRU WKLV SDSHU<br />

FRQFHUQV RQ UHSOLFDWLRQ VWUDWHJLHV EXW ZH DOVR UHYLHZ<br />

EULHIO\ SDUWLWLRQLQJ 6RPH RI WKH PRVW SURPLVLQJ<br />

SDUWLWLRQLQJ DQG SODFHPHQW DSSURDFKHV IRFXV RQ TXHU\<br />

ZRUNORDG EDVHG SDUWLWLRQLQJ FKRLFH > @ 7KH LGHD<br />

LQ WKRVH ZRUNV LV WR XVH WKH TXHU\ ZRUNORDG WR<br />

GHWHUPLQH WKH PRVW DSSURSULDWH SDUWLWLRQLQJ DWWULEXWHV<br />

ZKLFK VKRXOG EH UHODWHG WR W\SLFDO TXHU\ DFFHVV<br />

SDWWHUQV $OO WKRVH ZRUNV IRFXV PDLQO\ RQ KDVK<br />

SDUWLWLRQLQJ IRU HIILFLHQW SDUDOOHO MRLQ SURFHVVLQJ ><br />

@ DOVR UHYLHZHG LQ > @ 2XU SUHYLRXV ZRUN RQ WKH<br />

13': > @ SURSRVHV DQG DQDO\]HV JHQHULF GDWD<br />

SDUWLWLRQLQJ VWUDWHJLHV LQGHSHQGHQWO\ RI WKH XQGHUO\LQJ<br />

GDWDEDVH VHUYHU DQG WDUJHWHG DW QRGH SDUWLWLRQHG GDWD<br />

ZDUHKRXVHV 2XU SXUSRVH LQ WKLV SDSHU LV WR VWXG\<br />

DYDLODELOLW\ DQG UHSOLFDWLRQ FRQFHUQV WR WKH 13':<br />

GHVLJQ<br />

5HSOLFDWLRQ KDV EHHQ VWXGLHG LQ WKH SDUDOOHO GDWDEDVH<br />

FRQWH[W ,Q 7DQGHPV 1RQ6WRS 64/ > @ WKH XVH RI<br />

PLUURUHG GLVN GULYHV RIIHUV D KLJK OHYHO RI DYDLODELOLW\<br />

EXW GRHV D SRRU MRE RI GLVWULEXWLQJ WKH ORDG RI D IDLOHG<br />

SURFHVVRU ,I D SURFHVVRU IDLOV WKH VXEVWLWXWH SURFHVVRU<br />

ZLOO KDYH WR KDQGOH WKH GLVNV RI WKH IDLOHG SURFHVVRU DV<br />

ZHOO DV LWV RZQ HVVHQWLDOO\ GRXEOLQJ WKH SURFHVVLQJ<br />

WLPH ,Q WKLV SDSHU ZH DSSO\ WKLV VWUDWHJ\ DV WKH )XOO<br />

UHSOLFDWLRQ )5 RSWLRQ 7HUDGDWDV VFKHPH > @<br />

DVVXPHV UHODWLRQ FOXVWHUV JURXS RI QRGHV DQG FDQ<br />

EDFNXS D SDUWLWLRQHG FRS\ RI D UHODWLRQ E\ SODFLQJ LW LQ<br />

WKH 1 RWKHU QRGHV RI WKH UHODWLRQ FOXVWHU ZLWK 1<br />

QRGHV $OWKRXJK WKLV VFKHPH EDODQFHV WKH SURFHVVLQJ LQ<br />

FDVH RI IDLOXUHV LI PRUH WKDQ RQH QRGH LV XQDYDLODEOH LQ<br />

WKH FOXVWHU WKH V\VWHP VWRSV ,Q FKDLQHG GHFOXVWHULQJ ><br />

@ WZR GHFOXVWHUHG FRSLHV DUH NHSW VXFK WKDW WKH<br />

IUDJPHQWV RI WKH VHFRQG GHFOXVWHUHG FRS\ DUH SODFHG LQ<br />

GLIIHUHQW QRGHV IURP WKH RQHV RI WKH SULPDU\ FRS\ 7KLV<br />

VWUDWHJ\ LPSURYHV DYDLODELOLW\ ZKLOH PDLQWDLQLQJ WKH<br />

SHUIRUPDQFH OHYHO RI WKH 7HUDGDWD VFKHPH ,Q > @<br />

LQWHUOHDYHG GHFOXVWHULQJ GLYLGHV WKH GLVNV LQWR FOXVWHUV<br />

DQG IXOO\ GHFOXVWHUV UHODWLRQ SDUWLWLRQV LQWR WKH<br />

FRUUHVSRQGLQJ FOXVWHU ,Q > @ WKH DXWKRUV FRPSDUH KLJK<br />

DYDLODELOLW\ PHGLD UHFRYHU\ WHFKQLTXHV LQ D JHQHULF<br />

2/73 HQYLURQPHQW LQFOXGLQJ 7HUDGDWDV LQWHUOHDYHG<br />

GHFOXVWHULQJ<br />

5HFHQW ZRUN RQ UHSOLFDWLRQ LQFOXGHV > @ 7KH<br />

DXWKRUV XVH GDWD UHSOLFDWLRQ WR LPSURYH GDWD DYDLODELOLW\<br />

DQG TXHU\ ORDG EDODQFLQJ ZKLOH GHDOLQJ ZLWK<br />

FRQVLVWHQF\ SUREOHPV 7KH\ SURSRVH D OD]\ SUHYHQWLYH<br />

GDWD UHSOLFDWLRQ VROXWLRQ LQ > @ DQG D VWUDWHJ\ WR VFDOH<br />

XS WKH VROXWLRQ LQ > @ 7KH ZRUN LQ > @ VWXGLHV VLPLODU<br />

DSSURDFKHV ZKHQ DSSOLHG WR :$1 HQYLURQPHQWV 7KH\<br />

LGHQWLI\ WKH PRVW FUXFLDO ERWWOHQHFNV RI WKH H[LVWLQJ<br />

SURWRFROV DQG SURSRVH RSWLPL]DWLRQV WKDW DOOHYLDWH WKH<br />

LGHQWLILHG SUREOHPV<br />

:KLOH WKHVH ZRUNV IRFXV RQ JHQHULF UHSOLFDWLRQ<br />

VWUDWHJLHV IRU DYDLODELOLW\ FRQVLGHULQJ QRQ SDUWLWLRQHG<br />

UHODWLRQV DQG RU 2/73 ORDGV ZH GLVFXVV DQDO\]H DQG<br />

HYDOXDWH UHSOLFDWLRQ VWUDWHJLHV RQ WKH VSHFLILF FRQWH[W RI<br />

WKH 1RGH 3DUWLWLRQHG 'DWD :DUHKRXVH DQG DOVR FRQVLGHU<br />

LI WKH VWUDWHJLHV DOORZ PXOWLSOH QRGHV WR EH RIIOLQH<br />

VLPXOWDQHRXVO\ IRU PDLQWHQDQFH RU PDQDJHPHQW<br />

7KH 13':<br />

7KH 13': LV D GHVLJQ IRU HIILFLHQW SURFHVVLQJ RI<br />

WKH GDWD ZDUHKRXVH RYHU ORZ FRVW FRPSXWHU QRGHV RQ D<br />

SRVVLEO\ QRQ GHGLFDWHG VZLWFKHG QHWZRUN 7KH<br />

REMHFWLYH LV QRW WR DVVXPH DQ\ VSHFLDOL]HG KDUGZDUH RU<br />

LQWHUFRQQHFWV VR WKDW WKH 13': LV DEOH WR UXQ IRU<br />

LQVWDQFH LQ D 0ESV VZLWFKHG /$1 3DUDOOHOLVP LV


REWDLQHG E\ GLYLGLQJ WKH GDWD VHW LQLWLDOO\ LQWR WKH GLVNV<br />

RI LQGLYLGXDO QRGHV VR WKDW HDFK QRGH LV DEOH WR DFFHVV<br />

LWV GDWD ORFDOO\ DQG GDWD LV H[FKDQJHG EHWZHHQ QRGHV<br />

ZKHQ QHFHVVDU\ ,Q RUGHU WR GHOLYHU D QHDU WR OLQHDU<br />

VSHHGXS RYHU WKH QRGH SDUWLWLRQHG 13': FRQWH[W LW LV<br />

QHFHVVDU\ WR ILQG VXLWDEOH SDUWLWLRQLQJ DQG SODFHPHQW<br />

VWUDWHJLHV IRU WKH GDWD ZKLFK PD\ UHGXFH WKH QHHG WR<br />

H[FKDQJH GDWD EHWZHHQ QRGHV 7KLV LVVXH LV GLVFXVVHG LQ<br />

GHWDLO LQ > @ DQG ZH RQO\ UHYLHZ LW LQ WKLV SDSHU 2XU<br />

REMHFWLYH ZDV IRU WKH 13': WR EH DEOH WR SURFHVV<br />

HIILFLHQWO\ QRW RQO\ VLPSOH VWDU VFKHPDV > @ EXW DOVR<br />

PRUH FRPSOH[ GDWD ZDUHKRXVH VFKHPDV VXFK DV 73& +<br />

> @ ,Q D SDUWLWLRQLQJ DQG SODFHPHQW VFKHPH HDFK<br />

UHODWLRQ FDQ HVVHQWLDOO\ EH SDUWLWLRQHG GLYLGHG LQWR<br />

SDUWLWLRQV RU IUDJPHQWV RU FRSLHG LQ WKHLU HQWLUHW\ LQWR<br />

DOO QRGHV RI D JURXS ,Q RUGHU WR VLPSOLI\ RXU GLVFXVVLRQ<br />

ZH DVVXPH WKDW WKH\ DUH HLWKHU FRSLHG RU SDUWLWLRQHG LQWR<br />

DOO QRGHV WKDW LV WKH JURXS LV ³DOO QRGHV´ :H DOVR<br />

VLPSOLI\ WKH GLVFXVVLRQ E\ FRQVLGHULQJ KRPRJHQHRXV<br />

QRGHV VR WKDW HDFK QRGH KDV WKH VDPH ORDG 7KLV<br />

FRQVWUDLQW FDQ EH HOLPLQDWHG E\ WDNLQJ LQWR DFFRXQW<br />

QRGH SHUIRUPDQFHV LQ WKH LQLWLDO SODFHPHQW DQG<br />

VXEVHTXHQW UHRUJDQL]DWLRQV IRU ORDG EDODQFLQJ<br />

5HODWLRQV WKDW DUH FRSLHG LQWR DOO QRGHV DUH DOVR GHQRWHG<br />

DV UHSOLFDWHG UHODWLRQV 7KH GHFLVLRQ WR UHSOLFDWH<br />

UHODWLRQV IRU SHUIRUPDQFH UHDVRQV LV DQ RXWSXW IURP<br />

SDUWLWLRQLQJ QRW DYDLODELOLW\ UHODWHG UHSOLFDWLRQ EXW WKH<br />

UHVXOWLQJ UHSOLFDV DUH RI FRXUVH DOVR XVHIXO IRU<br />

DYDLODELOLW\ ,Q RUGHU WR GLVWLQJXLVK D UHSOLFD GLFWDWHG E\<br />

D SDUWLWLRQLQJ DOJRULWKP IURP RQH GLFWDWHG IURP<br />

DYDLODELOLW\ ZH GHQRWH SDUWLWLRQLQJ UHSOLFDWLRQ DV 3<br />

UHSOLFDWLRQ<br />

3DUWLWLRQHG UHODWLRQV FDQ EH GLYLGHG XVLQJ D URXQG<br />

URELQ UDQGRP UDQJH RU KDVK EDVHG VFKHPH 7KH<br />

13': XVHV KRUL]RQWDO KDVK SDUWLWLRQLQJ DV WKLV<br />

DSSURDFK IDFLOLWDWHV NH\ EDVHG WXSOH ORFDWLRQ DQG MRLQ<br />

RSHUDWLRQV )LJXUH VKRZV WKH SDUWLWLRQLQJ DQG<br />

SODFHPHQW RI UHODWLRQV IRU WKH 73& + EHQFKPDUN > @<br />

DIWHU WKH ZRUNORDG EDVHG DOJRULWKP LQ > @ ZDV DSSOLHG<br />

/, OLQHLWHP 2 RUGHUV 36 SDUWVXSS 3 SDUW 6 VXSSOLHU<br />

& FXVWRPHU ,Q WKDW )LJXUH GDVKHG UHFWDQJOHV<br />

UHSUHVHQW IXOO\ SDUWLWLRQHG UHODWLRQV GDVKHG DUURZV<br />

UHSUHVHQW ³UHSDUWLWLRQ MRLQV´ 5- MRLQV WKDW UHTXLUH GDWD<br />

WR EH VKLSSHG EHWZHHQ QRGHV EROG DUURZV UHSUHVHQW<br />

³HTXL SDUWLWLRQHG MRLQV´ (- MRLQV WKDW GR QRW UHTXLUH<br />

GDWD WR EH VKLSSHG EHWZHHQ QRGHV EHFDXVH WKH<br />

LQWHUYHQLQJ GDWD VHWV DUH SDUWLWLRQHG E\ WKH MRLQ NH\<br />

DQG QRUPDO DUURZV UHSUHVHQW ³3 UHSOLFDWHG MRLQV´ 55-<br />

± MRLQV WKDW GR QRW UHTXLUH GDWD WR EH VKLSSHG EHWZHHQ<br />

QRGHV EHFDXVH RQH RI WKH LQWHUYHQLQJ UHODWLRQV LV 3<br />

UHSOLFDWHG 5HSDUWLWLRQLQJ UHIHUV WR WKH QHHG WR<br />

H[FKDQJH GDWD EHWZHHQ QRGHV LQ RUGHU WR UHRUJDQL]H WZR<br />

GDWD VHWV VR WKDW WKH\ EHFRPH HTXL SDUWLWLRQHG<br />

SDUWLWLRQHG E\ WKH VDPH DWWULEXWH ,Q RUGHU WR FKRRVH<br />

WKH PRVW DSSURSULDWH SDUWLWLRQLQJ DOWHUQDWLYH ZH PXVW<br />

XVH D VWUDWHJ\ VXFK DV ZRUNORDG EDVHG SDUWLWLRQLQJ > @<br />

7KH LGHD LV WR FKRRVH SDUWLWLRQLQJ NH\V WKDW ³PD[LPL]H´<br />

WKH DPRXQW RI (- DV RSSRVHG WR 5- E\ ORRNLQJ DW WKH<br />

TXHU\ ZRUNORDG $GGLWLRQDOO\ IRU UHODWLRQV WKDW DUH<br />

VPDOO LQ FRPSDULVRQ WR WKH GDWD VHW WKDW ZRXOG QHHG WR<br />

EH UHSDUWLWLRQHG WR MRLQ ZLWK WKHP 55- PD\ EH<br />

SUHIHUDEOH > @ DV LW DYRLGV SRWHQWLDOO\ ODUJH<br />

UHSDUWLWLRQLQJ RYHUKHDGV 7KLV LV WKH UHDVRQ ZK\<br />

VPDOOHVW UHODWLRQV & DQG 6 DUH 3 UHSOLFDWHG DV LW<br />

DYRLGV WKH QHHG WR VKLS ODUJHU GDWD EHWZHHQ QRGHV WR<br />

MRLQ ZLWK WKRVH VPDOOHU GDWD VHWV<br />

4XHU\ SURFHVVLQJ RYHU D SDUDOOHO GDWDEDVH DQG LQ<br />

SDUWLFXODU RYHU WKH 13': IROORZV URXJKO\ WKH VWHSV LQ<br />

)LJXUH ZKLFK ZH GHVFULEH LQ PRUH GHWDLO LQ > @<br />

)LJXUH LOOXVWUDWHV D VLPSOH H[DPSOH &RQVLGHU D VXP<br />

TXHU\ (DFK QRGH QHHGV WR DSSO\ H[DFWO\ WKH VDPH LQLWLDO<br />

TXHU\ RU PRUH JHQHULFDOO\ D PRGLILHG TXHU\ RQ LWV<br />

SDUWLDO GDWD DQG WKH UHVXOWV DUH PHUJHG E\ DSSO\LQJ D<br />

PHUJH TXHU\ DJDLQ DW WKH PHUJLQJ QRGH ZLWK WKH SDUWLDO<br />

UHVXOWV FRPLQJ IURP WKH SURFHVVLQJ QRGHV<br />

0RUH JHQHULFDOO\ WKH W\SLFDO TXHU\ SURFHVVLQJ F\FOH<br />

LV VKRZQ LQ )LJXUH DQG D FRPSOHWH H[DPSOH LV JLYHQ<br />

LQ )LJXUH 6WHS SUHSDUHV WKH QRGH DQG PHUJH TXHU\<br />

FRPSRQHQWV IURP WKH RULJLQDO VXEPLWWHG TXHU\ 6WHS<br />

³6HQG 4XHU\´ IRUZDUGV WKH QRGH TXHU\ LQWR DOO QRGHV LQ<br />

WKH 13': ZKLFK SURFHVV WKH TXHU\ ORFDOO\ LQ VWHS<br />

(DFK QRGH WKHQ VHQGV LWV SDUWLDO UHVXOW LQWR WKH VXEPLWWHU<br />

QRGH ZKLFK DSSOLHV WKH PHUJH TXHU\ LQ 6WHS 6WHS<br />

UHGLVWULEXWHV UHVXOWV LQWR SURFHVVLQJ QRGHV LI UHTXLUHG<br />

IRU VRPH TXHULHV FRQWDLQLQJ VXETXHULHV LQ ZKLFK FDVH<br />

PRUH WKDQ RQH SURFHVVLQJ F\FOH PD\ EH UHTXLUHG<br />

6HQG WR<br />

QRGHV<br />

680 ;<br />

RYHU )$&7<br />

*5283 %< GLP$WWUV<br />

6XEPLWWHU 1RGH<br />

5HZULWH<br />

4XHU\<br />

&RPSXWLQJ<br />

1RGHV<br />

)LJXUH ± 7\SLFDO 4XHU\ RYHU 13':<br />

6HQG<br />

4XHU\<br />

&RPSXWH<br />

3DUWLDO<br />

5HVXOW<br />

680 ;<br />

RYHU Q )$&7<br />

*5283 %< GLP$WWUV<br />

680 ;<br />

RYHU Q )$&7<br />

*5283 %< GLP$WWUV<br />

6HQG<br />

3DUWLDO<br />

5HVXOWV<br />

680 680V<br />

81,21 3DUWLDOB6XPV<br />

*5283 %< GLP$WWUV<br />

$SSO\ 0HUJH<br />

4XHU\<br />

5HGLVWULEXWH<br />

)LJXUH ± 4XHU\ 3URFHVVLQJ 6WHSV LQ 13':


36[<br />

36 3B.(<<br />

)LJXUH 6FKHPD LQ 1RGH ; ZLWK UHSOLFDWHG 6FKHPD IURP 1RGH <<br />

,Q VWHSV DQG RI )LJXUH ZH FDQ VHH WKDW<br />

$JJUHJDWLRQ SULPLWLYHV DUH FRPSXWHG DW HDFK QRGH 7KH<br />

PRVW FRPPRQ SULPLWLYHV DUH<br />

/LQHDU VXP /6 680 ;<br />

6XP RI VTXDUHV 66 680 ;<br />

QXPEHU RI HOHPHQWV 1<br />

H[WUHPHV 0$; DQG 0,1<br />

3 3B.(<<br />

4XHU\ VXEPLVVLRQ<br />

6HOHFW VXP D FRXQW D DYHUDJH D PD[ D PLQ D<br />

VWGGHY D JURXSBDWWULEXWHV<br />

)URP IDFW GLPHQVLRQV MRLQ<br />

*URXS E\ JURXSBDWWULEXWHV<br />

4XHU\ UHZULWLQJ DQG GLVWULEXWLRQ WR HDFK QRGH<br />

6HOHFW VXP D FRXQW D VXP D [ D PD[ D PLQ D<br />

JURXSBDWWULEXWHV<br />

)URP IDFW GLPHQVLRQV MRLQ<br />

*URXS E\ JURXSBDWWULEXWHV<br />

&RPSXWH SDUWLDO UHVXOWV<br />

6HOHFW VXP D FRXQW D VXP D [ D PD[ D PLQ D<br />

JURXSBDWWULEXWHV<br />

)URP IDFW GLPHQVLRQV MRLQ<br />

*URXS E\ JURXSBDWWULEXWHV<br />

5HVXOWV FROOHFWLQJ<br />

&UHDWH FDFKHG WDEOH<br />

35TXHU\; QRGH VXPD FRXQWD VVXPD PD[D PLQD<br />

JURXSBDWWULEXWHV<br />

DV LQVHUW UHFHLYHG UHVXOWV!<br />

5HVXOWV PHUJLQJ<br />

6HOHFW VXP VXPD VXP FRXQWD<br />

VXP VXPD VXP FRXQWD PD[ PD[D PLQ PLQD<br />

VXP VVXPD VXP VXPD VXP FRXQWD JURXSBDWWULEXWHV<br />

)URP 81,21B$// 35TXHU\; GLPHQVLRQV MRLQ<br />

*URXS E\ JURXSBDWWULEXWHV<br />

)LJXUH ± %DVLF $JJUHJDWLRQ 4XHU\ 6WHSV<br />

$OWKRXJK ZH KDYH GLVFXVVHG DQG HYDOXDWHG<br />

H[WHQVLYHO\ SDUWLWLRQLQJ DQG SURFHVVLQJ FKRLFHV IRU WKH<br />

13': LQ SUHYLRXV ZRUNV ZH GLG QRW GLVFXVV<br />

DYDLODELOLW\ ZKLFK LV QHYHUWKHOHVV YHU\ LPSRUWDQW LQ WKH<br />

SRWHQWLDOO\ XQUHOLDEOH HQYLURQPHQW IRU ZKLFK 13': LV<br />

GHVLJQHG WR UXQ<br />

6<br />

3\<br />

3[<br />

/L 2B.(<<br />

36\ 2\<br />

/L Ã<br />

/L Ã<br />

2[<br />

2 2B.(<<br />

$ GLVFXVVLRQ RI DYDLODELOLW\ IRU WKH 13': EULQJV XS<br />

VHYHUDO LVVXHV )RU LQVWDQFH QHWZRUN IDLOXUHV IDLOXUH RI<br />

WKH VXEPLWWHU RU FRPSXWLQJ QRGHV ORDGLQJ IDLOXUHV<br />

DYDLODELOLW\ PRQLWRULQJ DQG VR RQ (DFK RI WKHVH LVVXHV<br />

UHTXLUHV VSHFLILF VROXWLRQV )RU LQVWDQFH QHWZRUN<br />

IDLOXUHV FDQ EH DFFRPPRGDWHG XVLQJ EDFNXS<br />

FRQQHFWLRQV XQDYDLODELOLW\ RI VXEPLWWHU QRGH FDQ EH<br />

DFFRPPRGDWHG E\ DOORZLQJ PRUH WKDQ RQH QRGH WR EH D<br />

SRWHQWLDO VXEPLWWHU DQG URXWLQJ FOLHQW UHTXHVWV LQWR<br />

DYDLODEOH QRGHV )DLOXUH RI WKH VXEPLWWHU QRGH LQ WKH<br />

PLGGOH RI TXHU\ SURFHVVLQJ FDQ EH KDQGOHG E\<br />

UHGLUHFWLQJ SDUWLDO UHVXOWV LQWR DQRWKHU QRGH RU<br />

UHVXEPLWWLQJ WKH TXHU\ 7KHVH LVVXHV DUH SDUW RI RXU<br />

FXUUHQW DQG IXWXUH ZRUN RQ WKH VXEMHFW ,Q WKLV SDSHU ZH<br />

UHVWULFW RXU DWWHQWLRQ WR WKH XQDYDLODELOLW\ RI FRPSXWLQJ<br />

QRGHV UHSOLFDWLRQ DOWHUQDWLYHV WR DFKLHYH KLJK<br />

DYDLODELOLW\ DQG SURFHVVLQJ HIILFLHQF\ LQ WKH SUHVHQFH RI<br />

UHSOLFDWLRQ DQG XQDYDLODELOLW\<br />

$YDLODELOLW\ WDUJHWHG 5HSOLFDWLRQ RYHU<br />

13':<br />

&RQVLGHU ILUVW WKDW WKH EDVLF UHSOLFDWLRQ XQLW LQ<br />

13': LV WKH QRGH $ ZKROH FRS\ RI UHODWLRQ SDUWLWLRQV<br />

IURP RQH QRGH FDQ EH SODFHG LQ DQRWKHU QRGH DQG LQ<br />

FDVH RI IDLOXUH WKH UHSODFHPHQW QRGH ZLOO SURFHVV<br />

³WZLFH´ WKH DPRXQW RI GDWD ± LWV RZQ QRGH GDWD DQG WKH<br />

RQH LW LV UHSODFLQJ ,Q SUDFWLFH 3 UHSOLFDWHG UHODWLRQV<br />

VPDOO GLPHQVLRQV GR QRW QHHG WR EH UHSOLFDWHG DJDLQ<br />

IRU DYDLODELOLW\ )LJXUH VKRZV WKH VFKHPD RI D QRGH ;<br />

ZLWK UHSOLFDWHG GDWD IURP DQRWKHU QRGH < 1RGH ; FDQ<br />

QRZ UHSODFH QRGH < LQ FDVH RI XQDYDLODELOLW\ RI <<br />

:H ZLOO DOVR GLVFXVV LQ WKH QH[W VHFWLRQ DYDLODELOLW\<br />

VWUDWHJLHV WKDW VOLFH WKH UHSOLFDWLRQ XQLWV IXUWKHU DQG<br />

GLYLGH WKH VOLFHV E\ PRUH WKDQ RQH QRGH 7KLV VWUDWHJ\<br />

LPSURYHV WKH HIILFLHQF\ RI SURFHVVLQJ LQ FDVH RI QRGH<br />

XQDYDLODELOLW\ )RU LQVWDQFH /L\ LQ )LJXUH ZLOO EH<br />

UHSODFHG E\ /L\M M P DQG GLYLGHG LQWR P QRGHV ,Q<br />

WKLV FDVH WKH XQLW RI UHSOLFDWLRQ ZLOO EH WKH VOLFH<br />

7KHUH LV DOVR DQRWKHU UHTXLUHPHQW FRQFHUQLQJ<br />

UHSOLFDWLRQ VOLFHV &RQVLGHU D SDUWLWLRQ /LL RI D UHODWLRQ<br />

&


L WKDW LV SODFHG DW D QRGH ; $V GHSLFWHG LQ )LJXUHV<br />

DQG UHODWLRQV DUH SDUWLWLRQHG E\ D SDUWLWLRQLQJ NH\<br />

W\SLFDOO\ KDVK SDUWLWLRQHG DQG SODFHG LQ HTXL<br />

SDUWLWLRQHG IDVKLRQ ZKHQ SRVVLEOH H J /L DQG 2 DUH<br />

ERWK SDUWLWLRQHG E\ 2B.(< DQG WXSOHV ZLWK D VSHFLILF<br />

YDOXH RI 2B.(< DUH SODFHG RQ WKH VDPH QRGH 7KH<br />

UHTXLUHPHQW LV WKDW UHSOLFDWLRQ VOLFHV DOVR EH RUJDQL]HG<br />

E\ SDUWLWLRQLQJ NH\ LQ D VLPLODU ZD\ VR WKDW WXSOHV ZLWK<br />

WKH VDPH NH\ ZLOO VWLOO EH FR ORFDWHG<br />

:LWK UHVSHFW WR TXHU\ SURFHVVLQJ ZLWK UHSOLFDV WKHUH<br />

DUH WZR LVVXHV ZKLFK QRGHV SURFHVV ZKLFK UHSOLFDV DQG<br />

KRZ WKH\ H[WHQG WKHLU SURFHVVLQJ WR KDQGOH WKH UHSOLFDV<br />

7KH ILUVW LVVXH LV D VFKHGXOLQJ SUREOHP ZKLFK LV QRW RXU<br />

PDLQ FRQFHUQ LQ WKLV SDSHU DQG IRU ZKLFK ZH XVH D<br />

VLPSOH JUHHG\ VROXWLRQ<br />

(DFK QRGH SURFHVVHV LWV RZQ GDWD<br />

)RU HDFK XQDYDLODEOH QRGH<br />

&KRRVH UHSOLFD KROGLQJ QRGH ZLWK OHVV ORDG WR<br />

SURFHVV LWV UHSOLFD<br />

LI PRUH WKDQ RQH KDYH VDPH ORDG FKRRVH<br />

FORVHVW<br />

$OWKRXJK WKLV DOJRULWKP GRHV QRW JXDUDQWHH<br />

EDODQFHG GLVWULEXWLRQ RI ORDG LW LV VXIILFLHQW IRU RXU<br />

SXUSRVHV DQG LI WKHUH LV LPEDODQFH LQ WKH UHVXOW H J WKH<br />

WRS ORDG EHLQJ D QRGH ZLWK PXFK PRUH ORDG WKDQ WKH<br />

RWKHU RQHV D VHFRQG VWHS FDQ WU\ WR UHDOORFDWH WKH<br />

SURFHVVLQJ RI RQH RU PRUH UHSOLFDV IURP WKDW QRGH<br />

:H QRZ FRQFHQWUDWH RQ KRZ WR KDQGOH UHSOLFDV<br />

ZKLOH SURFHVVLQJ TXHULHV $ QRGH UXQQLQJ D UHSOLFD RU<br />

VOLFH FDQ SURFHVV LWV GDWD VHW DQG WKH UHSOLFD<br />

LQGHSHQGHQWO\ DV LI LW UHSUHVHQWHG ³WZR YLUWXDO QRGHV´<br />

UXQQLQJ WZR LQGHSHQGHQW LQVWDQFHV RI WKH F\FOH LQ<br />

)LJXUH 7KHVH FRPSXWDWLRQV \LHOG WZR SDUWLDO UHVXOWV<br />

DV LI LW ZHUH WKH SDUWLDO UHVXOWV IURP WZR VHSDUDWH QRGHV<br />

ZKLFK FDQ EH PHUJHG XVLQJ VWHS RI )LJXUH EHIRUH<br />

VHQGLQJ D VLQJOH SDUWLDO UHVXOW WR WKH PHUJHU QRGH 7KH<br />

QRUPDO SURFHVVLQJ UHVXPHV DV EHIRUH LQ VWHS ZLWK<br />

HYHU\ QRGH VHQGLQJ WKHLU UHVXOWV WR WKH PHUJLQJ QRGH V<br />

7KLV VWUDWHJ\ LV QRW WKH PRVW HIILFLHQW EHFDXVH WKH<br />

UHSODFHPHQW QRGH SURFHVVHV WKH ZKROH GDWD VHSDUDWHO\<br />

IRU ERWK YLUWXDO QRGHV DQG DSSOLHV DQ H[WUD PHUJH TXHU\<br />

$ EHWWHU DOWHUQDWLYH LV WR VFDQ WKH XQLRQ RI SDUWLWLRQHG<br />

UHODWLRQV 6FDQ RSHUDWLRQV RYHU SDUWLWLRQHG UHODWLRQV<br />

QRZ VFDQ ERWK WKH QRGHV GDWD DQG WKH UHSOLFDV GDWD<br />

DQG WKH TXHU\ SURFHHGV DV LQ D VLQJOH QRGH ZLWK WKH<br />

TXHU\ RSWLPL]HU FKRRVLQJ WKH EHVW TXHU\ SODQ 7KLV<br />

DOWHUQDWLYH LV EHWWHU EHFDXVH LW DYRLGV H[WUD PHUJLQJ<br />

RYHUKHDG DQG DOVR WKH QHHG WR MRLQ WZLFH ZLWK UHSOLFDWHG<br />

UHODWLRQV WKDW DSSHDUV LI WKH YLUWXDO QRGHV DSSURDFK ZDV<br />

XVHG LQVWHDG RQH IRU HDFK YLUWXDO QRGH ZKLOH VFDQ<br />

XQLRQ UHTXLUHV D VLQJOH SURFHVVLQJ RI UHSOLFDWHG<br />

UHODWLRQV $V WKH VFDQ XQLRQ DOWHUQDWLYH LV PRUH<br />

HIILFLHQW WKDQ WKH YLUWXDO QRGHV DSSURDFK ZH DGRSWHG<br />

VFDQ XQLRQ LQ 13': WKH H[SHULPHQWDO HYDOXDWLRQ LV<br />

EDVHG LQ VFDQ XQLRQ<br />

$OWHUQDWLYH 5HSOLFDWLRQ 6WUDWHJLHV<br />

In this section we consi<strong>de</strong>r and analyze alternative<br />

replication strategies. We analyze the advantages of<br />

each strategy us<strong>in</strong>g as metrics: <strong>de</strong>gree of fault tolerance<br />

(how many no<strong>de</strong>s can be unavailable or fail<br />

simultaneously); efficiency (performance upon no<strong>de</strong><br />

failure); provision for tak<strong>in</strong>g several no<strong>de</strong>s offl<strong>in</strong>e<br />

simultaneously for data load<strong>in</strong>g or other management or<br />

ma<strong>in</strong>tenance activities. For <strong>in</strong>stance, it may be possible<br />

to take half the no<strong>de</strong>s offl<strong>in</strong>e for load<strong>in</strong>g while the<br />

system rema<strong>in</strong>s onl<strong>in</strong>e, then switch to load<strong>in</strong>g the other<br />

half while never stopp<strong>in</strong>g the availability status of the<br />

system.<br />

5.1. Full Replicas (FR)<br />

The simplest replica placement strategy <strong>in</strong>volves<br />

replicat<strong>in</strong>g each no<strong>de</strong>’s data <strong>in</strong>to at least one other no<strong>de</strong>.<br />

In case of failure of one no<strong>de</strong>, a no<strong>de</strong> conta<strong>in</strong><strong>in</strong>g the<br />

replica resumes the operation of the failed no<strong>de</strong>. A<br />

simple placement algorithm consi<strong>de</strong>r<strong>in</strong>g R replicas is:<br />

Number no<strong>de</strong>s l<strong>in</strong>early;<br />

For each no<strong>de</strong> i<br />

For replica =1 to R<br />

data for no<strong>de</strong> i is also placed <strong>in</strong> no<strong>de</strong> (i+R) MOD N;<br />

Metrics:<br />

Degree of fault tolerance: R no<strong>de</strong>s when consi<strong>de</strong>r<strong>in</strong>g<br />

R replicas;<br />

Efficiency (performance upon no<strong>de</strong> failure):<br />

process<strong>in</strong>g time doubles when a no<strong>de</strong> fails;<br />

Provision for tak<strong>in</strong>g several no<strong>de</strong>s offl<strong>in</strong>e<br />

simultaneously: can take multiple no<strong>de</strong>s offl<strong>in</strong>e<br />

simultaneously, as long as the set of unavailable no<strong>de</strong>s<br />

does not <strong>in</strong>clu<strong>de</strong> all R+1 copies of any no<strong>de</strong>. For<br />

example, <strong>in</strong> Figure 6 with two replicas, sha<strong>de</strong>d boxes<br />

may be unavailable and the system still works, because<br />

no<strong>de</strong>s 3, 6 and 9 conta<strong>in</strong> replicas of their two closest<br />

neighbors. This suggests that up to R/(R+1)N no<strong>de</strong>s can<br />

be offl<strong>in</strong>e simultaneously, if chosen carefully.<br />

)LJXUH $YDLODELOLW\ LQ )5<br />

The major drawback of this simple strategy is<br />

process<strong>in</strong>g efficiency when unavailability of a few<br />

no<strong>de</strong>s occur: consi<strong>de</strong>r a NPDW system with N<br />

homogeneous no<strong>de</strong>s. Us<strong>in</strong>g a simplified l<strong>in</strong>ear mo<strong>de</strong>l,<br />

assume that each no<strong>de</strong> conta<strong>in</strong>s and processes about


1/N of the data <strong>in</strong> O(1/N) of the time it would take to<br />

process the whole data. If one no<strong>de</strong> fails, the no<strong>de</strong><br />

replac<strong>in</strong>g it with the replica will take (at least) about<br />

twice as long O(2/N), even though all the other no<strong>de</strong>s<br />

will take O(1/N). The replica effort is placed on a s<strong>in</strong>gle<br />

no<strong>de</strong>, even though other no<strong>de</strong>s are less loa<strong>de</strong>d.<br />

5.2. Fully <strong>Partitioned</strong> Replicas (FPR)<br />

Instead of hav<strong>in</strong>g full replicas <strong>in</strong> a s<strong>in</strong>gle no<strong>de</strong>, much<br />

more efficiency results if replicas are partitioned <strong>in</strong>to as<br />

many slices as there are no<strong>de</strong>s m<strong>in</strong>us one. If there are N<br />

no<strong>de</strong>s, a replica is partitioned <strong>in</strong>to N-1 slices and each<br />

slice is placed <strong>in</strong> one no<strong>de</strong>. The replica of no<strong>de</strong> i is now<br />

dispersed <strong>in</strong>to all no<strong>de</strong>s except no<strong>de</strong> i. The follow<strong>in</strong>g<br />

algorithm can be used to place the slices:<br />

Number no<strong>de</strong>s l<strong>in</strong>early;<br />

The data for no<strong>de</strong> i is partitioned <strong>in</strong>to N-1 numbered<br />

slices, start<strong>in</strong>g at 1;<br />

For slice x from 1 to N-1:<br />

Place slice x <strong>in</strong> no<strong>de</strong> (i+x) MOD N .<br />

This strategy is the most efficient one because,<br />

consi<strong>de</strong>r<strong>in</strong>g N no<strong>de</strong>s, each replica slice has 1/(N-1) of<br />

the data and each no<strong>de</strong> has to process only that fraction<br />

<strong>in</strong> excess <strong>in</strong> case of a s<strong>in</strong>gle no<strong>de</strong> be<strong>in</strong>g unavailable. If a<br />

no<strong>de</strong> becomes unavailable, the rema<strong>in</strong><strong>in</strong>g no<strong>de</strong>s will<br />

process their data together with the replica slices<br />

correspond<strong>in</strong>g to the unavailable no<strong>de</strong>. However, <strong>in</strong> this<br />

case it is not possible to stop more than one no<strong>de</strong> if<br />

there is a s<strong>in</strong>gle replica, because all no<strong>de</strong>s that rema<strong>in</strong><br />

active are nee<strong>de</strong>d to process a slice from the replica. In<br />

or<strong>de</strong>r to allow up to R no<strong>de</strong>s to become unavailable,<br />

there must be R non-overlapp<strong>in</strong>g replica slice sets. Two<br />

replicas are non-overlapped iff the equivalent slices of<br />

the two replicas are not placed <strong>in</strong> the same no<strong>de</strong>.<br />

Consi<strong>de</strong>r that R replicas are to be created (tolerance to<br />

unavailability of R no<strong>de</strong>s). In or<strong>de</strong>r to avoid slice<br />

overlapp<strong>in</strong>g, the follow<strong>in</strong>g placement algorithm is used:<br />

Number no<strong>de</strong>s l<strong>in</strong>early;<br />

The copy of the data of no<strong>de</strong> i is partitioned <strong>in</strong>to N-1<br />

numbered slices, start<strong>in</strong>g at 1.<br />

For j=0 to R:<br />

For slice x from 1 to N-1:<br />

Place slice x <strong>in</strong> no<strong>de</strong> (i+j+ x) MOD N<br />

Metrics:<br />

Degree of fault tolerance: R no<strong>de</strong>s, when R replicas<br />

are used;<br />

Efficiency (performance upon no<strong>de</strong> failure):<br />

process<strong>in</strong>g time <strong>in</strong>creases proportionally to size of slice<br />

(fraction 1/(N-1));<br />

Provision for tak<strong>in</strong>g several no<strong>de</strong>s offl<strong>in</strong>e<br />

simultaneously: need multiple non-overlapp<strong>in</strong>g<br />

replicas.<br />

5.3. <strong>Partitioned</strong> Replicas (PR)<br />

Replicas may be partitioned <strong>in</strong>to less than N slices (<strong>in</strong><br />

NPDW with N no<strong>de</strong>s). If replicas are partitioned <strong>in</strong>to x<br />

slices, we <strong>de</strong>note it by PR(x). If x=N, we have a fully<br />

partitioned replica. A very simple algorithm to generate<br />

less than N slices is:<br />

Number no<strong>de</strong>s l<strong>in</strong>early;<br />

The data for no<strong>de</strong> i is partitioned <strong>in</strong>to X slices start<strong>in</strong>g<br />

at 1;<br />

For slice set j=0 to R:<br />

For slice x from 1 to X:<br />

Place slice x <strong>in</strong> no<strong>de</strong> (i+j+ x) MOD N<br />

If we <strong>de</strong>sire y no<strong>de</strong>s to be able to come offl<strong>in</strong>e<br />

simultaneously when a s<strong>in</strong>gle replica is used, then the y<br />

no<strong>de</strong>s must not conta<strong>in</strong> replica slices of each other. In<br />

or<strong>de</strong>r to achieve this, we can divi<strong>de</strong> the no<strong>de</strong>s <strong>in</strong>to<br />

groups that we want to take offl<strong>in</strong>e simultaneously.<br />

Then we guarantee by placement that replica slices of<br />

the no<strong>de</strong>s <strong>in</strong> a group are not placed <strong>in</strong> any no<strong>de</strong> of that<br />

group and therefore we can take the whole group<br />

offl<strong>in</strong>e simultaneously for ma<strong>in</strong>tenance or other<br />

functionality.<br />

For <strong>in</strong>stance, Figure 7 shows twelve no<strong>de</strong>s organized<br />

<strong>in</strong>to two groups G1 and G2. Replicas of each no<strong>de</strong> are<br />

PR(6) and the slices are placed <strong>in</strong> the other group. The<br />

labels R1 and R2 <strong>in</strong> the Figure represent the replicas of<br />

no<strong>de</strong>s of each group and <strong>in</strong>dicate that they are placed <strong>in</strong><br />

the other group. The replicas are fully partitioned <strong>in</strong>to<br />

the other group.<br />

* 5<br />

*<br />

)LJXUH *URXSLQJ 5HSOLFDV<br />

Us<strong>in</strong>g this strategy, it is possible to take a whole group<br />

(6 no<strong>de</strong>s) offl<strong>in</strong>e simultaneously. The system will run<br />

slightly slower than if we had a s<strong>in</strong>gle no<strong>de</strong> offl<strong>in</strong>e with<br />

12 full replica slices, because slices are larger. This<br />

layout guarantees availability to failures of a s<strong>in</strong>gle<br />

no<strong>de</strong> (R=1) but also of any number of no<strong>de</strong>s from a<br />

s<strong>in</strong>gle group.<br />

We <strong>de</strong>note this strategy by PRG(g,x) (g groups with x<br />

elements each) or PR(x), for simplicity and consi<strong>de</strong>r<strong>in</strong>g<br />

equal-sized groups. It works like FR at the <strong>in</strong>ter-group<br />

level and FPR with<strong>in</strong> each group. If we use this<br />

strategy with R replicas and R+1 groups, the system can<br />

tolerate failures or unavailability of no<strong>de</strong>s from up to R<br />

5


groups. More groups allow more no<strong>de</strong>s to be<br />

unavailable but slices will be larger, lead<strong>in</strong>g to possibly<br />

slower process<strong>in</strong>g when groups are offl<strong>in</strong>e.<br />

Metrics:<br />

Degree of fault tolerance: X no<strong>de</strong>s from a s<strong>in</strong>gle<br />

group; If R replicas over R+1 groups are used, the<br />

system can tolerate failures or unavailability of no<strong>de</strong>s<br />

from up to R groups;<br />

Efficiency (performance upon no<strong>de</strong> failure):<br />

process<strong>in</strong>g time <strong>in</strong>creases proportionally to size of slice<br />

(fraction 1/(X));<br />

Provision for tak<strong>in</strong>g several no<strong>de</strong>s offl<strong>in</strong>e<br />

simultaneously: can take offl<strong>in</strong>e whole groups.<br />

&RPSDUDWLYH $QDO\VLV<br />

,Q WKLV DQDO\VLV ZH IRFXV RQ WKH EDODQFH EHWZHHQ<br />

HIILFLHQW DYDLODELOLW\ E\ DQDO\]LQJ WKH SHUIRUPDQFH<br />

XQGHU QRGH XQDYDLODELOLW\ DQG WKH IOH[LELOLW\ WR WDNH<br />

PXOWLSOH QRGHV RIIOLQH :H FRQVLGHU WKH XVH RI IXOO<br />

UHSOLFDV )5 IXOO\ SDUWLWLRQHG UHSOLFDV )35 DQG<br />

SDUWLWLRQHG UHSOLFDV 35 7KH DQDO\VLV LQYROYHG<br />

PHDVXULQJ UHVSRQVH WLPH RI 13': RQ ORZ FRVW 3&V<br />

0+] 0% 5$0 *% 73& + > @ ZDV<br />

PDQXDOO\ VHWXS LQWR DQG QRGHV ZLWK<br />

SDUWLWLRQLQJ DQG SODFHPHQW DV GHVFULEHG LQ VHFWLRQ<br />

:H WKHQ PHDVXUHG UHVSRQVH WLPH IRU TXHU\ RI 73& +<br />

ZLWKRXW QRGHV RIIOLQH DQG FRPSDUHG WKH UHVXOW WR WKH<br />

UHVSRQVH WLPH ZLWK QRGHV RIIOLQH 4XHU\ LV<br />

UHSURGXFHG EHORZ IRU UHIHUHQFH WKH TXHU\ SDUDPHWHUV<br />

ZHUH JHQHUDWHG DV GHVFULEHG LQ WKH 73& + VSHFLILFDWLRQ<br />

DQG WKH UHVXOWV DUH WKH DYHUDJH RI UXQV<br />

Select nation, o_year, sum(amount) as sum profit from<br />

(<br />

Select n_name as nation, year(o_or<strong>de</strong>rdate) as o_year,<br />

l exten<strong>de</strong>dprice * (1 - l discount) – ps_supplycost*<br />

l_quantity as amount<br />

from<br />

tpcd.part,tpcd.supplier, tpcd.l<strong>in</strong>eitem, tpcd.partsupp,<br />

tpcd.or<strong>de</strong>rs, tpcd.nation<br />

where<br />

s suppkey = l_suppkey and ps suppkey = l_suppkey<br />

and ps partkey = l_partkey and p_partkey = l_partkey<br />

and o_or<strong>de</strong>rkey = l_or<strong>de</strong>rkey<br />

and s_nationkey = n_nationkey<br />

and p_name like x and n_nationkey > y<br />

and o_or<strong>de</strong>rpriority = 'z' and ps_availqty > w<br />

) as profit<br />

group by nation, o_year<br />

or<strong>de</strong>r by nation, o_year <strong>de</strong>sc;)<br />

)LJXUH VKRZV WKH UHVSRQVH WLPH PLQ VHF ZKHQ<br />

RXW RI QRGHV DUH RIIOLQH OLQH 7KH DOWHUQDWLYHV<br />

FRPSDUHG DUH ³RQOLQH´ ± HYHU\ QRGH LV RQOLQH )35 ±<br />

IXOO\ SDUWLWLRQHG UHSOLFDV QRGHV RIIOLQH 35 ±<br />

SDUWLWLRQHG UHSOLFDV WZR JURXSV RI QRGHV HDFK<br />

35 ± SDUWLWLRQHG UHSOLFDV JURXSV RI QRGHV HDFK<br />

,W DOVR VKRZV WKH PLQLPXP QXPEHU RI UHSOLFDV WKDW DUH<br />

QHFHVVDU\ WR SURYLGH WKH UHTXLUHG DYDLODELOLW\ 7KHVH<br />

UHVXOWV VKRZ WKH PXFK ODUJHU SHQDOW\ LQFXUUHG E\ )5<br />

DQG WKH H[FHVVLYH QXPEHU RI UHSOLFDV UHTXLUHG IRU )35<br />

WR DOORZ QRGHV RIIOLQH VLPXOWDQHRXVO\ 35<br />

SDUWLWLRQHG UHSOLFDV ZLWK WZR HOHPHQW JURXSV DUH D<br />

JRRG FKRLFH DV LW UHTXLUHV D VLQJOH UHSOLFD DQG REWDLQV D<br />

JRRG UHVSRQVH WLPH VLPXOWDQHRXVO\<br />

QU RI UHSOLFDV<br />

1ž RI 5HSOLFDV<br />

5HVSRQVH 7LPH PLQ VHF<br />

RQOLQH )35 35 35 )5<br />

)LJXUH 5HVSRQVH 7LPH 5HSOLFDV QRGHV IDLO 4XHU\<br />

UHVSRQVH WLPH PLQ VHF<br />

7KH UHVXOWV IRU 13': ZLWK QRGHV DUH VKRZQ LQ<br />

)LJXUH ,Q WKLV FDVH ZH FRQVLGHU XQDYDLODEOH QRGHV<br />

LQVWHDG RI WKH RI WKH SUHYLRXV UHVXOWV DQG WKH SDLU<br />

35 35 LQVWHDG RI 35 DQG 35<br />

QU RI UHSOLFDV<br />

1ž RI 5HSOLFDV<br />

5HVSRQVH 7LPH PLQ VHF<br />

RQOLQH )35 35 35 )5<br />

)LJXUH 5HVSRQVH 7LPH 5HSOLFDV QRGHV IDLO 4XHU\<br />

UHVSRQVH WLPH PLQ VHF<br />

7KH WUHQG LV VLPLODU WR WKH RQH REVHUYHG LQ )LJXUH<br />

WKH PDLQ GLIIHUHQFH EHLQJ WKDW WKH UHVSRQVH WLPHV DUH<br />

PXFK ODUJHU LQ HYHU\ FDVH EHFDXVH WKHUH DUH RQO\ KDOI<br />

WKH QXPEHU RI QRGHV QRGHV LQ )LJXUH YHUVXV<br />

QRGHV LQ )LJXUH ,Q WKLV FDVH 35 VHHPV WR EH WKH<br />

EHVW FKRLFH DV LW DYRLGV WKH FRVW RI )5 RU 35 DQG<br />

VLPXOWDQHRXVO\ WKH UHTXLUHPHQW RI )35 WKDW WKHUH EH DW<br />

OHDVW UHSOLFDV RI HDFK QRGH<br />

)LJXUH FRPSDUHV WKH UHVSRQVH WLPH RQ 13':<br />

ZLWK QRGHV YHUVXV 13': ZLWK QRGHV 7KHVH<br />

UHVXOWV VKRZ WKDW DOWKRXJK WKH UHVSRQVH WLPH ZLWK<br />

QRGHV LV PXFK ODUJHU WKDQ WKDW ZLWK QRGHV DV


H[SHFWHG WKH FRPSDULVRQ EHWZHHQ DOWHUQDWLYH<br />

UHSOLFDWLRQ VFKHPHV IROORZV D VLPLODU WUHQG<br />

7KHVH H[SHULPHQWDO UHVXOWV KDYH VKRZQ WKDW LW LV<br />

DGYDQWDJHRXV WR FRQVLGHU SDUWLWLRQHG UHSOLFDV LQVWHDG RI<br />

VLPSO\ IXOO UHSOLFDV LI WKH V\VWHP LV WR RIIHU HIILFLHQW<br />

DYDLODELOLW\ :LWK VXFK D FDSDELOLW\ WKH V\VWHP FDQ EH<br />

DOZD\V RQ DOZD\V HIILFLHQW HYHQ WKRXJK SDUWV RI LW DUH<br />

WDNHQ RIIOLQH IRU PDLQWHQDQFH RI PDQDJHPHQW IXQFWLRQV<br />

VXFK DV ORDGLQJ ZLWK QHZ GDWD RU '%$ IXQFWLRQDOLW\<br />

:H DUH FXUUHQWO\ WHVWLQJ WKH VWUDWHJLHV RYHU DGGLWLRQDO<br />

TXHU\ ZRUNORDGV ZLWK YDULHG FKDUDFWHULVWLFV<br />

QU RI UHSOLFDV<br />

5HVSRQVH 7LPH PLQ VHF QRGHV<br />

5HVSRQVH 7LPH PLQ VHF QRGHV<br />

onl<strong>in</strong>e FPR PR(5)<br />

PR(10)<br />

PR(2)<br />

PR(5)<br />

)LJXUH &RPSDULVRQ QRGHV YHUVXV QRGHV<br />

&RQFOXVLRQV DQG )XWXUH :RUN<br />

7KH ZRUN SUHVHQWHG LQ WKLV SDSHU IRFXVHG RQ UHSOLFDWLRQ<br />

IRU HIILFLHQW DYDLODELOLW\ RQ WKH 1RGH 3DUWLWLRQHG 'DWD<br />

:DUHKRXVH 13': $IWHU UHYLHZLQJ SODFHPHQW DQG<br />

SURFHVVLQJ LVVXHV RYHU WKH 13': ZH KDYH FRPSDUHG<br />

DOWHUQDWLYH UHSOLFD VWUDWHJLHV XVLQJ PHWULFV WKDW LQFOXGHG<br />

HIILFLHQF\ GHJUHH RI WROHUDQFH WR QRGH IDLOXUHV DQG<br />

FDSDFLW\ WR DOORZ PXOWLSOH QRGHV WR EH RIIOLQH<br />

VLPXOWDQHRXVO\ 7KH DOWHUQDWLYHV UDQJLQJ IURP IXOO<br />

UHSOLFDWLRQ WR YDULRXV GHJUHHV RI SDUWLWLRQHG UHSOLFDWLRQ<br />

ZHUH FRPSDUHG H[SHULPHQWDOO\ IURP WKH SHUVSHFWLYH RI<br />

SHUIRUPDQFH GHJUDGDWLRQ ZKHQ QRGHV JR RIIOLQH :H<br />

FRQFOXGHG WKDW UHSOLFDV SDUWLWLRQHG E\ JURXSV DUH WKH<br />

PRVW DGYDQWDJHRXV DOWHUQDWLYH IRU 13': LI ZH<br />

FRQVLGHU ERWK SHUIRUPDQFH DQG IOH[LELOLW\ LQ DOORZLQJ<br />

PXOWLSOH QRGHV WR EH WDNHQ RIIOLQH VLPXOWDQHRXVO\ IRU<br />

PDLQWHQDQFH RU ORDGLQJ UHDVRQV %HVLGHV H[WHQVLYH<br />

WHVWLQJ RI WKH DSSURDFKHV RXU IXWXUH ZRUN LQ WKLV<br />

VXEMHFW LQFOXGHV DXWRPDWLQJ UHSOLFDWLRQ DQG UHFRYHU\ DV<br />

ZHOO DV DXWRPDWHG GDWD ZDUHKRXVH ORDGLQJ ZLWK WKH<br />

V\VWHP DOZD\V RQ XVLQJ WKH 35 VWUDWHJLHV GHVFULEHG LQ<br />

WKLV SDSHU<br />

5HIHUHQFHV<br />

> @ &RXORQ & ( 3DFLWWL 3 9DOGXULH] ³6FDOLQJ XS WKH<br />

3UHYHQWLYH 5HSOLFDWLRQ RI $XWRQRPRXV 'DWDEDVHV LQ &OXVWHU<br />

6\VWHPV´ 9HFSDU WK ,QWHUQDWLRQDO &RQIHUHQFH<br />

9DOHQFLD 6SDLQ -XQH<br />

FR<br />

> @ &RSHODQG * 7RP .HOOHU ³$ FRPSDULVRQ RI KLJK<br />

DYDLODELOLW\ PHGLD UHFRYHU\ WHFKQLTXHV´ ,Q 3URFV RI WKH<br />

$&0 ,QWHUQDWLRQDO &RQI RQ 0DQDJHPHQW RI 'DWD<br />

> @ 'H:LWW ' *HUEHU 5 ³0XOWLSURFHVVRU +DVK %DVHG -RLQ<br />

$OJRULWKPV´ 3URFHHGLQJV RI WKH (OHYHQWK &RQIHUHQFH RQ<br />

9HU\ /DUJH 'DWDEDVHV 6WRFNKROP 6ZHGHQ $XJXVW<br />

> @ )XUWDGR 3 7KH ,VVXH RI /DUJH 5HODWLRQV LQ 1RGH<br />

3DUWLWLRQHG 'DWD :DUHKRXVHV ,QWHUQDWLRQDO &RQIHUHQFH RQ<br />

'DWDEDVH 6\VWHPV IRU $GYDQFHG $SSOLFDWLRQV '$6)$$<br />

%HLMLQJ &KLQD $SULO<br />

> @ )XUWDGR 3 ([SHULPHQWDO (YLGHQFH RQ 3DUWLWLRQLQJ LQ<br />

3DUDOOHO 'DWD :DUHKRXVHV '2/$3 :25.6+23 RI WKH<br />

,QWO &RQIHUHQFH RQ ,QIRUPDWLRQ DQG .QRZOHGJH 0DQDJHPHQW<br />

&,.0 :DVKLQJWRQ 1RYHPEHU<br />

> @ )XUWDGR 3 ³(IILFLHQWO\ 3URFHVVLQJ 4XHU\ ,QWHQVLYH<br />

'DWDEDVHV RYHU D 1RQ GHGLFDWHG /RFDO 1HWZRUN´ 1LQHWHHQWK<br />

,QWHUQDWLRQDO 3DUDOOHO DQG 'LVWULEXWHG 3URFHVVLQJ 6\PSRVLXP<br />

'HQYHU &RORUDGR 86$ 0D\<br />

> @ +VLDR + 'DYLG - 'H:LWW 5HSOLFDWHG 'DWD 0DQDJHPHQW<br />

LQ WKH *DPPD 'DWDEDVH 0D FKLQH :RUNVKRS RQ WKH<br />

0DQDJHPHQW RI 5HSOLFDWHG 'DWD<br />

> @ +VLDR + 'DYLG - 'H:LWW &KDLQHG 'HFOXVWHULQJ $ 1HZ<br />

$YDLODELOLW\ 6WUDWHJ\ IRU 0XOWL SURFHVVRU 'DWDEDVH 0DFKLQHV<br />

,&'(<br />

> @ +VLDR + 'DYLG - 'H:LWW $ 3HUIRUPDQFH 6WXG\ RI 7KUHH<br />

+LJK $YDLODELOLW\ 'DWD 5HSOLFDWLRQ 6WUDWHJLHV 3',6<br />

> @ .LPEDOO 5 7KH 'DWD :DUHKRXVH 7RRONLW 1HZ<br />

@ .LWVXUHJDZD 0 7DQDND + DQG 0RWRRND 7<br />

³$SSOLFDWLRQ RI +DVK WR 'DWDEDVH 0DFKLQH DQG LWV<br />

$UFKLWHFWXUH´ 1HZ *HQHUDWLRQ &RPSXWLQJ<br />

> @ /LQ < % .HPPH5 -LPHQH] 3HULV ³&RQVLVWHQW 'DWD<br />

5HSOLFDWLRQ ,V LW IHDVLEOH LQ :$1V"´ LQ WK ,QWHUQDWLRQDO<br />

(XUR 3DU &RQIHUHQFH /LVERD 3RUWXJDO $XJXVW<br />

> @ 3DFLWWL ( 0 g]VX & &RXORQ ³3UHYHQWLYH 0XOWL 0DVWHU<br />

5HSOLFDWLRQ LQ D &OXVWHU RI $XWRQRPRXV 'DWDEDVHV´ WK<br />

,QWHUQDWLRQDO<br />

$XJXVW<br />

(XUR 3DU &RQIHUHQFH .ODJHQIXUW $XVWULD<br />

> @ 5DR - =KDQJ F 0HJLGGR Q /RKPDQ * ³$XWRPDWLQJ<br />

3K\VLFDO 'DWDEDVH 'HVLJQ LQ D 3DUDOOHO 'DWDEDVH´<br />

,QWHUQDWLRQDO &RQIHUHQFH RQ 0DQDJHPHQW RI 'DWD<br />

0DGLVRQ :LVFRQVLQ 86$ -XQH<br />

$&0<br />

> @ 7DQGHP 'DWDEDVH *URXS 1RQ6WRS 64/ $ 'LVWULEXWHG<br />

+LJK 3HUIRUPDQFH +LJK 5HOLDELOLW\ ,PSOHPHQWDWLRQ RI 64/<br />

:RUNVKRS RQ +LJK 3HUIRUP 7UDQV 6\V &$ VHSW<br />

> @ 7HUDGDWD '%& 'DWDEDVH &RPSXWHU 6\VWHP<br />

0DQXDO 5HOHDVH & 7HUDGDWD 1RY<br />

> @ 73& %HQFKPDUN + 7UDQVDFWLRQ 3URFHVVLQJ &RXQFLO<br />

-XQH $YDLODEOH DW KWWS ZZZ WSF RUJ<br />

> @

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!