19. Distributed Databases

  • No tags were found...

19. Distributed Databases

Distributed Database System• A distributed database system consists of looselycoupled sites that share no physical component.• Database systems that run on each site are independentof each other.• Transactions may access data at one or more sites.3

Distributed Databases• In a homogeneous distributed database• All sites have identical software• All sites are aware of each other and agree to cooperate in processinguser requests.• Each site surrenders part of its autonomy in terms of right to changeschemas or software.• Appears to user as a single system.• In a heterogeneous distributed database• Different sites may use different schemas and software.Difference in schema is a major problem for query processing.Difference in software is a major problem for transaction processing.• Sites may not be aware of each other and may provide onlylimited facilities for cooperation in transaction processing.4

Distributed Database: Outline• Introduction• Distributed Data Storage• Distributed Transactions• Concurrency Control in Distributed DatabasesDistributed Query Processing and Optimization5

Distributed Data Storage• We assume relational data model.• Replication• System maintains multiple copies of data, stored in differentsites, for faster retrieval and fault tolerance.• Fragmentation• Relation is partitioned into several fragments stored in distinctsites• Replication and fragmentation can be combined.• Relation is partitioned into several fragments: systemmaintains several identical replicas of each such fragment.6

Data Replication• A relation or fragment of a relation is replicated if it isstored redundantly in two or more sites.• Full replication of a relation is the case where therelation is stored at all sites.• Fully redundant databases are those in which everysite contains a copy of the entire database.7

Data Replication (Cont.)• Advantages of Replication• Availability: failure of site containing relation r does not result inunavailability of r is replicas exist.• Parallelism: queries on r may be processed by several nodes in parallel.• Reduced data transfer: relation r is available locally at each sitecontaining a replica of r.• Disadvantages of Replication• Increased cost of updates: each replica of relation r must be updated.• Increased complexity of concurrency control: concurrent updates todistinct replicas may lead to inconsistent data unless special concurrencycontrol mechanisms are implemented.One solution: choose one copy as primary copy and apply concurrencycontrol operations on primary copy8

Data Fragmentation• A relation r is partitioned into fragments r 1 , r 2 , …, r nwhich contain sufficient information to reconstructrelation r.• Horizontal fragmentation: each tuple of r isassigned to one or more fragments• Vertical fragmentation: the schema for relation r issplit into several smaller schemas• All schemas must contain a common candidate key (orsuperkey) to ensure lossless join property.• A special attribute, the tuple-id attribute may be added toeach schema to serve as a candidate key.9

Horizontal Fragmentation account Relationbranch_name account_number balanceHillsideHillsideHillsideA-305A-226A-15550033662account 1 = branch_name=“Hillside” (account )branch_name account_number balanceValleyviewValleyviewValleyviewValleyviewA-177A-402A-408A-639205100001123750account 2 = branch_name=“Valleyview” (account )10

Vertical Fragmentation of employee Relationbranch_name customer_name tuple_idHillsideHillsideValleyviewValleyviewHillsideValleyviewValleyviewLowmanCampCampKahnKahnKahnGreen1234567deposit 1 = branch_name, customer_name, tuple_id (employee )account_number balance tuple_idA-305A-226A-177A-402A-155A-408A-639500336205100006211237501234567deposit 2 = account_number, balance, tuple_id (employee)11

Advantages of Fragmentation• Horizontal fragmentation:• allows parallel processing on fragments of a relation• allows a relation to be split so that tuples are located wherethey are most frequently accessed• Vertical fragmentation:• allows tuples to be split so that each part of the tuple isstored where it is most frequently accessed• tuple-id attribute allows efficient joining of verticalfragments• allows parallel processing on a relation• Vertical and horizontal fragmentation can be mixed.• Fragments may be successively fragmented to an arbitrarydepth.12

• Data transparencyData Transparency• Degree to which system user may remain unaware of thedetails of how and where the data items are stored in adistributed system.• Consider transparency issues in relation to:• Fragmentation transparency• Replication transparency• Location transparency13

Distributed Database: Outline• Introduction• Distributed Data Storage• Distributed Transactions• Concurrency Control in Distributed DatabasesDistributed Query Processing and Optimization14

Distributed Transactions• Transaction may access data at several sites.• Each site has a local transaction manager that is responsiblefor:• Maintaining a log for recovery purposes.• Participating in coordinating the concurrent execution of the transactionsexecuting at that site.• Each site has a transaction coordinator that is responsible for:• Starting the execution of transactions that originate at the site.• Distributing subtransactions at appropriate sites for execution.• Coordinating the termination of each transaction that originates at the site,which may result in the transaction being committed at all sites or abortedat all sites.15

Transaction System Architecture16

System Failure Modes• Failures unique to distributed systems:• Failure of a site.• Loss of messagesHandled by network transmission control protocols such as TCP/IP• Failure of a communication linkHandled by network protocols, by routing messages via alternativelinks• Network partitionA network is said to be partitioned when it has been split into two ormore subsystems that lack any connection between themNote: a subsystem may consist of a single node• Network partitioning and site failures are generallyindistinguishable.17

Commit Protocols• Commit protocols are used to ensure atomicity acrosssites• A transaction which executes at multiple sites must either becommitted at all the sites, or aborted at all the sites.• It’s not acceptable to have a transaction committed at one siteand aborted at another.• The two-phase commit (2PC) protocol is widely used• The three-phase commit (3PC) protocol is morecomplicated and more expensive, but avoids somedrawbacks of two-phase commit protocol. Thisprotocol is not used in practice.18

Two Phase Commit Protocol (2PC)• Assumes fail-stop model• Failed sites simply stop working, and do not cause any otherharm, such as sending incorrect messages to other sites.• Execution of the protocol is initiated by the coordinatorafter the last step of the transaction has been reached.• The protocol involves all the sites at which thetransaction is executed.• Participants• Let T be a transaction initiated at site S i , and let thetransaction coordinator at S i be C i19

Phase 1: Obtaining a Decision• Coordinator asks all participants to prepare to committransaction T.• C i adds the entry to the log and forces log tostable storage.• C i sends prepare T messages to all participants.• Upon receiving the message, the transaction manager ata site determines if it can commit the transaction.• If not, add an entry to the log and send abort Tmessage to C i• If the transaction can be committed, then:Adds the entry to the logForces all entries for T to stable storageSend ready T message to C i20

Phase 2: Recording the Decision• T can be committed if C i receives a ready T messagefrom all the participating sites; otherwise T must beaborted.• Coordinator adds a decision entry, or, to the log and forces the entry onto stablestorage. Once the entry reaches stable storage it isirrevocable (even if failures occur).• Coordinator sends a message to each participantinforming it of the decision (commit or abort).• Participants take appropriate action locally.21

Handling of Site Failure• When site S k recovers, it examines its log to determine the fate oftransactions active at the time of the failure.• Log contain entry: site executes redo (T)• Log contains entry: site executes undo (T)• Log contains entry: site must consult C i to determine the fate ofT.If T committed, redo (T)If T aborted, undo (T)• If the log contains no control entries concerning T, S k replies thatit failed before responding to the prepare T message from C i• Since the failure of S k precludes the sending of such a response, C i mustabort T• S k must execute undo (T)22

Handling of Coordinator Failure• If coordinator fails while the commit protocol for T is executingthen participating sites must decide on T’s fate:• If an active site contains a record in its log, then T must becommitted.• If an active site contains an record in its log, then T must beaborted.• If some active participating site does not contain a record in itslog, then the failed coordinator C i cannot have decided to commit T. Cantherefore abort T.• If none of the above cases holds, then all active sites must have a record in their logs, but no additional control records (such as of ). In this case active sites must wait for C i to recover, tofind decision.• Blocking problem: active sites may have to wait for failedcoordinator to recover.23

Handling of Network Partition• If the coordinator and all its participants remain in one partition,the failure has no effect on the commit protocol.• If the coordinator and its participants belong to severalpartitions:• Sites that are not in the partition containing the coordinator think thecoordinator has failed, and execute the protocol to deal with failure of thecoordinator.No harm results, but sites may still have to wait for decision fromcoordinator.• The coordinator and the sites in the same partition think that the sites inthe other partition(s) have failed, and follow the usual commit protocol.Again, no harm results24

Distributed Database: Outline• Introduction• Distributed Data Storage• Distributed Transactions• Concurrency Control in Distributed DatabasesDistributed Query Processing and Optimization25

Concurrency Control• Modify concurrency control schemes for use indistributed environment.• We assume that each site participates in the executionof a commit protocol to ensure global transactionautomicity.• We assume all replicas of any item are updated• How to relax this in case of site failures is covered in thetextbook.26

Single-Lock-Manager Approach• System maintains a single lock manager that resides ina single chosen site, say S i .• When a transaction needs to lock a data item, it sends alock request to S i and lock manager determineswhether the lock can be granted immediately.• If yes, lock manager sends a message to the site whichinitiated the request.• If no, request is delayed until it can be granted, at whichtime a message is sent to the initiating site.27

Single-Lock-Manager Approach (Cont.)• The transaction can read the data item from any one ofthe sites at which a replica of the data item resides.• Writes must be performed on all replicas of a data item.• Advantages• Simple implementation• Simple deadlock handling• Disadvantages• Bottleneck: lock manager site becomes a bottleneck• Vulnerability: system is vulnerable to lock manager sitefailure.28

Distributed Lock Manager• In this approach, functionality of locking isimplemented by lock managers at each site• Lock managers control access to local data items• AdvantageBut special protocols may be used for replicas• Work is distributed and can be made robust to failures• Disadvantage• Deadlock detection is more complicated. Lock managerscooperate for deadlock detection29

Distributed Database: Outline• Introduction• Distributed Data Storage• Distributed Transactions• Concurrency Control in Distributed DatabasesDistributed Query Processing and Optimization30

Distributed Query Processing• For centralized systems, the primary criterion formeasuring the cost of a particular strategy is thenumber of disk accesses.• In a distributed system, other issues must be taken intoaccount:• Data transmission cost over the network (often critical).• The potential gain in performance from having several sitesprocess parts of the query in parallel.• Translating algebraic queries on fragments• It must be possible to construct relation r from its fragments.• Replace relation r by the expression to construct relation rfrom its fragments.31

Query Transformation Example• Consider the horizontal fragmentation of the account relationaccount 1 = branch_name = “Hillside” (account )account 2 = branch_name = “Valleyview” (account )• The query branch_name = “Hillside” (account ) becomes branch_name = “Hillside” (account 1 account 2 )which is optimized into branch_name = “Hillside” (account 1 ) branch_name = “Hillside” (account 2 )• Since account 1 has only tuples pertaining to the Hillside branch, we caneliminate the selection operation.• Apply the definition of account 2 to obtain branch_name = “Hillside” ( branch_name = “Valleyview” (account))• This expression always gives an empty set.• Final strategy is for the Hillside site to return account 1 as theresult of the query.32

Simple Join Processing• Consider the following relational algebra expression inwhich the three relations are neither replicated norfragmentedaccount depositor branch• account is stored at site S 1• depositor at S 2• branch at S 3• For a query issued at site S I , the system needs toproduce the result at site S I33

Possible Query Processing Strategies• Ship copies of all three relations to site S I and choose astrategy for processing the entire locally at site S I.• Ship a copy of the account relation to site S 2 andcompute temp 1 = accountdepositor at S 2 . Shiptemp 1 from S 2 to S 3 , and compute temp 2 = temp 1branch at S 3 . Ship the result temp 2 to S I .• Devise similar strategies, exchanging the roles S 1 , S 2 ,S 3 .• Must consider following factors:• amount of data being shipped• cost of transmitting a data block between sites• relative processing speed at each site34

Semijoin Strategy• Let r 1 be a relation with schema R 1 stores at site S 1.Let r 2 be a relation with schema R 2 stores at site S 2.• Evaluate the expression r 1 r 2 and obtain the resultat S 1 .1. Compute temp 1 R 1 R2 (r 1) at S 1 .2. Ship temp 1 from S 1 to S 2 .3. Compute temp 2 r 2 temp 1 at S 24. Ship temp 2 from S 2 to S 1 .5. Compute r 1 temp 2 at S 1 . This is the same as r 1 r 2 .35

Formal Definition• The semijoin of r 1 with r 2 , is denoted by:r 1 r 2• it is defined by: (r R 1 1 r 2 )• Thus, r 1 r 2 selects only those tuples of r 1 thatcontribute to r 1 r 2 .• In step 3 above, temp 2 =r 2 r 1 .• temp 2 = r 2 temp 1 = r 2 R 1 R2 (r 1) = R 2 (r 2 r 1 )• For joins of several relations, the above strategy can beextended to a series of semijoin steps.36

Join Strategies that Exploit Parallelism• Consider r 1 r 2 r 3 r 4 where relation r i is stored at siteS i . Suppose the result must be presented at site S 1 .• r 1 is shipped to S 2 and r 1 r 2 is computed at S 2 ;simultaneously r 3 is shipped to S 4 and r 3 r 4 is computed atS 4• S 2 ships tuples of (r 1 r 2 ) to S 1 as they are produced;S 4 ships tuples of (r 3 r 4 ) to S 1 as they are produced.• Once tuples of (r 1 r 2 ) and (r 3 r 4 ) arrive at S 1 ,(r 1 r 2 ) (r 3 r 4 ) is computed in parallel with thecomputation of (r 1(r 3 r 4 ) at S 4 .r 2 ) at S 2 and the computation of37

Reduced Query due to FragmentationE(Eno, Ename, Title) G(Eno, Jno, Resp, Dur)• E is fragmented into E 1, E 2and E 3as follows:• E 1= Enoe3(E)• E 2= e3e6(E)• G is fragmented into G 1and G 2as follows:• G 1= Enoe3(G)• G 2= Eno>e3(G)• Consider the following query-1:SELECT *FROMWHEREE, GE.Eno = G.Eno38

Query Tree for query-1: Eno Enoe3(E)E 1 E 2 E 3 G 1 G 2 Eno>e3(G)39

Reduced Query Tree for query-1: Eno Eno EnoE 1 G 1 E 2 G 2 E 3 G 2Advantages of the reduced query tree:• Provide parallelism• Eliminate unnecessary work40

Join ExampleConsider the following setting and statistics:R(A, B)site = 1card (R) =10,000R B = projection of R onto Bcard(R B ) = 2,000Suppose card(R B S)=5,000and card(R S B )=2,500S(B, C)site = 2card (S) = 50,000S B = projection of S onto Bcard(S B ) =5,000Attribute A have length = 116; attribute B have length = 4;attribute C have length = 76; and assume thatc 0= initial set-up cost between any two sites = 10c 1= transmission cost per data unit between any two sites = 1 per 1000 bytesTwo naïve strategies and their costs(1) Join in site 1: Cost = 10 + (76+4)*50000/1000 = 4010(2) Join in site 2: Cost = 10 + (116+4)*10000/1000 = 121041

Join Example (cont.)Consider the following setting and statistics:R(A, B)site = 1card (R) =10,000R B = projection of R onto Bcard(R B ) = 2,000Suppose card(R B S)=5,000and card(R S B )=2,500S(B, C)site = 2card (S) = 50,000S B = projection of S onto Bcard(S B ) =5,000Attribute A have length = 116; attribute B have length = 4;attribute C have length = 76; and assume thatc 0= initial set-up cost between any two sites = 10c 1= transmission cost per data unit between any two sites = 1 per 1000 bytes(1) Join in site 1 using semijoin• Cost: 10 + 4*2000/1000 + 10 + (76+4)*5000/1000 = 428(2) Join in site 2 using semijoin• Cost: 10 + 4*5000/1000 + 10 + (116+4)*2500/1000 = 34042

Semijoin Selectivity Factor• Semi-join selectivity factor in R < S is defined as theexpected fraction of the tuples of R which belong to theresult.card A( R S) card ( R)• An estimation of semijoin selectivity factor is: nn• Or, an easier but more rough estimation is~ RSRcard(R A S A )card(R)card ( A(S))card ( domain[A])43

ExerciseConsider the following setting and statistics:R(A,B)site = 1card (R) =10,000R B = projection of R onto Bcard(R B ) = 2,000n RS = card(R B S B ) = 500S(B,C)site = 2card (S) = 50,000S B = projection of S onto Bcard(S B ) =5,000Assume that tuples in both R and S are uniformly distributedover attribute B.Q: Estimate the semi- join selectivity factor of R < S.44

E(Eno, Ename, Title)Reduced Query ExampleFragmentation is as follows:E 1= Title=’Programmer’(E)E 2= Title’Programmer’(E)G 1= G < EnoE 1G 2= G < EnoE 2G(Eno, Jno, Resp, Dur)Query-2:SELECT *FROME, GWHEREE.Eno = G.Eno ANDE.Title = “Mech. Eng.”45

Query Tree for query-2: Title=“Mech.Eng” EnoE 1 E 2 G 1 G 246

Reduced Query Tree for query-2: Eno Title=“Mech.Eng”E 2G 2 Title’Programmer’(E)G < EnoE 247

Summary• Heterogeneous and Homogeneous DatabasesDistributed Data Storage• Replication• Vertical fragmentation, horizontal fragmentation• Distributed Transactions• Two phase commit protocol• Concurrency Control in Distributed Databases• Single vs. distributed lock manager• Distributed Query Processing• Cost measure• Processing in the presence of fragments48

1. Spatial Databases (SW6)• Spatial data model• Spatial data object types• Spatial relationships• R-tree• Basic idea and structure• Insertion and deletion• Spatial queries• Filter-and-refinement paradigm• Range query (processing via R-tree)• Nearest neighbor query (processing via R-tree)49

2. Parallel and Distributed Databases (SW6)• Parallel databases• Architectures• IO parallelism and partitioningRound-robin, hash, and range• Parallel sort and parallel join• Distributed databases• Replication and fragmentation• Two phase commit protocol• Concurrency control• Distributed query processing50

More magazines by this user
Similar magazines