30.01.2015 Views

Data Replication in Data Intensive Scientific Applications - CiteSeerX

Data Replication in Data Intensive Scientific Applications - CiteSeerX

Data Replication in Data Intensive Scientific Applications - CiteSeerX

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

12<br />

Average file access time (second)<br />

500000<br />

400000<br />

300000<br />

200000<br />

100000<br />

Ditributed <strong>in</strong> Random<br />

Cascad<strong>in</strong>g <strong>in</strong> Random<br />

Distributed <strong>in</strong> TS-Static<br />

Cascad<strong>in</strong>g <strong>in</strong> TS-Static<br />

Distributed <strong>in</strong> TS-Dynamic<br />

Cascad<strong>in</strong>g <strong>in</strong> TS-Dynamic<br />

Average file access time (second)<br />

700000<br />

600000<br />

500000<br />

400000<br />

300000<br />

200000<br />

100000<br />

Ditributed <strong>in</strong> Random<br />

Cascad<strong>in</strong>g <strong>in</strong> Random<br />

Distributed <strong>in</strong> TS-Static<br />

Cascad<strong>in</strong>g <strong>in</strong> TS-Static<br />

Distributed <strong>in</strong> TS-Dynamic<br />

Cascad<strong>in</strong>g <strong>in</strong> TS-Dynamic<br />

0<br />

1000 50000 200000 400000 600000 8000001000000<br />

Total number of files<br />

0<br />

1 50 100 200 400 600 800 1000<br />

Storage Capacity (GB)<br />

(a) Vary<strong>in</strong>g total number of files.<br />

(b) Vary<strong>in</strong>g storage capacity of each node.<br />

Fig. 10. Performance comparison between our distributed algorithm and Cascad<strong>in</strong>g <strong>in</strong> a typical cluster environment. In (a), the storage capacity of each<br />

node is 500 GB; <strong>in</strong> (b), the number of data files <strong>in</strong> the cluster is 500,000. Each data file size is 1 GB.<br />

Average file access time (second)<br />

160000<br />

140000<br />

120000<br />

100000<br />

80000<br />

60000<br />

40000<br />

20000<br />

Ditributed <strong>in</strong> Random<br />

Cascad<strong>in</strong>g <strong>in</strong> Random<br />

Distributed <strong>in</strong> TS-Static<br />

Cascad<strong>in</strong>g <strong>in</strong> TS-Static<br />

Distributed <strong>in</strong> TS-Dynamic<br />

Cascad<strong>in</strong>g <strong>in</strong> TS-Dynamic<br />

Average file access time (second)<br />

120000<br />

100000<br />

80000<br />

60000<br />

40000<br />

20000<br />

Ditributed <strong>in</strong> Random<br />

Cascad<strong>in</strong>g <strong>in</strong> Random<br />

Distributed <strong>in</strong> TS-Static<br />

Cascad<strong>in</strong>g <strong>in</strong> TS-Static<br />

Distributed <strong>in</strong> TS-Dynamic<br />

Cascad<strong>in</strong>g <strong>in</strong> TS-Dynamic<br />

0<br />

1000 50000 200000 400000 600000 8000001000000<br />

Total number of files<br />

0<br />

1 50 100 200 400 600 800 1000<br />

Storage Capacity (GB)<br />

(a) Vary<strong>in</strong>g total number of files.<br />

(b) Vary<strong>in</strong>g storage capacity of each node.<br />

Fig. 11. Performance comparison between our distributed algorithm and Cascad<strong>in</strong>g <strong>in</strong> a cluster environment with full connectivity. In (a), the storage<br />

capacity of each node is 500 GB; <strong>in</strong> (b), the number of data files <strong>in</strong> the cluster is 500,000. Each data file size is 1 GB.<br />

storage, and execut<strong>in</strong>g the next job. Otherwise, we observe<br />

the performance differences as shown <strong>in</strong> Figure 7. Second,<br />

comparisons with different performance show that for our<br />

distributed algorithm, the percentage <strong>in</strong>crease of access time<br />

due to the dynamic pattern is very small, around 5% to<br />

8%. This shows that our algorithm adjusts to the dynamic<br />

access pattern well <strong>in</strong> a typical Grid environment. Third,<br />

Figure 9 (b) shows that with <strong>in</strong>creased storage capacity of<br />

each site, the performance difference between our distributed<br />

algorithm and Cascad<strong>in</strong>g is gett<strong>in</strong>g small for all three access<br />

patterns. For example, when the storage capacity is 1 TB,<br />

Cascad<strong>in</strong>g yields more than twice the file access time than<br />

our distributed algorithm; while at 400 TB, it costs 40%<br />

more access time than our distributed algorithm. This shows<br />

that <strong>in</strong> the more str<strong>in</strong>gent storage scenario, our cach<strong>in</strong>g<br />

algorithm is a better mechanism to reduc<strong>in</strong>g file access time<br />

and thus job execution time.<br />

Comparison <strong>in</strong> Typical Cluster Environment. In a typical<br />

cluster environment, a site often has 1,000 to 10,000 nodes<br />

(note the difference between sites and nodes as expla<strong>in</strong>ed<br />

<strong>in</strong> Section III), each with small local storage <strong>in</strong> the range<br />

10 GB to 1 TB. The network bandwidth with<strong>in</strong> a cluster<br />

is typically 1 GB/s. We use the parameters <strong>in</strong> Table III<br />

to simulate the cluster environment. Figure 10 shows the<br />

performance comparison of our distributed algorithm and<br />

Cascad<strong>in</strong>g under different access patterns. We observe that<br />

for our distributed algorithm, the percentage <strong>in</strong>crease of<br />

access time due to dynamic pattern is relatively large, around<br />

40% to 100%. This shows that our algorithm does not<br />

adjust well to the dynamic access pattern <strong>in</strong> a typical cluster<br />

environment. This is because <strong>in</strong> a cluster, the diameter of the<br />

network (number of hops <strong>in</strong> the longest shortest path of two<br />

cluster nodes) is much larger than that <strong>in</strong> the typical Grid<br />

environment, and the bandwidth <strong>in</strong> a cluster is much smaller<br />

than that <strong>in</strong> Grids. As a result, when the file access pattern<br />

of each site changes <strong>in</strong> the middle of the simulation, it takes<br />

longer time to propagate such <strong>in</strong>formation to other nodes.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!