20.01.2014 Views

Workshop proceeding - final.pdf - Faculty of Information and ...

Workshop proceeding - final.pdf - Faculty of Information and ...

Workshop proceeding - final.pdf - Faculty of Information and ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Data Management in Cloud Scientific Workflow Systems<br />

Dong Yuan 1 *<br />

* Presenter<br />

1. <strong>Faculty</strong> <strong>of</strong> <strong>Information</strong> <strong>and</strong> Communication Technologies, Swinburne University <strong>of</strong><br />

Technology, Hawthorn, Melbourne, Australia 3122<br />

Data-intensive scientific applications are posing many challenges in distributed<br />

computing systems. In the scientific field, the application data are expected to double every<br />

year over the next decade <strong>and</strong> further. With this continuing data explosion, high performance<br />

computing systems are needed to store <strong>and</strong> process data efficiently, <strong>and</strong> workflow<br />

technologies are facilitated to automate these scientific applications. Scientific workflows are<br />

typically very complex. They usually have a large number <strong>of</strong> tasks <strong>and</strong> need a long time for<br />

execution. Running scientific workflow applications usually need not only high performance<br />

computing resources but also massive storage. The emergence <strong>of</strong> cloud computing<br />

technologies <strong>of</strong>fers a new way to develop scientific workflow systems. Scientists can upload<br />

their data <strong>and</strong> launch their applications on the scientific cloud workflow systems from<br />

everywhere in the world via the Internet, <strong>and</strong> they only need to pay for the resources that they<br />

use for their applications. As all the data are managed in the cloud, it is easy to share data<br />

among scientists. This kind <strong>of</strong> model is very convenient for users, but remains a big challenge<br />

to the system. This seminar will discuss several research topics <strong>of</strong> data management in<br />

scientific cloud workflow systems.<br />

Keywords<br />

component, formatting, style, styling, insert (key words)<br />

1. Introduction<br />

Data-intensive scientific applications are posing many challenges in distributed computing systems.<br />

In many scientific research fields, like astronomy, high-energy physics <strong>and</strong> bio-informatics, scientists<br />

need to analyse terabytes <strong>of</strong> data either from existing data resources or collected from physical<br />

devices. During these processes, similar amounts <strong>of</strong> new data might also be generated as intermediate<br />

or <strong>final</strong> products [9]. According to [20], in the scientific field, the application data are expected to<br />

double every year over the next decade <strong>and</strong> further. With this continuing data explosion, high<br />

performance computing systems are needed to store <strong>and</strong> process data efficiently, <strong>and</strong> workflow<br />

technologies are facilitated to automate these scientific applications. Scientific workflows are typically<br />

very complex. They usually have a large number <strong>of</strong> tasks <strong>and</strong> need a long time for execution. Running<br />

scientific workflow applications usually need not only high performance computing resources but also<br />

massive storage [9].<br />

Nowadays, popular scientific workflows are <strong>of</strong>ten deployed in grid systems [16] because they have<br />

high performance <strong>and</strong> massive storage. However, building a grid system is extremely expensive <strong>and</strong> it<br />

is normally not open for scientists all over the world. The emergence <strong>of</strong> cloud computing technologies<br />

<strong>of</strong>fers a new way to develop scientific workflow systems.<br />

Since late 2007 the concept <strong>of</strong> cloud computing was proposed [22], it has been utilised in many<br />

areas with some success. Cloud computing is deemed as the next generation <strong>of</strong> IT platforms that can<br />

deliver computing as a kind <strong>of</strong> utility [8]. Foster et al. made a comprehensive comparison <strong>of</strong> grid<br />

computing <strong>and</strong> cloud computing [11]. Some features <strong>of</strong> cloud computing also meet the requirements<br />

<strong>of</strong> scientific workflow systems. Cloud computing systems provide the high performance <strong>and</strong> massive<br />

storage required for scientific applications in the same way as grid systems, but with a lower<br />

infrastructure construction cost among many other features, because cloud computing systems are<br />

composed <strong>of</strong> data centres which can be clusters <strong>of</strong> commodity hardware [22]. Research into doing<br />

science <strong>and</strong> data-intensive applications on the cloud has already commenced [18], such as early<br />

experiences like Nimbus [14] <strong>and</strong> Cumulus [21] projects. The work by Deelman et al. [10] shows that<br />

67

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!