Workshop proceeding - final.pdf - Faculty of Information and ...

More documents

Recommendations

Info

cloud computing offers a cost-effective solution for data-intensive applications, such as scientific workflows [13]. Furthermore, cloud computing systems offer a new model that scientists from all over the world can collaborate and conduct their research together. Cloud computing systems are based on the Internet, and so are the scientific workflow systems deployed in the cloud. Scientists can upload their data and launch their applications on the scientific cloud workflow systems from everywhere in the world via the Internet, and they only need to pay for the resources that they use for their applications. As all the data are managed in the cloud, it is easy to share data among scientists. This kind of model is very convenient for users, but remains a big challenge for data management to scientific cloud workflow systems. Firstly, new data storage strategy is required in cloud scientific workflow systems. In a cloud computing, theoretically, the system can offer unlimited storage resources. All the application data of the scientific workflows can be stored, including the generated intermediate data, if we are willing to pay for the required resources. Storing all the application data in the cloud is obviously not costeffective, since some data are seldom used and huge in size. However, in scientific cloud workflow systems, a scientist can not decide whether a piece of application data should be stored or not, since the data are shared and he is not the only user. Secondly, new data placement strategy is also required, which means the cloud workflow systems must have the ability to decide where to store the application data. Cloud computing platform contains different cloud service providers with different pricing models, where data transfers between service providers also carry a cost. The cloud scientific workflows are usually distributed, and the data placement strategy will decide where to store the application data, in order to reduce the total system cost. Last but not least, new data replication strategy should also be designed for cloud scientific workflow systems. A good replication strategy can not only guarantee the security of application data, but also further reduce the system cost by replicating frequently used data in different locations. Replication strategy in the Cloud should be dynamic based on the application data’s usage rate. 2. Related Works As cloud computing has become more and more popular, new data management systems have also appeared, such as Google File System [12] and Hadoop [1]. They all have hidden infrastructures that can store the application data independent of users’ control. Google File System is designed mainly for Web search applications, which are different from workflow applications. Hadoop is a more general distributed file system, which has been used by many companies, such as Amazon and Facebook. When you push a file to a Hadoop File System, it will automatically split this file into chunks and randomly distribute these chunks in a cluster. Furthermore, the Cumulus project [21] introduced a scientific cloud architecture for a data centre. And the Nimbus [14] toolkit can directly turn a cluster into a cloud and it has already been used to build a cloud for scientific applications. Comparing to the distributed computing systems like cluster and grid, a cloud computing system has a cost benefit [4]. Assunção et al. [5] demonstrate that cloud computing can extend the capacity of clusters with a cost benefit. Using Amazon clouds’ cost model and BOINC volunteer computing middleware, the work in [15] analyses the cost benefit of cloud computing versus grid computing. The idea of doing science on the cloud is not new. Scientific applications have already been introduced to cloud computing systems. In terms of the cost benefit, the work by Deelman et al. [10] also applies Amazon clouds’ cost model and demonstrates that cloud computing offers a cost-effective way to deploy scientific applications. In [13], Hoffa conducted simulations of running an astronomy scientific workflow in cloud and clusters, which shows cloud scientific workflows are cost-effective. The above works mainly focus on the comparison of cloud computing systems and the traditional distributed computing paradigms, which shows that applications running on cloud have cost benefits. When it comes to how to reduce cost of running applications in clouds, Deelman et al. present in [10] that storing some popular intermediate data can save the cost in comparison to always regenerating them from the input data. Furthermore, in [2], Adams proposes a model to represent the trade-off of computation cost and storage cost in storing application data, but the authors have not given the specific method of managing the data. 68
To the best of our knowledge, research in cost-effectively managing application data of cloud scientific workflow systems is still in blank. My research will focus on this spot and develop novel strategies of data storage, placement and replication in cloud scientific workflow systems. 3. Research problems and methodologies A. Data storage strategy 1) Problem analysis Traditionally, scientific workflows are deployed on the high performance computing facilities, such as clusters and grids. Scientific workflows are often complex with huge intermediate data generated during their execution. How to store these intermediate data is normally decided by the scientists who use the scientific workflows. This is because the clusters and grids only serve for certain institutions. The scientists may store the intermediate data that are most valuable to them, based on the storage capacity of the system. However, in many scientific workflow systems, the storage capacities are limited, such as the pulsar searching workflow we introduced. The scientists have to delete all the intermediate data because of the storage limitation. This bottleneck of storage can be avoided if we run scientific workflows in the cloud. In a cloud computing environment, theoretically, the system can offer unlimited storage resources. All the intermediate data generated by scientific cloud workflows can be stored, if we are willing to pay for the required resources. However, in scientific cloud workflow systems, whether to store intermediate data or not is not an easy decision anymore. a) All the resources in the cloud carry certain costs, so either storing or generating an intermediate dataset, we have to pay for the resources used. The intermediate datasets vary in size, and have different generation cost and usage rate. Some of them may often be used whilst some others may be not. On one hand, it is most likely not cost effective to store all the intermediate data in the cloud. On the other hand, if we delete them all, regeneration of frequently used intermediate datasets imposes a high computation cost. We need a strategy to balance the generation cost and the storage cost of the intermediate data, in order to reduce the total cost of the scientific cloud workflow system. b) The scientists can not predict the usage rate of the intermediate data anymore. For a single research group, if the data resources of the applications are only used by its own scientists, the scientists may predict the usage rate of the intermediate data and decide whether to store or delete them. However, the scientific cloud workflow system is not developed for a single scientist or institution, rather, developed for scientists from different institutions to collaborate and share data resources. The users of the system could be anonymous from the Internet. We must have a strategy storing the intermediate data based on the needs of all the users that can reduce the cost of the whole system. Hence, for scientific cloud workflow systems, we need a strategy that can automatically select and store the most appropriate intermediate datasets. Furthermore, this strategy should be cost effective that can reduce the total cost of the whole system. 2) Methodology Scientific workflows have many computation and data intensive tasks that will generate many intermediate datasets of considerable size. There are dependencies exist among the intermediate datasets. Data provenance in workflows is a kind of important metadata, in which the dependencies between datasets are recorded. The dependency depicts the derivation relationship between workflow intermediate datasets. For scientific workflows, data provenance is especially important, because after the execution, some intermediate datasets may be deleted, but sometimes the scientists have to regenerate them for either reuse or reanalysis. Data provenance records the information of how the intermediate datasets were generated, which is very important for the scientists. Furthermore, regeneration of the intermediate datasets from the input data may be very time consuming, and therefore carry a high cost. With data provenance information, the regeneration of the demanding dataset may start from some stored intermediated datasets instead. In the scientific cloud workflow system, data provenance is recorded while the workflow execution. Taking advantage of data provenance, we can build an IDG based on data provenance. All the intermediate datasets once generated in the system, whether stored or deleted, their references are recorded in the IDG. Based on the IDG, we can calculate the generation cost of every intermediate dataset in the scientific cloud 69
Page 2 and 3:
The 3 rd VACPS Research Workshop Pr
Page 4 and 5:
Organizing Committee Victorian Asso
Page 6 and 7:
Preface On the occasion of the publ
Page 8 and 9:
Paper as a low-cost base material f
Page 10 and 11:
Acknowledgements We gratefully ackn
Page 13 and 14:
Career planning and Job Interview i
Page 15 and 16:
The role of mitochondria in atrial
Page 18 and 19:
Track-Before-Detect Procedures for
Page 20 and 21:
3. Dynamic model for approaching ta
Page 22 and 23:
⎧π0 if j = i and Cx ij k > δij
Page 24 and 25:
k k k k ( ) p( p Ζ | X , B = ∏ z
Page 26 and 27: As illustrated in Fig.4, the struct
Page 28 and 29: ( xˆ , m ˆ ) = argmax Γ ( x , m
Page 30 and 31: (a) (b) Fig. 7. mesh plot of the re
Page 32 and 33: Definition 2 The probability of tar
Page 34 and 35: Acknowledgments Xiaobo Deng would l
Page 36 and 37: speed, several methods can be utili
Page 38 and 39: Where u ′ , v′ ) is the point p
Page 40 and 41: ase on SIFT are also unsuitable to
Page 42 and 43: Demonstration and Performance Analy
Page 45 and 46: Effect of mini-tyrosl-tRNA syntheta
Page 47 and 48: 2.1 Materials Sprague-Dauley(SD) ma
Page 49 and 50: Fig. 2. The representative photomic
Page 51 and 52: mount/0.1mm 2 , as for mini-TrpRS g
Page 53 and 54: 3.4.The mRNA expression of mini-Tyr
Page 55 and 56: 9.Du XJ, Gao X, Ramsey D. Surgical
Page 57 and 58: Distribution of three forms alpha-m
Page 59 and 60: Materials and Methods Ethics All an
Page 61 and 62: Fig.1 (A) Mass Spectrometry results
Page 63 and 64: indicated that there is more des-α
Page 65 and 66: Fig.6 (A) Mass Spectrometry results
Page 67 and 68: transcription-polymerase chain reac
Page 69 and 70: Investigation of the mechanism of t
Page 71 and 72: Dual mode roll-up effect in a multi
Page 73: Paper as a low-cost base material f
Page 78 and 79: workflows. By comparing the generat
Page 80 and 81: [10] E. Deelman, G. Singh, M. Livny
Page 82 and 83: Web Page Prediction Based on Condit
Page 84 and 85: A Multi-modal Gesture Recognition S
Page 86: Searching for Fair Joint Gains in A
Page 89 and 90: environments. However, challenges a
Page 91 and 92: conclude that the hydrophobic PHB c
Page 93 and 94: Figure 4. (a) Energy transfer model
Page 95 and 96: 20 Intensity (/10 6 ) / a.u. 15 10
Page 97 and 98: (22) (a) Cho, J.; Quinn, J. F.; Car
Page 99 and 100: Grain refinement of pure magnesium
Page 101 and 102: Fig. 3 The optical and SEM microstr
Page 103 and 104: Conclusions (1) The large grains in
Page 105 and 106: [22-25]. Sorting of normal and Babe
Page 107 and 108: Figure 1. Г×Re[fCM] vs. frequency
Page 109 and 110: Figure 3. The DEP spectrum of cylin
Page 111 and 112: A B C Figure 5. The gradient of the
Page 113 and 114: Figure 7. SEM image of damaged elec
Page 115 and 116: Figure 9. Particle separation progr
Page 117 and 118: The separation efficiency can be im
Page 119 and 120: [37] A. Motayed, M. Q. He, A. V. Da
Page 121 and 122: Fig. 1. The transverse and longitud
Page 123 and 124: References: [1] Xiong Liu, Mark C.
Page 125 and 126: new tissue[10]. Most of these human
Page 127 and 128:
2.8 Viability and morphology study
Page 129 and 130:
Fig 3 and 4 SEM images of electrosp
Page 131 and 132:
studies have indicated that surface
Page 133 and 134:
Fig 10 Tensile Stress curve of elec
Page 135 and 136:
well as on a usual nanofibrous scaf
Page 137 and 138:
Uniform Coating of WO x on TiO 2 Na
Page 140 and 141:
Union as a Social Regulator of Mark
Page 142 and 143:
associated with a reinvention of in
Page 144 and 145:
enterprises…There obviously is no
Page 146 and 147:
Union assistants participate in org
Page 148 and 149:
[5] Deyo, Frederic C and Ağartan,
Page 150 and 151:
Organizational Capabilities • Lea
Page 152 and 153:
Table 1. Correlations, Reliabilitie
Page 154 and 155:
[18]. Jap, S. D. (1999). Pie-expans
Page 156 and 157:
Early Childhood Education Matters:
Page 158 and 159:
decisions on selection of child car
Page 160 and 161:
Noble, K. (2007). Parent choice of
Page 162:
An Exploration of Country of Origin
Page 165 and 166:
Supercontinuum generation for fiber
Page 167 and 168:
Gas chromatographic retention indic
Page 169 and 170:
Outward Foreign Direct Investment f
Page 171 and 172:
Fiber Nonlinearity Precompensation
Page 173 and 174:
Improved Nonlinearity precompensati
show all

Workshop proceeding - final.pdf - Faculty of Information and ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?