Views
4 years ago

Reliability Analysis of Grid Computing Systems - Computer ...

Reliability Analysis of Grid Computing Systems - Computer ...

Reliability Analysis of Grid Computing Systems - Computer

Reliability Analysis of Grid Computing Systems Y.S. Dai, M. Xie K.L. Poh engp0495@nus.edu.sg isexiem@nus.edu.sg isepohkl@nus.edu.sg Department of Industrial and System Engineering, National University of Singapore. Abstract Grid computing system is different from conventional distributed computing systems by its focus on large-scale resource sharing, where processors and communication have significant influence on grid computing reliability. Most previous research on conventional small-scale distributed systems ignored the communication time and processing time when studying the distributed program reliability, which is not practical in the analysis of grid computing systems. This paper describes the property of the grid computing systems and presents algorithms to analyze the grid program and system reliability. Key words: Grid system, Reliability, Distributed systems. 1. Introduction “Grid” computing systems have emerged as an important new field, distinguished from conventional distributed systems by its focus on large-scale resource sharing, innovative applications, and high-performance orientation [1-2]. Grid computing system is a kind of wide-area distributed systems. The real and specific problem that underlies the Grid concept is coordinated resource sharing and problem solving in dynamic, multi-distributed programs[3-4]. The sharing that the grid computing is concerned with is not primarily file exchange but rather direct access to computers, software data, and other resources, as is required by a range of collaborative problem-solving and resource-brokering strategies. Thus, the communication between computing programs and resources significantly affects the grid computing reliability. From the viewpoint of grid computing program, the program reliability can be defined as the probability of successful execution of the given program running on multiple nodes and exchanging information with the remote resources of other nodes. From the system point of view, the reliability of the grid system can be defined as the probability for all of the grid computing programs to be executed successfully in the grid computing environment. In fact, the grid program/system reliability is a special case of distributed program/system reliability. There are many algorithms for the study of distributed program reliability (DPR) and distributed system reliability (DSR). Kumar et al. [5] seems to be the first paper to present the definition of the DPR and DSR. They constructed a distributed model, which assumed that the probabilities for links and nodes to be operational were constant. Kumar [6] also developed a fast algorithm to evaluate the DPR and DSR. To further improve the evaluation speed for reliability assessment, Chen and Huang [7] proposed FST-SPR algorithm for reducing the number of subgraphs generated during reliability evaluation. The follow-up research [8-11] continued with the study of the DPR and DSR based on the model of Kumar et al. [5]. The assumption, that the operational probabilities of nodes and links are constant, may be feasible in the conventional small-scale distributed systems where the communication time could be negligible. However, for the grid computing system, the size of communicated information through the network and the processing time of the computing programs cannot be ignored. Hence, most of the previous models or algorithms for the conventional small-scale distributed computing systems cannot be simply implemented to study the reliability of grid computing systems. In order to accurately assess the reliability of grid computing system considering its large-scale character, this paper describes the property of the grid computing systems and develops algorithms to derive the grid program and system reliability considering the communication time on the wide-area transfer. The rest part of this paper is organized as below. Section 2 describes the grid computing systems and defines the concepts of grid program and system reliability. Section 3 develops algorithms to derive grid program and system reliability. Section 4 illustrates a numerical example to show the procedures and feasibility of the algorithms. 2. Grid computing system In grid computing systems, programs of each grid computing element need not primarily be for file exchange but rather direct accessing of remote resources [3], such as printers, computers, faxes, software, data, etc.

The LHC Computing Grid - Frédéric Hemmer - CERN
Using AstroGrid CEA to access compute grids
System-Wide Lock Reliability Analysis Phase 0 - Planning ...
Demand Response and ERCOT Grid Reliability - ERCOT.com
Massively Parallel Computing with CUDA - Open Grid Forum
DAE-CERN Collaboration in Grid Computing - National Knowledge ...
Timely and reliable analysis for global decisions - Inforum
Cost-Benefit Analysis of Cloud Computing versus Desktop Grids
Reliable workflow execution in distributed systems...
BOINC – an approach to grid (distributed) computing LinuxDays ...
Grid Computing in Malaysia by Ng Kwang Ming
Intrusion Detection for Grid and Cloud Computing
The Earth System Grid Federation - NOAA National Operational ...
Site Assessment and Probabilistic Risk Analysis (PRA) of Grid ...
Data Visualization - Computational and Systems Biology
pdf - Computational Modeling and Analysis for Complex Systems
IPACS - Integrated Performance Analysis of Computer Systems
Gain insight on Smart Grid Round-Up Industry Growth, 2014
DIS Network Analysis Toolkit - Concurrent Computer Corporation
Computational analysis of membrane proteins: the largest class of ...
Analysis of high-throughput sequencing data - Computational ...
Eyetracking-Analysis of Advertising Effects in Computer Games ...
Introduction to the Modeling and Analysis of Complex Systems
Safety & Reliability Analysis - Rolls-Royce
Bayesian Networks with Applications in Reliability Analysis
Genetic and Evolutionary Computation Conference 2012 - SigEVO