Elektronika 2009-11.pdf - Instytut Systemów Elektronicznych
Elektronika 2009-11.pdf - Instytut Systemów Elektronicznych
Elektronika 2009-11.pdf - Instytut Systemów Elektronicznych
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
[4] Bose A., i in.: Analysis of manufacturing blocking systems with<br />
Network Calculus. Performance Evaluation, vol. 63, pp. 1216-<br />
1234, 2006.<br />
[5] Economou A., Fakinos D. Product form stationary distributions<br />
for queuing networks with blocking and rerouting. Queuing Systems,<br />
vol. 30(3/4), pp. 251-260, 1998.<br />
[6] Gomez-Corral A., Matros M. E. Performance of two-stage tandem<br />
queues with blocking: The impact of several flows of signals.<br />
Performance Evaluation, vol. 63, pp. 910-938, 2006.<br />
[7] Gupta U. C., i in. Discrete-time single-server finite-buffer under<br />
discrete Markovian arrival process with vacations. Performance<br />
Evaluation, vol. 64, pp. 1-19, 2007.<br />
[8] Kim C. S., i in. The BMAP/G/1-> ·/PH/1/M tandem queue with<br />
feedback and losses. Performance Evaluation, vol. 64, pp. 802-<br />
818, 2007.<br />
[9] Mei van der R. D., i in. Response times in a two-node queuing<br />
network with feedback. Performance Evaluation, vol. 49, pp. 99-<br />
110, 2002.<br />
[10] Oniszczuk W. Analysis of an Open Linked Series Three-Station<br />
Network with Blocking. In Advances in Information Processing<br />
and Protection, J. Pejaś, K. Saeed (Eds), Springer: New York,<br />
pp. 419-429, 2007.<br />
[11] Stewart W. J. Introduction to the Numerical Solution of Markov<br />
Chains. Princeton University Press: New Jersey, 1994.<br />
A system automating repairs of IT systems<br />
(System automatyzujący naprawy systemów IT)<br />
MSc MAREK KAMIŃSKI<br />
Gdańsk University of Technology Technology, Faculty of Electronics, Telecommunication<br />
Lufthansa Systems Poland, Sp. z o.o., Gdańsk<br />
Very fast and progressive informatization of almost all domains<br />
of our lives became an undisputed fact. Because huge<br />
number of IT systems of various kinds needs supervision and<br />
maintenance 24 hours a day and 365 days a year, many IT<br />
companies provide nowadays to their customers a new service<br />
offer of remote monitoring, technical support and assistance<br />
in taking care of their systems.<br />
Monitoring is relatively easy in realization because lots of<br />
monitoring systems have been already developed [1] and they<br />
usually accomplish their task in a satisfactory way. We distinguish<br />
traditional monitoring systems [2-7] (usually having centralized<br />
monitoring logic), and ones dedicated to monitoring<br />
grid or cluster structures [8-11] (usually having monitoring<br />
logic distributed), but regardless of the kind of monitoring system,<br />
monitoring aims to give administrators of monitored systems<br />
a clear indication of what is wrong. The next step is to<br />
solve the problem (to repair the system), so integrating monitoring<br />
and repair aspects seems natural.<br />
However, repairs are usually more complicated than monitoring,<br />
as they often involve manual and time-consuming interventions<br />
of administrators, but observations made by the<br />
author of this article show that, in many cases, even they can<br />
be automated and, moreover, integrated with the monitoring<br />
task. Their automation is a complex problem, which is still the<br />
area of active research. The already proposed solutions are<br />
usually too trivial to handle complex repairs in the real world<br />
situations. Others have vulnerabilities, excluding their industrial<br />
application. Some solutions use the concept of timeouts<br />
or retrying failed actions finite number of times, hoping the<br />
problem will not reoccur, or the concept of event handlers<br />
[4-6], being simple executions of remote commands [7], telling<br />
the system which program to execute on undesired monitoring<br />
results. Some solutions address only automation of administrative<br />
tasks [12,13] and do not integrate with monitoring.<br />
Other attempts focus on describing the architecture the system<br />
should have, to facilitate automatic repair [14-16]. Also efforts<br />
were made to integrate the Nagios Monitoring System<br />
[4-6] with the CFengine system (in particular situation, regarding<br />
network problems [17]), and some works show the directions,<br />
monitoring and healing (repairing) can go [18].<br />
Goal and context<br />
This article focuses mainly on the Repair Management System<br />
(RMS), that is one of the parts of the developed Repair<br />
Management Framework, aming at automating the process<br />
of repairing IT systems. The RMF consists also of the Repair<br />
Management Model (RMM) and the repair library. Both of<br />
them were formally and precisely specified, using the Z notation<br />
[19-21] and are described in [22]. The RMM introduces<br />
two mathematical models (model of monitoring and model of<br />
repair processes), general enough to cover the existing problems<br />
and solutions, while RMS, with its complex architecture,<br />
uses those concepts as fundamentals, to incorporate and exploit<br />
existing monitoring solutions into triggering and conducting<br />
repairs automatically. The repair library introduces the<br />
abstraction layer (in a form of API = Application Programmers<br />
Interface), to allow for easy constructing of the repair algorithms,<br />
taking advantage of the programming language, they<br />
are embedded in, and of set of predefined routines (procedures<br />
and function), hiding away from programmers many<br />
unimportant details, regarding monitoring and repair of those<br />
particular problems they solve. All parts of the RMF have been<br />
already instantiated and are under tests in the Lufthansa Systems<br />
Poland Company.<br />
Definition of the solved problem<br />
The problem addressed in this article may be briefly defined<br />
as follows:<br />
• IT company continuously supervises work of many objects<br />
of various kinds,<br />
54 ELEKTRONIKA 11/<strong>2009</strong>