Elektronika 2009-11.pdf - Instytut Systemów Elektronicznych
Elektronika 2009-11.pdf - Instytut Systemów Elektronicznych
Elektronika 2009-11.pdf - Instytut Systemów Elektronicznych
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
• data regarding their identity (details of repairs): their<br />
names, times their threads were started up and finished,<br />
PIDs (Process IDentifiers) of those threads, names of related<br />
scripts, with repair algorithms, binding repairs with<br />
scopes of problems triggering them,<br />
• information on making connections (with particular machines<br />
and as particular users) in the complex topology<br />
network, existing in the enterprise (connection tree).<br />
Case study of exemplary repair<br />
This subsection contains description and repair algorithm of<br />
exemplary problem.<br />
Brief description of the solved problem<br />
The examined problem is databases-related and may be characterized<br />
as follows:<br />
• each change in database is added to journal files (known<br />
also as redo log files),<br />
• once a day, after successful backup of the whole database<br />
(called here checkpoint) is made, journal files are deleted<br />
and their creation starts from scratch again,<br />
• creating and enlarging of the journal files is realized by<br />
a database archiver process,<br />
• lack of space in the filesystem, where those files exist,<br />
stops the database.<br />
Flowchart of the solution<br />
The way this problem is solved has been depicted in the Fig. 2.<br />
Conclusions<br />
The RMS and all other components of the RMF have been<br />
successfully implemented. Formal specifications of the RMM<br />
and the repair library formed the base for further development<br />
and left room for different implementations, sharing the<br />
same idea. The RMS with the prototype repair library (the<br />
repair API) are under tests in the Lufthansa Systems Poland<br />
Company. The API was implemented in the Perl programming<br />
language and the first experiments show that the system<br />
and its API meet the expectations of providing adequate<br />
support to the repair process. Experiments involved mentioned<br />
problem and its repair procedure which was easily implemented<br />
(using routines, derived from formal model), after<br />
representing it by a flowchart.<br />
Further experiments aiming at assessment of the effectiveness<br />
and efficiency of the RMS and benchmarking it with<br />
the traditional approaches are in the planning phase. Among<br />
expected benefits of the proposed approach are increase of<br />
reuse of repair procedures, better reliability of the repair<br />
process, significant increase of performance and better manageability<br />
due to improved documentation. The RMS incorporates<br />
existing monitoring systems and it is noticeable that<br />
repairs became faster, well-documented (so their results can<br />
be included in the internal reporting systems), and that administrators<br />
may focus on more complicated tasks.<br />
References<br />
[1] Škiljan, Z., Radič, B.: Monitoring systems: Concepts and tools.<br />
University Computing Centre, Croatia (2004).<br />
[2] Kamiński, M.: XML-based monitoring and its implementation in<br />
Perl. Proceedings of the 2nd National TPD Conference, Politechnika<br />
Poznańska Press, Poland (2007).<br />
[3] Kamiński, M.: HVRmonitor - data replication monitoring method.<br />
Proceedings of the 2nd AIS SIGSAND European Symposium on<br />
Systems Analysis and Design, University of Gdańsk Press,<br />
Poland (2007).<br />
[4] Barth, W.: Nagios. System and Network Monitoring. O’Reilly<br />
Press, USA (2006).<br />
[5] David, J.: Building a monitoring infrastructure with Nagios. Prentice-Hall,<br />
Great Britain (2007).<br />
[6] Turnbull, J.: Pro Nagios 2.0, Apress, USA (2006).<br />
[7] Zabbix reference manual: http://www.zabbix.com/documentation.php.<br />
[8] Zanikolas, S., Sakellariou, R.: A taxonomy of grid monitoring systems.<br />
School of Computer Science. The University of Manchester,<br />
Great Britain (2004).<br />
[9] Ceccanti, A., Panzieri, F.: Content-Based Monitoring in Grid Environments.<br />
Proceedings of the 13th IEEE International Workshops<br />
on Enabling Technologies. Department of Computer<br />
Science, University of Bologna, Italy (2004).<br />
[10] Jianwei, L., Hongbin, C., Pandeng, J., Meirong, C.: Design and<br />
Implementation of Grid Monitoring System Based on GMA. Proceedings<br />
of the 6th IEEE International Conference on Parallel<br />
and Distributed Computing. Applications and Technologies College<br />
of Computer Science, and University of Electronic Science<br />
and Technology. China (2006).<br />
[11] Cooke, A., Nutt, W., Magowan, J., Taylor, P., Leake, J., Byrom,<br />
R., Field, L., Hicks, S., Soni, M., Wilson, A., Cordenonsi, R.,<br />
Cornwall, L., Djaoui, A., Fisher, S., Podhorszki, N., Coghlan, B.,<br />
Kenny, S., O’Callaghan, D., Ryan, J.: Relational Grid Monitoring<br />
Architecture (R-GMA), Joint article published in GridPP. University<br />
of London, Great Britain (2003).<br />
[12] Campi, N., Bauer, K.: Automating Linux and Unix System Administration.<br />
Apress. USA (<strong>2009</strong>).<br />
[13] Strejcek, B.: Automate admin tasks with the powerful CFengine<br />
framework: http://www.linuxpromagazine.com/issues/<strong>2009</strong>/<br />
101/big_engine.<br />
[14] Gerlan, D., Schmerl, B.: Model-based Adaptation for Self-Healing<br />
Systems. School of Computer Science, Carnegie Mellon University,<br />
USA (2002).<br />
[15] Gerlan, D., Shang-Wen, C., Schmerl, B.: Increasing System Dependability<br />
through Architecture-based Self-repair, School of<br />
Computer Science, Carnegie Mellon University, USA (2003).<br />
[16] Gerlan, D., Shang-Wen, C., Schmerl, B., Sousa, J. P., Spitznagel,<br />
B., Steenkiste, P.: Using Architectural Style as a Basis for<br />
System Self-repair, School of Computer Science. Carnegie Mellon<br />
University, USA (2002).<br />
[17] Retkowski, G.: Building a Self-Healing Network: http://www.onlamp.com/pub/a/onlamp/2006/05/25/self-healing-networks.html.<br />
[18] Pervilä, M.: Using Nagios to monitor faults in a self-healing environment,<br />
Department of Computer Science, Helsinki University,<br />
Finland (2007).<br />
[19] Woodcock, J., Davies, J.: Using Z.: Specification, Refinement,<br />
and Proof, University of Oxford, Great Britain (1999).<br />
[20] Potter, B., Sinclair, J., Till, D.: An Introduction to Formal Specification<br />
and Z, International series in computer science, Prentice-<br />
Hall. Great Britain (1991).<br />
[21] Spivey, J.M.: The Z notation - A Reference Manual, Prentice-<br />
Hall. Great Britain (1992).<br />
[22] Kamiński, M.: Towards automating repairs of IT systems, article<br />
accepted (basing on its abstract) and submitted to the 30th International<br />
ISAT Conference (Information Systems., Architecture,<br />
and Technology). Poland (<strong>2009</strong>): http://www.isat.pwr.wroc.pl/.<br />
ELEKTRONIKA 11/<strong>2009</strong> 57