24.11.2014 Views

Elektronika 2009-11.pdf - Instytut Systemów Elektronicznych

Elektronika 2009-11.pdf - Instytut Systemów Elektronicznych

Elektronika 2009-11.pdf - Instytut Systemów Elektronicznych

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

• data regarding their identity (details of repairs): their<br />

names, times their threads were started up and finished,<br />

PIDs (Process IDentifiers) of those threads, names of related<br />

scripts, with repair algorithms, binding repairs with<br />

scopes of problems triggering them,<br />

• information on making connections (with particular machines<br />

and as particular users) in the complex topology<br />

network, existing in the enterprise (connection tree).<br />

Case study of exemplary repair<br />

This subsection contains description and repair algorithm of<br />

exemplary problem.<br />

Brief description of the solved problem<br />

The examined problem is databases-related and may be characterized<br />

as follows:<br />

• each change in database is added to journal files (known<br />

also as redo log files),<br />

• once a day, after successful backup of the whole database<br />

(called here checkpoint) is made, journal files are deleted<br />

and their creation starts from scratch again,<br />

• creating and enlarging of the journal files is realized by<br />

a database archiver process,<br />

• lack of space in the filesystem, where those files exist,<br />

stops the database.<br />

Flowchart of the solution<br />

The way this problem is solved has been depicted in the Fig. 2.<br />

Conclusions<br />

The RMS and all other components of the RMF have been<br />

successfully implemented. Formal specifications of the RMM<br />

and the repair library formed the base for further development<br />

and left room for different implementations, sharing the<br />

same idea. The RMS with the prototype repair library (the<br />

repair API) are under tests in the Lufthansa Systems Poland<br />

Company. The API was implemented in the Perl programming<br />

language and the first experiments show that the system<br />

and its API meet the expectations of providing adequate<br />

support to the repair process. Experiments involved mentioned<br />

problem and its repair procedure which was easily implemented<br />

(using routines, derived from formal model), after<br />

representing it by a flowchart.<br />

Further experiments aiming at assessment of the effectiveness<br />

and efficiency of the RMS and benchmarking it with<br />

the traditional approaches are in the planning phase. Among<br />

expected benefits of the proposed approach are increase of<br />

reuse of repair procedures, better reliability of the repair<br />

process, significant increase of performance and better manageability<br />

due to improved documentation. The RMS incorporates<br />

existing monitoring systems and it is noticeable that<br />

repairs became faster, well-documented (so their results can<br />

be included in the internal reporting systems), and that administrators<br />

may focus on more complicated tasks.<br />

References<br />

[1] Škiljan, Z., Radič, B.: Monitoring systems: Concepts and tools.<br />

University Computing Centre, Croatia (2004).<br />

[2] Kamiński, M.: XML-based monitoring and its implementation in<br />

Perl. Proceedings of the 2nd National TPD Conference, Politechnika<br />

Poznańska Press, Poland (2007).<br />

[3] Kamiński, M.: HVRmonitor - data replication monitoring method.<br />

Proceedings of the 2nd AIS SIGSAND European Symposium on<br />

Systems Analysis and Design, University of Gdańsk Press,<br />

Poland (2007).<br />

[4] Barth, W.: Nagios. System and Network Monitoring. O’Reilly<br />

Press, USA (2006).<br />

[5] David, J.: Building a monitoring infrastructure with Nagios. Prentice-Hall,<br />

Great Britain (2007).<br />

[6] Turnbull, J.: Pro Nagios 2.0, Apress, USA (2006).<br />

[7] Zabbix reference manual: http://www.zabbix.com/documentation.php.<br />

[8] Zanikolas, S., Sakellariou, R.: A taxonomy of grid monitoring systems.<br />

School of Computer Science. The University of Manchester,<br />

Great Britain (2004).<br />

[9] Ceccanti, A., Panzieri, F.: Content-Based Monitoring in Grid Environments.<br />

Proceedings of the 13th IEEE International Workshops<br />

on Enabling Technologies. Department of Computer<br />

Science, University of Bologna, Italy (2004).<br />

[10] Jianwei, L., Hongbin, C., Pandeng, J., Meirong, C.: Design and<br />

Implementation of Grid Monitoring System Based on GMA. Proceedings<br />

of the 6th IEEE International Conference on Parallel<br />

and Distributed Computing. Applications and Technologies College<br />

of Computer Science, and University of Electronic Science<br />

and Technology. China (2006).<br />

[11] Cooke, A., Nutt, W., Magowan, J., Taylor, P., Leake, J., Byrom,<br />

R., Field, L., Hicks, S., Soni, M., Wilson, A., Cordenonsi, R.,<br />

Cornwall, L., Djaoui, A., Fisher, S., Podhorszki, N., Coghlan, B.,<br />

Kenny, S., O’Callaghan, D., Ryan, J.: Relational Grid Monitoring<br />

Architecture (R-GMA), Joint article published in GridPP. University<br />

of London, Great Britain (2003).<br />

[12] Campi, N., Bauer, K.: Automating Linux and Unix System Administration.<br />

Apress. USA (<strong>2009</strong>).<br />

[13] Strejcek, B.: Automate admin tasks with the powerful CFengine<br />

framework: http://www.linuxpromagazine.com/issues/<strong>2009</strong>/<br />

101/big_engine.<br />

[14] Gerlan, D., Schmerl, B.: Model-based Adaptation for Self-Healing<br />

Systems. School of Computer Science, Carnegie Mellon University,<br />

USA (2002).<br />

[15] Gerlan, D., Shang-Wen, C., Schmerl, B.: Increasing System Dependability<br />

through Architecture-based Self-repair, School of<br />

Computer Science, Carnegie Mellon University, USA (2003).<br />

[16] Gerlan, D., Shang-Wen, C., Schmerl, B., Sousa, J. P., Spitznagel,<br />

B., Steenkiste, P.: Using Architectural Style as a Basis for<br />

System Self-repair, School of Computer Science. Carnegie Mellon<br />

University, USA (2002).<br />

[17] Retkowski, G.: Building a Self-Healing Network: http://www.onlamp.com/pub/a/onlamp/2006/05/25/self-healing-networks.html.<br />

[18] Pervilä, M.: Using Nagios to monitor faults in a self-healing environment,<br />

Department of Computer Science, Helsinki University,<br />

Finland (2007).<br />

[19] Woodcock, J., Davies, J.: Using Z.: Specification, Refinement,<br />

and Proof, University of Oxford, Great Britain (1999).<br />

[20] Potter, B., Sinclair, J., Till, D.: An Introduction to Formal Specification<br />

and Z, International series in computer science, Prentice-<br />

Hall. Great Britain (1991).<br />

[21] Spivey, J.M.: The Z notation - A Reference Manual, Prentice-<br />

Hall. Great Britain (1992).<br />

[22] Kamiński, M.: Towards automating repairs of IT systems, article<br />

accepted (basing on its abstract) and submitted to the 30th International<br />

ISAT Conference (Information Systems., Architecture,<br />

and Technology). Poland (<strong>2009</strong>): http://www.isat.pwr.wroc.pl/.<br />

ELEKTRONIKA 11/<strong>2009</strong> 57

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!