Elektronika 2009-11.pdf - Instytut SystemÃ³w Elektronicznych

• data regarding their identity (details of repairs): their 

names, times their threads were started up and finished, 

PIDs (Process IDentifiers) of those threads, names of related 

scripts, with repair algorithms, binding repairs with 

scopes of problems triggering them, 

• information on making connections (with particular machines 

and as particular users) in the complex topology 

network, existing in the enterprise (connection tree). 

Case study of exemplary repair 

This subsection contains description and repair algorithm of 

exemplary problem. 

Brief description of the solved problem 

The examined problem is databases-related and may be characterized 

as follows: 

• each change in database is added to journal files (known 

also as redo log files), 

• once a day, after successful backup of the whole database 

(called here checkpoint) is made, journal files are deleted 

and their creation starts from scratch again, 

• creating and enlarging of the journal files is realized by 

a database archiver process, 

• lack of space in the filesystem, where those files exist, 

stops the database. 

Flowchart of the solution 

The way this problem is solved has been depicted in the Fig. 2. 

Conclusions 

The RMS and all other components of the RMF have been 

successfully implemented. Formal specifications of the RMM 

and the repair library formed the base for further development 

and left room for different implementations, sharing the 

same idea. The RMS with the prototype repair library (the 

repair API) are under tests in the Lufthansa Systems Poland 

Company. The API was implemented in the Perl programming 

language and the first experiments show that the system 

and its API meet the expectations of providing adequate 

support to the repair process. Experiments involved mentioned 

problem and its repair procedure which was easily implemented 

(using routines, derived from formal model), after 

representing it by a flowchart. 

Further experiments aiming at assessment of the effectiveness 

and efficiency of the RMS and benchmarking it with 

the traditional approaches are in the planning phase. Among 

expected benefits of the proposed approach are increase of 

reuse of repair procedures, better reliability of the repair 

process, significant increase of performance and better manageability 

due to improved documentation. The RMS incorporates 

existing monitoring systems and it is noticeable that 

repairs became faster, well-documented (so their results can 

be included in the internal reporting systems), and that administrators 

may focus on more complicated tasks. 

References 

[1] Škiljan, Z., Radič, B.: Monitoring systems: Concepts and tools. 

University Computing Centre, Croatia (2004). 

[2] Kamiński, M.: XML-based monitoring and its implementation in 

Perl. Proceedings of the 2nd National TPD Conference, Politechnika 

Poznańska Press, Poland (2007). 

[3] Kamiński, M.: HVRmonitor - data replication monitoring method. 

Proceedings of the 2nd AIS SIGSAND European Symposium on 

Systems Analysis and Design, University of Gdańsk Press, 

Poland (2007). 

[4] Barth, W.: Nagios. System and Network Monitoring. O’Reilly 

Press, USA (2006). 

[5] David, J.: Building a monitoring infrastructure with Nagios. Prentice-Hall, 

Great Britain (2007). 

[6] Turnbull, J.: Pro Nagios 2.0, Apress, USA (2006). 

[7] Zabbix reference manual: http://www.zabbix.com/documentation.php. 

[8] Zanikolas, S., Sakellariou, R.: A taxonomy of grid monitoring systems. 

School of Computer Science. The University of Manchester, 

Great Britain (2004). 

[9] Ceccanti, A., Panzieri, F.: Content-Based Monitoring in Grid Environments. 

Proceedings of the 13th IEEE International Workshops 

on Enabling Technologies. Department of Computer 

Science, University of Bologna, Italy (2004). 

[10] Jianwei, L., Hongbin, C., Pandeng, J., Meirong, C.: Design and 

Implementation of Grid Monitoring System Based on GMA. Proceedings 

of the 6th IEEE International Conference on Parallel 

and Distributed Computing. Applications and Technologies College 

of Computer Science, and University of Electronic Science 

and Technology. China (2006). 

[11] Cooke, A., Nutt, W., Magowan, J., Taylor, P., Leake, J., Byrom, 

R., Field, L., Hicks, S., Soni, M., Wilson, A., Cordenonsi, R., 

Cornwall, L., Djaoui, A., Fisher, S., Podhorszki, N., Coghlan, B., 

Kenny, S., O’Callaghan, D., Ryan, J.: Relational Grid Monitoring 

Architecture (R-GMA), Joint article published in GridPP. University 

of London, Great Britain (2003). 

[12] Campi, N., Bauer, K.: Automating Linux and Unix System Administration. 

Apress. USA (2009). 

[13] Strejcek, B.: Automate admin tasks with the powerful CFengine 

framework: http://www.linuxpromagazine.com/issues/2009/ 

101/big_engine. 

[14] Gerlan, D., Schmerl, B.: Model-based Adaptation for Self-Healing 

Systems. School of Computer Science, Carnegie Mellon University, 

USA (2002). 

[15] Gerlan, D., Shang-Wen, C., Schmerl, B.: Increasing System Dependability 

through Architecture-based Self-repair, School of 

Computer Science, Carnegie Mellon University, USA (2003). 

[16] Gerlan, D., Shang-Wen, C., Schmerl, B., Sousa, J. P., Spitznagel, 

B., Steenkiste, P.: Using Architectural Style as a Basis for 

System Self-repair, School of Computer Science. Carnegie Mellon 

University, USA (2002). 

[17] Retkowski, G.: Building a Self-Healing Network: http://www.onlamp.com/pub/a/onlamp/2006/05/25/self-healing-networks.html. 

[18] Pervilä, M.: Using Nagios to monitor faults in a self-healing environment, 

Department of Computer Science, Helsinki University, 

Finland (2007). 

[19] Woodcock, J., Davies, J.: Using Z.: Specification, Refinement, 

and Proof, University of Oxford, Great Britain (1999). 

[20] Potter, B., Sinclair, J., Till, D.: An Introduction to Formal Specification 

and Z, International series in computer science, Prentice- 

Hall. Great Britain (1991). 

[21] Spivey, J.M.: The Z notation - A Reference Manual, Prentice- 

Hall. Great Britain (1992). 

[22] Kamiński, M.: Towards automating repairs of IT systems, article 

accepted (basing on its abstract) and submitted to the 30th International 

ISAT Conference (Information Systems., Architecture, 

and Technology). Poland (2009): http://www.isat.pwr.wroc.pl/. 

ELEKTRONIKA 11/2009 57

Previous page

Next page

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

Elektronika 2009-11.pdf - Instytut SystemÃ³w Elektronicznych

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?