24.11.2014 Views

Elektronika 2009-11.pdf - Instytut Systemów Elektronicznych

Elektronika 2009-11.pdf - Instytut Systemów Elektronicznych

Elektronika 2009-11.pdf - Instytut Systemów Elektronicznych

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

• information on their state comes from various sources (e.g.:<br />

monitoring systems, emails being sent by maintenance<br />

tasks executed regularly, customers),<br />

• those objects exist on machines connected by network<br />

having complex topology,<br />

• undesired state of specified objects or their combination indicates<br />

problems that must be solved (usually as soon as<br />

possible),<br />

• big number of repairs is conducted manually, while they are<br />

repeatable and it is noticeable that they may be formally<br />

and precisely specified.<br />

Main ideas and terms<br />

The following subsections contain explanation of main ideas<br />

and terms used in the developed solution and introduce unified<br />

naming convention, regarding both of discussed problems:<br />

Monitoring<br />

Analysis of multiple monitoring systems allowed for creating<br />

more general description of monitoring, which can be perceived<br />

as a process of examining states of groups of objects<br />

of any kind, and of informing about any abnormalities. Each<br />

object may assume precisely determined number of, mutually<br />

exclusive, states taking their values from precisely determined<br />

domain. Some values mean the states are correct, while the<br />

other ones mean the object is associated with a problem that<br />

needs resolving.<br />

We may introduce simple method of identification of those<br />

objects, consisting on associating each of them with the complex<br />

variable of precisely determined name. Monitoring becomes<br />

then a process of continuous checking (or observing)<br />

the values of many variables, and of extracting from their<br />

space, those variables that do not fulfill required constraints.<br />

Binding the names of those variables with the real monitored<br />

objects is done in monitoring procedures. Because most of<br />

those objects may be placed in leaves of a tree, representing<br />

hierarchy of the monitored world, it is convenient to create<br />

space of variables with qualified names, resembling directory<br />

structure of hard disk drive, the DNS (Domain Name System<br />

of Internet names), or the MIB tree of SNMP (Simple Net Management<br />

Protocol). Staying with the analogy of directory structure,<br />

objects (so variables) are represented by files, while<br />

directories represent units grouping them.<br />

eclipse.oracle.TESTDB.fs.usage may be, for example,<br />

a variable representing percent usage of filesystem of<br />

machine eclipse, belonging to DBMS (DataBase Management<br />

System) oracle, identified by SID (System IDentifier)<br />

TESTDB. Values of those variables are changeable in time and<br />

frequency of their changes depends on applied monitoring<br />

mechanisms, on frequency of processing by them the data<br />

coming from monitored objects, on stability of their states and<br />

on their character. Sometimes, there is also a need to bind<br />

a state with some details. Summarizing, we may say:<br />

• state of monitored object is represented by a value of complex<br />

variable associated with it,<br />

• each variable has a name, identifying its position in a monitoring<br />

tree unambiguously,<br />

• their values are changeable in time and can be assigned<br />

optional details.<br />

Monitoring is a process of continuous checking (or observing)<br />

the values of variables making up the monitoring tree.<br />

Problem is a situation when one or more variables have undesired<br />

values. Undesired values, so problems, are expressed<br />

by predicates of problem-related variables of the monitoring<br />

tree. Precisely saying, problem is defined as a situation when<br />

any of those predicates is unfulfilled (when it is false), and<br />

a scope of problem is a term referring to a set of problem-related<br />

variables, falsifying the problem-related predicate.<br />

Repair<br />

Each problem, the RMS is able to solve, is associated with<br />

one or more repair procedures and repair is the process of<br />

using those procedures to bring monitored objects back from<br />

undesired states to desired states. Repair is initiated (triggered)<br />

by detecting in the monitoring tree set of such unprocessed<br />

variables, that make up the scope of the problem,<br />

that has been associated with at least one repair procedure.<br />

Repair procedures are interactive algorithms, written in<br />

high-level programming language, using the standard flow<br />

control instructions and taking advantage of other features of<br />

language, they are embedded in. Those algorithms execute,<br />

so called, corrective steps, whose invocations may be perceived<br />

as calls of external routines (procedures or functions).<br />

External routines are well-known programming languages concept<br />

and, like internal ones, they resemble black boxes having<br />

precisely determined interface (getting determined input and<br />

returning determined output) but, unlike internal ones, having<br />

their bodies defined outside the algorithms calling them. In<br />

case of the RMS system, they are exported, before their executions,<br />

to remote machines, related to problems that need<br />

them to be solved, and they are executed on those machines<br />

with access rights of users defined in repair algorithms.<br />

Programmers writing repair procedures may use means<br />

of expressions encapsulated in routines provided by the repair<br />

library (the repair API), and each repair is started up in<br />

environment containing all necessary information on the particular<br />

problem initiating it (and with the use of the repair library,<br />

this information is easily retrievable).<br />

Construction of the RMS<br />

Ideas and considerations drafted in the previous subsections, as<br />

well as existing industrial reality and already employed monitoring<br />

systems, have lead to the following construction of the RMS:<br />

Architecture of the RMS<br />

Architecture of the RMS has been shown in the Figure 1. MON-<br />

JAMI and HVRMONITOR correspond to systems described in<br />

[2,3]. The RMS collects data coming from various sources, initializes<br />

repairs, manages their whole lifecycle and each repair<br />

may assume one of the following states:<br />

• pending: need for repair was detected and appropriate repair<br />

is awaiting its starting up,<br />

• running: repair has been started up and its realization is in<br />

progress,<br />

• suspended: it has been suspended, as it needs additional<br />

information,<br />

• finished: it has been finished (regardless of its result) and<br />

its result has been saved.<br />

Features of the RMS<br />

The most important features of the RMS are the following:<br />

• it starts up subsequent repairs in parallel (not sequentially),<br />

• starting up follows the order requests for repairs are coming<br />

into the repairs queue,<br />

ELEKTRONIKA 11/<strong>2009</strong> 55

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!