Elektronika 2009-11.pdf - Instytut Systemów Elektronicznych
Elektronika 2009-11.pdf - Instytut Systemów Elektronicznych
Elektronika 2009-11.pdf - Instytut Systemów Elektronicznych
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
• information on their state comes from various sources (e.g.:<br />
monitoring systems, emails being sent by maintenance<br />
tasks executed regularly, customers),<br />
• those objects exist on machines connected by network<br />
having complex topology,<br />
• undesired state of specified objects or their combination indicates<br />
problems that must be solved (usually as soon as<br />
possible),<br />
• big number of repairs is conducted manually, while they are<br />
repeatable and it is noticeable that they may be formally<br />
and precisely specified.<br />
Main ideas and terms<br />
The following subsections contain explanation of main ideas<br />
and terms used in the developed solution and introduce unified<br />
naming convention, regarding both of discussed problems:<br />
Monitoring<br />
Analysis of multiple monitoring systems allowed for creating<br />
more general description of monitoring, which can be perceived<br />
as a process of examining states of groups of objects<br />
of any kind, and of informing about any abnormalities. Each<br />
object may assume precisely determined number of, mutually<br />
exclusive, states taking their values from precisely determined<br />
domain. Some values mean the states are correct, while the<br />
other ones mean the object is associated with a problem that<br />
needs resolving.<br />
We may introduce simple method of identification of those<br />
objects, consisting on associating each of them with the complex<br />
variable of precisely determined name. Monitoring becomes<br />
then a process of continuous checking (or observing)<br />
the values of many variables, and of extracting from their<br />
space, those variables that do not fulfill required constraints.<br />
Binding the names of those variables with the real monitored<br />
objects is done in monitoring procedures. Because most of<br />
those objects may be placed in leaves of a tree, representing<br />
hierarchy of the monitored world, it is convenient to create<br />
space of variables with qualified names, resembling directory<br />
structure of hard disk drive, the DNS (Domain Name System<br />
of Internet names), or the MIB tree of SNMP (Simple Net Management<br />
Protocol). Staying with the analogy of directory structure,<br />
objects (so variables) are represented by files, while<br />
directories represent units grouping them.<br />
eclipse.oracle.TESTDB.fs.usage may be, for example,<br />
a variable representing percent usage of filesystem of<br />
machine eclipse, belonging to DBMS (DataBase Management<br />
System) oracle, identified by SID (System IDentifier)<br />
TESTDB. Values of those variables are changeable in time and<br />
frequency of their changes depends on applied monitoring<br />
mechanisms, on frequency of processing by them the data<br />
coming from monitored objects, on stability of their states and<br />
on their character. Sometimes, there is also a need to bind<br />
a state with some details. Summarizing, we may say:<br />
• state of monitored object is represented by a value of complex<br />
variable associated with it,<br />
• each variable has a name, identifying its position in a monitoring<br />
tree unambiguously,<br />
• their values are changeable in time and can be assigned<br />
optional details.<br />
Monitoring is a process of continuous checking (or observing)<br />
the values of variables making up the monitoring tree.<br />
Problem is a situation when one or more variables have undesired<br />
values. Undesired values, so problems, are expressed<br />
by predicates of problem-related variables of the monitoring<br />
tree. Precisely saying, problem is defined as a situation when<br />
any of those predicates is unfulfilled (when it is false), and<br />
a scope of problem is a term referring to a set of problem-related<br />
variables, falsifying the problem-related predicate.<br />
Repair<br />
Each problem, the RMS is able to solve, is associated with<br />
one or more repair procedures and repair is the process of<br />
using those procedures to bring monitored objects back from<br />
undesired states to desired states. Repair is initiated (triggered)<br />
by detecting in the monitoring tree set of such unprocessed<br />
variables, that make up the scope of the problem,<br />
that has been associated with at least one repair procedure.<br />
Repair procedures are interactive algorithms, written in<br />
high-level programming language, using the standard flow<br />
control instructions and taking advantage of other features of<br />
language, they are embedded in. Those algorithms execute,<br />
so called, corrective steps, whose invocations may be perceived<br />
as calls of external routines (procedures or functions).<br />
External routines are well-known programming languages concept<br />
and, like internal ones, they resemble black boxes having<br />
precisely determined interface (getting determined input and<br />
returning determined output) but, unlike internal ones, having<br />
their bodies defined outside the algorithms calling them. In<br />
case of the RMS system, they are exported, before their executions,<br />
to remote machines, related to problems that need<br />
them to be solved, and they are executed on those machines<br />
with access rights of users defined in repair algorithms.<br />
Programmers writing repair procedures may use means<br />
of expressions encapsulated in routines provided by the repair<br />
library (the repair API), and each repair is started up in<br />
environment containing all necessary information on the particular<br />
problem initiating it (and with the use of the repair library,<br />
this information is easily retrievable).<br />
Construction of the RMS<br />
Ideas and considerations drafted in the previous subsections, as<br />
well as existing industrial reality and already employed monitoring<br />
systems, have lead to the following construction of the RMS:<br />
Architecture of the RMS<br />
Architecture of the RMS has been shown in the Figure 1. MON-<br />
JAMI and HVRMONITOR correspond to systems described in<br />
[2,3]. The RMS collects data coming from various sources, initializes<br />
repairs, manages their whole lifecycle and each repair<br />
may assume one of the following states:<br />
• pending: need for repair was detected and appropriate repair<br />
is awaiting its starting up,<br />
• running: repair has been started up and its realization is in<br />
progress,<br />
• suspended: it has been suspended, as it needs additional<br />
information,<br />
• finished: it has been finished (regardless of its result) and<br />
its result has been saved.<br />
Features of the RMS<br />
The most important features of the RMS are the following:<br />
• it starts up subsequent repairs in parallel (not sequentially),<br />
• starting up follows the order requests for repairs are coming<br />
into the repairs queue,<br />
ELEKTRONIKA 11/<strong>2009</strong> 55