ADMIN+Magazine+Sample+PDF
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Daemon Monitoring<br />
Nuts and Bolts<br />
Figure 1: After starting, the script outputs the log at the console: availability,<br />
error, restart, database running.<br />
Almost all known databases include<br />
a client program for the shell – for<br />
example, mysql for MySQL or psql for<br />
PostgreSQL. Alternatively, you can<br />
use ODBC to access the database in<br />
your scripted monitoring, such as the<br />
isql tool provided by the Unix ODBC<br />
project.<br />
For ease of access, you might need to<br />
set up a (non-privileged) user, a database,<br />
and a table for the test query<br />
on the database server. If you choose<br />
the ODBC option, you also need a<br />
.odbc.ini file with the right access<br />
credentials.<br />
The psql shell client for the Postgres<br />
database also poses the problem of<br />
non-standard exit codes. 1 stands for<br />
an error in the query, although the<br />
connect attempt has been successful;<br />
2 indicates a connection error.<br />
A connection test with psql would<br />
look like this:<br />
psql ‐U User ‐d Database ‐c U<br />
"select * from test_table;"<br />
For ODBC access, you would need to<br />
pipe the SQL query to the client:<br />
echo "select * from test_table;" | U<br />
isql ODBC_data_source user<br />
For the cups printer daemon, lpq<br />
gives you a simple method of checking<br />
whether the daemon is alive. If<br />
you need to check access to individual<br />
printers, you additionally need<br />
to provide the print queue name and<br />
then grep the exit code. To make sure<br />
the exit code complies with this behavior,<br />
Grep checks the output that<br />
you receive if the printer is active:<br />
lpq ‐Pprinter | grep ‐q U<br />
"printer is ready"<br />
To match the output from lpq, you<br />
need to modify the search string for<br />
grep.<br />
The ping command<br />
checks<br />
network connections.<br />
The<br />
exit error codes<br />
differ, depending<br />
on your operating<br />
system. The<br />
FreeBSD ping<br />
uses 2, the Linux<br />
ping uses 1.<br />
The number of test packets is restricted<br />
by the ‐c packets option;<br />
this improves the script run time and<br />
avoids unnecessary network traffic. If<br />
you use the IP address as the target,<br />
you avoid the risk of false positives<br />
from buggy name resolution.<br />
ping ‐c1 ip_address<br />
Sensor scripts can obviously be extended<br />
to cover many other system<br />
parameters, such as disk space usage<br />
(df), logged in users (who), and much,<br />
much more.<br />
If an error or threshold value infringement<br />
occurs, the script can use this<br />
information to generate a message<br />
and notify the system administrator.<br />
The message text should include the<br />
hostname, date, and time. Messages<br />
can be stored in a file to which the<br />
administrator has permanent access.<br />
To allow this to happen, you simply<br />
have to display the logfile in a terminal<br />
and use tail ‐f, but other forms<br />
of communication are also possible –<br />
texting, for example.<br />
If the shell script has the correct<br />
privileges, it can become involved<br />
and restart a daemon, remove block<br />
files, or even reboot the whole system.<br />
Because you should avoid running<br />
this kind of script as root, you<br />
can instead set up special users and<br />
groups to own the script and the<br />
process (which is the case with many<br />
daemons).<br />
Database Restart<br />
The sample script in Listing 1 monitors<br />
an active database instance and<br />
notifies the administrator if the database<br />
happens to fail and then is<br />
successfully restarted (Figure 1). If<br />
it can’t start the daemon, it waits<br />
for the administrator to step in and<br />
handle the situation.<br />
Printer Restart<br />
The second sample script relates to<br />
the printing service. The one shown<br />
here is taken from a production example,<br />
in which the cupsd server has<br />
an unknown problem with a network<br />
printer. The printer was disabled time<br />
and time again, causing no end of<br />
frustration to users and unnecessary<br />
work for the system admins. The shell<br />
script shown in Listing 2 doesn’t<br />
output messages; instead, it simply<br />
restarts the service. Either run these<br />
scripts manually (for a temporary fix<br />
or quick check) or as RC scripts.<br />
Conclusions<br />
Administrators don’t need a complex<br />
monitoring framework that covers<br />
every aspect of the environment and<br />
has a multi-week learning curve.<br />
With some scripting know-how, you<br />
can easily create your own shell<br />
scripts to monitor server daemon processes<br />
and restart them autonomously<br />
if so desired. The use of shell scripts<br />
to monitor daemons and other system<br />
functions is by no means restricted to<br />
small embedded systems. With scripts<br />
tailored to match your requirements,<br />
you can establish your own troubleshooting<br />
arsenal.<br />
n<br />
The Author<br />
Harald Zisler has worked with Unix-flavored<br />
operating systems since the early 1990s.<br />
Listing 2: CUPS Monitoring<br />
01 #! /bin/sh<br />
02 <br />
03 while true<br />
04 do<br />
05 <br />
06 lpq ‐Plp | grep ‐q "lp is ready"<br />
07 <br />
08 if [ $? ‐gt 0 ]<br />
09 then<br />
10 cupsenable lp<br />
11 fi<br />
12 <br />
13 sleep 15<br />
14 <br />
15 done<br />
www.admin-magazine.com<br />
Admin 01<br />
93