Avaya MCC1 Maintenance Procedures page 125

Hide thumbs Also See for MCC1:
Table of Contents

Advertisement

The Watchdog tries to recreate the application a specified number of times. If unsuccessful after that
number of tries within the specified retry interval, the Watchdog runs the application's "total failure"
script.
For Communication Manager, the recovery script kills every Communication Manager process. Its total-
failure script kills off the Communication Manager processes and causes a Linux reboot.
Watchdog and Linux
The Watchdog monitors several Linux services/daemons. Since the Linux init process originally started
these processes, Watchdog can not use the SIGCHLD signal to monitor these processes. Instead,
Watchdog uses a thread to periodically check the validity of the process identifier for each monitored
processes. If invalid, the Watchdog calls a Linux script to stop and then restart the particular service. The
Linux services monitored by Watchdog are:
atd – at daemon (runs programs at specific times)
crond – cron daemon (runs programs periodically)
dbgserv – provides debugging services
httpd – Apache hypertext transfer protocol server (provides Web service)
inetd – Internet server daemon (provides telnet/rlogin/etc. connectivity)
klogd – Linux kernel log daemon (manages logging from Linux kernel/drivers)
prune – monitors and cleans up partitions
syslogd – Linux system log daemon (manages logging from Linux services and applications)
xntpd – network time protocol daemon (manages clock synchronizations across the network)
Watchdog's HiMonitor
The Watchdog's HiMonitor checks for run-away processes and terminates them. HiMonitor deals with an
infinitely looping process that prevents lower-priority processes from running. More specifically, the
high-priority HiMonitor process periodically looks for responses from the low-priority LoMonitor
process. If present, HiMonitor resets Watchdog's timer. If not, HiMonitor issues and logs a top command
to determine which processes are taking up CPU resources. HiMonitor then takes one of three recovery
actions in this order:
1
If a process within Watchdog's or the Process Manager's Linux process group, is consuming too
high a percentage (percentage set in watchd.conf) of CPU occupancy, HiMonitor kills the
process.
2
If no process is using too high a percentage, but more than 100 instances of the same monitored
process is running, HiMonitor reboots Linux.
3
Does nothing and waits for the system to recover on its own.
If LoMonitor does not respond to a preset threshold of HiMonitor checks, then, as a final recovery action,
HiMonitor reboots Linux.
CAUTION:
Escalate to an Avaya engineer for guidance with this recovery, because it is potentially
disruptive. A process can legitimately occupy abnormally high amounts of processor time
due to server load, and killing it could make the server totally unavailable.
Maintenance Procedures
December 2003
Server initialization, recovery, and resets
S8100 Initialization
125

Hide quick links:

Advertisement

Table of Contents
loading

This manual is also suitable for:

Scc1Cmc1G600G650G350G700

Table of Contents