Avaya S8700 Maintenance Manual page 68

For multi-connect configurations
Hide thumbs Also See for S8700:
Table of Contents

Advertisement

Initialization and Recovery
syslogd – Linux system log daemon (manages logging from Linux
services)
xntpd – Network Time Protocol daemon (manages clock synchronizations
across the network)
Watchdog's HiMonitor
The Watchdog's HiMonitor checks for run-away processes and terminates them.
HiMonitor deals with an infinitely looping process that is preventing lower-priority
processes from running. More specifically, the high-priority HiMonitor process
periodically (interval set in watchd.conf) looks for responses from the
low-priority LoMonitor process. If present, HiMonitor resets Watchdog's timer. If
not, HiMonitor issues and logs a "top" command to determine which processes
are taking up CPU resources. HiMonitor then takes one of three recovery actions
in this order:
1. If a process within Watchdog's or the Process Manager's Linux process
group, is consuming too high a percentage (percentage set in
/etc/opt/ecs/watchd.conf) of CPU occupancy, HiMonitor kills the
process.
2. If no process is using too high a percentage, but more than 100 instances
of the same monitored process is running, HiMonitor reboots Linux.
3. Does nothing and waits for the system to recover on its own.
If LoMonitor does not respond to a preset threshold (currently 5 of 7) of HiMonitor
checks, then (as a final recovery action) HiMonitor reboots Linux.
!
CAUTION:
Escalate to an Avaya engineer for explicit guidance with this recovery, since
it is potentially disruptive. A process can legitimately occupy abnormally
high amounts of processor time due to server load, and killing it could make
the server totally unavailable.
However, with an engineer's guidance, recovery can be disabled by setting
the sampling-interval or occupancy-threshold values to "0." More likely, the
sampling-interval and CPU-occupancy thresholds will need to be fine-tuned
to values that don't cause erroneous recovery attempts.
NOTE:
The value of the sampling interval must be greater or equal to "0." If set
to "0," then the "top" command is not run, and no recovery is performed.
Also, the threshold CPU-occupancy percentage must be between "0" and
"100." If set to "0," then no recovery is performed, but the "top" command's
output is logged. Setting these values to "0" may help achieve stability by
obtaining useful data without disrupting the processes.
3-6
Issue 1 May 2002
555-233-143

Advertisement

Table of Contents
loading

Table of Contents