C H A P T E R 8 Process Monitoring And Troubleshooting; System Manager; Watchdog System Monitor; Deadlock Detections - Cisco CRS-1 - Carrier Routing System Router Troubleshooting Manual

Ios xr troubleshooting guide
Hide thumbs Also See for CRS-1 - Carrier Routing System Router:
Table of Contents

Advertisement

System Manager

System Manager
Each process is assigned a job ID (JID) when started. The JID does not change when a process is started,
stopped, then restarted. Each process is also assigned a process ID (PID) when started, but this PID
changes each time the process is stopped and restarted.
The System Manager (sysmgr) is the fundamental process and the foundation of the system. The sysmgr
is responsible for monitoring, starting, stopping, and restarting almost all processes on the system. The
restarting of processes is predefined (respawn flag on or off) and honored by sysmgr. The sysmgr is the
parent of all processes started on boot-up and by configuration. Two instances are running on each node
providing a hot standby process level redundancy. Each active process is registered with the SysDB and
once started by the sysmgr active process the sysmgr is notified when it is running. If the sysmgr active
process is dying the standby process takes over the active state and a new standby process is generated.
The sysmgr running on the line card (LC) handles all the system management duties like process
creation, re-spawning, and core-dumping relevant to that node.
The sysmgr itself is started on bootup by the initialization process. Once the sysmgr is started,
initialization hands over the ownership of all processes started by initialization to sysmgr and exits.

Watchdog System Monitor

The Watchdog System Monitor (wdsysmon) keeps historical data on processes and posts this
information to a fault detector dynamic link library (DLL), which can then be queried by manageability
applications. Once per minute, wdsysmon polls the kernel for process data. This data is stored in a
database maintained by the fm_fd_wdsysmon.dll fault detector, which is loaded by wdsysmon.
For more information on wdsysmon and memory thresholds, see the
section on page 9-197

Deadlock detections

Wdsysmon can attempt to find deadlocks because thread state is returned with the process data.
Wdsysmon specifically looks for mutex deadlocks and local Inter-Process Communication (IPC) hangs.
Only local IPC deadlocks can be detected. If deadlocks are detected, debugging information is collected
in disk0:/wdsysmon_debug.
Deadlocked processes can be stopped and restarted manually using the processes restart command.

Hang detection

When an event manager is created in the system, the event manager library registers the event with
wdsysmon. Wdsysmon expects to periodically hear a "pulse" from every registered event manager in the
system. When an event manager is missing, wdsysmon runs a debug script that shows exactly what the
thread that created the event manager is doing.
Cisco IOS XR Troubleshooting Guide for the Cisco CRS-1 Router
8-172
in
Chapter 9, "Troubleshooting Memory."
Chapter 8
Process Monitoring and Troubleshooting
"Watchdog System Monitor"
OL-21483-02

Advertisement

Table of Contents

Troubleshooting

loading

Table of Contents