4
Software Exception Handling
This chapter describes the software exception handling features built into Extreme hardware and
software products to detect and respond to problems to maximize switch reliability and availability.
This chapter contains the following sections:
• Overview of Software Exception Handling Features on page 37
• Configuring System Recovery Actions on page 40
• Configuring Reboot Loop Protection on page 42
• Dumping the System Memory on page 44
Overview of Software Exception Handling Features
In the context of using the Extreme Advanced System Diagnostics—either manually or automatically,
there are several things you must keep in mind that can affect the operation of the diagnostics and/or
the reliable operation of the switch itself:
• System watchdog behavior
• System software exception recovery behavior (configuration options)
• Redundant MSM behavior (and failover, in BlackDiamond systems)
System Watchdog Behavior
The system watchdog is a system self-reliancy diagnostic mechanism to monitor the CPU and ensure
that it does not become trapped in a processing loop.
In normal operation, the system's normal task processing periodically resets the watchdog timer and
restarts it, maintaining uninterrupted system operation. But if the watchdog timer expires before the
normal task processing restarts it, the system is presumed to be malfunctioning and the watchdog
expiry triggers a reboot of the master MSM.
Depending on the persistence of an error and the system recovery actions configured in the
config
command (reboot, shutdown, system dump, or—in the case of BlackDiamond
sys-recovery-level
systems equipped with redundant MSMs—MSM failover), the reboot might cause the system to
perform the configured system recovery actions.
Advanced System Diagnostics and Troubleshooting Guide
37