Blackdiamond System With Two Msms - Extreme Networks BlackDiamond 6804 Troubleshooting Manual

Advanced system diagnostics and troubleshooting guide
Hide thumbs Also See for BlackDiamond 6804:
Table of Contents

Advertisement

Diagnostics

BlackDiamond System with Two MSMs

During the scanning period, the module is taken offline. Expect a minimum offline time of 90 seconds.
Up to eight correctable single-bit errors are corrected, with minimal loss to the total memory buffers.
In extremely rare cases, non-correctable errors are detected by memory scanning. In these circumstances,
the condition is noted, but no corrective action is possible. When operating in the manual mode of
memory scanning, the module is returned to online service after all possible corrective actions have
been taken.
During the memory scan, the CPU utilization is high and mostly dedicated to executing the
diagnostics—as is normal for running any diagnostic on the modules. During this time, other network
activities where this system is expected to be a timely participant could be adversely affected, for
example, in networks making use of STP and OSPF.
The alarm-level option of the global system health check facility does not attempt to diagnose a
suspected module; instead, it simply logs a message at a specified level.
The auto-recovery option does attempt to diagnose and recover a failed module a configured number of
times. You should plan carefully before you use this command option. If you enable the system health
check facility on the switch and configure the auto-recovery option to use the offline auto-recovery
action, once a module failure is suspected, the system removes the module from service and performs
extended diagnostics. If the number of auto-recovery attempts exceeds the configured threshold, the
system removes the module from service. The module is permanently marked "down," is left in a
non-operational state, and cannot be used in a system running ExtremeWare 6.2.2 or later. A log
message indicating this will be posted to the system log.
NOTE
Keep in mind that the behavior described above is configurable by the user, and that you can enable the
system health check facility on the switch and configure the auto-recovery option to use the online
auto-recovery action, which will keep a suspect module online regardless of the number of errors
detected.
Example log messages for modules taken offline:
01/31/2002 01:16.40 <CRIT:SYST> Sys-health-check [ACTION] (PBUS checksum)
(CARD_HWFAIL_PBUS_CHKSUM_EDP_ERROR) slot 3
01/31/2002 01:16.40 <INFO:SYST> Card in slot 1 is off line
01/31/2002 01:16.40 <INFO:SYST> card.c 2035: Set card 1 to Non-operational
01/31/2002 01:16.40 <INFO:SYST> Card in slot 2 is off line
01/31/2002 01:16.44 <INFO:SYST> card.c 2035: Set card 2 to Non-operational
01/31/2002 01:16.44 <INFO:SYST> Card in slot 3 is off line
01/31/2002 01:16.46 <INFO:SYST> card.c 2035: Set card 3 to Non-operational
01/31/2002 01:16.46 <INFO:SYST> Card in slot 4 is off line
01/31/2002 01:16.46 <INFO:SYST> card.c 2035: Set card 4 to Non-operational
64
Advanced System Diagnostics and Troubleshooting Guide

Advertisement

Table of Contents
loading

Table of Contents