Extreme Networks ExtremeWare Command Reference Manual page 709

Version 7.5
Hide thumbs Also See for ExtremeWare:
Table of Contents

Advertisement

In the BlackDiamond switches, the
health checker attempts to automatically reset a faulty module and bring it online. If the system health
checker fails more than the configured number of attempts, it sets the module to card-down. This
threshold applies only to BlackDiamond I/O modules.
In ExtremeWare 6.2.1 or later, when auto-recovery is configured, the occurrence of three consecutive
checksum errors causes the packet memory (PM) defect detection program to be run against the I/O
module. Checksum errors can include internal and external MAC port parity errors, EDP checksum
errors, and CPU packet or diagnostic packet checksum errors. If defects are detected, the module is
taken off line, the memory defect information is recorded in the module EEPROM, the defective buffer
is mapped out of further use, and the module is returned to operational state. A maximum of 8 defects
can be stored in the EEPROM.
After the PM defect detection and mapping process has been run, a module is considered failed and is
taken off line in the following circumstances:
• More than eight defects are detected.
• Three consecutive checksum errors were detected by the health checker, but no new defects were
found by the memory scanning and mapping process.
• After defects were detected and mapped out, the same checksum errors are again detected by the
system health checker.
The auto-recovery repetition value is ignored in these cases. In any of these cases, please contact
Extreme Technical Support.
Auto-recovery mode only affects an MSM if the system has no slave MSM. If the faulty module is the
only MSM in the system, auto recovery automatically resets the MSM and brings it back on line.
Otherwise, auto-recovery has no effect on an MSM.
If you specify the
recorded in the log:
<WARN:SYST> card_db.c 832: Although card 2 is back online, contact Tech. Supp. for
assistance.
<WARN:SYST> card_db.c 821: Card 2 has nonrecoverable packet memory defect.
To view the status of the system health checker, use the
To enable the health checker, use the
To disable the health checker, use the
The alarm-level
system is backed by an identical system. By powering down the faulty system, you ensure that erratic
ESRP behavior in the faulty system does not affect ESRP performance and ensures full system failover
to the redundant system.
If you are using ESRP with ESRP diagnostic tracking enabled in your configuration, the system health
check failure will automatically reduce the ESRP priority of the system to the configured failover
priority. This allows the healthy standby system to take over ESRP and become responsible for handling
traffic.
I/O module faults are permanently recorded on the module EEPROM. A module that has failed a
system health check cannot be brought back online.
ExtremeWare 7.5 Command Reference Guide
auto-recovery
option, the module is kept on line, but the following error messages are
online
enable sys-health-check
disable sys-health-check
option is especially useful in an ESRP configuration where the entire
system-down
configure sys-health-check auto-recovery
option configures the number of times the system
command.
show diagnostics
command.
command.
709

Advertisement

Table of Contents
loading

This manual is also suitable for:

Extremeware 7.5

Table of Contents