Extreme Networks ExtremeWare Command Reference Manual page 706

Version 7.5
Hide thumbs Also See for ExtremeWare:
Table of Contents

Advertisement

Commands for Status Monitoring and Statistics
The
option configures the number of times the system health checker attempts to
auto-recovery
automatically reset a faulty module and bring it online. If the system health checker fails more than the
configured number of attempts, it sets the module to card-down.
NOTE
This threshold applies only to BlackDiamond I/O modules.
In ExtremeWare 6.2.1 or later, when auto-recovery is configured, the occurrence of three consecutive
checksum errors causes the packet memory (PM) defect detection program to be run against the I/O
module. Checksum errors may include internal and external MAC port parity errors, EDP checksum
errors, and CPU packet or diagnostic packet checksum errors. If defects are detected, the module is
taken offline, the memory defect information is recorded in the module EEPROM, the defective buffer is
mapped out of further use, and the module is returned to operational state. A maximum of 8 defects
can be stored in the EEPROM.
After the PM defect detection and mapping process has been run, a module is considered failed and is
taken offline in the following circumstances:
• More than eight defects are detected.
• Three consecutive checksum errors were detected by the health checker, but no new PM defects were
found by the PM defect detection process.
• After defects were detected and mapped out, the same checksum errors are again detected by the
system health checker.
The auto-recovery repetition value is ignored in these cases. In any of these cases, please contact
Extreme Technical Support.
If you specify the
online
recorded in the log:
<WARN:SYST> card_db.c 832: Although card 2 is back online, contact Tech. Supp. for
assistance.
<WARN:SYST> card_db.c 821: Card 2 has nonrecoverable packet memory defect.
To view the status of the system health checker, use the
To enable the health checker, use the
To disable the health checker, use the
The alarm-level
system-down
system is backed by an identical system. By powering down the faulty system, you ensure that erratic
ESRP behavior in the faulty system does not affect ESRP performance and ensures full system failover
to the redundant system.
If you are using ESRP with ESRP diagnostic tracking enabled in your configuration, the system health
check failure will automatically reduce the ESRP priority of the system to the configured failover
priority. This allows the healthy standby system to take over ESRP and become responsible for handling
traffic.
I/O module faults are permanently recorded on the module's EEPROM. A module that has failed a
system health check cannot be brought back online.
706
option, the module is kept online, but the following error messages are
enable sys-health-check
disable sys-health-check
option is especially useful in an ESRP configuration where the entire
command.
show diagnostics
command.
command.
ExtremeWare 7.5 Command Reference Guide

Advertisement

Table of Contents
loading

This manual is also suitable for:

Extremeware 7.5

Table of Contents