Permanent Failures; Responding To Reported Failures - Extreme Networks ExtremeWare Version 7.8 Troubleshooting Manual

Advanced system diagnostics

Table of Contents

Failures of this type are the result of software or hardware systems entering an abnormal operating state

in which normal switch operation might, or might not, be impaired.

Permanent Failures

The most detrimental set of conditions that result in packet error events are those that result in

permanent errors. These types of errors arise from some failure within the switch fabric that causes data

to be corrupted in a systematic fashion. These permanent hardware defects might, or might not, affect

normal switch operation. They cannot be resolved by user intervention and will not resolve themselves.

You must replace hardware to resolve permanent errors.

Responding to Reported Failures

Before ExtremeWare 7.1, the fabric checksum validation mechanisms in ExtremeWare detected and

reported all checksum validation failures, so the resulting mix of message types reported in the system

log could cause confusion as to the true nature of the failure and the appropriate response. The

confusion over the error reporting scheme often led to unnecessary diversion of resources and often

unnecessary service interruptions because operators attempted to respond to reported errors that

presented no actual threat to network operation.

In ExtremeWare 7.1, the responsibility for reporting checksum errors shifted from the low-level bus

monitoring and data integrity verification subsystem that monitors the operation of all data and control

busses within the switch to the higher-level intelligent layer that is responsible for interpreting the test

results and reporting them to the user. Rather than simply insert every checksum validation error in the

system log, the higher-level interpreting and reporting subsystem monitors checksum validation failures

and inserts error messages in the system log when it is likely that a systematic hardware problem is the

cause for the checksum validation failures.

NOTE

The intent of the higher-level interpreting and reporting subsystem is to remove the burden of

interpreting and classifying of messages from the operator. The subsystem automatically differentiates

between harmless checksum error instances and service-impacting checksum error instances.

The interpreting and reporting subsystem uses measurement periods that are divided into a sequence of

20-second windows. Within the period of a window, reports from the low-level bus monitoring

subsystem are collected and stored in an internal data structure for the window. These reports are

divided into two major categories: slow-path reports and fast-path reports.

• Slow-path reports come from monitoring control busses and the CPU-to-switch fabric interface. The

slow-path reporting category is subdivided into different report message subcategories depending

on whether they come from CPU data monitoring, CPU health check tests, or backplane health check

tests.

• Fast-path reports come from direct monitoring of the switch fabric data path. The fast-path reporting

category is subdivided into different report message subcategories, depending on whether they come

from monitoring either internal or external MAC counters associated with each switch fabric in the

switching system.

Advanced System Diagnostics and Troubleshooting Guide

Failure Modes

Table of Contents

Troubleshooting

Need help?

Do you have a question about the ExtremeWare Version 7.8 and is the answer not in the manual?

Permanent Failures; Responding To Reported Failures - Extreme Networks ExtremeWare Version 7.8 Troubleshooting Manual

Permanent Failures

Responding to Reported Failures

Troubleshooting

Need help?

Related Manuals for Extreme Networks ExtremeWare Version 7.8

Related Products for Extreme Networks ExtremeWare Version 7.8

Table of Contents