Diagnostics: A Brief Historical Perspective - Extreme Networks BlackDiamond 6804 Troubleshooting Manual

Advanced system diagnostics and troubleshooting guide
Hide thumbs Also See for BlackDiamond 6804:
Table of Contents

Advertisement

Introduction

Diagnostics: A Brief Historical Perspective

Diagnostic utility programs were created to aid in troubleshooting system problems by detecting and
reporting faults so that operators or administrators could go fix the problem. While this approach does
help, it has some key limitations:
• It is, at its base, reactive, meaning a failure must occur before the diagnostic test can be used to look
for a cause for the failure.
• It can be time consuming, because the ability to troubleshoot a failure successfully based on the
information provided by the diagnostics test depends greatly on the types of information reported
by the test and the level of detail in the information.
Because users of mission-critical networks and network applications are becoming increasingly
dependent on around-the-clock network access and highest performance levels, any downtime or
service degradation is disruptive and costly. So time lost to an unexpected failure, compounded by more
time lost while someone attempts to track down and fix the failure, has become increasingly less
acceptable.
The process of improving diagnostic tests to minimize failures and their impact is a kind of feedback
system: What you learn through the use of the diagnostics improves your understanding of hardware
failure modes; what you learn from an improved understanding of hardware failure modes improves
your understanding of the diagnostics.
The goal of the current generation of ExtremeWare diagnostics is to help users achieve the highest levels
of network availability and performance by providing a suite of diagnostic tests that moves away from
a reactive stance—wherein a problem occurs and then you attempt to determine what caused the
problem—to a proactive state—wherein the system hardware, software, and diagnostics work together
to reduce the total number of failures and downtime through:
• More accurate reporting of errors (fewer false notifications; more information about actual errors)
• Early detection of conditions that lead to a failure (so that corrective action can be taken before the
failure occurs)
• Automatic detection and correction of packet memory errors in the system's control and data planes
Administrators will now find a greatly reduced MTTR (mean time to repair) due to fast and accurate
fault identification. Multiple modules will no longer need to be removed and tested; faulty components
will usually be identified directly. Over time, there should be a significant reduction in the number of
problems found.
NOTE
In spite of the improved ExtremeWare hardware diagnostics, some network events might still occur,
because software is incapable of detecting and preventing every kind of failure.
12
Advanced System Diagnostics and Troubleshooting Guide

Advertisement

Table of Contents
loading

Table of Contents