Extreme Networks BlackDiamond 6804 Troubleshooting Manual
Extreme Networks BlackDiamond 6804 Troubleshooting Manual

Extreme Networks BlackDiamond 6804 Troubleshooting Manual

Advanced system diagnostics and troubleshooting guide
Hide thumbs Also See for BlackDiamond 6804:
Table of Contents

Advertisement

Quick Links

Advanced System
Diagnostics and

Troubleshooting Guide

Extreme Networks, Inc.
3585 Monroe Street
Santa Clara, California 95051
(888) 257-3000
http://www.extremenetworks.com
Published: March 2005
Part number: 100189-00 Rev. 01

Advertisement

Table of Contents
loading

Summary of Contents for Extreme Networks BlackDiamond 6804

  • Page 1: Troubleshooting Guide

    Advanced System Diagnostics and Troubleshooting Guide Extreme Networks, Inc. 3585 Monroe Street Santa Clara, California 95051 (888) 257-3000 http://www.extremenetworks.com Published: March 2005 Part number: 100189-00 Rev. 01...
  • Page 2 Extreme Networks, Inc., which may be registered or pending registration in certain jurisdictions. The Extreme Turbodrive logo is a service mark of Extreme Networks, which may be registered or pending registration in certain jurisdictions. Specifications are subject to change without notice.
  • Page 3: Table Of Contents

    Diagnostics Support The BlackDiamond Systems BlackDiamond 6800 Series Hardware Architecture Differences The BlackDiamond Backplane BlackDiamond I/O Modules Management Switch Modules BlackDiamond MSM Redundancy Causes of MSM Failover and System Behavior Alpine Systems Summit Systems Advanced System Diagnostics and Troubleshooting Guide...
  • Page 4 Packet Errors and Packet Error Detection Overview Definition of Terms Standard Ethernet Detection for Packet Errors on the Wire Extreme Networks’ Complementary Detection of Packet Errors Between Wires Hardware System Detection Mechanisms Software System Detection Mechanisms Failure Modes Transient Failures...
  • Page 5 Automatic Packet Memory Scan (via sys-health-check) Memory Scanning and Memory Mapping Behavior Limited Operation Mode Effects of Running Memory Scanning on the Switch Summit, Alpine, or BlackDiamond with a Single MSM BlackDiamond System with Two MSMs Interpreting Memory Scanning Results...
  • Page 6 Viewing Diagnostics Results Example Log Messages for FDB Scan Diagnostic Failures Example FDB Scan Results from the show diagnostics Command Example Output from the show switch command Example Output from the show fdb remap Command Chapter 6 Additional Diagnostics Tools...
  • Page 7 Chapter 7 Troubleshooting Guidelines Contacting Extreme Technical Support Americas TAC Asia TAC EMEA TAC Japan TAC What Information Should You Collect? Analyzing Data Diagnostic Troubleshooting Extreme Networks’ Recommendations Using Memory Scanning to Screen I/O Modules Appendix A Limited Operation Mode and Minimal Operation Mode...
  • Page 8 Contents Advanced System Diagnostics and Troubleshooting Guide...
  • Page 9: Preface

    Introduction This guide describes how to use the ExtremeWare hardware diagnostics suite to test and validate the operating integrity of Extreme Networks switches. The tools in the diagnostic suite are used to detect, isolate, and treat faults in a system.
  • Page 10: Related Publications

    • ExtremeWare Software User Guide, Software Version 6.2.2. • ExtremeWare Software Command Reference, Software Version 6.2.2. • ExtremeWare Error Message Decoder. Documentation for Extreme Networks products is available on the World Wide Web at the following location: http://www.extremenetworks.com/ Advanced System Diagnostics and Troubleshooting Guide...
  • Page 11: Chapter 1 Introduction

    Introduction This guide describes how to use the ExtremeWare hardware diagnostics suite to test and validate the operating integrity of Extreme Networks switches. The tools in the diagnostic suite are used to detect, isolate, and treat faults in a system.
  • Page 12: Diagnostics: A Brief Historical Perspective

    Introduction Diagnostics: A Brief Historical Perspective Diagnostic utility programs were created to aid in troubleshooting system problems by detecting and reporting faults so that operators or administrators could go fix the problem. While this approach does help, it has some key limitations: •...
  • Page 13: Overview Of The Extremeware Diagnostics Suite

    The extended diagnostics include the packet memory scan, which checks the packet memory area of the switch fabric for defects and maps out defective blocks. This test can be run by itself, as part of the slot-based extended diagnostics, or can be invoked from within the system health checks.
  • Page 14 Introduction Advanced System Diagnostics and Troubleshooting Guide...
  • Page 15: Hardware Architecture

    • Summit Systems on page 23 Diagnostics Support The ExtremeWare diagnostic suite applies only to Extreme Networks switch products based on the “inferno” series chipset. Equipment based on this chipset are referred to as being “inferno” series or “i” series products: the BlackDiamond family of core chassis switches (6804, 6808, and 6816), the Alpine systems (3802, 3804, 3808), and the Summit “i”-series stackables (Summit1i, Summit5i, Summit7i,...
  • Page 16: The Blackdiamond Systems

    MSMs. • BlackDiamond 6808—Modular chassis with passive backplane; eight chassis slots for I/O modules; two chassis slots for MSMs. • BlackDiamond 6804—Modular chassis with passive backplane; four chassis slots for I/O modules; two chassis slots for MSMs. Management Bus...
  • Page 17: The Blackdiamond Backplane

    Data traffic is carried on four AUI links between each MSM and each I/O slot on BlackDiamond 6804 and BlackDiamond 6808 systems, and on two AUI links between each MSM and each I/O slot on BlackDiamond 6816 systems. Device management occurs on a 32-bit PCI bus connecting MSMs and I/O modules.
  • Page 18: Blackdiamond I/O Modules

    Each BlackDiamond I/O module has a built-in switching fabric (see Figure 3) giving the module the capability to switch local traffic on the same module. Traffic that is destined for other modules in the chassis travels across the backplane to the MSMs, where it is switched and sent to its destination I/O module.
  • Page 19: Management Switch Modules

    MSMs. Management Switch Modules As its name indicates, the Management Switch Fabric Module (MSM) serves a dual role in the system: it is equipped to act as the internal switch fabric for data that is being transferred between I/O modules in the chassis, and to handle the upper-layer processing and system management functions for the switch.
  • Page 20: Blackdiamond Msm Redundancy

    CPU Control Path Data Path The master MSM CPU subsystem actively manages the switch and the task of switching packets in the CPU control (or management) path. The slave MSM CPU subsystem is in standby mode, but is checked periodically by the master MSM CPU (via EDP) to determine whether it is still available.
  • Page 21 The MSM-3 uses new technology to provide “hitless” failover, meaning the MSM-3 transitions through a failover with no traffic loss and no switch downtime, while it maintains active links and preserves layer 2 state tables. Contrast this performance to normal failover with MSM64i modules, which can take the switch down for approximately 30 seconds.
  • Page 22: Alpine Systems

    • Active backplane—Alpine switches use an active backplane that uses the same basic set of ASICs (the switch engine ASIC and the address filtering and queue management ASIC) and memory (packet memory for storing packets; OTP RAM, PQ RAM, and VPST RAM) that are used on the BlackDiamond MSMs and I/O modules, so it offers wire-speed switching.
  • Page 23: Summit Systems

    (DRAM, NVRAM, and flash memory), console port connectors, management interface, and a PCMCIA slot. The Summit switching fabric subsystem uses the same basic set of ASICs (the switch engine ASIC and the address filtering and queue management ASIC) and memory (packet memory for storing packets;...
  • Page 24 Hardware Architecture Advanced System Diagnostics and Troubleshooting Guide...
  • Page 25: Overview

    Packet Errors and Packet Error Detection This chapter describes some of the factors that might result in packet errors in the switch fabric and the kinds of protection mechanisms that are applied to ensure that packet error events are minimized and handled appropriately.
  • Page 26: Definition Of Terms

    Checksum A value computed by running actual packet data through a polynomial formula. Checksums are one of the tools used by Extreme Networks in attempts to detect and manage packet error events. Packet checksum A checksum value that is computed by the MAC chip when the packet is transferred from the MAC chip to the switch fabric.
  • Page 27: Standard Ethernet Detection For Packet Errors On The Wire

    Redundancy Check (CRC) built into the IEEE 802.3 specification. As the sending switch assembles a frame, it performs a CRC calculation on the bits in that frame and stores the results of that calculation in the frame check sequence field of the frame. At the receiving end, the switch performs an identical CRC calculation and compares the result to the value stored in the frame check sequence field of the frame.
  • Page 28: Hardware System Detection Mechanisms

    Extreme Networks switch. Hardware System Detection Mechanisms All Extreme Networks switches based on the “i”-series switch fabric validate data integrity internal to the switch fabric using a common checksum verification algorithm. Using Figure 8 as a generalized...
  • Page 29: Software System Detection Mechanisms

    If a mismatch is found, the switch fabric reports the checksum error condition to the CPU as it passes the packet up to the CPU. These types of checksum errors are one instance of a class of checksum errors known as slow-path checksum errors.
  • Page 30: Failure Modes

    Diagnostics, the appearance of a checksum error message in the system log—for example—indicates that the normal error detection mechanisms in the switch have detected that the data in a packet has been modified inappropriately. While checksums provide a strong check of data integrity, they must be qualified according to their risk to the system and by what you can do to resolve the problem.
  • Page 31: Permanent Failures

    The most detrimental set of conditions that result in packet error events are those that result in permanent errors. These types of errors arise from some failure within the switch fabric that causes data to be corrupted in a systematic fashion. These permanent hardware defects might, or might not, affect normal switch operation.
  • Page 32: Error Message Format

    (ingresses) and exits (egresses) the switch fabric MACs. (Extreme switches pro-actively scan for fault conditions throughout the switch architecture, and these packet types are all part of this effort. A checksum on one of these packet types could have its root in packet memory, because all of these test packet types are stored for a time in the packet memory.
  • Page 33: Fabric Checksum Error Message Logging

    Sys-health-check [CPU] checksum error on slot <slot_number> This message indicates that the switch fabric detected an error on a frame destined for the CPU. This error was most likely introduced in slot <slot_number>. If the CPU or backplane health check counters for <slot_number>...
  • Page 34: Checksum Message Examples

    Packet Errors and Packet Error Detection NOTE The prev= and cur= counters described above are 8-bit counters, and can wrap around, so the actual number of checksum errors detected in the previous 15 seconds might not present an accurate count. Checksum Message Examples 12/15/2003 16:57.04 <CRIT:SYST>...
  • Page 35: Panic/Action Error Messages

    Extreme diagnostics suite. For example, any other component in the path between the ingress and egress points could malfunction, resulting in a corrupted checksum.
  • Page 36: Panic/Action Message Example

    Packet Errors and Packet Error Detection by a packet memory failure, but that there are other possibilities as well. The packet memory scan should always be used in conjunction with the extended diagnostics to check the integrity of all the components.
  • Page 37: Software Exception Handling

    Software Exception Handling This chapter describes the software exception handling features built into Extreme hardware and software products to detect and respond to problems to maximize switch reliability and availability. This chapter contains the following sections: • Overview of Software Exception Handling Features on page 37 •...
  • Page 38: System Software Exception Recovery Behavior

    Software Exception Handling The system-watchdog feature is enabled by default. The CLI commands related to system-watchdog operation are: enable system-watchdog disable system-watchdog NOTE During the reboot cycle, network redundancy protocols will work to recover the network. The impact on the network depends on the network topology and configuration (for example, OSPF ECMP versus a large STP network on a single domain).
  • Page 39 For example, if you select the “shutdown” option in the the actions the software routine performs is to instruct the slave MSM in a BlackDiamond switch not to monitor the master MSM, to prevent MSM failover from occurring.
  • Page 40: Configuring System Recovery Actions

    The remaining completion action for the system-dump option, maintenance-mode, leaves the switch in whatever state the dump transfer puts it in. Some subsystems might not work correctly, or work at all after a system dump.
  • Page 41: Usage Notes

    Usage Notes When you configure the actions: • All watchdogs and timers are stopped. • All tasks except the following are suspended: — tSwFault — tLogTask — tSyslogTask — tShell — tConsole — tExtTask — tExcTask — The root task •...
  • Page 42: Configuring Reboot Loop Protection

    • To view the current settings, use the • The reboot loop protection settings are stored in the switch memory, but are not saved in the switch configuration. In a BlackDiamond switch equipped with redundant MSM64i modules, the...
  • Page 43 Configuring Reboot Loop Protection command does transfer the reboot loop protection settings to the synchronized synchronize MSM64i. Advanced System Diagnostics and Troubleshooting Guide...
  • Page 44: Dumping The System Memory

    You can dump (copy and transfer) the contents of the system DRAM memory to a remote TFTP host so that it can be passed to an Extreme Networks technical support representative who will examine and interpret the dump results. The system dump only works through the Ethernet management port.
  • Page 45: Initiating A Manual System Dump

    where: critical • To turn off the system dump action in the system-recovery-level process, use the following command: unconfigure system-dump • To display the configured values for the use the following command: show system-dump When neither the server IP address nor the timeout parameter has been configured, the show system-dump Server ip : none Dump timeout : none...
  • Page 46 Software Exception Handling Advanced System Diagnostics and Troubleshooting Guide...
  • Page 47: Diagnostic Test Functionality

    Diagnostics This chapter describes how to configure and use the Extreme Advanced System Diagnostics. This chapter contains the following sections: • Diagnostic Test Functionality on page 47 • System Health Checks: A Diagnostics Suite on page 51 • Power On Self Test (POST) on page 55 •...
  • Page 48: How The Test Affects The Switch

    (see Figure 10) or the data bus during the test: • In a passive test, the test merely scans switch traffic (packet flow) for packet memory errors. • In an active test, the test originates test messages (diagnostic packets) that it sends out and then validates to verify correct operation.
  • Page 49 Diagnostic tests are processed by the CPU. When invoked, each diagnostic tests looks for different things (device problems, communication-path problems, etc.), and uses either the control bus or the data bus, or—in some cases—both buses to perform the test. For example, Figure 9 shows a simplified example of the CPU health check test. The CPU health check test sends five different diagnostic packets across the control bus to each I/O module.
  • Page 50 Diagnostics Figure 10: Backplane health check paths (BlackDiamond architecture) I/O Module AFQM ASIC (Quake) ASIC (Twister) CPU loads test packet to MSM fabric Test packet transferred on data bus (fast path) Test packet returned to Management Bus NVRAM E-NET Console PCMCIA AFQM ASIC...
  • Page 51: System Health Checks: A Diagnostics Suite

    — Normal, extended, and packet memory scan — Run on demand by user command — Offer configurable levels — Remove the switch fabric from service for the duration of the tests • Background packet memory scanning and mapping — Checks all packet storage memory for defects —...
  • Page 52: The Role Of Memory Scanning And Memory Mapping

    The packet memory scan examines every node of packet memory to detect packet errors by writing data to packet memory, then reading and comparing results. The test is invasive and takes the switch fabric offline to perform the test.
  • Page 53: Modes Of Operation

    Memory scanning is supported on the following platforms and modules: • BlackDiamond 6816, BlackDiamond 6808, and BlackDiamond 6804 • BlackDiamond modules: MSM, F96Ti, F48Ti, G8Xi, G8Ti, G12SXi, G16Xi, G16Ti, 10G Xenpak • Alpine 3808, Alpine 3804, and Alpine 3802 (manual mode only) •...
  • Page 54: The Role Of Processes To Monitor System Operation

    When you are in the process of implementing the ExtremeWare diagnostics, keep in mind the software fault recovery features built into Extreme hardware and software products to detect and respond to problems to maximize switch reliability and availability. The System-Watchdog, System-Recovery-Mode, and Reboot-Loop-Protection functions ensure that the switch can not only pass all POST test diagnostics, but also verify that all processes continue to perform properly during runtime operation.
  • Page 55: Power On Self Test (Post)

    The pre-POST test is a bootup process that tests CPU memory, Universal Asynchronous Receiver/Transmitter (UART) parts, ASIC registers and memory. The POST tests the following switch elements (depending on the module type: MSM or I/O module): • Register ASIC on the CPU •...
  • Page 56: Runtime (On-Demand) System Diagnostics

    ASIC, ASIC-memory, and packet loopback tests. The extended tests take a maximum of 15 minutes. • On Demand Packet Memory Scan on page 59—The packet memory test scans the switch fabric in the switch (Summit or Alpine) or the module in the specified slot (BlackDiamond only) for single-bit packet memory defects.
  • Page 57: Running The Normal Diagnostics On Blackdiamond Systems

    The impact of the normal diagnostics depends on the switch type and—in the case of the BlackDiamond switch—whether the module type being tested is an MSM or an I/O module (see Table 4).
  • Page 58: Extended System Diagnostics

    • Additional loop-back tests: Big packet (4k) MAC, transceiver, VLAN NOTE Only run these diagnostics when the switch or module can be brought off-line. The tests performed are extensive and affect traffic that must be processed by the system CPU, because the diagnostics are processed by the system CPU.
  • Page 59: Running The Extended Diagnostics On Summit Systems

    The impact of the extended diagnostics depends on the switch type and—in the case of the BlackDiamond switch—whether the extended diagnostics are run on an MSM in a switch with a single MSM or on an I/O module (see Table 7). When you enter the...
  • Page 60: Running The Packet Memory Scan Diagnostics On Summit Systems

    90 seconds and the module remains offline for the duration of the scan. For Summit and Alpine systems, the test is initiated by manual command, the entire switch is taken offline during the time test is running and is then rebooted.
  • Page 61 Errors not mapped; switch kept online. • Switch enters limited commands mode. • Errors mapped; switch kept online. • >7 Errors not mapped; switch enters limited commands mode. • Switch kept online. • Errors not mapped; switch kept online. • >7 Errors not mapped;...
  • Page 62 Diagnostics Table 6 describes the behavior of the switch if you run diagnostics manually using the CLI command with the run diagnostics configuration, the mode selected (online or offline) using the command, and the number of errors to be detected.
  • Page 63: Limited Operation Mode

    CLI commands are active. Ports are powered down so that links to adjacent devices do not come up. The switch fabric is not operational. Limited operation mode allows diagnostic work to be done on failed devices while redundant backup devices continue to operate.
  • Page 64: Blackdiamond System With Two Msms

    Keep in mind that the behavior described above is configurable by the user, and that you can enable the system health check facility on the switch and configure the auto-recovery option to use the online auto-recovery action, which will keep a suspect module online regardless of the number of errors detected.
  • Page 65: Interpreting Memory Scanning Results

    Interpreting Memory Scanning Results If single-bit permanent errors are detected during the memory scanning process, these errors will be mapped out of the general memory map with only a minimal loss to the total available memory on the system. Example show diagnostics ------------------------------------------------------------------------ Diagnostic Test Result run on Thu Jan 23 14:24:44 2003...
  • Page 66: Per-Slot Packet Memory Scan On Blackdiamond Switches

    Diagnostics Per-Slot Packet Memory Scan on BlackDiamond Switches While the system health check auto-recovery mode is effective at recovering from suspected failures, it does not provide the level of granularity and control over recovery options that many network administrators require. The per-slot packet memory scan capability on BlackDiamond switches gives administrators the ability to set the recovery behavior for each module—an important distinction when only certain modules can be taken offline, while others must remain online no matter what the error condition.
  • Page 67: System Impact Of Per-Slot Packet Memory Scanning

    You should take great care to ensure that a module in this state is identified and replaced as soon as possible.
  • Page 68 In an OSPF network, for example, after the shutdown/reboot is initiated, the adjacent OSPF routers will drop routes to the faltering switch. Very little traffic loss should occur during the network reconvergence, because traffic is simply routed around the affected switch via pre-learned routes.
  • Page 69: System (Cpu And Backplane) Health Check

    The purpose of the system health check feature is to ensure that communication between the CPU on the management switch module (MSM) and all I/O cards within the chassis is functioning properly. The system health checking cycle consists of two parts: •...
  • Page 70: Health Check Functionality

    These modes are configured by two separate CLI commands, described below. Alarm-Level Response Action To configure the switch to respond to a failed health check based on alarm-level, use this command: config sys-health-check alarm-level [card-down | log | system-down | traps] where: (BlackDiamond only.) Posts a CRIT message to the log, sends an SNMP trap, and...
  • Page 71: Backplane Health Check

    Alpine or Summit Switches. To configure the switch to respond to a failed health check by attempting to perform auto-recovery (packet memory scanning and mapping), use this command: config sys-health-check alarm-level auto-recovery [offline | online] When system health checks fail at a specified frequency, packet memory scanning and mapping is invoked automatically.
  • Page 72 Diagnostics Backplane Health Check Diagnostic Results—Example 1. Example 1 shows the report from one MSM, MSM-A in a BlackDiamond 6808 switch. If two MSMs are in the chassis, both MSM-A and MSM-B are reported. Total Tx Total Rv MSM-A Port...
  • Page 73 Port 32: (chan3, The report in Example 1 shows 32 ports from the MSM switch fabric to the backplane; four channels; four MSM ports to each BlackDiamond 6808 I/O module slot. In this example, chassis slots 1, 7, and 8 are populated with I/O modules capable of responding to the backplane health check packets from the system health checker.
  • Page 74 Diagnostics Backplane Health Check Diagnostic Results—Example 2. Example 2 shows a report for MSM-A again, but this time with missed and corrupted packets on different channels going to more than one I/O module slot. In example 2, the missed packets and corrupted packets on channels going to more than one I/O module (slots 1, 4, and 7 in this example) indicate what is most likely a problem with MSM-A, itself.
  • Page 75 Backplane Health Check Diagnostic Results—Example 3. Example 3 shows a report for MSM-A again, but with missed and corrupted packets on channels going to the same slot. In example 3, the corrupted packets on channels going to the same I/O module (slot 7 in this example) indicate what is most likely a problem with the I/O module in slot 7.
  • Page 76: Analyzing The Results

    Diagnostics Backplane Health Check Diagnostic Results—Example 4. Example 4 shows a report for MSM-A again, but with small numbers of missed packets on channels going to different slots. In example 4, the small numbers of missed packets (fewer than five) indicate what is most likely not a serious hardware problem.
  • Page 77: Cpu Health Check

    • If a health check checksum error message appears in the log, and the output of the show diagnostics use those two sources of information to determine the location of the problem. • If backplane health check counts for missing or corrupted packets are increasing, but the log shows no checksum error messages, the problem is probably a low-risk, transient problem—possibly a busy CPU.
  • Page 78: Viewing Cpu Health Check Diagnostic Results-Show Diagnostics Command

    Type 4). Counts are maintained for transmitted, received, missed, and corrupted health check packets for each I/O module in the switch. As in the backplane health check, small numbers of missed health check packets are probably okay; large numbers of missed health check packets indicate a systematic problem, such as an MSM control bus transceiver problem or other serious problem.
  • Page 79 • CPU health check failures might indicate a faulty transceiver on one of the MSMs, but might also indicate other I/O control bus failures. Always use log messages in conjunction with the output of show diagnostics • If a health check checksum error message appears in the log, and the output of the show diagnostics use those two sources of information to determine the location of the problem.
  • Page 80: Transceiver Diagnostics

    Diagnostics Transceiver Diagnostics The transceiver diagnostics test the integrity of the management bus transceivers used for communication between the ASICs in the Inferno chipset and the CPU subsystem. (See Figure 10.) These diagnostics write test patterns to specific ASIC registers, read the registers, then compare results, looking for errors in the communication path.
  • Page 81: System Impacts Of The Transceiver Diagnostics

    1 to 8. If you do not specify a value, the test uses the default of 3 errors. NOTE Extreme Networks recommends against changing the default transceiver test threshold value. The default value of 3 errors is adequate for most networks.
  • Page 82: Viewing Diagnostics Results

    Use the following commands to view information related to the transceiver diagnostic test: show log show diagnostics show switch Example Log Messages for Transceiver Diagnostic Failures • If the transceiver diagnostic test detects a failure, any of the following messages will appear in the log one time.
  • Page 83 slot 8 F48Ti Operational QUAKE slot 8 F48Ti Operational TWISTER MSM-A Operational UART MSM-A Operational FLASH MSM-A Operational SRAM MSM-A Operational NVRAM MSM-A Operational ENET MSM-A FABRIC Operational QUAKE MSM-A FABRIC Operational TWISTER MSM-A SLAVE Operational MAC MSM-A SLAVE Operational QUAKE MSM-A SLAVE Operational TWISTER...
  • Page 84: Example-Show Switch Command

    Operational TWISTER MSM-A SLAVE Operational MAC MSM-A SLAVE Operational QUAKE MSM-A SLAVE Operational TWISTER Example—show switch Command The following is a display example for the Sysname: BD6808 License: Full L3 + Security SysHealth Check: Enabled. Recovery Mode: None Transceiver Diag: Enabled.
  • Page 85: Fdb Scan

    You can scan the FDB on a stand-alone switch, or scan on a slot-by-slot or backplane basis on a modular switch. Using the enable fdb-scan check configuration.
  • Page 86: Related Commands

    These commands are described in the sections that follow. The default settings are: disabled, log, 30 seconds. Enabling FDB Scanning You can scan the FDB on a stand-alone switch, on the backplane of a modular switch, or on a module in a slot of a modular switch. To enable FDB scanning, use this command: enable fdb-scan [all | slot {{backplane} | <slot number>...
  • Page 87: System Impact Of The Fdb Scan Diagnostic

    config fdb-scan period <1-60> The interval is a number in the range from 1 to 60 seconds. The default is 30 seconds. We recommend a period of at least 15 seconds. If you attempt to configure a period of fewer than 15 seconds, the system displays the following warning message: Setting period below (15) may starve other tasks.
  • Page 88: Viewing Diagnostics Results

    Example Log Messages for FDB Scan Diagnostic Failures Look for the following types of messages in the log: FDB Scan: max number of remaps ( This message indicates that the FDB scan cannot re-map any more FDB entries. The value num is the maximum number of entries than can be remapped (should be 8), slot indicates the chassis slot, and entry indicates the entry.
  • Page 89: Example Output From The Show Switch Command

    Example Output from the show switch command For an example of the section on page 84. The output from this command will indicate whether FDB scanning is enabled and will also indicate the failure action to be taken. Example Output from the show fdb remap Command...
  • Page 90 Diagnostics Advanced System Diagnostics and Troubleshooting Guide...
  • Page 91: Additional Diagnostics Tools

    Additional Diagnostics Tools This chapter describes additional diagnostic tools that can be used to detect and help in resolving system problems. This chapter contains the following sections: • Temperature Logging on page 92 • Syslog Servers on page 93 Advanced System Diagnostics and Troubleshooting Guide...
  • Page 92: Temperature Logging

    The recommended ambient operating temperature for Extreme Networks switches is 32° to 104° F (0° to 40° C), but this range represents the absolute limits of the equipment. Whenever possible, the temperature should be kept at approximately 78°...
  • Page 93: Syslog Servers

    Log information is critical not only to troubleshoot a failed system, but also to identify contributing conditions that might lead to future failures. One major problem with internal switch logs is that only a limited amount of memory can be allocated for logs. After 1,000 messages, the log wraps and the first messages are lost.
  • Page 94: Adding A Syslog Server

    Network Impact of the Syslog Server Facility Network impact depends on the volume of log messages sent to the syslog server. But even under extreme conditions, the relative brevity of log messages means that even a very large message volume should not adversely affect network throughput.
  • Page 95: Contacting Extreme Technical Support

    If you have a network issue that you are unable to resolve, contact the nearest Extreme Networks TAC. The TAC will create a service request (SR) and manage all aspects of the service request until the question or issue that spawned the service request is resolved.
  • Page 96: Asia Tac

    For a detailed description of the Extreme Networks TAC program and its procedures, including service request information requirements and return materials authorization (RMA) information requirements, please refer to the Extreme Networks What You Need to Know TAC User Guide at this Web location: http://www.extremenetworks.com/services/wwtac/TacUserGuide.asp...
  • Page 97: What Information Should You Collect

    — Trend: Recurrent event? Frequency? Etc. — If the problem was resolved, what steps did you take to diagnose and resolve the problem? • Optional information (upon request from Extreme Networks TAC personnel) — System dump (CPU memory dump) • Additional CLI commands for information include: —...
  • Page 98: Diagnostic Troubleshooting

    FDB scan will mark it as suspect (suspect entries are marked with an “S”). Look at the output of the command. Address suspect entries by manually removing the entries and show fdb remap re-adding them. Consult Extreme Networks TAC if this is not possible. • In the output from the show diagnostics incrementing, it might indicate a transceiver problem.
  • Page 99: Extreme Networks' Recommendations

    Extreme Networks’ Recommendations Extreme Networks strongly recommends that you observe the process shown in Figure 11 and outlined in the steps that follow when dealing with checksum errors. Figure 11: Diagnostic Troubleshooting Process Customer experiences checksum errors on Inferno/Triumph Customer...
  • Page 100 Did the extended diagnostics (plus the packet memory scan) detect errors? • If no errors were detected, you should call the Extreme Networks TAC. The next action will be determined by the frequency with which the error occurs and other problem details.
  • Page 101: Using Memory Scanning To Screen I/O Modules

    To do this, schedule an extended maintenance window and prepare the system for a temporary ExtremeWare upgrade. Do not convert or save the configuration on the switch. It is not possible to run an ExtremeWare 6.2.2 (or later) configuration correctly on an older version of ExtremeWare.
  • Page 102 Troubleshooting Guidelines Advanced System Diagnostics and Troubleshooting Guide...
  • Page 103: Limited Operation Mode And Minimal Operation Mode

    • Minimal Operation Mode on page 104 Limited Operation Mode A switch enters limited operation mode after a catastrophic reboot. As the switch boots, it checks to see whether a catastrophe caused this reboot. If that is the case, the switch enters limited operation mode.
  • Page 104: Triggering Limited Operation Mode

    In limited operation mode, you must use the error code so that the module—in the case of a BlackDiamond system, or the switch—in the case of an Alpine or Summit system, can be brought up after the next reboot.
  • Page 105: Bringing A Switch Out Of Minimal Operation Mode

    To detect a reboot loop, a timestamp and a counter are saved. Each time the switch reboots because of a software crash or exception, the counter is incremented. A user-executed timestamp and counter to prevent a false reboot loop protection. This action also allows the user to bring the switch out of minimal operation mode so that the system can come up normally after the failure has been identified and fixed.
  • Page 106 Limited Operation Mode and Minimal Operation Mode Advanced System Diagnostics and Troubleshooting Guide...
  • Page 107: Reference Documents

    Other Documentation Resources Extreme Networks customer documentation is available at the following Web site: http://www.extremenetworks.com/services/documentation/ The customer documentation support includes: • ExtremeWare Software User Guide • ExtremeWare Command Reference Guide Use the user guide and the command reference guide to verify whether the configuration is correct.
  • Page 108 Use the release notes to check for known issues, supported limits, bug fixes from higher ExtremeWare versions, etc. (Release notes are available to all customers who have a service contract with Extreme Networks via eSupport. The release notes are provided by product, under the Software Downloads area of eSupport.)
  • Page 109 BPLANE (fabric checksum error location) checksums checksum error defined (table) defined (table) error messages error location slot number switch fabric packet memory severity level CRIT Advanced System Diagnostics and Troubleshooting Guide WARN switch fabric 21, 39 clear fdb remap command...
  • Page 110 Ethernet CRC Extreme Knowledge Base Extreme Virtual Information Center (VIC) ExtremeWare Error Message Decoder fabric checksums defined error message logging error messages fast path checksum errors defined...
  • Page 111 (Summit) run fdb-check command severity level CRIT WARN show config command show diagnostics command show fdb remap command show log command show switch command show tech command show version command shutdown slow path checksum errors defined SMMi SNAP...
  • Page 112 Index Advanced System Diagnostics and Troubleshooting Guide...

Table of Contents