Sfm Channel Monitoring; Software Component Health Monitoring; System Health Monitoring; Failure And Event Logging - Dell Force10 S4810P Configuration Manual

High-density, 1ru 48-port 10gbe switch
Hide thumbs Also See for Force10 S4810P:
Table of Contents

Advertisement

For ExaScale, the RPM alone RPM periodically sends out test frames that loop back through the SFM.
The loopback health check determines the overall status of the backplane and can identifies a faulty
SFM. If three consecutive RPM loopbacks fail, then the software initiates a fault isolation procedure
that sequentially disables one SFM at a time and performs the same loopback test.
Refer to the
E-Series TeraScale Debugging and Diagnostics
Diagnostics
chapters for details on the different system checks performed.

SFM Channel Monitoring

PCDFO is supported only on platform:
Another test that is used to check the integrity of the data plane is a Per-channel De-skew FIFO Overflow
(PCDFO). Each ingress and egress Buffer and Traffic Manager (BTM/FPTM) maintains nine channel
connections to the SFM. The PCDFO test detects a faulty channel on an SFM, RPM, or line card by
creating a test frame and striping it across all nine SFM channels between the eBTM/eFPTM and iBTM/
iFPTM. The eBTM/eFPTM must receive each segment of striped data within a specified time to be
considered to have proper temporal alignment. Small skews less than the specified time are tolerated
because of buffering within the BTM/FPTM. If segments are not received within the specified time, the
fault is not tolerated, and FTOS initiates additional tests to isolate the fault.
For more information on the PCDFO test, see
PCDFO events
or
E-Series ExaScale Debugging and Diagnostics.
Note: The BTM applies to E-Series TeraScale, and the FPTM applies to the E-Series ExaScale.

Software Component Health Monitoring

On each of the line cards and the RPM, there are a number of software components. FTOS performs a
periodic health check on each of these components by querying the status of a flag, which the
corresponding component resets within a specified time.
If any health checks on the RPM fail, then the FTOS fails over to standby RPM. If any health checks on a
line card fail, FTOS resets the card to bring it back to the correct state.

System Health Monitoring

FTOS also monitors the overall health of the system. Key parameters like CPU utilization, free memory,
and error counters (CRC failures, packet loss, etc.) are measured, and upon exceeding a threshold can be
used to initiate recovery mechanism.

Failure and Event Logging

Dell Force10 systems provide multiple options for logging failures and events.
and
e
E-Series TeraScale Debugging and Diagnostics, Respond to
E-Series ExaScale Debugging and
High Availability | 451

Advertisement

Table of Contents
loading

Table of Contents