Failure Detection In Cisco Asr 9000 Series Nv Edge System; Scenarios For High Availability - Cisco ASR 9000 Series System Configuration Manual

Aggregation services router nv
Hide thumbs Also See for ASR 9000 Series:
Table of Contents

Advertisement

Configuring the nV Edge System on the Cisco ASR 9000 Series Router

Failure Detection in Cisco ASR 9000 Series nV Edge System

In the Cisco ASR 9000 Series nV Edge system, when the Primary DSC node fails, the RSP in the Backup
DSC node becomes Primary. It executes the duties of the master RSP that hosts the active set of control plane
processes. In a normal scenario of nV Edge System where the Primary and Backup DSC nodes are hosted on
separate racks, the failure detection for the Primary DSC happens through communication between the racks.
These mechanisms are used to detect RSP failures across rack boundaries:
• FPGA state information detected by the peer RSP in the same chassis is broadcast over the control links.
• The UDLD state of the inter rack control or data links are sent to the remote rack, with failures detected
• A keep-alive message is sent between RSP cards through the inter rack control links, with a failure
A Split Brain is a condition where the inter rack links between the routers in a Cisco ASR 9000 Series nV
Edge system fails and hence the nodes on both routers start to act as primary node. So, messages are sent
between these racks in order to detect Split Brain avoidance. These occur at 200ms intervals across the
inter-rack data links.

Scenarios for High Availability

These are some sample scenarios for failure detection:
1 Single RSP Failure in the Primary DSC node - The Standby RSP within the same chassis initially detects
the failure through the backplane FPGA. In the event of a failure detection, this RSP transitions to the
active state and notifies the Backup DSC node about the failure through the inter-chassis control link
messaging.
2 Failure of Primary DSC node and the Standby peer RSP - There are multiple cases where this scenario
can occur, such as power-cycle of the Primary DSC rack or simultaneous soft reset of both RSP cards
within the Primary rack.
a The remote rack failure is initially detected by UDLD failure on the inter rack control link. The Backup
b UDLD failure detection occurs every 500ms but the time between control link and data link failure
3 Failure of Inter Rack Control links (Split Brain) - This failure is initially detected by the UDLD protocol
on the Inter Rack Control links. In this case, the Backup DSC continues to receive UDLD and keep-alive
messages through the inter rack data link. As discussed in the Scenario 2, a windowing period of two
seconds is allowed to synchronize between the control and data link failures. If the data link has not failed,
or Split Brain packets are received across the Management LAN, then the Backup DSC rack reloads to
avoid the split brain condition.
This information is sent if any state change occurs and periodically every 200ms.
at an interval of 500ms.
detection time of 10 seconds.
DSC node checks the UDLD state on the inter rack data link. If the rack failure is confirmed by failure
of the data link as well, then the Backup DSC node becomes active.
can vary since these are independent failures detected by the RSP and line cards. A windowing period
of up to 2 seconds is needed to correlate the control and data link failures and to allow split brain
detection messages to be received. The keep-alive messaging between RSPs acts as a redundant detection
mechanism, if the UDLD detection fails to detect a reset RSP card.
Cisco ASR 9000 Series Aggregation Services Router nV System Configuration Guide, Release 5.3.x
Failure Detection in Cisco ASR 9000 Series nV Edge System
137

Advertisement

Table of Contents
loading

Table of Contents