Changes To Bgp Multipath; Delayed Installation Of Ecmp Routes Into Bgp; Rdma Over Converged Ethernet (Roce) Overview - Dell S4048–ON Configuration Manual

Hide thumbs Also See for S4048–ON:
Table of Contents

Advertisement

Changes to BGP Multipath

When the system becomes active after a fast-boot restart, a change has been made to the BGP multipath and ECMP behavior. The
system delays the computation and installation of additional paths to a destination into the BGP routing information base (RIB) and
forwarding table for a certain period of time. Additional paths, if any, are automatically computed and installed without the need for any
manual intervention in any of the following conditions:
After 30 seconds of the system returning online after a restart
After all established peers have synchronized with the restarting system
A combination of the previous two conditions
One possible impact of this behavior change is that if the amount of traffic to a destination is higher than the volume of traffic that can be
carried over one path, a portion of that traffic might be dropped for a short duration (30-60 seconds) after the system comes up.

Delayed Installation of ECMP Routes Into BGP

The current FIB component of Dell Networking OS has some inherent inefficiencies when handling a large number of ECMP routes (i.e.,
routes with multiple equal-cost next hops). To circumvent this for the configuration of fast boot, changes are made in BGP to delay the
installation of ECMP routes. This is done only if the system comes up through a fast boot reload. The BGP route selection algorithm only
selects one best path to each destination and delays installation of additional ECMP paths until a minimum of 30 seconds has elapsed from
the time the first BGP peer is established. Once this time has elapsed, all routes in the BGP RIB are processed for additional paths.
While the above change will ensure that at least one path to each destination gets into the FIB as quickly as possible, it does prevent
additional paths from being used even if they are available. This downside has been deemed to be acceptable.

RDMA Over Converged Ethernet (RoCE) Overview

This functionality is supported on the platform.
RDMA is a technology that a virtual machine (VM) uses to directly transfer information to the memory of another VM, thus enabling VMs
to be connected to storage networks. With RoCE, RDMA enables data to be forwarded without passing through the CPU and the main
memory path of TCP/IP. In a deployment that contains both the RoCE network and the normal IP network on two different networks,
RRoCE combines the RoCE and the IP networks and sends the RoCE frames over the IP network. This method of transmission, called
RRoCE, results in the encapsulation of RoCE packets to IP packets. RRoCE sends Infini Band (IB) packets over IP. IB supports input and
output connectivity for the internet infrastructure. Infini Band enables the expansion of network topologies over large geographical
boundaries and the creation of next-generation I/O interconnect standards in servers.
When a storage area network (SAN) is connected over an IP network, the following conditions must be satisfied:
Faster Connectivity: QoS for RRoCE enables faster and lossless nature of disk input and output services.
Lossless connectivity: VMs require the connectivity to the storage network to be lossless always. When a planned upgrade of the
network nodes happens, especially with top-of-rack (ToR) nodes where there is a single point of failure for the VMs, disk I/O operations
are expected to occur in 20 seconds. If disk in not accessible in 20 seconds, unexpected and undefined behavior of the VMs occurs.
You can optimize the booting time of the ToR nodes that experience a single point of failure to reduce the outage in traffic-handling
operations.
RRoCE is bursty and uses the entire 10-Gigabit Ethernet interface. Although RRoCE and normal data traffic are propagated in separate
network portions, it may be necessary in certain topologies to combine both the RRoCE and the data traffic in a single network structure.
RRoCE traffic is marked with dot1p priorities 3 and 4 (code points 011 and 100, respectively) and these queues are strict and lossless. DSCP
code points are not tagged for RRoCE. Both ECN and PFC are enabled for RRoCE traffic. For normal IP or data traffic that is not RRoCE-
342
Flex Hash and Optimized Boot-Up

Advertisement

Table of Contents
loading

This manual is also suitable for:

S4048t-on

Table of Contents