High availability overview
Communication interruptions can seriously affect widely-deployed value-added services such as IPTV
and video conference. Therefore, the basic network infrastructures must be able to provide high
availability.
The following are the effective ways to improve availability:
Increasing fault tolerance
•
Speeding up fault recovery
•
Reducing impact of faults on services
•
Availability requirements
Availability requirements fall into three levels based on purpose and implementation.
Table 1 Availability requirements
Level
Requirement
Decrease system software and
1
hardware faults
Protect system functions from being
2
affected if faults occur
Enable the system to recover as fast
3
as possible
The level 1 availability requirement should be considered during the design and production process of
network devices. Level 2 should be considered during network design. Level 3 should be considered
during network deployment, according to the network infrastructure and service characteristics.
Availability evaluation
Mean Time Between Failures (MTBF) and Mean Time to Repair (MTTR) are used to evaluate the
availability of a network.
MTBF
MTBF is the predicted elapsed time between inherent failures of a system during operation. It is typically
in the unit of hours. A higher MTBF means a high availability.
MTTR
MTTR is the average time required to repair a failed system. MTTR in a broad sense also involves spare
parts management and customer services.
MTTR = fault detection time + hardware replacement time + system initialization time + link recovery time
+ routing time + forwarding recovery time. A smaller value of each item means a smaller MTTR and a
higher availability.
Solution
•
Hardware—Simplifying circuit design, enhancing
production techniques, and performing reliability tests.
•
Software—Reliability design and test
Device and link redundancy and deployment of switchover
strategies
Performing fault detection, diagnosis, isolation, and
recovery technologies
1