Preventing Split-Brain Conditions - Oracle ZFS Storage Appliance Administration Manual

Hide thumbs Also See for ZFS Storage Appliance:
Table of Contents

Advertisement

Shutting Down a Clustered Configuration (CLI)
"Shutting Down a Clustered Configuration (CLI)" on page 199

Preventing Split-Brain Conditions

A common failure mode in clustered systems is known as split-brain; in this condition, each of
the clustered controllers believes its peer has failed and attempts takeover. Absent additional
logic, this condition can cause a broad spectrum of unexpected and destructive behavior that
can be difficult to diagnose or correct. The canonical trigger for this condition is the failure of
the communication medium shared by the controllers; in the case of the Oracle ZFS Storage
Appliance, this would occur if the cluster I/O links fail. In addition to the built-in triple-link
redundancy (only a single link is required to avoid triggering takeover), the appliance software
will also perform an arbitration procedure to determine which controller should continue with
takeover.
A number of arbitration mechanisms are employed by similar products; typically they entail
the use of quorum disks (using SCSI reservations) or quorum servers. To support the use
of ATA disks without the need for additional hardware, the Oracle ZFS Storage Appliance
uses a different approach relying on the storage fabric itself to provide the required mutual
exclusivity. The arbitration process consists of attempting to perform a SAS ZONE LOCK
command on each of the visible SAS expanders in the storage fabric, in a predefined order.
Whichever appliance is successful in its attempts to obtain all such locks will proceed with
takeover; the other will reset itself. Since a clustered appliance that boots and detects that its
peer is unreachable will attempt takeover and enter the same arbitration process, it will reset in
a continuous loop until at least one cluster I/O link is restored. This ensures that the subsequent
failure of the other controller will not result in an extended outage. These SAS zone locks
are released when failback is performed or approximately 10 seconds has elapsed since the
controller in the AKCS_OWNER state most recently renewed its own access to the storage
fabric.
This arbitration mechanism is simple, inexpensive, and requires no additional hardware, but
it relies on the clustered appliances both having access to at least one common SAS expander
in the storage fabric. Under normal conditions, each appliance has access to all expanders,
and arbitration will consist of taking at least two SAS zone locks. It is possible, however, to
construct multiple-failure scenarios in which the appliances do not have access to any common
expander. For example, if two of the SAS cables are removed or a disk shelf is powered down,
each appliance will have access to disjoint subsets of expanders. In this case, each appliance
will successfully lock all reachable expanders, conclude that its peer has failed, and attempt to
proceed with takeover. This can cause unrecoverable hangs due to disk affiliation conflicts and/
or severe data corruption.
Note that while the consequences of this condition are severe, it can arise only in the case
of multiple failures (often only in the case of 4 or more failures). The clustering solution
Configuring the Appliance
219

Advertisement

Table of Contents
loading

Table of Contents