30
Chapter 2. Hardware Installation and Operating System Configuration
2.5.2. Configuring a Fence Device
Fence devices enable a node to power-cycle another node before restarting its services as
part of the failover process. The ability to remotely disable a node ensures data integrity is
maintained under any failure condition. Deploying a cluster in a production environment
requires the use of a fence device. Only development (test) environments should use a
configuration without a fence device. Refer to Section 2.1.2 Choosing the Type of Fence
Device for a description of the various types of power switches.
In a cluster configuration that uses fence devices such as power switches, each node is
connected to a switch through either a serial port (for two-node clusters) or network con-
nection (for multi-node clusters). When failover occurs, a node can use this connection to
power-cycle another node before restarting its services.
Fence devices protect against data corruption if an unresponsive (or hanging) node be-
comes responsive after its services have failed over, and issues I/O to a disk that is also
receiving I/O from another node. In addition, if CMAN detects node failure, the failed
node will be removed from the cluster. If a fence device is not used in the cluster, then
a failed node may result in cluster services being run on more than one node, which can
cause data corruption and possibly system crashes.
A node may appear to hang for a few seconds if it is swapping or has a high system
workload. For this reason, adequate time is allowed prior to concluding that a node has
failed.
If a node fails, and a fence device is used in the cluster, the fencing daemon power-cycles
the hung node before restarting its services. This causes the hung node to reboot in a clean
state and prevent it from issuing I/O and corrupting cluster service data.
When used, fence devices must be set up according to the vendor instructions; however,
some cluster-specific tasks may be required to use them in a cluster. Consult the man-
ufacturer documentation on configuring the fence device. Note that the cluster-specific
information provided in this manual supersedes the vendor information.
When cabling a physical fence device such as a power switch, take special care to en-
sure that each cable is plugged into the appropriate port and configured correctly. This is
crucial because there is no independent means for the software to verify correct cabling.
Failure to cable correctly can lead to an incorrect node being power cycled, fenced off from
shared storage via fabric-level fencing, or for a node to inappropriately conclude that it has
successfully power cycled a failed node.
2.5.3. Configuring UPS Systems
Uninterruptible power supplies (UPS) provide a highly-available source of power. Ideally, a
redundant solution should be used that incorporates multiple UPS systems (one per server).
For maximal fault-tolerance, it is possible to incorporate two UPS systems per server as
well as APC Automatic Transfer Switches to manage the power and shutdown management
of the server. Both solutions are solely dependent on the level of availability desired.
Need help?
Do you have a question about the CLUSTER SUITE - CONFIGURING AND MANAGING A CLUSTER 2006 and is the answer not in the manual?
Questions and answers