Download  Print this page

Troubleshooting - Red Hat ENTERPRISE LINUX 5 Configuration Manual

Fence devices
Hide thumbs

Advertisement

Chapter 4.

Troubleshooting

The following is a list of some problems you may see regarding the configuration of fence devices as
well as some suggestions for how to address these problems.
• If your system does not fence a node automatically, you can try to fence the node from the
command line using the fence_node command, as described at the end of each of the fencing
configuration procedures. The fence_node performs I/O fencing on a single node by reading
the fencing settings from the cluster.conf file for the given node and then running the
configured fencing agent against the node. For example, the following command fences node
clusternode1.example.com:
# /sbin/fence_node clusternode1.example.com
If the fence_node command is unsuccessful, you may have made an error in defining the fence
device configuration. To determine whether the fencing agent itself is able to talk to the fencing
device, you can execute the I/O fencing command for your fence device directly from the command
line. As a first step, you can execute the with the -o status option specified. For example, if you
are using an APC switch as a fencing agent, you can execute a command such as the following:
# /sbin/fence_apc -a (ipaddress) -l (login) ... -o status -v
You can also use the I/O fencing command for your device to fence the node. For example, for an
HP ILO device, you can issue the following command:
# /sbin/fence_ilo -a myilo -l login -p passwd -o off -v
• Check the version of firmware you are using in your fence device. You may want to consider
upgrading your firmware. You may also want to scan bugzilla to see if there are any issues
regarding your level of firmware.
• If a node in your cluster is repeatedly getting fenced, it means that one of the nodes in your cluster
is not seeing enough "heartbeat" network messages from the node that is getting fenced. Most of
the time, this is a result of flaky or faulty hardware, such as bad cables or bad ports on the network
hub or switch. Test your communications paths thoroughly without the cluster software running to
make sure your hardware is working correctly.
• If a node in your cluster is repeatedly getting fenced right at startup, if may be due to system
activities that occur when a node joins a cluster. If your network is busy, your cluster may decide
it is not getting enough heartbeat packets. To address this, you may have to increase the
post_join_delay setting in your cluster.conf file. This delay is basically a grace period to
give the node more time to join the cluster.
In the following example, the fence_daemon entry in the cluster configuration file shows a
post_join_delay setting that has been increased to 600.
19

Advertisement

loading