Inaccessible Quorum Partitions; System Panic; Total Network Connection Failure - Red Hat CLUSTER MANAGER - INSTALLATION AND Administration Manual

Table of Contents

Advertisement

174
B.3.2 System Panic
A system panic (crash) is a controlled response to a software-detected error. A panic attempts to return
the system to a consistent state by shutting down the system. If a cluster system panics, the following
occurs:
1. The functional cluster system detects that the cluster system that is experiencing the panic is not
updating its timestamp on the quorum partitions and is not communicating over the heartbeat chan-
nels.
2. The cluster system that is experiencing the panic initiates a system shut down and reboot.
3. If power switches are used, the functional cluster system power-cycles the cluster system that is
experiencing the panic.
4. The functional cluster system restarts any services that were running on the system that experi-
enced the panic.
5. When the system that experienced the panic reboots, and can join the cluster (that is, the system can
write to both quorum partitions), services are re-balanced across the member systems, according
to each service's placement policy.
B.3.3 Inaccessible Quorum Partitions
Inaccessible quorum partitions can be caused by the failure of a SCSI (or Fibre Channel) adapter that
is connected to the shared disk storage, or by a SCSI cable becoming disconnected to the shared disk
storage. If one of these conditions occurs, and the SCSI bus remains terminated, the cluster behaves
as follows:
1. The cluster system with the inaccessible quorum partitions notices that it cannot update its time-
stamp on the quorum partitions and initiates a reboot.
2. If the cluster configuration includes power switches, the functional cluster system power-cycles
the rebooting system.
3. The functional cluster system restarts any services that were running on the system with the inac-
cessible quorum partitions.
4. If the cluster system reboots, and can join the cluster (that is, the system can write to both quo-
rum partitions), services are re-balanced across the member systems, according to each service's
placement policy.
B.3.4 Total Network Connection Failure
A total network connection failure occurs when all the heartbeat network connections between the
systems fail. This can be caused by one of the following:
Appendix B:Supplementary Software Information

Advertisement

Table of Contents
loading

Table of Contents