Node Failure; Disk Failure; Data Divergence; Geomirror State Map Devices - IBM TotalStorage NAS Gateway 500 Administrator's Manual

Hide thumbs Also See for TotalStorage NAS Gateway 500:
Table of Contents

Advertisement

state map devices so that data at the rejoining site is brought up to date. If writes
occurred on both sites during the site isolation situation, you must unify the state
maps on the GeoMirror devices to correct any data divergence that occurred. Then
you bring up the nodes at the rejoining site.

Node failure

When a node at a site with more than one node fails, messages are logged to the
system administrator. You should periodically check the logs as part of normal
maintenance procedures. You can also have messages sent to the console.
Clustering handles recovery from failures of local nodes, networks, and adapters.
No GeoMirror synchronization is needed after the failure of a local peer, as long as
the clustering failover succeeds or the configuration is concurrent. IP address
takeover is not supported on Remote Mirroring networks.

Disk failure

You can recover from failures such as the loss of a hard drive, which is not mirrored
in the logical volume, using the gmd_update_state utility. Use this utility, for
example, if a hard disk is replaced on a site. You can mark all the cells associated
with the GeoMirror device logical volumes on the new hard disk as being stale. This
causes the synchronization process to update the device from the peer device at
the other site. We suggest that you implement LVM mirroring or use RAID
techniques. It is possible that some I/O errors, when writing to GMDs, may cause
inconsistencies that are not reflected in the GMD state map. Unrecoverable disk I/O
errors are recorded in the AIX
controlled recovery in case these disk errors occur.

Data divergence

Attention: Follow all instructions in this section with extreme care. If you do not
When the state maps for a GeoMirror device have cells that are marked stale in
both sites, the GMDs cannot be started because the clustering manager cannot
determine which data is most recently written. This data divergence occurs when
the sites are not communicating, and information is written to the volumes at both
sites without being mirrored to the other site. In order to recover the mirrored data
the state maps must be unified. The following procedures explain the key points of
the state maps and the unifying process.

GeoMirror state map devices

The GeoMirror state map device is a key component in the process of recovery
after various types of failures. Communication failures, power failures, and other
short-term failures are considered site failures. Depending on the application, site
failures can cause the surviving copy of the data to continue to change while the
mirror copy is not available. When a site fails and recovers, the Remote Mirroring
software synchronizes the GeoMirror device. It reads the appropriate state map for
each node in order to reconstruct and update the mirror on the recovered node.
This process of synchronizing the GeoMirror device is automatic. The intervention
of the system administrator might be required in a failure such as the loss of a hard
drive that is not mirrored in the logical volume.
Note: Be extremely careful if you modify a state map. Each node in the remote
®
follow instructions exactly, loss of data is very likely.
mirror configuration maintains, or shares, a state map for each of its
error log. That log should be monitored to allow for
Appendix C. Remote Mirroring problem determination
297

Advertisement

Table of Contents
loading

Table of Contents