What To Check After Running The System Recovery - IBM Storwize V7000 Unified Problem Determination Manual

Table of Contents

Advertisement

3.
Refer to "What to check after running the system recovery" for what to do
with volumes that have been corrupted by the loss of data from the
write-cache.
4. Recreate all FlashCopy mappings and Metro Mirror or Global Mirror
relationships that use the volumes.

What to check after running the system recovery

Several tasks must be completed before you use the system.
The recovery procedure recreates the old system from the quorum data. However,
some things cannot be restored, such as cached data or system data managing
in-flight I/O. This latter loss of state affects RAID arrays managing internal
storage. The detailed map about where data is out of synchronization has been
lost, meaning that all parity information must be restored, and mirrored pairs must
be brought back into synchronization. Normally this results in either old or stale
data being used, so only writes in flight are affected. However, if the array had lost
redundancy (such as syncing, or degraded or critical RAID status) prior to the
error requiring system recovery, then the situation is more severe. Under this
situation you need to check the internal storage:
v Parity arrays will likely be syncing to restore parity; they do not have
redundancy when this operation proceeds.
v Because there is no redundancy in this process, bad blocks might have been
created where data is not accessible.
v Parity arrays could be marked as corrupt. This indicates that the extent of lost
data is wider than in-flight I/O, and in order to bring the array online, the data
loss must be acknowledged.
v RAID-6 arrays that were actually degraded prior the system recovery might
require a full restore from backup. For this reason, it is important to have at
least a capacity match spare available.
Be aware of these differences regarding the recovered configuration:
v FlashCopy mappings are restored as "idle_or_copied" with 0% progress. Both
volumes must have been restored to their original I/O groups.
v The management ID is different. Any scripts or associated programs that refer to
the system-management ID of the clustered system (system) must be changed.
v Any FlashCopy mappings that were not in the "idle_or_copied" state with 100%
progress at the point of disaster have inconsistent data on their target disks.
These mappings must be restarted.
v Intersystem remote copy partnerships and relationships are not restored and
must be re-created manually.
v Consistency groups are not restored and must be re-created manually.
v Intrasystem remote copy relationships are restored if all dependencies were
successfully restored to their original I/O groups.
v The system time zone might not have been restored.
v The GPFS system quorum state held on the control enclosure might not have
been restored.
v Any Global Mirror secondary volumes on the recovered system might have
inconsistent data if there was replication I/O from the primary volume cached
on the secondary system at the point of the disaster. A full synchronization is
required when recreating and restarting these remote copy relationships.
Chapter 5. Control enclosure
387

Advertisement

Table of Contents
loading

Table of Contents