What To Check After Running The System Recovery - IBM Storwize V7000 Unified Problem Determination Manual

Table of Contents

Advertisement

Example
Perform the following steps to recover an offline volume after the recovery
procedure has completed:
1. Delete all IBM FlashCopy function mappings and Metro Mirror or Global
Mirror relationships that use the offline volumes.
2. Run the recovervdisk or recovervdiskbysystem command. (This will only bring
the volume back online so that you can attempt to deal with the data loss.)
Contact IBM Remote Technical Support to help you with recovering from file
volumes that have been corrupted by data lost from the write-cache. They
might ask you to refer to "Recovering a GPFS file system" on page 161 and
help you with interpreting the results from the chkfs CLI command.
3.
Refer to "What to check after running the system recovery" for what to do
with volumes that have been corrupted by the loss of data from the
write-cache.
4. Recreate all FlashCopy mappings and Metro Mirror or Global Mirror
relationships that use the volumes.

What to check after running the system recovery

Several tasks must be performed before you use the system.
The recovery procedure performs a recreation of the old system from the quorum
data. However, some things cannot be restored, such as cached data or system data
managing in-flight I/O. This latter loss of state affects RAID arrays managing
internal storage. The detailed map about where data is out of synchronization has
been lost, meaning that all parity information must be restored, and mirrored pairs
must be brought back into synchronization. Normally this results in either old or
stale data being used, so only writes in flight are affected. However, if the array
had lost redundancy (such as syncing, or degraded or critical RAID status) prior to
the error requiring system recovery, then the situation is more severe. Under this
situation you need to check the internal storage:
v Parity arrays will likely be syncing to restore parity; they do not have
redundancy when this operation proceeds.
v Because there is no redundancy in this process, bad blocks may have been
created where data is not accessible.
v Parity arrays could be marked as corrupt. This indicates that the extent of lost
data is wider than in-flight IO, and in order to bring the array online, the data
loss must be acknowledged.
v Raid-6 arrays that were actually degraded prior the system recovery may require
a full restore from backup. For this reason, it is important to have at least a
capacity match spare available.
Be aware of these differences regarding the recovered configuration:
v FlashCopy mappings are restored as "idle_or_copied" with 0% progress. Both
volumes must have been restored to their original I/O groups.
v The management ID is different. Any scripts or associated programs that refer to
the system-management ID of the clustered system (system) must be changed.
v Any FlashCopy mappings that were not in the "idle_or_copied" state with 100%
progress at the point of disaster have inconsistent data on their target disks.
These mappings must be restarted.
v Intersystem remote copy partnerships and relationships are not restored and
must be re-created manually.
Chapter 5. Control enclosure
251

Advertisement

Table of Contents
loading

Table of Contents