IBM Midrange System DS4000 Series Hardware Manual page 451

Midrange system storage ds4000/ds5000 series
Hide thumbs Also See for Midrange System DS4000 Series:
Table of Contents

Advertisement

High risk PFA
When a PFA threshold is exceeded on a drive that is a member of an
array where no more drives can fail without losing data. This is either a
RAID 0 array or a degraded RAID 1, 3, 5, or 6 array. Immediate action
should be taken to avoid data loss. We discuss some of the possible
recovery actions later in this section.
When impending drive failure is detected, the affected disk remains powered and spinning.
With low or medium risk PFAs, the recovery actions are nondisruptive. The affected drive
needs to be manually failed before it can be safely replaced. This is performed in the Storage
Manager Subsystem Management window physical view by highlighting the affected drive
and selecting Advanced  Recovery  Fail drive. Once in a failed state, the drive can be
handled as a normal faulty drive with the procedure described in 7.9.1, "Managing disk
failures" on page 429.
The recovery options for high risk PFAs are different. It is a good idea to back up all data on
the affected logical drives and then proceed with the steps in either "PFA warning on a disk in
a RAID 0 array" or "PFA warning on a disk in a degraded array" on page 434.
PFA warning on a disk in a RAID 0 array
An array is configured without redundancy (RAID 0) with the understanding that a single disk
failure results in data loss. Only temporary or non-critical data should be stored on the
associated logical drives. Therefore, the main PFA recovery action for RAID 0 arrays is a
disruptive procedure with all associated LUNs being inaccessible while the affected drive is
replaced and data restored.
Perform these steps:
1. Stop all I/O to the affected logical drives.
2. Volume Copy can be used as an alternative to tape backup and restore. This function is
only available with the optional premium feature. If any of the affected logical drives are
also source or target logical drives in a Volume Copy operation that is either Pending or In
Progress, you must stop the copy operation before continuing. Go to the Copy Manager by
selecting Logical Drive  VolumeCopy  Copy Manager, highlight each copy pair that
contains an affected logical drive, and select Copy  Stop.
3. If you have FlashCopy logical drives associated with the affected logical drives, these
FlashCopy logical drives will no longer be valid. Perform any necessary operations (such
as backup) on the FlashCopy logical drives and then delete them.
4. Highlight the affected drive in the Physical View of the Subsystem Management window
and select Advanced  Recovery  Fail Drive. The amber fault LED illuminates on the
affected disk. The affected logical drives become Failed.
5. Replace the failed drive.
6. Highlight the array associated with the replaced drive in the Logical View of the Subsystem
Management window and select Advanced  Recovery  Initialize  Array. The
logical drives in the array are initialized, one at a time.
To monitor initialization progress for a logical drive, highlight the logical drive in the Logical
View of the Subsystem Management window and select Logical Drive  Properties.
Note that after the operation in progress has completed, the progress bar is no longer
displayed in the Properties dialog.
When initialization is completed, all logical drives in the array have the Optimal status.
7. Use operating system tools to re-discover the initialized LUNs.
Chapter 7. Advanced maintenance, troubleshooting, and diagnostics
433

Advertisement

Table of Contents
loading

Table of Contents