Understanding The Medium Errors And Bad Blocks - IBM Storwize V7000 Unified Problem Determination Manual

Table of Contents

Advertisement

v A battery must restart a maintenance discharge because the previous
maintenance cycle was disrupted by an ac power outage.
If a system suffers repeated ac power failures without a sufficient time interval in
between the ac failures to complete battery conditioning, then neither battery is
considered when calculating whether there is sufficient charge to protect the
system. In these circumstances, the system enters service state and does not permit
I/O operations to be restarted until the batteries have charged and one of the
batteries has completed a maintenance discharge. This activity takes approximately
10 hours.
If one of the batteries in a system fails and is not replaced, it prevents the other
battery from performing a maintenance discharge. Not only does this condition
reduce the lifetime of the remaining battery, but it also prevents a maintenance
discharge cycle from occurring after the battery has provided protection for at least
2 critical saves or 10 brown outs. Preventing this maintenance cycle from occurring
increases the risk that the system accumulates a sufficient number of power
outages to cause the remaining battery to be discounted when calculating whether
there is sufficient charge to protect the system. This condition results in the system
entering service state while the one remaining battery performs a maintenance
discharge. I/O operations are not permitted during this process. This activity takes
approximately 10 hours.

Understanding the medium errors and bad blocks

A storage system returns a medium error response to a host when it is unable to
successfully read a block. The Storwize V7000 Unified response to a host read
follows this behavior.
The volume virtualization that is provided extends the time when a medium error
is returned to a host. Because of this difference to non-virtualized systems, the
Storwize V7000 Unified uses the term bad blocks rather than medium errors.
The Storwize V7000 Unified allocates volumes from the extents that are on the
managed disks (MDisks). The MDisk can be a volume on an external storage
controller or a RAID array that is created from internal drives. In either case,
depending on the RAID level used, there is normally protection against a read
error on a single drive. However, it is still possible to get a medium error on a
read request if multiple drives have errors or if the drives are rebuilding or are
offline due to other issues.
The Storwize V7000 Unified provides migration facilities to move a volume from
one underlying set of physical storage to another or to replicate a volume that uses
FlashCopy or Metro Mirror or Global Mirror. In all these cases, the migrated
volume or the replicated volume returns a medium error to the host when the
logical block address on the original volume is read. The system maintains tables
of bad blocks to record where the logical block addresses that cannot be read are.
These tables are associated with the MDisks that are providing storage for the
volumes.
The dumpmdiskbadblocks command and the dumpallmdiskbadblocks command are
available to query the location of bad blocks.
Important: The dumpmdiskbadblocks only outputs the virtual medium errors that
have been created, and not a list of the actual medium errors on MDisks or drives.
199
Chapter 5. Control enclosure

Advertisement

Table of Contents
loading

Table of Contents