Disk Failure And Recovery; Spare Disks And Disk Failure - HP xw4400 Mini White Paper

Software raid in linux workstations
Hide thumbs Also See for xw4400:
Table of Contents

Advertisement

In a RAID-10 configuration, you will need to add three lines to the /etc/fstab file, one for each of
the RAID arrays. There does not need to be a mount point specified for /dev/md0 or /dev/md1. If
no mount point is specified, you will see error messages during startup, but the RAID-10 array will still
initialize and mount correctly.
Please note that these configuration files are only meant as examples, and your /etc/raidtab file
will differ based on your specific hard drive configuration.

Disk Failure and Recovery

Spare Disks and Disk Failure

Spare disks are disks that do not take part in the RAID configuration until one of the active disks fails.
At that point, the failed device is marked as "bad" and reconstruction of the RAID array begins
immediately on the first available spare disk.
Tip:
When reconstruction occurs, if multiple bad blocks have built up on the
active disks over time, the reconstruction process can sometimes trigger the
failure of one of the "good" disks, leading to failure of the entire array.
However, performing regular filesystem checks (fsck) of the entire RAID
filesystem should almost completely eliminate this risk.
Spare disks are not required for a RAID configuration, but they are highly recommended. While most
RAID levels can handle the failure of one physical disk, the failure of another disk will cause the entire
array to fail, thus it is recommended to start rebuilding the array as quickly as possible after a disk
failure. When a disk fails, the crashed disk is marked as "faulty." Faulty disks still look and behave as
members of the RAID array; they are simply treated as inactive parts of the filesystem.
When a disk fails, information regarding the failure will appear in the standard log and stat files.
Looking in /proc/mdstat will show information regarding the drives in the RAID array. RAID role
numbers show the role that the disks play in the RAID configuration; for an array with n disks, disks
with RAID role numbers greater than or equal to n are designated spare disks. A failed disk will be
marked with an "F" and will be replaced with the device with the lowest role number greater than n
that is not failed.
Removing and replacing a failed disk can be done as follows:
Remove the failed disk from the RAID array by running the command:
1.
raidhotremove /dev/md0 /dev/sdc2
where /dev/sdc2 is the name of the failed drive. If you wish to use mdadm instead of
raidtools, the command is:
mdadm /dev/md0 -r /dev/sdc2
Please note that raidhotremove can not be used to pull a disk out of a running array, and
should only be used for removing failed disks.
After recovery ends, a new disk should be designated as /dev/sdc2, or whichever disk was the
2.
one that failed. This new disk now needs to be added to the array.
raidhotadd /dev/md1 /dev/sdc2
Using mdadm, the command is:
mdadm /dev/md1 -a /dev/sdc2
14

Advertisement

Table of Contents
loading

Table of Contents