Hot-Plug Capabilities - HP DL740 - ProLiant - 4 GB RAM Manual

Hot plug raid memory technology for fault tolerance and scalability
Hide thumbs Also See for DL740 - ProLiant - 4 GB RAM:
Table of Contents

Advertisement

hot plug RAID memory technology for fault tolerance and scalability

hot-plug capabilities

If the signal from the ECC logic to the MUX indicates that the data is good, a parity
compare logic circuit (for example, PC1, PC2, PC3, or PC4 in figure 6) compares the
data from the ECC logic with the regenerated data from the RAID memory logic. If all the
data words in a read transaction are good, then the original data and the data from the
RAID memory logic should be identical. If they are not, a data error undetectable by ECC
has occurred. Such an occurrence, although rare, would result in bad data being passed
along as if it were good.
However, with Hot Plug RAID Memory, the parity compare fails in such a situation and
initiates an NMI, preventing the transmission of corrupt data. This feature makes Hot Plug
RAID Memory virtually immune to data corruption.
The redundancy in Hot Plug RAID Memory provides the ability to hot plug memory
cartridges without bringing down the server. This gives unprecedented levels of memory
availability and scalability within industry-standard servers. Hot Plug RAID Memory
enables the following abilities while the system is running:
Hot replace: replacing a failed DIMM
Hot add: adding a DIMM to a memory cartridge
Hot upgrade: replacing a set of DIMMs with different (higher capacity) ones
Hot-replace capability is offered in a driverless implementation that requires no support
from the operating system. ProLiant servers with Hot Plug RAID Memory have hot-replace
capability directly out of the box, regardless of the operating system used. This operating
system independence was achieved using System Management Mode (SMM), a mode of
Intel processors. Use of SMM eliminated the need for HP engineers to develop driver
software for every OS and removed the maintenance associated with those drivers.
When an administrator initiates a hot-replace operation, the memory controller tells the
server to ignore the cartridge of memory where the hot-replace operation will be
performed. Until the hot-replace operation is completed, memory transactions use the
other four memory cartridges protected by ECC. Thus, the memory subsystem operates in
a nonredundant mode like today's ECC memory subsystems. At this point the cartridge
containing the DIMM to be replaced can be removed from the system. The failed DIMM
can then be replaced in that cartridge and the cartridge can be inserted into the system.
Once the memory cartridge is back online, full redundancy is restored.
When a cartridge is inserted back into the system, Hot Plug RAID Memory automatically
rebuilds the data across all the memory cartridges. Rebuilding data can degrade memory
performance briefly, but a rebuild for 4 GB of memory takes about 30 seconds—a small
price to pay to avoid downtime while increasing fault tolerance.
After the RAID logic rebuilds the data, a verify procedure confirms that the rebuild
operation was successful. During a verify procedure, every address location in memory is
read. Errors found are reported to the system. If the verify procedure does not confirm
that the rebuild operation was successful, the memory will not be brought online until the
problem is corrected. The verify command can also be initiated independently of a hot-
plug procedure. For example, an administrator can set up a routine that will run the verify
procedure periodically and report any errors before they cause problems. This type of
proactive monitoring program further reduces downtime.
8

Advertisement

Table of Contents
loading

Table of Contents