Memory Fault Handling - Sun Microsystems Sun Fire T2000 Service Manual

Table of Contents

DIMMs are installed in groups of eight, called ranks (ranks 0 and 1). At a minimum,

rank 0 must be fully populated with eight DIMMS of the same capacity. A second

rank of DIMMs of the same capacity, can be added to fill rank 1.

See

Section 5.2.3, "Removing DIMMs" on page 5-12

memory to a server.

3.1.1.2

Memory Fault Handling

The server uses advanced ECC technology, also called chipkill, that corrects up to 4-

bits in error on nibble boundaries, as long as the bits are all in the same DRAM. If a

DRAM fails, the DIMM continues to function.

The following server features independently manage memory faults:

POST – Based on ALOM CMT configuration variables, POST runs when the

■

server is powered on. In normal operation, the default configuration of POST

(diag_level=min), provides a check to ensure the server will boot. Normal

operation applies to any boot of the server not intended to test power-on errors,

hardware upgrades, or repairs. Once the Solaris OS is running, PSH provides run-

time diagnosis of faults.

When a memory fault is detected, POST displays the fault with the device name

of the faulty DIMMS, logs the fault, and disables the faulty DIMMs by placing

them in the ASR blacklist. For a given memory fault, POST disables half of the

physical memory in the system. When this offlining process occurs in normal

operation, you must replace the faulty DIMMs based on the fault message and

enable the disabled DIMMs with the ALOM CMT enablecomponent command.

In other than normal operation, POST can be configured to run various levels of

testing (see

subsystem based on the purpose of the test. However, with thorough testing

enabled (diag_level=max), POST finds faults and offlines memory devices with

errors that could be correctable with PSH. Thus, not all memory devices detected

and offlined by POST need to be replaced. See

Detected by POST" on page

Solaris Predictive Self-Healing (PSH) technology – A feature of the Solaris OS,

■

PSH uses the fault manager daemon (fmd) to watch for various kinds of faults.

When a fault occurs, the fault is assigned a unique fault ID (UUID), and logged.

PSH reports the fault and provides a recommended proactive replacement for the

DIMMs associated with the fault.

and

TABLE 3-9

TABLE 3-10

3-36.

for instructions about adding

) and can thoroughly test the memory

Section 3.4.5, "Correctable Errors

Chapter 3 Server Diagnostics

3-7

Table of Contents

This manual is also suitable for:

Fire t2000

Memory Fault Handling - Sun Microsystems Sun Fire T2000 Service Manual

Memory Fault Handling

Related Manuals for Sun Microsystems Sun Fire T2000

Related Content for Sun Microsystems Sun Fire T2000

This manual is also suitable for:

Table of Contents