Non-uniform Memory Access (NUMA)
Reliability, Availability, and Serviceability (RAS)
007-4377-002
In DSM systems, memory is physically located at various distances from the processors.
As a result, memory access times (latencies) are different or "non-uniform." For example,
it takes less time for a processor to reference its local memory than to reference remote
memory.
The Altix 3700 Bx2 server series components have the following features to increase the
reliability, availability, and serviceability (RAS) of the systems.
•
Power and cooling:
–
Power supplies are redundant and can be hot-swapped.
–
Bricks have overcurrent protection.
–
Fans are redundant and can be hot-swapped.
–
Fans run at multiple speeds in all bricks except the optional R-brick. Speed
increases automatically when temperature increases or when a single fan fails.
•
System monitoring:
–
System controllers monitor the internal power and temperature of the bricks,
and automatically shut down bricks to prevent overheating.
–
Memory, L2 cache, L3 cache, and all external bus transfers are protected by
single-bit error correction and double-bit error detection (SECDED).
–
The NUMAlink interconnect network is protected by cyclic redundancy check
(CRC).
–
The L1 primary cache is protected by parity.
–
Each brick has failure LEDs that indicate the failed part; LEDs are readable via
the system controllers.
–
Systems support the optional Embedded Support Partner (ESP), a tool that
monitors the system; when a condition occurs that may cause a failure, ESP
notifies the appropriate SGI personnel.
–
Systems support remote console and maintenance activities.
System Features
53
Need help?
Do you have a question about the SGI Altix 3700 Bx2 and is the answer not in the manual?