distributed or physically inaccessible machines. In addition, ALOM CMT enables
you to run diagnostics (such as POST) remotely that would otherwise require
physical proximity to the server's serial port.
You can configure ALOM CMT to send email alerts of hardware failures, hardware
warnings, and other events related to the server or to ALOM CMT. The ALOM CMT
circuitry runs independently of the server, using the server's standby power.
Therefore, ALOM CMT firmware and software continue to function when the server
operating system goes offline or when the server is powered off. ALOM CMT
monitors the following server components:
CPU temperature conditions
■
Hard drive status
■
Enclosure thermal conditions
■
Fan speed and status
■
Power supply status
■
Voltage levels
■
Faults detected by POST (power-on self-test)
■
Solaris Predictive Self-Healing (PSH) diagnostic facilities
■
For information about configuring and using the ALOM system controller, refer to
the latest Advanced Lights Out Manager (ALOM) CMT Guide.
2.1.4
System Reliability, Availability, and Serviceability
Reliability, availability, and serviceability (RAS) are aspects of a system's design that
affect its ability to operate continuously and to minimize the time necessary to
service the system. Reliability refers to a system's ability to operate continuously
without failures and to maintain data integrity. System availability refers to the
ability of a system to recover to an operational state after a failure, with minimal
impact. Serviceability relates to the time it takes to restore a system to service
following a system failure. Together, reliability, availability, and serviceability
features provide for near continuous system operation.
To deliver high levels of reliability, availability, and serviceability, the server offers
the following features:
Hot-pluggable hard drives
■
Redundant, hot-swappable power supplies (two)
■
Redundant hot-swappable fan units (three)
■
Environmental monitoring
■
Error detection and correction for improved data integrity
■
Easy access for most component replacements
■
Extensive POST tests that automatically delete faulty components from the
■
configuration
2-6
Sun Fire T2000 Server Service Manual • July 2007