Automatic System Recovery (Asr) - Sun Microsystems Fire V240 Administration Manual

Hide thumbs Also See for Fire V240:
Table of Contents

Advertisement

If the kernel hangs and the watchdog times out, ALOM reports and logs the event
and performs one of three user configurable actions.
xir: this is the default action and will cause the server to sync the filesystems and
restart. In the event of the sync hanging, ALOM will fallback to a hard reset after
15 minutes.
Reset: this is a hard reset and results in a rapid system recovery but diagnostic
data regarding the hang is not stored, and filesystem damage may result.
None - this will result in the system being left in the hung state indefinitely after
the watchdog timeout has been reported.
For more information, see the sys_autorestart section of the ALOM Online Help
that is contained on the Sun Fire V210 and V240 Server Documentation CD.

Automatic System Recovery (ASR)

Note – Automatic System Recovery (ASR) is not the same as Automatic Server
Restart, which the Sun Fire V210 and V240 servers also support.
Automatic System Recovery (ASR) consists of self-test features and an auto-
configuring capability to detect failed hardware components and unconfigure them.
By doing this, the server is able to resume operating after certain non-fatal hardware
faults or failures have occured.
If a component is one that is monitored by ASR, and the server is capable of
operating without it, the server will automatically reboot if that component should
develop a fault or fail.
ASR monitors the following components:
Memory modules
If a fault is detected during the power-on sequence, the faulty component is
disabled. If the system remains capable of functioning, the boot sequence continues.
If a fault occurs on a running server, and it is possible for the server to run without
the failed component, the server automatically reboots. This prevents a faulty
hardware component from keeping the entire system down or causing the system to
crash repeatedly.
To support such a degraded boot capability, the OpenBoot firmware uses the 1275
Client Interface (via the device tree) to mark a device as either failed or disabled, by
creating an appropriate status property in the device tree node. The Solaris
operating environment will not activate a driver for any subsystem so marked.
Chapter 6 Diagnostics
81

Hide quick links:

Advertisement

Table of Contents
loading

Table of Contents