Troubleshoot With Diagnostic Tools - Oracle Exadata X10M Service Manual

Table of Contents

Advertisement

Table 2-2
(Cont.) Server Cooling Issues
Cooling
Description
Issue
Hardware
Components, such as power supplies and fan
Component
modules, are an integral part of the server
Failure
cooling system.
When one of these components fails, the
server internal temperature can rise. This rise
in temperature can cause other components to
enter into an over-temperature state. Some
components, such as processors, might
overheat when they are failing, which can also
generate an over-temperature event.
To reduce the risk related to component
failure, power supplies and fan modules are
installed in pairs to provide redundancy.
Redundancy ensures that if one component in
the pair fails, the other functioning component
can continue to maintain the subsystem.

Troubleshoot With Diagnostic Tools

The server and its accompanying software and firmware contain diagnostic tools and
features that can help you isolate component problems, monitor the status of a
functioning system, and exercise one or more subsystem to disclose more subtle or
intermittent hardware-related problems.
Each diagnostic tool has its own specific strength and application. Review the tools
listed in this section and determine which tool might be best to use for your situation.
After you determine the tool to use, you can access it locally, while at the server, or
remotely. The selection of diagnostic tools available for your server range in complexity
from a comprehensive validation test suite (Oracle VTS) to a chronological event log
(Oracle ILOM event Log). The selection of diagnostic tools also includes standalone
software packages, firmware-based tests, and hardware-based LED indicators.
The following table summarizes the diagnostic tools that you can use when
troubleshooting or monitoring your server.
Diagnosing Server Component Hardware Faults
Action
Prevention
Investigate the cause of
Component
the overtemperature
redundancy is provided
event, and replace failed
to allow for component
components
failure in critical
immediately. See
subsystems, such as
Diagnosing Server
the cooling subsystem.
Component Hardware
However, once a
Faults.
component in a
redundant system fails,
the redundancy no
longer exists, and the
risk for server
shutdown and
component failures
increases. Therefore, it
is important to maintain
redundant systems and
replace failed
components
immediately.
Chapter 2
2-8

Advertisement

Table of Contents
loading
Need help?

Need help?

Do you have a question about the Exadata X10M and is the answer not in the manual?

Questions and answers

Subscribe to Our Youtube Channel

Table of Contents

Save PDF