Huawei TaiShan Troubleshooting Manual page 73

Table of Contents

Advertisement

TaiShan Servers
Troubleshooting
Fault Symptom
Issue 12 (2022-08-30)
Diagnosis Method
Use iBMC to locate the fault,
for example, the DIMM,
drive, or mainboard
component for which an
alarm is reported.
If the OS logs contain read-
only file system records, use
Smart Provisioning
the drive and decide
whether to replace the drive
based on the result.
Check whether there is a
Machine Check Exception
issue. Locate such a fault by
checking /var/log/mce.log
and error codes of serial
port Kdump information.
Collect the following
information:
● For new servers, confirm
the proportion of
abnormal servers and
check whether normal
and abnormal servers
have the same
configurations.
● For existing servers,
confirm the number of
abnormal servers and
check whether the issues
occur under specific
circumstances.
● Check iBMC for hardware
alarms.
After collecting the
preceding information,
determine whether it is a
single server or hardware
issue. Run
Provisioning
locating.
Copyright © Huawei Technologies Co., Ltd.
5 Diagnosing and Rectifying Faults
to rate
Smart
for fault
Conclusion
Circuit hardware is
faulty.
A drive fault occurred.
● The hardware is
faulty.
● The software or
hardware interface
setting is incorrect.
Locate the fault based
on the report.
66

Advertisement

Table of Contents
loading

Table of Contents