Collecting Diagnostic Data - IBM Power System 8335-GCA Manual

Problem analysis, system parts, and locations
Hide thumbs Also See for Power System 8335-GCA:
Table of Contents

Advertisement

Table 30. Determining a verification action for GPUs, PCIe adapters, and devices
Adapter type
Devices that are controlled by a RAID adapter
Devices that are not controlled by a RAID adapter
GPU
Network adapter
RAID adapter

Collecting diagnostic data

Learn how to collect diagnostic data to send to IBM service and support.
To collect diagnostic data, complete the following steps:
1.
Is the operating system available?
If
Then
Yes:
Continue with step 2.
No:
Continue with step 3 on page 110.
2. To collect diagnostic data from the operating system, complete the following steps:
a. Log in as root user.
Verification action
Complete the following steps:
1. Install the arcconf utility for the RAID adapter.
2. Type ARCCONF GETSMARTSTATS 1 at the command
prompt and press Enter.
3. Verify that the self-monitoring, analysis and reporting
technology system (SMART) health assessment for
the device passed.
Complete the following steps:
1. Install the smartmontools utility.
2. Type apt-get install smartmontools at the
command prompt and press Enter.
3. At the command prompt, type smartctl --all
/dev/sdx, where x is the letter that is associated with
the drive.
4. Verify that the SMART health assessment passed.
Complete the following steps:
1. Type nvidia-smi -L at the command prompt and
press Enter. Verify that the GPU is listed.
2. Type nvidia-smi -q at the command prompt and
press Enter. Verify that no errors are listed.
Complete the following steps:
1. At the command prompt, type ethtool ethx, where x
is the number of the physical port that you are
testing. Verify that the connection speed that is
indicated in the output is correct.
2. Perform a ping test to verify the network
connectivity.
Complete the following steps:
1. Install the arcconf utility for the RAID adapter.
2. Type ARCCONF GETLOGS 1 STATS at the command
prompt and press Enter.
3. Verify that usage statistics are returned. The presence
of usage statistics indicates that the adapter is
functioning properly.
Beginning troubleshooting and problem analysis
109

Advertisement

Table of Contents
loading

Table of Contents