Monitor the InfiniBand Fabric
15.1.14 Determine Which Links Are Experiencing Significant Errors
You can use the ibdiagnet command to determine which links are experiencing
symbol errors and recovery errors by injecting packets.
On the command-line interface (CLI), run the following command:
# ibdiagnet -c 100 -P all=1
In this instance of the ibdiagnet command, 100 test packets are injected into each
link and the -P all=1 option returns all counters that increment during the test.
In the output of the ibdiagnet command, search for the symbol_error_counter
string. That line contains the symbol error count in hexadecimal. The preceding lines
identify the node and port with the errors. Symbol errors are minor errors, and if there
are relatively few during the diagnostic, they can be monitored.
In addition, in the output of the ibdiagnet command, search for the
link_error_recovery_counter string.
That line contains the recovery error count in hexadecimal. The preceding lines
identify the node and port with the errors. Recovery errors are major errors and the
respective links must be investigated for the cause of the rapid symbol error
propagation.
Additionally, the ibdiagnet.log file contains the log of the testing.
15.1.15 Check All Ports
To perform a quick check of all ports of all nodes in your InfiniBand fabric, you can
use the ibcheckstate command.
On the command-line interface (CLI), run the following command:
# ibcheckstate -v
The output is displayed, as in the following example:
# Checking Switch: nodeguid 0x0021283a8389a0a0
Node check lid 15: OK
Port check lid 15 port 23: OK
Port check lid 15 port 19: OK
.
.
.
# Checking Ca: nodeguid 0x0003ba000100e388
Node check lid 14: OK
Port check lid 14 port 2: OK
## Summary: 5 nodes checked, 0 bad nodes found
## 10 ports checked, 0 ports with bad state found
#
15-10 Oracle Exalogic Elastic Cloud Machine Owner's Guide
Note:
According to the InfiniBand specification 10E-12 BER, the maximum allowable
symbol error rate is 120 errors per hour.
Need help?
Do you have a question about the Exalogic Elastic Cloud and is the answer not in the manual?
Questions and answers