Page 1
IBM Elastic Storage System 3000 6.0.2 Service Guide SC28-3187-01...
Page 2
IBM welcomes your comments; see the topic “How to submit your comments” on page xi. When you send information to IBM, you grant IBM a nonexclusive right to use or distribute the information in any way it believes appropriate without incurring any obligation to you.
Contents Figures........................v Tables......................... vii About this information..................ix Who should read this information.......................ix IBM Elastic Storage System information units................... ix Related information.............................ix Conventions used in this information......................x How to submit your comments........................xi Chapter 1. Events....................1 Array events..............................1 Canister events.............................2...
Figures 1. Unlocking the drive and release latch......................21 2. Removing the drive............................21 3. Inserting the new drive..........................22 4. Completing the drive installation........................22 5. Correct drive blank orientation........................23 6. Details of Power Supply Units in the management GUI................24 7.
Tables 1. Conventions..............................x 2. Events for the Array component........................1 3. Events for the Canister component......................2 4. Events for the Enclosure component......................4 5. Events for the physical disk component....................... 8 6. Events for the Recovery group component....................11 7.
This information is intended for administrators of IBM Elastic Storage System (ESS) that includes IBM Spectrum Scale RAID. IBM Elastic Storage System information units IBM Elastic Storage System (ESS) 3000 documentation consists of the following information units. Information unit Type of information Intended users...
– IBM System Storage DCS3700 Storage Subsystem and DCS3700 Storage Subsystem with Performance Module Controllers: Installation, User's, and Maintenance Guide, GA32-0959-07: http://www.ibm.com/support/docview.wss?uid=ssg1S7004920 • For information about the IBM Power Systems EXP24S I/O Drawer (FC 5887), see IBM Knowledge Center : http://www.ibm.com/support/knowledgecenter/8247-22L/p8ham/p8ham_5887_kickoff.htm •...
In the left margin of the document, vertical lines indicate technical changes to the information. How to submit your comments To contact the IBM Spectrum Scale development organization, send your comments to the following email address: scale@us.ibm.com About this information xi...
Page 12
IBM Elastic Storage System 3000: Service Guide...
The old RAS event log is emptied automatically. You can verify that the event log is emptied either by using the mmhealth node eventlog command or in the IBM Spectrum Scale GUI. Note: The event logs are updated only the first time IBM Spectrum Scale is upgraded to version 5.0.5.3 or higher.
STATE_CHANGE WARNING Temperature sensor The temperature Check the temperature {0} I2C bus is failed. sensor I2C bus mmlsenclosure status by using the failed. command reports mmlsenclosure the temperature command. sensor with a failure. 2 IBM Elastic Storage System 3000: Service Guide...
Page 15
STATE_CHANGE ERROR The inspection of the Number of populated The /opt/ibm/gss Check for specific events CPU slots found a CPU slots, number of /tools/bin/ related to CPUs by using mismatch enabled CPUs, ess3kplt command the mmhealth command.
{0} is not the adapter is not BIOS level using the available. available. mmlsfirmware command. adapter_bios_ok STATE_CHANGE INFO The BIOS level of The BIOS level of adapter {0} is the adapter is correct. correct. 4 IBM Elastic Storage System 3000: Service Guide...
Page 17
Table 4. Events for the Enclosure component (continued) Event Event Type Severity Message Description Cause User Action adapter_bios_wrong STATE_CHANGE WARNIN The bios level of The bios level of Check the installed adapter {0} is the adapter is BIOS level using the wrong.
Page 18
The fan state is ok. fan_speed_high STATE_CHANGE WARNIN Fan {0} speed is The fan speed is Check the enclosure too high out of the tolerance cooling module LEDs range for fan faults. 6 IBM Elastic Storage System 3000: Service Guide...
Page 19
Table 4. Events for the Enclosure component (continued) Event Event Type Severity Message Description Cause User Action fan_speed_low STATE_CHANGE WARNIN Fan {0} speed is The fan speed is Check the enclosure too low out of the tolerance cooling module LEDs range for fan faults.
BPM, which condition for the NVRAM encountered the drive of the disk. errors from FSP log or call home data, and replace the faulty NVDIMM cards, BPM or both as soon as possible. 8 IBM Elastic Storage System 3000: Service Guide...
Page 21
Pdisks found on this Pdisks found node. gnr_pdisk_found INFO_ADD_ENTI INFO GNR pdisk {0} was A GNR pdisk listed in found. the IBM Spectrum Scale configuration was detected. gnr_pdisk_maintena STATE_CHANGE WARNING GNR pdisk {0} is in The GNR pdisk is in The mmlspdisk Complete the maintenance.
Page 22
0 indicates that full life remains, and 100 indicates that the drive is at or past its end of life. The drive must be replaced when the value exceeds 100", "state":"DEGRADED " }. 10 IBM Elastic Storage System 3000: Service Guide...
The recovery group is {0} is not active. not active. gnr_rg_found INFO_ADD_ENTITY INFO GNR recovery group A GNR recovery group {0} was found. listed in the IBM Spectrum Scale configuration was detected. gnr_rg_ok STATE_CHANGE INFO GNR recoverygroup The recovery group is {0} is ok.
Page 24
{0} failed. hardware state using xCAT. server_power_supply_aux_line_ STATE_CHANGE INFO AUX Line 12V of The GUI The hardware None. 12V_ok Power Supply checks the part is ok. {0} is ok. hardware state using xCAT. 12 IBM Elastic Storage System 3000: Service Guide...
Page 25
Table 7. Server events (continued) Event Event Type Severi Message Description Cause User Action server_power_supply_aux_line_ STATE_CHANGE ERRO AUX Line 12V of The GUI The hardware None. 12V_failed Power Supply checks the part failed. {0} failed. hardware state using xCAT. server_power_supply_ fan_ok STATE_CHANGE INFO Fan of Power...
Page 26
{0} failed. hardware state using xCAT. server_pci_ok STATE_CHANGE INFO All PCIs of The GUI The hardware None. server {0} are checks the part is ok. fully available. hardware state using xCAT. 14 IBM Elastic Storage System 3000: Service Guide...
Page 27
Table 7. Server events (continued) Event Event Type Severi Message Description Cause User Action server_pci_failed STATE_CHANGE ERRO At least one PCI The GUI The hardware None. of server {0} checks the part failed. failed. hardware state using xCAT. server_ps_conf_ok STATE_CHANGE INFO All Power The GUI...
IBM Spectrum Scale configuration was detected. gnr_vdisk_offline STATE_CHANGE ERROR GNR vdisk {0} is offline. The vdisk state is offline. gnr_vdisk_ok STATE_CHANGE INFO GNR vdisk {0} is ok. The vdisk state is ok. 16 IBM Elastic Storage System 3000: Service Guide...
Page 29
INFO_DELETE_E INFO GNR vdisk {0} has A GNR vdisk listed in Run the mmlsvdisk NTITY vanished. the IBM Spectrum command to verify that all Scale configuration was vdisk, expected GNR vdisk exist. not detected. listed in the Spect...
Page 30
18 IBM Elastic Storage System 3000: Service Guide...
Chapter 2. Servicing Service information is intended for IBM authorized service personnel only. Consult the terms of your warranty to determine the extent to which you can attempt to accomplish any IBM ESS 3000 system maintenance. IBM service support representatives and lab based services personnel can access service information through the following link.
Page 32
The drive associated with the pdisk name in the previous command should now have flashing amber fault LED to indicate it is safe to remove this drive. Removing the disk physically 1. Press the blue touchpoint to unlock the latching handle, as shown in this figure. 20 IBM Elastic Storage System 3000: Service Guide...
Figure 1. Unlocking the drive and release latch 2. Lower the handle and slide the drive out of the enclosure, as shown in this figure. Figure 2. Removing the drive Replacing the drive 1. Ensure that the LED indicators are at the top of the drive. 2.
The following pdisks will be formatted on node ess01io1: mmvdisk: /dev/sdrk mmvdisk: mmvdisk: Location SX32901810-11 is Enclosure 2 Drive 11. mmvdisk: Pdisk e2s11 of RG BB01L successfully replaced. mmvdisk: Carrier resumed. 22 IBM Elastic Storage System 3000: Service Guide...
6. Slide the replacement drive blank into the empty drive slot. Removing and replacing a power supply unit You can remove and replace either of the two hot-swap redundant power supply units (PSUs) in an IBM Elastic Storage System 3000 control enclosure. These redundant power supplies operate in parallel, one continuing to power the enclosure if the other fails.
Page 36
When you replace this part, you must follow recommended procedures for handling electrostatic discharge (ESD)-sensitive devices. 1. In the management GUI, you can identify the faulty PSU from the Monitoring > Hardware page. 24 IBM Elastic Storage System 3000: Service Guide...
Page 37
You can also run the mmhealth node show enclosure command on the canister of the affected enclosure. To identify the affected enclosure, run the mmhealth cluster show enclosure command. The faulty enclosure will be in an DEGRADED or FAILED state. A faulty power supply would be indicated by the power_supply_failed, power_supply_absent, power_high_power, power_high_current or power_no_power events.
5 minutes. Operating for longer than this period might cause the control enclosure to shut down due to overheating. • No tools are required to complete this task. Do not remove or loosen any screws. 26 IBM Elastic Storage System 3000: Service Guide...
Page 39
1. Remove the power supply unit. For more information, see “Removing and replacing a power supply unit” on page 23. Removing the power interposer 2. Remove the power interposer by pulling on the blue handle that is located beneath the PSU slot. Figure 9 on page 27 shows an example.
• All new or existing building blocks must be at the ESS 5.3.5.1 or ESS 3000 6.0.0.1 level. If the setup has any protocol nodes, these nodes must also be upgraded to ESS 5.3.5.1 levels (underlying code + IBM Spectrum Scale 5.0.4.2 verified by using the gssinstallcheck command).
Page 41
• When the resizing is done and the upgraded ESS 3000 is back online, you can perform other ESS and GPFS operations. Note: GPFS uses preferentially the new network shared disks (NSDs) to store data of a new file system. GPFS has four new NSDs that are the same as the four original NSDs, the workload per server is the same as it was before.
14. Install the hardware – add an adapter to the available middle slot in canister B. 15. Reinsert canister B into the enclosure and do basic checks. 16. Incorporate new interfaces to the existing single master bond in canister B. 17. Start GPFS in canister B. 30 IBM Elastic Storage System 3000: Service Guide...
18. Mount file systems on canister B. • Steps 1–3 are expected to be customer task • Steps 4–6 are SSR task • Steps 7–12 are expected to be customer task • Steps 13–15 are SSR task • Steps 16–18 are expected to be customer task Summary The goal of this procedure is primarily to add a third high-speed adapter into each ESS 3000 canister.
Page 44
Remove top cover of canister A and install one adapter in the 3 PCIe slot that is located between the two existing adapters. For more information about these steps, see the IBM Elastic Storage System 3000: Service Guide. 6. Insert canister A into the enclosure again and do basic checks. (SSR task) a.
Page 45
c. Connect the new high speed cables to the new adapter if provided by the customer. d. Perform basic checks via the technician port in canister A by using the “essserv1” service ID. e. Perform the following checks in the essutils menu: i) Option 2.
Page 46
Display the state of the nodeclass associated with the ESS 3000. # mmgetstate -N the ESS 3000 nodeclass Example: [root@ess3k5a ~]# mmvdisk nodeclass list node class recovery groups -------------------------- -------------------------- ece_nc_1 ece_rg_1 ess_x86_64_mmvdisk_78E016N ess3k_78E016N ess_x86_64_mmvdisk_78E05N1 ess3k_78E05N1 gssio1_ibgssio2_ib rg_gssio1-ib, rg_gssio2-ib 34 IBM Elastic Storage System 3000: Service Guide...
Page 47
Remove top cover of canister B and install an adapter in the 3 PCIe slot that is located between two existing adapters. For more information about these steps, see IBM Elastic Storage System 3000: Service Guide. 15. Insert canister B into the enclosure again and do basic checks. (SSR task) a.
Page 48
8d5550ae-52e4-4a2e-8037-e09c2df60dbc ethernet servport 28be0740-4dfd-4646-8f06-331de13f2c7f ethernet 17. Start GPFS in canister B. (Customer task) # mmstartup -N canister B a. Verify that GPFS is active on A and B canisters. 36 IBM Elastic Storage System 3000: Service Guide...
Update the verbsPorts list, first start GPFS manually. ii) Ensure that correct entries are listed in verbsPorts for the target node class by issuing the following command: /opt/ibm/ess/tools/samples/essServerConfig.sh node class name # mmlsconfig -Y | grep -i verbsPort Example:...
Note: These images show the PCIe ports for two adapters for each canister. The MES upgrade installs the third adapter with two more ports in the PCIe slot between the original two adapters. 38 IBM Elastic Storage System 3000: Service Guide...
Page 51
Online adapter MES procedure 1. Prepare canister A (bottom canister slot) for an adapter MES. (Customer task) a. Determine the node class of the ESS 3000 enclosure where you want to perform MES. # mmvdisk nodeclass list b. Display the state of the nodeclass associated with the ESS 3000. # mmgetstate -N the ESS 3000 nodeclass Example: [root@ess3k5a ~]# mmvdisk nodeclass list...
Page 52
7. Incorporate new interfaces to the existing single master bond in canister A. (Customer task) To configure each of the two new ports to be either InfiniBand or Ethernet, see the ConnectX-5 VPI support on ESS 3000 topic in the IBM Elastic Storage System 3000: Quick Deployment Guide. Contact technical support, if required.
Page 53
mmlsconfig::0:1:::verbsPorts:mlx5_2/1 mlx5_3/1:ess_x86_64_mmvdisk_6: mmlsconfig::0:1:::verbsPorts:mlx5_0/1:gss_ppc64: 9. Mount file systems to canister A again. (Customer task) a. Check mounted file systems and mount to canister A again, if necessary. # mmlsmount fs1 -L # mmlsmount all -L b. Move the quorum node back to the original state, if necessary. 10.
Page 54
16. Incorporate new interfaces to the existing single master bond in canister B. (Customer task) To configure each of the two new ports to be either InfiniBand or Ethernet, see the ConnectX-5 VPI support on ESS 3000 topic in the IBM Elastic Storage System 3000: Quick Deployment Guide. Contact technical support, if required.
Ensure that correct entries are listed in verbsPorts for the target node class by issuing the following command: /opt/ibm/ess/tools/samples/essServerConfig.sh node class name # mmlsconfig -Y | grep -i verbsPort Example: [root@ess3k5a ~]# mmlsconfig -Y | grep -i verbsPort mmlsconfig::0:1:::verbsPorts:mlx5_1/1::...
Page 56
The goal of this procedure is to add additional memory into each ESS 3000 canister. When the physical memory is installed, the customer can complete the operation by increasing the GPFS page pool. For more information, see the Planning for hardware chapter of the IBM Elastic Storage System 3000: Hardware Planning and Installation Guide.
Page 57
After basic checks completion, place everything back into the frame and reinsert power cables. This step restarts the nodes. You can use the procedure in the Installing chapter of the IBM Elastic Storage System 3000: Hardware Planning and Installation Guide to do the following steps: i) Plug your laptop to point-to-point to each container technician port.
Page 58
Ensure that node servers are active before you do the next step by issuing the following command: # mmgetstate -a You can use the following command also to check the pagepool: # mmvdisk server list --nc <node class name> --config 46 IBM Elastic Storage System 3000: Service Guide...
# mmhealth node show ESS 3000 storage drives concurrent MES upgrade An online IBM Elastic Storage System 3000 (ESS 3000) MES upgrade is supported for customers who want to upgrade a 12-drive ESS 3000 to a 24-drive ESS 3000. To upgrade the system, the NVMe drives with the same size as the existing 12 drives must be used. This MES upgrade doubles the available storage capacity in the existing ESS 3000.
Page 60
IBM Spectrum Scale 5.0.4.3 must be verified by using the gssinstallcheck or essinstallcheck command). • The system must be healthy before the ESS 3000 storage MES upgrade. • The existing ESS 3000 must be a properly installed 12 NVMe system with 12 NVMe drives correctly located in slots 1 - 6 and 13 - 18.
The customer can use the new space by creating new vdisk sets from the available space. When the vdisk sets are added to the existing file system (if required), the restripe operation can be run. For more information, see IBM Spectrum Scale: Administration Guide. Example: Manually restarting GPFS on the ESS 3000 canisters You can manually stop and start GPFS to solidify the nodes configuration changes on both canisters.
Page 65
ess3ka-ib active ess3kb-ib active 6. Repeat the mmshutdown command and the mmstartup command on canisterB. Chapter 2. Servicing 53...
Page 66
54 IBM Elastic Storage System 3000: Service Guide...
Table 10. FRU Part Numbers (continued) Description Part Number END7 Power Cable - Drawer to IBM PDU - C13/C20 0000001PP687 (250V/10A) for India END5 power cord (9.2 ft), Drawer to IBM PDU - 0000001PP688 C13/C20 (250V/10A) for India Trusted Platform Module (TPM)
Page 69
100M QSFP28 AOC 100Gb Ethernet cable 0000001FT728 15M QSFP28 AOC 100Gb Ethernet cable 0000001FT730 END3 Power Cable - Drawer to IBM PDU - C13/ 0000001KV679 C14, 200-240V/10A for India END0 Power Cord M (6.5 foot), Drawer to IBM PDU 0000001KV680 - C13/C14 (250V/10A) for India Chapter 3.
Page 70
Table 11. Cable Part Numbers (continued) Description Part Number END1 Power Cord M (9 foot), Drawer to IBM PDU - 0000001KV681 C13/C14 (250V/10A) for India END2 Power Cord m (14 ft), Drawer to IBM PDU - 0000001KV682 C13/C14 (250V/10A) for India...
Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead.
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml.
• See refers you from a non-preferred term to the preferred term or from an abbreviation to the spelled- out form. • See also refers you to a related or contrasting term. For other terms and definitions, see the IBM Terminology website (opens in new window): http://www.ibm.com/software/globalization/terminology building block A pair of servers with shared disk enclosures attached.
Page 76
ESS to a single cluster in the ESS when the other clusters in the ESS fails. See also cluster. (3) The routing of all transactions to a second controller when the first controller fails. See also cluster. 64 IBM Elastic Storage System 3000: Service Guide...
Page 77
failure group A collection of disks that share common access paths or adapter connection, and could all become unavailable through a single hardware failure. See file encryption key (FEK). file encryption key (FEK) A key used to encrypt sectors of an individual file. See also encryption key. file system The methods and data structures used to control how data is stored and retrieved.
Page 78
Provides a way to control the bundling of several physical ports together to form a single logical channel. logical partition (LPAR) A subset of a server's hardware resources virtualized as a separate computer, each with its own operating system. See also node. 66 IBM Elastic Storage System 3000: Service Guide...
Page 79
LPAR See logical partition (LPAR). management network A network that is primarily responsible for booting and installing the designated server and compute nodes from the management server. management server (MS) An ESS node that hosts the ESS GUI and xCAT and is not connected to storage. It must be part of a GPFS cluster.
Page 80
A server that is used to store master encryption keys. See recovery group (RG). recovery group data (RGD) Data that is associated with a recovery group. RKM server See remote key management server (RKM server). See Serial Attached SCSI (SAS). 68 IBM Elastic Storage System 3000: Service Guide...
Page 81
secure shell (SSH) A cryptographic (encrypted) network protocol for initiating text-based shell sessions securely on remote computers. Serial Attached SCSI (SAS) A point-to-point serial protocol that moves data to and from such computer storage devices as hard drives and tape drives. service network A private network that is dedicated to managing POWER8 servers.
Page 82
70 IBM Elastic Storage System 3000: Service Guide...
Page 83
Recovery group events 11 server events 11 virtual disk events 16 documentation ix resources ix IBM Elastic Storage System 3000 28 IBM Spectrum Scale events 1, 2, 4, 8, 11, 16 RAS events 1, 2, 4, 8, 11, 16 information overview ix...
Page 84
72 IBM Elastic Storage System 3000: Service Guide...
Page 85
Recovery group events 11 server events 11 virtual disk events 16 documentation ix resources ix IBM Elastic Storage System 3000 28 IBM Spectrum Scale events 1, 2, 4, 8, 11, 16 RAS events 1, 2, 4, 8, 11, 16 information overview ix...
Page 86
74 IBM Elastic Storage System 3000: Service Guide...