Download Print this page

Advertisement

Quick Links

IBM Elastic Storage System 3000
Version 6.0.1
Service Guide
IBM
SC28-3158-00

Advertisement

loading

Summary of Contents for IBM Elastic Storage System 3000

  • Page 1 IBM Elastic Storage System 3000 Version 6.0.1 Service Guide SC28-3158-00...
  • Page 2 IBM welcomes your comments; see the topic “How to submit your comments” on page xi. When you send information to IBM, you grant IBM a nonexclusive right to use or distribute the information in any way it believes appropriate without incurring any obligation to you.
  • Page 3 Contents Figures........................v Tables......................... vii About this information..................ix Who should read this information.......................ix IBM Elastic Storage System information units................... ix Related information.............................ix Conventions used in this information......................x How to submit your comments........................xi Chapter 1. Events....................1 Array events..............................1 Canister events............................
  • Page 4 Index........................63...
  • Page 5 Figures 1. Unlocking the drive and release latch......................21 2. Removing the drive............................21 3. Inserting the new drive..........................22 4. Completing the drive installation........................22 5. Correct drive blank orientation........................23 6. Details of Power Supply Units in the management GUI................24 7.
  • Page 7 Tables 1. Conventions..............................x 2. Events for the Array component........................1 3. Events for the Canister component......................1 4. Events for the Enclosure component......................4 5. Events for the physical disk component....................... 8 6. Events for the Recovery group component....................11 7.
  • Page 8 viii...
  • Page 9 This information is intended for administrators of IBM Elastic Storage System (ESS) that includes IBM Spectrum Scale RAID. IBM Elastic Storage System information units IBM Elastic Storage System (ESS) 3000 documentation consists of the following information units. Information unit Type of information Intended users...
  • Page 10 Enter. In command examples, a backslash indicates that the command or coding example continues on the next line. For example: mkcondition -r IBM.FileSystem -e "PercentTotUsed > 90" \ -E "PercentTotUsed < 85" -m p "FileSystem space used" {item} Braces enclose a list from which you must choose an item in format and syntax descriptions.
  • Page 11 How to submit your comments To contact the IBM Spectrum Scale development organization, send your comments to the following email address: scale@us.ibm.com About this information xi...
  • Page 12 IBM Elastic Storage System 3000: Service Guide...
  • Page 13 The recorded events can also be displayed through the GUI. The following sections list the RAS events that are applicable to various components of the IBM Spectrum Scale system: Array events The following table lists the events that are created for the Array component.
  • Page 14 {0} measured a low fallen below the mmlsenclosure status by using the temperature value. actual low critical command reports mmlsenclosure threshold value for at the temperature command. least one sensor. sensor with a failure. 2 IBM Elastic Storage System 3000: Service Guide...
  • Page 15 STATE_CHANG ERROR The inspection of Number of populated The /opt/ibm/gss Check for specific events the CPU slots found CPU slots, number of /tools/bin/ related to CPUs by using a mismatch enabled CPUs, ess3kplt command the mmhealth command.
  • Page 16 The firmware level Check the installed of adapter {0} is of the adapter is BIOS level using the wrong. wrong. mmlsfirmware command. current_failed STATE_CHANGE ERROR currentSensor {0} The currentSensor failed. state is failed. 4 IBM Elastic Storage System 3000: Service Guide...
  • Page 17 If there is an issue with the SAS HBA or SAS Cable, reboot the node to see if this resolves the issue. If not contact your IBM representative. Chapter 1. Events 5...
  • Page 18 The DC power reports high supply current is current. greater than the threshold. power_high_voltage STATE_CHANGE WARNIN Power supply {0} The DC power reports high supply voltage is voltage. greater than the threshold. 6 IBM Elastic Storage System 3000: Service Guide...
  • Page 19 Table 4. Events for the Enclosure component (continued) Event Event Type Severity Message Description Cause User Action power_no_power STATE_CHANGE WARNIN Power supply {0} Power supply has has no power. no input AC power. The power supply may be turned off or disconnected from the AC supply.
  • Page 20 STATE_CHANGE ERROR The NVDIMM of the The nvram drive of the pdisk {0} is failed. disk is in error state. tsls nvra msta mand show fail state nvra drive of the disk. 8 IBM Elastic Storage System 3000: Service Guide...
  • Page 21 The pdisk state is missing. missing. gnr_pdisk_needanalysis STATE_CHANGE ERROR GNR pdisk {0} needs The GNR pdisk has a Contact IBM support if you analysis. problem that has to be mmls are not sure how to solve analyzed and solved pdis this problem.
  • Page 22 GNR will read-only mand using the sg_wr_modes from this disk. show command. s that pdisk state conta VWCE ssd_endurance_ok STATE_CHANGE INFO ssdEndurancePerc ssdEndurancePerc entage of GNR pdisk entage value is ok. {0} is ok. 10 IBM Elastic Storage System 3000: Service Guide...
  • Page 23 The recovery group is {0} is not active. not active. gnr_rg_found INFO_ADD_ENTITY INFO GNR recovery group A GNR recovery group {0} was found. listed in the IBM Spectrum Scale configuration was detected. gnr_rg_ok STATE_CHANGE INFO GNR recoverygroup The recovery group is {0} is ok.
  • Page 24 {0} is ok. hardware state using xCAT. server_power_supply_oc_line_ STATE_CHANGE ERRO OC Line 12V of The GUI The hardware None. 12V_failed Power Supply checks the part failed. {0} failed. hardware state using xCAT. 12 IBM Elastic Storage System 3000: Service Guide...
  • Page 25 Table 7. Server events (continued) Event Event Type Severi Message Description Cause User Action server_power_supply_ov_line_ 12V_ok STATE_CHANGE INFO OV Line 12V of The GUI The hardware None. Power Supply checks the part is ok. {0} is ok. hardware state using xCAT.
  • Page 26 Backplane {0} is checks the part is ok. hardware state using xCAT. dasd_backplane_failed STATE_CHANGE ERRO DASD The GUI The hardware None. Backplane {0} checks the part failed. failed. hardware state using xCAT. 14 IBM Elastic Storage System 3000: Service Guide...
  • Page 27 Table 7. Server events (continued) Event Event Type Severi Message Description Cause User Action server_cpu_ok STATE_CHANGE INFO All CPUs of The GUI The hardware None. server {0} are checks the part is ok. fully available. hardware state using xCAT. server_cpu_failed STATE_CHANGE ERRO At least one CPU...
  • Page 28 The hardware None. healthy. checks the part is ok. hardware state using xCAT. server_failed STATE_CHANGE ERRO The server {0} The GUI The hardware None. failed. checks the part failed. hardware state using xCAT. 16 IBM Elastic Storage System 3000: Service Guide...
  • Page 29 The vdisk state is degraded. degraded. gnr_vdisk_found INFO_ADD_ENTI INFO GNR vdisk {0} was A GNR vdisk listed in found. the IBM Spectrum Scale configuration was detected. gnr_vdisk_offline STATE_CHANGE ERROR GNR vdisk {0} is offline. The vdisk state is offline. gnr_vdisk_ok...
  • Page 30 18 IBM Elastic Storage System 3000: Service Guide...
  • Page 31 Chapter 2. Servicing Service information is intended for IBM authorized service personnel only. Consult the terms of your warranty to determine the extent to which you can attempt to accomplish any IBM ESS 3000 system maintenance. IBM service support representatives and lab based services personnel can access service information through the following link.
  • Page 32 The drive associated with the pdisk name in the previous command should now have flashing amber fault LED to indicate it is safe to remove this drive. Removing the disk physically 1. Press the blue touchpoint to unlock the latching handle, as shown in this figure. 20 IBM Elastic Storage System 3000: Service Guide...
  • Page 33 Figure 1. Unlocking the drive and release latch 2. Lower the handle and slide the drive out of the enclosure, as shown in this figure. Figure 2. Removing the drive Replacing the drive 1. Ensure that the LED indicators are at the top of the drive. 2.
  • Page 34 The following pdisks will be formatted on node ess01io1: mmvdisk: /dev/sdrk mmvdisk: mmvdisk: Location SX32901810-11 is Enclosure 2 Drive 11. mmvdisk: Pdisk e2s11 of RG BB01L successfully replaced. mmvdisk: Carrier resumed. 22 IBM Elastic Storage System 3000: Service Guide...
  • Page 35 6. Repeat the steps listed in the Preparing disks for replacement, Removing the disk physically and Replacing the drive sections for each pdisk that needs to be replaced as marked in the output of the mmvdisk pdisk list --replace command. Removing and replacing a drive blank Use the following procedures to remove a faulty drive slot filler and replace it with a new one from stock.
  • Page 36 • This procedure requires access to either management GUI or CLI command as a root user. IBM service personnel need to coordinate with the customer to work on this procedure. • Do not insert a PSU if the PSU slot does not contain a power interposer.
  • Page 37 1. In the management GUI, you can identify the faulty PSU from the Monitoring > Hardware page. You can also run the mmhealth node show enclosure command on the canister of the affected enclosure. To identify the affected enclosure, run the mmhealth cluster show enclosure command. The faulty enclosure will be in an DEGRADED or FAILED state.
  • Page 38 5 minutes. Operating for longer than this period might cause the control enclosure to shut down due to overheating. • No tools are required to complete this task. Do not remove or loosen any screws. 26 IBM Elastic Storage System 3000: Service Guide...
  • Page 39 1. Remove the power supply unit, as described in “Removing and replacing a power supply unit” on page Removing the power interposer 2. Remove the power interposer by pulling on the blue handle that is located beneath the PSU slot. Figure 9 on page 27 shows an example.
  • Page 40 MES instructions. ESS 3000 storage drives MES upgrade An offline IBM Elastic Storage System 3000 (ESS 3000) MES upgrade is supported for customers who want to upgrade a 12-drive ESS 3000 to a 24-drive ESS 3000.
  • Page 41 • When the resizing is done and the upgraded ESS 3000 is back online, you can perform other ESS and GPFS operations. Note: GPFS uses preferentially the new network shared disks (NSDs) to store data of a new file system. GPFS has four new NSDs that are the same as the four original NSDs, the workload per server is the same as it was before.
  • Page 42 The goal of this procedure is primarily to add a third high-speed adapter into each ESS 3000 canister. Customer can add supported InfiniBand or Ethernet adapters into the third PCI slot. • The PCI address is af:00.1 • The adapter type is ConnectX-5 [ConnectX-5 Ex] 30 IBM Elastic Storage System 3000: Service Guide...
  • Page 43 Figure 12. Ethernet ports on canister 1 (upper canister) Figure 13. Ethernet ports on canister 2 (lower canister) Note: These images show the PCIe ports for two adapters for each canister. For the MES upgrade, ESS 3000 has a third adapter with two more ports. Offline adapter MES procedure 1.
  • Page 44 Confirm with customer to ensure that customer did all required steps, and then disconnect the power cables. b. To shut down the storage enclosure, unplug both power cords that are on both the sides of the ESS 3000 system. 32 IBM Elastic Storage System 3000: Service Guide...
  • Page 45 After basic checks completion, place everything back into the frame and reinsert power cables. This step restarts the nodes. You can use the procedure in the Installing chapter of the IBM Elastic Storage System 3000: Hardware Planning and Installation Guide to do the following steps: 1) Plug your laptop to point-to-point to each container technician port.
  • Page 46 When the server is up again, do a basic ping test between the canister over the high-speed interface. c. If the ping is successful, start GPFS again by issuing the following command: # mmstartup -N <node class name> 34 IBM Elastic Storage System 3000: Service Guide...
  • Page 47 d. Ensure that node servers are active before you do the next step by issuing the following command: # mmgetstate -a e. Turn on the GPFS automount by issuing the following command: # mmchfs <filesystem> -A yes f. Turn on the GPFS autoload by issuing the following command: # mmchconfig autoload=yes g.
  • Page 48 The goal of this procedure is to add additional memory into each ESS 3000 canister. When the physical memory is installed, the customer can complete the operation by increasing the GPFS page pool. For more information, see the Planning for hardware chapter of the IBM Elastic Storage System 3000: Hardware Planning and Installation Guide.
  • Page 49 After basic checks completion, place everything back into the frame and reinsert power cables. This step restarts the nodes. You can use the procedure in the Installing chapter of the IBM Elastic Storage System 3000: Hardware Planning and Installation Guide to do the following steps: 1) Plug your laptop to point-to-point to each container technician port.
  • Page 50 When the server is up again, do a basic ping test between the canister over the high-speed interface. c. If the ping is successful, start GPFS again by issuing the following command: 38 IBM Elastic Storage System 3000: Service Guide...
  • Page 51 # mmhealth node show ESS 3000 storage drives concurrent MES upgrade An online IBM Elastic Storage System 3000 (ESS 3000) MES upgrade is supported for customers who want to upgrade a 12-drive ESS 3000 to a 24-drive ESS 3000. To upgrade the system, the NVMe drives with the same size as the existing 12 drives must be used. This MES upgrade doubles the available storage capacity in the existing ESS 3000.
  • Page 52 • All new or existing building blocks must be at the ESS 5.3.5.2 or ESS 3000 6.0.0.2 level. If the setup has any protocol nodes, these nodes must also be upgraded to ESS 5.3.5.2 levels (underlying code IBM Spectrum Scale 5.0.4.3 must be verified by using the gssinstallcheck or essinstallcheck command).
  • Page 53 7. To check whether all 24 NVMe drives have the latest firmware level, issue the following command from one of the canisters: # mmlsfirmware --type drive Example enclosure firmware available type product id serial number level firmware location ---- ---------- ------------- -------- --------...
  • Page 54 1612 GiB 3.84TB NVMe G3 ess3k_mySN e1s15 3576 GiB 1622 GiB 3.84TB NVMe G3 ess3k_mySN e1s16 3576 GiB 1632 GiB 3.84TB NVMe G3 ess3k_mySN e1s17 3576 GiB 1640 GiB 3.84TB NVMe G3 42 IBM Elastic Storage System 3000: Service Guide...
  • Page 55 ess3k_mySN e1s18 3576 GiB 1600 GiB 3.84TB NVMe G3 ess3k_mySN e1s19 3576 GiB 2892 GiB 3.84TB NVMe G3 ess3k_mySN e1s20 3576 GiB 2900 GiB 3.84TB NVMe G3 ess3k_mySN e1s21 3576 GiB 2892 GiB 3.84TB NVMe G3 ess3k_mySN e1s22 3576 GiB 2902 GiB 3.84TB NVMe G3 ess3k_mySN...
  • Page 56 4. Check the state of GPFS on both canisters by issuing the following command: # mmgetstate -N this ESS 3000 node class A sample output is as follows: Node number Node name GPFS state ------------------------------------------- 44 IBM Elastic Storage System 3000: Service Guide...
  • Page 57 ess3ka-ib arbitrating ess3kb-ib active 5. Issue the following command until GPFS is in the active state on both canisters: # mmgetstate -N this ESS 3000 node class A sample output is as follows: Node number Node name GPFS state ------------------------------------------- ess3ka-ib active ess3kb-ib...
  • Page 58 46 IBM Elastic Storage System 3000: Service Guide...
  • Page 59 Altsrc 2P100HP (*Arab Nation orders, China 0000001FT769 Source) 2-Port 100G LP (*Arab Nation orders, China 0000001FT812 Source) Node canister: ESS 3000 (5141-AF8) 0000001LL518 END7 Power Cable - Drawer to IBM PDU - C13/C20 0000001PP687 (250V/10A) for India © Copyright IBM Corp. 2019, 2020...
  • Page 60 Table 10. FRU Part Numbers (continued) Description Part Number END5 power cord (9.2 ft), Drawer to IBM PDU - 0000001PP688 C13/C20 (250V/10A) for India Trusted Platform Module (TPM) 0000001YM315 Drive Blank 0000001YM705 DIMM Filler 0000001YM789 PCIe riser card with bracket assembly...
  • Page 61 C14, 200-240V/10A for India END0 Power Cord M (6.5 foot), Drawer to IBM PDU 0000001KV680 - C13/C14 (250V/10A) for India END1 Power Cord M (9 foot), Drawer to IBM PDU - 0000001KV681 C13/C14 (250V/10A) for India Chapter 3. Part Listings 49...
  • Page 62 Table 11. Cable Part Numbers (continued) Description Part Number END2 Power Cord m (14 ft), Drawer to IBM PDU - 0000001KV682 C13/C14 (250V/10A) for India 5M, Blue Ethernet Cat 5E cable 0000002CL468 5M, Green Ethernet Cat 5E cable 0000002CL469 5M, Yellow Ethernet Cat 5E cable...
  • Page 63 IBM Knowledge Center (www.ibm.com/support/knowledgecenter). Keyboard navigation This product uses standard Microsoft Windows navigation keys. IBM and accessibility See the IBM Human Ability and Accessibility Center (www.ibm.com/able) for more information about the commitment that IBM has to accessibility. © Copyright IBM Corp. 2019, 2020...
  • Page 64 52 IBM Elastic Storage System 3000: Service Guide...
  • Page 65 Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead.
  • Page 66 IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml.
  • Page 67 • See refers you from a non-preferred term to the preferred term or from an abbreviation to the spelled- out form. • See also refers you to a related or contrasting term. For other terms and definitions, see the IBM Terminology website (opens in new window): http://www.ibm.com/software/globalization/terminology building block A pair of servers with shared disk enclosures attached.
  • Page 68 See also cluster. (3) The routing of all transactions to a second controller when the first controller fails. See also cluster. failure group A collection of disks that share common access paths or adapter connection, and could all become unavailable through a single hardware failure. 56 IBM Elastic Storage System 3000: Service Guide...
  • Page 69 See file encryption key (FEK). file encryption key (FEK) A key used to encrypt sectors of an individual file. See also encryption key. file system The methods and data structures used to control how data is stored and retrieved. file system descriptor A data structure containing key information about a file system.
  • Page 70 Provides a way to control the bundling of several physical ports together to form a single logical channel. logical partition (LPAR) A subset of a server's hardware resources virtualized as a separate computer, each with its own operating system. See also node. LPAR See logical partition (LPAR). 58 IBM Elastic Storage System 3000: Service Guide...
  • Page 71 management network A network that is primarily responsible for booting and installing the designated server and compute nodes from the management server. management server (MS) An ESS node that hosts the ESS GUI and xCAT and is not connected to storage. It must be part of a GPFS cluster.
  • Page 72 Data that is associated with a recovery group. RKM server See remote key management server (RKM server). See Serial Attached SCSI (SAS). secure shell (SSH) A cryptographic (encrypted) network protocol for initiating text-based shell sessions securely on remote computers. 60 IBM Elastic Storage System 3000: Service Guide...
  • Page 73 Serial Attached SCSI (SAS) A point-to-point serial protocol that moves data to and from such computer storage devices as hard drives and tape drives. service network ® A private network that is dedicated to managing POWER8 servers. Provides Ethernet-based connectivity among the FSP, CPC, HMC, and management server. See symmetric multiprocessing (SMP).
  • Page 74 62 IBM Elastic Storage System 3000: Service Guide...
  • Page 75 Recovery group events 11 server events 12 virtual disk events 17 documentation ix resources ix IBM Elastic Storage System 3000 28 IBM Spectrum Scale events 1, 4, 8, 11, 12, 17 RAS events 1, 4, 8, 11, 12, 17 information overview ix...
  • Page 76 64 IBM Elastic Storage System 3000: Service Guide...
  • Page 77 Recovery group events 11 server events 12 virtual disk events 17 documentation ix resources ix IBM Elastic Storage System 3000 28 IBM Spectrum Scale events 1, 4, 8, 11, 12, 17 RAS events 1, 4, 8, 11, 12, 17 information overview ix...
  • Page 78 66 IBM Elastic Storage System 3000: Service Guide...
  • Page 80 IBM® Product Number: 5765-DME 5765-DAE SC28-3158-00...

This manual is also suitable for:

Elastic storage 3000Ess 3000