Contents Contents ™ About ClusterStor MMU and AMMU Addition......................3 ™ Add an MMU or AMMU to a ClusterStor L300 and L300N System................ 5 MMU/AMMU Addition Process........................8 Prerequisites for MMU/AMMU Addition......................8 Cable MMU/AMMU Hardware to the Storage System.................. 11 Unmount Lustre and Auto-Discovery......................13 Verify MDS Node Discovery..........................
ClusterStor MMU/AMMU Addition (3.1) H-6167 February 2019 Release of 3.1 version Scope and Audience The procedure in this publication is intended to be performed only by qualified Cray personnel. Typographic Conventions Monospace A Monospace font indicates program code, reserved words or library functions,...
(2) AMMUs (one [1] in the base rack). CAUTION: This procedure is intended to be performed only by qualified Cray personnel. Lustre 2.5 and later releases support Phase 1 of the Lustre DNE feature, which allows multiple MDTs, operating through multiple MDS nodes, to be configured and to operate as part of a single file system.
Page 6
™ Add an MMU or AMMU to a ClusterStor L300 and L300N System Figure 1. MMU Network Overview For systems using an AMMU instead of an MMU in the base rack, a maximum of one additional AMMU may be added to a storage rack. MMU and AMMU Hardware An MMU is a 2U storage enclosure that consists of the following: ●...
Page 7
™ Add an MMU or AMMU to a ClusterStor L300 and L300N System ● The two (2) MDSes operate as an HA pair ● Each MDS, under normal operation, owns and operates one Lustre Metadata Target (MDT) of two MDTs provided by the MMU/AMMU ●...
MMUs and AMMUs are added to storage racks only. Only one (1) MMU/AMMU may be added per storage rack, up to a maximum of eight (8) total MMUs or two (2) total AMMUs for a system. Contact Cray support to add an MMU or AMMU to a storage rack. Prerequisites for MMU/AMMU Addition The following prerequisites must be met before performing this procedure.
Page 9
™ Add an MMU or AMMU to a ClusterStor L300 and L300N System Firmware Version To check the minimum version on L300 and L300N systems, run this command from the MGS or MDS node: [admin@n002]$ sudo sg_map –x –i | grep SP- To check the minimum version on ClusterStor 9000 systems that have been upgraded to 3.1, run this command from the MGS or MDS node: [admin@n002]$ sudo sg_map –x –i | grep XYRATEX...
Page 10
™ Add an MMU or AMMU to a ClusterStor L300 and L300N System management and network switches and before any SSUs. In a standard 42U storage rack, the MMU enclosure should occupy rack positions 37 and 38. An AMMU enclosure should occupy rack positions 35 through 38.
Cray Support. The publication Sonexion ® USM Firmware Update Guide H-6137, available on http://pubs.cray.com, may also be helpful for ClusterStor 9000 and Sonexion 2000 systems that have been upgraded to release 3.1 and higher. Estimated Service Interval The following table shows time estimates for each MMU/AMMU being added to the system. Schedule an appropriate service interval with the customer.
Page 12
13 14 15 16 17 18 19 20 21 22 23 24 IB SW1 IB SW0 MMU EAC Module Figure 4. AMMU Server Cabling For recommended port assignments for both the MMU and AMMU, contact Cray Support to obtain internal cabling documentation. H-6167...
™ Add an MMU or AMMU to a ClusterStor L300 and L300N System Figure 5. AMMU SAS Connections Unmount Lustre and Auto-Discovery Procedure 1. Log in to the primary MGMT node via SSH: [Client]$ ssh –l admin primary_MGMT_node 2. In this and following steps, verify that client systems have unmounted the Lustre file system and that Lustre has been stopped.
Page 14
™ Add an MMU or AMMU to a ClusterStor L300 and L300N System error: get_param: /proc/ {fs,sys} {lnet,lustre} ///exports/*/uuid: Found no match 3. If a client system needs to be unmounted manually, run the following command: [Client]$ sudo umount lustre_mountpoint Example: [Client]$ sudo umount /mnt/lustre 4.
Page 15
™ Add an MMU or AMMU to a ClusterStor L300 and L300N System cls12345n004 2 / 2 cls12345n005 /dev/md0, /dev/md2 cls12345n005 2 / 2 cls12345n004 /dev/md1, /dev/md3 In the output, under the heading "Targets," look for "0" in the first character position. For example, 0 / 4 or 0 / 0.
Page 16
If the script prints nothing, the new settings are valid. Exit root and proceed to Step on page 16. If the script prints an error, the new settings are invalid. Please contact Cray Support. 13. Power on the MMU/AMMU for discovery, in increasing node number order starting with the lowest numbered node, as follows (see the following figure).
™ Add an MMU or AMMU to a ClusterStor L300 and L300N System Figure 6. MDS Node Layout The management software's auto discovery feature will discover the new MDS nodes and prepare them (the order of the individual controllers within an MMU/AMMU will be handled by the management software).
Page 18
™ Add an MMU or AMMU to a ClusterStor L300 and L300N System [admin@cls12345n000 ~]$ cscli show_nodes ------------------------------------------------------------------------------- Hostname Role Power Service Targets Partner HA Resources state state ------------------------------------------------------------------------------- cls12345n000 MGMT 0 / 0 cls12345n001 None cls12345n001 (MGMT) 0 / 0 cls12345n000 None cls12345n002...
Page 19
If the proper entries in all references below are not listed, or any entry is out of sequence, please stop and contact Cray Support. Do not add any additional MMUs or AMMUs. If the proper entries are listed and they are in the correct sequence, please continue the MMU/AMMU Addition procedure.
Page 20
™ Add an MMU or AMMU to a ClusterStor L300 and L300N System localhost6.localdomain6 172.16.2.1 puppet 172.16.2.2 nfsserv 172.16.2.3 cls12345n000 cls12345n000-eth0 172.16.0.101 cls12345n000-ipmi 172.18.1.1 cls12345n000 cls12345n000-ib0 169.254.0.1 cls12345n000-ha cls12345n000-eth10 172.16.2.4 cls12345n001 cls12345n001-eth0 172.16.0.102 cls12345n001-ipmi 172.18.1.2 cls12345n001 cls12345n001-ib0 169.254.0.2 cls12345n001-ha cls12345n001-eth10 172.16.2.5 cls12345n002 cls12345n002-eth0 172.16.0.104...
™ Add an MMU or AMMU to a ClusterStor L300 and L300N System Example output for command: sudo ibhosts NOTE: The sudo ibhosts command will not work on a system that uses Intel Omni-Path switches. [admin@cls12345n000 ~]$ sudo ibhosts : 0x001e6703003e36f7 ports 1 "cls12345n002 HCA-1" : 0x001e67030039b77c ports 1 "cls12345n001 HCA-1"...
Page 22
™ Add an MMU or AMMU to a ClusterStor L300 and L300N System Run this command for each of the new MDS nodes. An MMU/AMMU needs only two arrays; it is recommended to use md0 and md1. Sample command and output for the md0 array: [admin@cls12345n000 ~]$ cscli configure_mds -n cls12345n006 -b md0:cls12345 Assigning filesystems to arrays on cls12345n006...
Page 23
™ Add an MMU or AMMU to a ClusterStor L300 and L300N System ● a 5U84 SSU at rack position 1U with 2 5U84 ESUs at rack positions 6U and 11U Example for current rack R1C2: [admin@cls12345n000 ~]$ cscli rack show –n R1C2 {'position': '6U', 'model': '5U84', 'serial_no': 'SHG1000576Y0F7C', 'purpose': 'oss', 'product_id': 'UD-8435-CS-9000'} {'position': '1U', 'model': '5U84', 'serial_no': 'SHG1000615Z0J6A', 'purpose':...
Page 24
™ Add an MMU or AMMU to a ClusterStor L300 and L300N System 7. Delete rack777 from the rack list: [admin@cls12345n000 ~]$ cscli rack delete -n rack777 rack: rack777: rack has been deleted rack777: rack has been deleted 8. Save all the rack changes from the commands above: [admin@cls12345n000 ~]$ cscli rack apply This command may take a while (~ 5 minutes)...
Page 25
0 / 1 cls12345n006 /dev/md1 12. Modify LNET configuration if required. Contact Cray Support for guidance if needed. a. Update the ip2nets.dat file. b. Update the routes.dat file. The following step (c) is required even if fine-grained routing (FGR) is not used, because the beSysNetConfig script applies some additional tuning.
Page 26
Update GEM GOBI firmware on systems running releases later than 2.1.0 SU-004. For additional information about updating GEM GOBI firmware, please contact Cray Support. b. Update GEM USM firmware for systems running releases prior to 2.1.0 SU-005. For additional information about updating USM firmware, please contact Cray Support.
™ Add an MMU or AMMU to a ClusterStor L300 and L300N System Sample Output for configure_mds Following is an example output for the configure_mds command used in step on page 22 of Complete the MMU/AMMU Addition Procedure on page 21 [admin@cls12345n000 ~]$ cscli configure_mds -A configure_mds: Performing new MDS nodes...
Page 28
™ Add an MMU or AMMU to a ClusterStor L300 and L300N System cls12345n006: ... Wiping disks in enclosure /dev/sg22 cls12345n006: ... base mdn for encl: /dev/sg22 is 0 -- 2 data raids cls12345n006: ... Make 2 RAID10 arrays in enclosure /dev/sg22 cls12345n006: ...
Page 29
™ Add an MMU or AMMU to a ClusterStor L300 and L300N System local: lustre_config: Operating on the following nodes: cls12345n006,cls12345n007 local: lustre_config: ******** Lustre cluster configuration BEGIN ******** local: lustre_config: There is no MGS target in the /mnt/mgmt/var/lib/puppet/files/lustre/tmpAWegnF file. local: lustre_config: Creating the mount point /data/cls12345n006:md0 on cls12345n006 local: lustre_config: Adding module options to cls12345n006 local: lustre_config: Formatting Lustre target /dev/md0 on cls12345n006...
Page 30
™ Add an MMU or AMMU to a ClusterStor L300 and L300N System [try 1]:None cls12345n006: Set 'lcnmon' option: id=lcnmon-meta_attributes-target-role name=target-role=Started cls12345n007: Set 'lcnmon' option: id=lcnmon-meta_attributes-target-role name=target-role=Started cls12345n006: A value for 'target-role' already exists in child 'lcnmon', performing update on that instead of 'cln- lcnmon' cls12345n006: Set 'lcnmon' option: id=lcnmon-meta_attributes-target-role name=target-role=Started cls12345n007: A value for 'target-role' already exists in child 'lcnmon', performing update on that instead of 'cln-...
Page 31
™ Add an MMU or AMMU to a ClusterStor L300 and L300N System cls12345n006: /var/lib/pacemaker/cib/cib-8.raw cls12345n006: /var/lib/pacemaker/cib/cib-8.raw.sig cls12345n006: /var/lib/pacemaker/cib/cib-9.raw cls12345n006: /var/lib/pacemaker/cib/cib-9.raw.sig cls12345n006: /var/lib/pacemaker/cib/cib.last cls12345n006: /var/lib/pacemaker/cib/cib.xml cls12345n006: /var/lib/pacemaker/cib/cib.xml.sig cls12345n006: /var/lib/pacemaker/cores/ cls12345n006: /var/lib/pacemaker/pengine/ cls12345n007: cls12345n007: sent 93577 bytes received 724 bytes 188602.00 bytes/sec cls12345n007: total size is 91348 speedup is 0.97 cls12345n006:...
Contact Cray Support if you believe that it is necessary to remove an MMU/AMMU. This is a support-assisted operation and should not be attempted without the direction of an experienced Level 2 or 3 support personnel.
Tips and Tricks [Client]$ ssh –l admin primary_MGMT_node 2. Run the remove_unit command: [admin@cls12345n000]$ cscli remove_unit -n 1st_nodename, 2nd_nodename Optionally, the additional parameter of -p filename may also be used to direct the remove_unit process to save YAML to a specific location and filename. This YAML is designed for use by manufacturing only and can be used with the restore_unit command.
Tips and Tricks c. Clean the Puppet history of the node stuck in the discovery process from the server: [admin@cls12345n000]$ puppet cert --clean discovery-mac-address Where the mac-address of the new node is printed in the prompt. 5. Retry the discovery process on the troubled node: [Node]$ puppet agent -tv 6.
This removes the certificate for the failed node and reboots it. Upon power up, it should resynchronize to the server and come online clean. SSH Connection Refused-- MDS/OSS Node Not Fully Booted About this task Contact Cray Support prior to performing this procedure. H-6167...
Tips and Tricks Follow this cleanup procedure (from the MGMT node) when using SSH fails to connect after a reboot of the MDS or OSS nodes, or during any procedure where SSH fails to connect. This procedure cleans the Puppet certificates and allows the MDS or OSS node to complete the boot process and SSH connections.
Tips and Tricks Procedure Clear the assignment: [admin@cls12345n000]$ /opt/xyratex/bin/beNewOSS --clear -n 'nodename' Check ARP, Local Hosts, and DHCP The following commands check the ARP, local hosts, and DHCP during node discovery: [admin@cls12345n000]$ arp -a | sort [admin@cls12345n000]$ arp -av|grep cls12345n000|grep -v ipmi|sort [admin@cls12345n000]$ grep cls12345n000 /etc/dhcp/dhcpd.conf [admin@cls12345n000]$ cat /etc/hosts [admin@cls12345n000]$ pdsh -a date 2>/dev/null | sort...
Page 38
Tips and Tricks cls12345n004 0 / 4 cls12345n005 /dev/md0, /dev/md2, /dev/md4, /dev/md6 cls12345n005 0 / 4 cls12345n004 /dev/md1, /dev/md3, /dev/md5, /dev/md7 cls12345n006 0 / 1 cls12345n007 /dev/md0 cls12345n007 0 / 1 cls12345n006 /dev/md1 In this case, the file system name is cls12345. b.
Need help?
Do you have a question about the ClusterStor H-6167 and is the answer not in the manual?
Questions and answers