Page 1
Maintain Install and maintain NetApp July 05, 2024 This PDF was generated from https://docs.netapp.com/us-en/ontap-systems/a1k/maintain-overview.html on July 05, 2024. Always check docs.netapp.com for the latest.
Maintain Maintain AFF A1K hardware For the AFF A1K storage system, you can perform maintenance procedures on the following components. Boot media The boot media stores a primary and secondary set of ONTAP image files that the system uses when it boots.
(SSN). Boot media Overview of boot media replacement - AFF A1K The boot media stores a primary and secondary set of system (boot image) files that the system uses when it boots. Depending on your network configuration, you can perform either a nondisruptive or disruptive replacement.
Page 5
Enter the onboard security key-manager sync command: security key-manager onboard sync Enter the 32 character, alphanumeric onboard key management passphrase at the prompt. If the passphrase cannot be provided, contact NetApp Support. mysupport.netapp.com b. Verify the column displays...
Page 6
You can safely shut down the impaired controller. Shut down the impaired controller - AFF A1K After completing the NVE or NSE tasks, you need to complete the shutdown of the impaired controller. Shut down or take over the impaired controller using the appropriate...
Page 7
Option 1: Most systems To shut down the impaired controller, you must determine the status of the controller and, if necessary, take over the controller so that the healthy controller continues to serve data from the impaired controller storage. About this task •...
Page 8
When the impaired controller shows Waiting for giveback…, press Ctrl-C, and then respond y. Replace the boot media - AFF A1K To replace the boot media, you must remove the System Management module from the back of the system, remove the impaired boot media, install the replacement boot media in the System Management module, and transfer the ONTAP image from a USB flash drive to the replacement boot media.
Page 9
System Management module cam latch Boot media locking button Boot media 1. If you are not already grounded, properly ground yourself. 2. Unplug the power supply cables from the PSUs from the controller. If your storage system has DC power supplies, disconnect the power cable block from the power supply units (PSUs).
Page 10
Downloads section on the NetApp Support Site ◦ If NVE is supported, download the image with NetApp Volume Encryption, as indicated in the download button. ◦ If NVE is not supported, download the image without NetApp Volume Encryption, as indicated in the download button.
Page 11
Other parameters might be necessary for your interface. You can enter help ifconfig at the firmware prompt for details. Boot the recovery image - AFF A1K You must boot the ONTAP image from the USB drive, restore the file system, and verify the environmental variables.
Page 12
MetroCluster IP configuration Encryption restore - AFF A1K You must complete steps specific to systems that have Onboard Key Manager (OKM), NetApp Storage Encryption (NSE) or NetApp Volume Encryption (NVE) enabled using settings you captured at the beginning of this procedure.
Page 13
If NSE or NVE are enabled along with Onboard or external Key Manager you must restore settings you captured at the beginning of this procedure. Steps 1. Connect the console cable to the target controller.
Page 14
Option 1: Systems with onboard key manager server configuration Restore the onboard key manager configuration from the ONATP boot menu. Before you begin You need the following information while restoring the OKM configuration: • Cluster-wide passphrase entered while enabling onboard key management.
Page 15
Enter the backup data: --------------------------BEGIN BACKUP-------------------------- 0123456789012345678901234567890123456789012345678901234567890123 1234567890123456789012345678901234567890123456789012345678901234 2345678901234567890123456789012345678901234567890123456789012345 3456789012345678901234567890123456789012345678901234567890123456 4567890123456789012345678901234567890123456789012345678901234567 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 0123456789012345678901234567890123456789012345678901234567890123 1234567890123456789012345678901234567890123456789012345678901234 2345678901234567890123456789012345678901234567890123456789012345 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA ---------------------------END BACKUP--------------------------- 5. The recovery process will be completed.
Page 16
Trying to recover keymanager secrets..Setting recovery material for the onboard key manager Recovery secrets set successfully Trying to delete any existing km_onboard.wkeydb file. Successfully recovered keymanager secrets. ******************************************************************** *************** * Select option "(1) Normal Boot." to complete recovery process. * Run the "security key-manager onboard sync"...
Page 17
8. From the partner node, giveback the partner controller: storage failover giveback -fromnode local -only-cfo-aggregates true 9. Once booted only with CFO aggregate run the security key-manager onboard sync command: 10. Enter the cluster-wide passphrase for the Onboard Key Manager: Enter the cluster-wide passphrase for the Onboard Key Manager: All offline encrypted volumes will be brought online and the corresponding volume encryption keys (VEKs) will be restored...
Page 18
Normal Boot. Boot without /etc/rc. Change password. Clean configuration and initialize all disks. Maintenance mode boot. Update flash from backup config. Install new software first. Reboot node. Configure Advanced Drive Partitioning. (10) Set Onboard Key Manager recovery secrets. (11) Configure node for external key management. Selection (1-11)? 11 2.
Page 19
Example Enter the client certificate (client.crt) file contents: -----BEGIN CERTIFICATE----- MIIDvjCCAqagAwIBAgICN3gwDQYJKoZIhvcNAQELBQAwgY8xCzAJBgNVBAYTAlVT MRMwEQYDVQQIEwpDYWxpZm9ybmlhMQwwCgYDVQQHEwNTVkwxDzANBgNVBAoTBk5l MSUbQusvzAFs8G3P54GG32iIRvaCFnj2gQpCxciLJ0qB2foiBGx5XVQ/Mtk+rlap Pk4ECW/wqSOUXDYtJs1+RB+w0+SHx8mzxpbz3mXF/X/1PC3YOzVNCq5eieek62si Fp8= -----END CERTIFICATE----- Enter the client key (client.key) file contents: -----BEGIN RSA PRIVATE KEY----- MIIEpQIBAAKCAQEAoU1eajEG6QC2h2Zih0jEaGVtQUexNeoCFwKPoMSePmjDNtrU MSB1SlX3VgCuElHk57XPdq6xSbYlbkIb4bAgLztHEmUDOkGmXYAkblQ= -----END RSA PRIVATE KEY----- Enter the KMIP server CA(s) (CA.pem) file contents: -----BEGIN CERTIFICATE----- MIIEizCCA3OgAwIBAgIBADANBgkqhkiG9w0BAQsFADCBjzELMAkGA1UEBhMCVVMx 7yaumMQETNrpMfP+nQMd34y4AmseWYGM6qG0z37BRnYU0Wf2qDL61cQ3/jkm7Y94...
Page 20
5. Select option 1 from the boot menu to continue booting into ONTAP. ******************************************************************** *************** * Select option "(1) Normal Boot." to complete the recovery process. ******************************************************************** *************** Normal Boot. Boot without /etc/rc. Change password. Clean configuration and initialize all disks. Maintenance mode boot.
-node * -type all -message MAINT=END command. Return the failed part to NetApp - AFF A1K Return the failed part to NetApp, as described in the RMA instructions shipped with the kit. See the Part Return & Replacements page for further information.
Page 22
• You must always capture the controller’s console output to a text log file. This provides you a record of the procedure so that you can troubleshoot any issues that you might encounter during the replacement process. Shut down the impaired controller - AFF A1K...
Page 23
Option 1: Most systems To shut down the impaired controller, you must determine the status of the controller and, if necessary, take over the controller so that the healthy controller continues to serve data from the impaired controller storage. About this task •...
Page 24
When the impaired controller shows Waiting for giveback…, press Ctrl-C, and then respond y. Replace the controller module hardware - AFF A1K To replace the controller, you must remove the impaired controller, move FRU components to the replacement controller module, install the replacement controller module in the enclosure, and then boot the system to Maintenance mode.
Page 25
If the NVRAM status LED is flashing, it could mean the controller module was not taken over or halted properly (uncommitted data). If the impaired controller module was not successfully taken over by the partner controller module, contact NetApp Support before continuing with this procedure.
Page 26
a Locking cam latches 4. Slide the controller module out of the enclosure and place it on a flat, stable surface. Make sure that you support the bottom of the controller module as you slide it out of the enclosure. Step 2: Move the fans You must remove the five fan modules from the impaired controller module to the replacement controller module.
Page 27
5. Repeat the preceding steps for the remaining fan modules. Step 3: Move the NV battery Move the NV battery to the replacement controller. 1. Open the NV battery air duct cover and locate the NV battery. NV battery air duct cover NV battery plug NV battery pack 2.
Page 28
System DIMM 2. Note the orientation of the DIMM in the socket so that you can insert the DIMM in the replacement controller module in the proper orientation. 3. Eject the DIMM from its slot by slowly pushing apart the two DIMM ejector tabs on either side of the DIMM, and then slide the DIMM out of the slot.
Page 29
-node * -type all -message MAINT=END Restore and verify the system configuration - AFF A1K After completing the hardware replacement, you verify the low-level system configuration of the replacement controller and reconfigure system settings as necessary.
Page 30
4. Confirm that the setting has changed: ha-config show Give back the controller - AFF A1K Continue the replacement procedure by giving back the controller. Step 1: Give back the controller 1. If your storage system has Encryption configured, you must restore Storage or Volume Encryption functionality using the following procedure to reboot the system: a.
Page 31
If the giveback is vetoed, you can consider overriding the vetoes. Find the High-Availability Configuration content for your version of ONTAP 9 d. After the giveback has been completed, confirm that the HA pair is healthy and that takeover is possible: storage failover show 3.
-node local -auto -giveback true Step 2: Return the failed part to NetApp Return the failed part to NetApp, as described in the RMA instructions shipped with the kit. See the Part Return & Replacements page for further information.
Page 33
Option 1: Most systems To shut down the impaired controller, you must determine the status of the controller and, if necessary, take over the controller so that the healthy controller continues to serve data from the impaired controller storage. About this task •...
Page 34
take over the controller so that the healthy controller continues to serve data from the impaired controller storage. • If you have a cluster with more than two nodes, it must be in quorum. If the cluster is not in quorum or a healthy controller shows false for eligibility and health, you must correct the issue before shutting down the impaired controller;...
Page 35
If the NVRAM status LED is flashing, it could mean the controller module was not taken over or halted properly (uncommitted data). If the impaired controller module was not successfully taken over by the partner controller module, contact NetApp Support before continuing with this procedure.
Page 36
a Locking cam latches 4. Slide the controller module out of the enclosure and place it on a flat, stable surface. Make sure that you support the bottom of the controller module as you slide it out of the enclosure. Step 3: Replace a DIMM You must replace a DIMM when the system reports a permanent failure condition for that DIMM.
-node * -type all -message MAINT=END Step 5: Return the failed part to NetApp Return the failed part to NetApp, as described in the RMA instructions shipped with the kit. See the Part Return & Replacements page for further information.
7. Align the bezel with the ball studs, and then gently push the bezel onto the ball studs. 8. Return the failed part to NetApp, as described in the RMA instructions shipped with the kit. See the Part Return & Replacements page for further information.
Page 39
Step 1: Shut down the impaired controller Shut down or take over the impaired controller using one of the following options.
Page 40
Option 1: Most systems To shut down the impaired controller, you must determine the status of the controller and, if necessary, take over the controller so that the healthy controller continues to serve data from the impaired controller storage. About this task •...
Page 41
take over the controller so that the healthy controller continues to serve data from the impaired controller storage. • If you have a cluster with more than two nodes, it must be in quorum. If the cluster is not in quorum or a healthy controller shows false for eligibility and health, you must correct the issue before shutting down the impaired controller;...
Page 42
The cam button moves away from the enclosure. b. Rotate the cam latch down as far as it will go. c. Remove the impaired NVRAM module from the enclosure by hooking your finger into the cam lever opening and pulling the module out of the enclosure. Cam locking button DIMM locking tabs 5.
Page 43
tray down. 4. Remove the target NVRAM module from the enclosure. Cam locking button DIMM locking tabs 5. Set the NVRAM module on a stable surface. 6. Locate the DIMM to be replaced inside the NVRAM module. Consult the FRU map label on the side of the NVRAM module to determine the locations of DIMM slots 1 and 2.
Page 44
Step 5: Reassign disks You must confirm the system ID change when you boot the controller and then verify that the change was implemented. Disk reassignment is only needed when replacing the NVRAM module and does not apply to NVRAM DIMM replacement. Steps 1.
Page 45
The output from the command should not include the System ID changed storage failover show on partner message. 5. Verify that the disks were assigned correctly: storage disk show -ownership The disks belonging to the controller should show the new system ID. In the following example, the disks owned by node1 now show the new system ID, 151759706: node1:>...
-node * -type all -message MAINT=END Step 6: Return the failed part to NetApp Return the failed part to NetApp, as described in the RMA instructions shipped with the kit. See the Part Return & Replacements page for further information.
Page 47
Option 1: Most systems To shut down the impaired controller, you must determine the status of the controller and, if necessary, take over the controller so that the healthy controller continues to serve data from the impaired controller storage. About this task •...
Page 48
take over the controller so that the healthy controller continues to serve data from the impaired controller storage. • If you have a cluster with more than two nodes, it must be in quorum. If the cluster is not in quorum or a healthy controller shows false for eligibility and health, you must correct the issue before shutting down the impaired controller;...
Page 49
If the NVRAM status LED is flashing, it could mean the controller module was not taken over or halted properly (uncommitted data). If the impaired controller module was not successfully taken over by the partner controller module, contact NetApp Support before continuing with this procedure.
Page 50
a Locking cam latches 4. Slide the controller module out of the enclosure and place it on a flat, stable surface. Make sure that you support the bottom of the controller module as you slide it out of the enclosure. Step 3: Replace the NV battery Remove the failed NV battery from the controller module and install the replacement NV battery.
-node * -type all -message MAINT=END Step 5: Return the failed part to NetApp Return the failed part to NetApp, as described in the RMA instructions shipped with the kit. See the Part Return & Replacements page for further information.
Page 52
Option 1: Add an I/O module to a storage system with empty slots You can add an I/O module into an empty module slot in your storage system. Step 1: Shut down the impaired node Shut down or take over the impaired controller using one of the following options.
Page 53
Option 1: Most systems To shut down the impaired controller, you must determine the status of the controller and, if necessary, take over the controller so that the healthy controller continues to serve data from the impaired controller storage. About this task If you have a cluster with more than two nodes, it must be in quorum.
Page 54
Steps 1. If AutoSupport is enabled, suppress automatic case creation by invoking an AutoSupport message: system node autosupport invoke -node * -type all -message MAINT=number_of_hours_downh The following AutoSupport message suppresses automatic case creation for two hours: cluster1:*> system node autosupport invoke -node * -type all -message MAINT=2h 2.
Page 55
Cam locking button a. Depress the cam latch on the blanking module in the target slot. b. Rotate the cam latch down as far as it will go. For horizontal modules, rotate the cam away from the module as far as it will go. c.
Page 56
Option 2: Add an I/O module in a storage system with no empty slots You can change an I/O module in an I/O slot in a fully-populated system by removing an existing I/O module and replacing it with a different I/O module. 1.
Page 57
Option 1: Most systems To shut down the impaired controller, you must determine the status of the controller and, if necessary, take over the controller so that the healthy controller continues to serve data from the impaired controller storage. About this task If you have a cluster with more than two nodes, it must be in quorum.
Page 58
Configuration State is configured and that the nodes are in an enabled and normal state (metrocluster node show). Steps 1. If AutoSupport is enabled, suppress automatic case creation by invoking an AutoSupport command: system node autosupport invoke -node * -type all -message MAINT=number_of_hours_downh The following AutoSupport command suppresses automatic case creation for two hours: cluster1:*>...
Page 59
Cam locking button a. Depress the cam latch button. The cam latch moves away from the chassis. b. Rotate the cam latch down as far as it will go. For horizontal modules, rotate the cam away from the module as far as it will go. c.
Page 60
Hot-add a shelf. 15. Repeat these steps for controller B. Replace I/O module - AFF A1K Use this procedure to replace a failed I/O module. • You can use this procedure with all versions of ONTAP supported by your storage system.
Page 61
Option 1: Most systems To shut down the impaired controller, you must determine the status of the controller and, if necessary, take over the controller so that the healthy controller continues to serve data from the impaired controller storage. About this task If you have a cluster with more than two nodes, it must be in quorum.
Page 62
Configuration State is configured and that the nodes are in an enabled and normal state (metrocluster node show). Steps 1. If AutoSupport is enabled, suppress automatic case creation by invoking an AutoSupport command: system node autosupport invoke -node * -type all -message MAINT=number_of_hours_downh The following AutoSupport command suppresses automatic case creation for two hours: cluster1:*>...
Page 63
I/O cam latch Make sure that you label the cables so that you know where they came from. 5. Remove the target I/O module from the enclosure: a. Depress the cam button on the target module. The cam button moves away from the enclosure. b.
4. If automatic giveback was disabled, reenable it: storage failover modify -node local -auto-giveback true Step 4: Return the failed part to NetApp Return the failed part to NetApp, as described in the RMA instructions shipped with the kit. See the Part Return &...
Secure the power cable to the PSU using the power cable retainer. Once power is restored to the PSU, the status LED should be green. 7. Return the failed part to NetApp, as described in the RMA instructions shipped with the kit. See the Part Return &...
Page 66
Option 1: Most systems To shut down the impaired controller, you must determine the status of the controller and, if necessary, take over the controller so that the healthy controller continues to serve data from the impaired controller storage. About this task •...
Page 67
take over the controller so that the healthy controller continues to serve data from the impaired controller storage. • If you have a cluster with more than two nodes, it must be in quorum. If the cluster is not in quorum or a healthy controller shows false for eligibility and health, you must correct the issue before shutting down the impaired controller;...
Page 68
If the NVRAM status LED is flashing, it could mean the controller module was not taken over or halted properly (uncommitted data). If the impaired controller module was not successfully taken over by the partner controller module, contact NetApp Support before continuing with this procedure.
Page 69
a Locking cam latches 4. Slide the controller module out of the enclosure and place it on a flat, stable surface. Make sure that you support the bottom of the controller module as you slide it out of the enclosure. Step 3: Replace the RTC battery Remove failed RTC battery and install the replacement RTC battery.
Page 70
4. If automatic giveback was disabled, reenable it: storage failover modify -node local -auto-giveback true Step 6: Return the failed part to NetApp Return the failed part to NetApp, as described in the RMA instructions shipped with the kit. See the Part Return &...
Replace System management module - AFF A1K The System Management module, located at the back of the controller in slot 8, contains onboard components for system management, as well as ports for external management. The target controller must be shut down to replace an impaired System Management module or replace the boot media.
Page 72
Option 1: Most systems To shut down the impaired controller, you must determine the status of the controller and, if necessary, take over the controller so that the healthy controller continues to serve data from the impaired controller storage. About this task •...
Page 73
take over the controller so that the healthy controller continues to serve data from the impaired controller storage. • If you have a cluster with more than two nodes, it must be in quorum. If the cluster is not in quorum or a healthy controller shows false for eligibility and health, you must correct the issue before shutting down the impaired controller;...
Page 74
System Management module cam latch a. If you are not already grounded, properly ground yourself. Make sure NVRAM destage has completed before proceeding. b. Remove any cables connected to the System Management module. Make sure that label where the cables were connected, so that you can connect them to the correct ports when you reinstall the module.
Page 75
Boot media locking button Boot media a. Press the blue boot media locking button in the impaired System Management module. b. Rotate the boot media up and slide it out of the socket. 3. Install the boot media in the replacement System Management module: a.
Page 76
NetApp Support to register the serial number. Step 5: Return the failed part to NetApp Return the failed part to NetApp, as described in the RMA instructions shipped with the kit. See the Part Return & Replacements page for further information.
Page 77
NetApp. The use or purchase of this product does not convey a license under any patent rights, trademark rights, or any other intellectual property rights of NetApp.
Need help?
Do you have a question about the AFF A1K and is the answer not in the manual?
Questions and answers