IBM Power Systems 7063-CR1 Service Manual

IBM Power Systems 7063-CR1 Service Manual

Problem analysis, system parts, and locations
Hide thumbs Also See for Power Systems 7063-CR1:
Table of Contents

Advertisement

Power Systems
Problem analysis, system parts, and
locations for the 7063-CR1
IBM

Advertisement

Table of Contents
loading

Summary of Contents for IBM Power Systems 7063-CR1

  • Page 1 Power Systems Problem analysis, system parts, and locations for the 7063-CR1...
  • Page 3 Power Systems Problem analysis, system parts, and locations for the 7063-CR1...
  • Page 4 Note Before using this information and the product it supports, read the information in “Safety notices” on page v, “Notices” on page 45, the IBM Systems Safety Notices manual, G229-9054, and the IBM Environmental Notices and User Guide, Z125–5823. ™...
  • Page 5: Table Of Contents

    Notices ........45 Accessibility features for IBM Power Systems servers .
  • Page 6 Problem analysis, system parts, and locations for the 7063-CR1...
  • Page 7: Safety Notices

    Electrical voltage and current from power, telephone, and communication cables are hazardous. To avoid a shock hazard: v If IBM supplied the power cord(s), connect power to this unit only with the IBM provided power cord. Do not use the IBM provided power cord for any other product.
  • Page 8 – For racks with AC power, connect all power cords to a properly wired and grounded electrical outlet. Ensure that the outlet supplies proper voltage and phase rotation according to the system rating plate. – For racks with a DC power distribution panel (PDP), connect the customer’s DC power source to the PDP.
  • Page 9 v Each rack cabinet might have more than one power cord. – For AC powered racks, be sure to disconnect all power cords in the rack cabinet when directed to disconnect power during servicing. – For racks with a DC power distribution panel (PDP), turn off the circuit breaker that controls the power to the system unit(s), or disconnect the customer’s DC power source, when directed to disconnect power during servicing.
  • Page 10 CAUTION: Removing components from the upper positions in the rack cabinet improves rack stability during relocation. Follow these general guidelines whenever you relocate a populated rack cabinet within a room or building. v Reduce the weight of the rack cabinet by removing equipment starting at the top of the rack cabinet.
  • Page 11 DANGER: Rack-mounted devices are not to be used as shelves or work spaces. (L002) (L003) Safety notices...
  • Page 12 DANGER: Multiple power cords. The product might be equipped with multiple AC power cords or multiple DC power cables. To remove all hazardous voltages, disconnect all power cords and power cables. (L003) (L007) CAUTION: A hot surface nearby. (L007) (L008) Problem analysis, system parts, and locations for the 7063-CR1...
  • Page 13 Exchange only with the IBM-approved part. Recycle or discard the battery as instructed by local regulations. In the United States, IBM has a process for the collection of this battery. For information, call 1-800-426-4333. Have the IBM part number for the battery unit available when you call. (C003)
  • Page 14 (C048) Power and cabling information for NEBS (Network Equipment-Building System) GR-1089-CORE The following comments apply to the IBM servers that have been designated as conforming to NEBS (Network Equipment-Building System) GR-1089-CORE: Problem analysis, system parts, and locations for the 7063-CR1...
  • Page 15 The equipment is suitable for installation in the following: v Network telecommunications facilities v Locations where the NEC (National Electrical Code) applies The intrabuilding ports of this equipment are suitable for connection to intrabuilding or unexposed wiring or cabling only. The intrabuilding ports of this equipment must not be metallically connected to the interfaces that connect to the OSP (outside plant) or its wiring.
  • Page 16 Problem analysis, system parts, and locations for the 7063-CR1...
  • Page 17: Beginning Troubleshooting And Problem Analysis

    Missing or faulty PCIe adapter or device. Go to “Collecting diagnostic data” on page 35. Then, go to “Contacting IBM service and support” on page 35. Determining the problem analysis procedure to perform Learn how to identify the correct problem analysis procedure to perform.
  • Page 18: Resolving A Bmc Access Problem

    3. Can you boot the system to the Petitboot menu? Then Yes: Continue with the next step. Go to “Resolving a system firmware boot failure” on page 5. 4. Is video displayed on the video graphics array (VGA) monitor? Then Yes: Continue with the next step.
  • Page 19 Note: If the IP address setting is incorrect, go to Configuring the firmware IP address website(http://www.ibm.com/support/knowledgecenter/linuxonibm/liabw/ liabwenablenetwork.htm). If the MAC address is 00:00:00:00:00:00, go to “Contacting IBM service and support” on page 35. 5. Are you able to log in to the BMC web interface?
  • Page 20: Resolving A Power Problem

    Then Go to step 11. 10. Complete the following steps: a. Type cd /var/petitboot/mnt/dev/sdb1 and press Enter. b. To update the BMC firmware, type the following command and press Enter: ./pUpdate -f bmc.bin -i bt, where bmc.bin is the name of the BMC image file. c.
  • Page 21: Resolving A System Firmware Boot Failure

    This ends the procedure. Resolving a system firmware boot failure Learn how to identify the service action that is needed to resolve a failure while booting your system firmware. Does the baseboard management controller (BMC) respond to commands and are you able to access the BMC web interface? Note: To determine whether the BMC responds to commands, run the following ipmitool command: ipmitool -I lanplus -U <username>...
  • Page 22: Resolving An Operating System Boot Failure

    2. Complete the following steps, one at a time until the problem is resolved: a. Ensure that the VGA cable is properly seated to the server port and to the monitor port. Verify that your monitor and your VGA cable are working properly by testing them on a system that is known to be working properly.
  • Page 23 7. If you obtained the mvcli utility from a USB drive that is inserted into one of the USB ports of the system, type /tmp/media/mvcli. Otherwise, type /tmp/mvcli. 8. To check the status of the RAID virtual disk, type info -o vd and press Enter. To check the status of the physical disks, type info -o pd and press Enter.
  • Page 24: Resolving A Hardware Problem

    3. Was a service action identified? Then Yes: Continue with the next step. Go to “Collecting diagnostic data” on page 35. Then, go to “Contacting IBM service and support” on page 35. This ends the procedure. 4. Did the service action fix the problem? Then Yes: This ends the procedure.
  • Page 25 Yes: Continue with the next step and list SELs remotely over the LAN. Go to “Collecting diagnostic data” on page 35. Then, go to “Contacting IBM service and support” on page 35. 6. Use the ipmitool command to examine SELs.
  • Page 26 If this SEL event continues to be logged, go to “Collecting diagnostic data” on page 35. Then, go to “Contacting IBM service and support” on page 35. 1Cxxxxxxxxxx Go to Getting fixes and update the system firmware to the most recent level of firmware that is available.
  • Page 27 Yes: Continue with the next step. Go to “Collecting diagnostic data” on page 35. Then, go to “Contacting IBM service and support” on page 35. 12. Did you find only one SEL event that requires a service action as defined in step 10?
  • Page 28 The sensor ID field contains sensor information in the format sensor name (sensor ID). Record the sensor name, sensor ID, and event description. Then, go to “Identifying a service action by using sensor and event information for the 7063-CR1” on page 14 and use the sensor name, sensor ID, and event description that you recorded to determine the service action to perform.
  • Page 29: Identifying Service Action Keywords In System Event Logs

    20. The service actions for all of the events that were identified in step 18 on page 12 must be performed to successfully complete the repair. Record the SEL record IDs for the events that you identified in step 18 on page 12. The SEL record ID is indicated by the leftmost digits of the SEL. Use the ipmitool command to display SEL details for each SEL record ID that you recorded.
  • Page 30: Identifying A Service Action By Using Sensor And Event Information For The 7063-Cr1

    Memory service action keywords v Configuration Error v Transition to Non-recoverable v Predictive Failure Processor service action keywords v IERR v Transition to Non-recoverable v Predictive Failure v Device Disabled Power supply service action keywords v Power Supply Failure Detected v Predictive Failure v Power Supply Input Lost or AC DC v Power Supply Input Lost Or Out of Range...
  • Page 31 Table 3. Sensor information, event description, and service action for the 7063-CR1 Sensor name (Sensor ID) Event description Service action System Temp (0x01) Ensure that there are no air flow v Transition to Critical from Less obstructions at the front or at the rear Severe of the system.
  • Page 32 Table 3. Sensor information, event description, and service action for the 7063-CR1 (continued) Sensor name (Sensor ID) Event description Service action Device Disabled Replace CPU 1. Go to “7063-CR1 v OCC Active 1 (0x08) locations” on page 37 to identify the physical location and removal and replacement procedure.
  • Page 33 Table 3. Sensor information, event description, and service action for the 7063-CR1 (continued) Sensor name (Sensor ID) Event description Service action Replace CPU 1. Go to “7063-CR1 v CPU Func 1 (0x0C) v IERR locations” on page 37 to identify the v Transition to Non-recoverable physical location and removal and v Predictive Failure...
  • Page 34 Table 3. Sensor information, event description, and service action for the 7063-CR1 (continued) Sensor name (Sensor ID) Event description Service action No service action is required. v P1M1-DIMMA Func (0x10) v Memory Device Disabled v P1M1-DIMMB Func (0x11) v Uncorrectable Memory Error v P1M2-DIMMA Func (0x14) v Memory Scrub Failed v P1M2-DIMMB Func (0x15)
  • Page 35 System Event (0x35) Undetermined system hardware Go to “Collecting diagnostic data” on failure page 35. Then, go to “Contacting IBM service and support” on page 35. No service action is required. v System Reconfigured v OEM System boot event...
  • Page 36 Table 3. Sensor information, event description, and service action for the 7063-CR1 (continued) Sensor name (Sensor ID) Event description Service action Ensure that the ambient temperature v SAS Temp (0x4A) v Transition to Critical from Less is within operating specifications. Severe v HDD Temp (0x4B) Ensure that there are no blockages to...
  • Page 37 Table 3. Sensor information, event description, and service action for the 7063-CR1 (continued) Sensor name (Sensor ID) Event description Service action Ensure that there are no air flow v Mem Buf Temp 1 (0x5E) v Transition to Critical from Less obstructions at the front or at the rear Severe v Mem Buf Temp 2 (0x5F)
  • Page 38 Table 3. Sensor information, event description, and service action for the 7063-CR1 (continued) Sensor name (Sensor ID) Event description Service action VBAT (0x9C) Replace the time-of-day battery. Go v Transition to Non-recoverable to “7063-CR1 locations” on page 37 v Lower Non-recoverable – going to identify the physical location and removal and replacement procedure.
  • Page 39 Table 3. Sensor information, event description, and service action for the 7063-CR1 (continued) Sensor name (Sensor ID) Event description Service action No service action required. v CPU1 Power (0xA2) v Lower Non-critical – going low v PCIE CPU1 Pwr (0xA6) v Lower Non-critical –...
  • Page 40 Table 3. Sensor information, event description, and service action for the 7063-CR1 (continued) Sensor name (Sensor ID) Event description Service action Performance Met If Asserted is in the event v Freq Limit Pwr 1 (0xA9) description, no service action is v Freq Limit Pwr 2 (0xAD) required.
  • Page 41 Table 3. Sensor information, event description, and service action for the 7063-CR1 (continued) Sensor name (Sensor ID) Event description Service action Replace system processor CPU 1. Go v CPU Core Func 1 (0xC8) v IERR to “7063-CR1 locations” on page 37 v CPU Core Func 2 (0xC9) v Transition to Non-recoverable to identify the physical location and...
  • Page 42 Table 3. Sensor information, event description, and service action for the 7063-CR1 (continued) Sensor name (Sensor ID) Event description Service action If the sensor name is FAN1, replace v FAN1 (0xE3) v Transition to Critical from Less Fan 1. If the sensor name is FAN2, Severe v FAN2 (0xE4) replace Fan 2.
  • Page 43: Isolation Procedures

    Table 3. Sensor information, event description, and service action for the 7063-CR1 (continued) Sensor name (Sensor ID) Event description Service action If the sensor name is PS1 Status, v PS1 Status (0xF3) v Predictive Failure replace PSU 1. If the sensor name is v PS2 Status (0xF4) v Power Supply Input Out of Range PS2 Status, replace PSU 2.
  • Page 44: Epub_Prc_Find_Deconfigure_Part Isolation Procedure

    Yes: Continue with the next step and list SELs remotely over the LAN. Go to “Collecting diagnostic data” on page 35. Then, go to “Contacting IBM service and support” on page 35. 3. Use the ipmitool command to examine system event logs (SELs).
  • Page 45: Epub_Prc_Phyp_Code Isolation Procedure

    Update the system firmware image. Go to Getting fixes and update the system firmware with the most recent level of firmware. Then, reboot the system. If the system firmware update does not resolve the problem, go to “Contacting IBM service and support” on page 35. This ends the procedure. EPUB_PRC_ALL_PROCS isolation procedure A problem was detected with a system processor.
  • Page 46: Epub_Prc_Lvl_Support Isolation Procedure

    Continue with the next step. Go to “Contacting IBM service and support” on page 35. This ends the procedure. 5. For each of the SELs that you identified in step 4 on page 29, determine the sensor name that is associated with each SEL.
  • Page 47: Epub_Prc_Proc_Ab_Bus Isolation Procedure

    Yes: Continue with the next step and list SELs remotely over the LAN. Go to “Collecting diagnostic data” on page 35. Then, go to “Contacting IBM service and support” on page 35. 3. Use the ipmitool command to examine system event logs (SELs).
  • Page 48: Epub_Prc_Power_Error Isolation Procedure

    Does the problem persist? Then Yes: Replace the system backplane. If the replacement of the system backplane does not resolve the problem, go to “Contacting IBM service and support” on page 35. This ends the procedure. This ends the procedure. EPUB_PRC_POWER_ERROR isolation procedure A power problem occurred.
  • Page 49: Epub_Prc_Hb_Code Isolation Procedure

    Yes: Continue with the next step and list SELs remotely over the LAN. Go to “Collecting diagnostic data” on page 35. Then, go to “Contacting IBM service and support” on page 35. 4. Use the ipmitool command to examine system event logs (SELs).
  • Page 50: Epub_Prc_Tod_Clock_Err Isolation Procedure

    Does the problem persist? Then Yes: Replace the system backplane. If the replacement of the system backplane does not resolve the problem, go to “Contacting IBM service and support” on page 35. This ends the procedure. This ends the procedure. EPUB_PRC_TOD_CLOCK_ERR isolation procedure A diagnostic function detected a problem with the time of day or clock function.
  • Page 51: Collecting Diagnostic Data

    Follow the instructions to install and run the system event log collection tool. Then, continue with the next step. 5. Send the data that you collected during this procedure to IBM service and support. This ends the procedure. Contacting IBM service and support You can contact IBM service and support by telephone or through the IBM Support Portal.
  • Page 52 Customers in the United States, United States territories, or Canada can place a hardware service request online. To place a hardware service request online, go to the IBM Support Portal (http://www.ibm.com/ support/home/). For up-to-date telephone contact information, go to the Directory of worldwide contacts website (www.ibm.com/planetwide/).
  • Page 53: Finding Parts And Locations

    7063-CR1. HDD 1 USB cable and connectors See Removing and replacing the USB cable and connectors in the 7063-CR1. Note: Your 7063-CR1 might include a front serial port; however, this port is not supported. © Copyright IBM Corp. 2017, 2019...
  • Page 54 Figure 2. Top view Table 12. Top view locations FRU removal and replacement Index number FRU description procedures Disk drive backplane See Removing and replacing the disk drive backplane in the 7063-CR1. Fan 1 See Removing and replacing fans in the 7063-CR1.
  • Page 55 Table 12. Top view locations (continued) FRU removal and replacement Index number FRU description procedures Time-of-day battery See Removing and replacing the time-of-day battery in the 7063-CR1. System backplane See Removing and replacing the system backplane in the 7063-CR1. PSU 1 See Removing and replacing a power supply in the 7063-CR1.
  • Page 56: 7063-Cr1 Parts

    After you identify the part number of the part that you want to order, go to Advanced Part Exchange Warranty Service. Registration is required. If you are not able to identify the part number, go to Contacting IBM service and support. Problem analysis, system parts, and locations for the 7063-CR1...
  • Page 57 Rack final assembly Figure 5. Rack final assembly Table 15. Rack final assembly part numbers Units per Index number Part number assembly Description 00E4982 Slide rail kit - contains left and right slide rails and attaching screws 00E4982 Slide rail kit - contains left and right slide rails and attaching screws Finding parts and locations...
  • Page 58 System parts Figure 6. System parts Table 16. System parts Index number Part number Units per assembly Description Top cover assembly Screws PCIe cage 00E5001 PCIe riser 00E5029 1U UIO NIC PCIe adapter with integrated 4-port 10 GbE Base-T (RJ-45), Intel XL710, and CAPI 01EM016 Power supply 01KL860...
  • Page 59 Table 16. System parts (continued) Index number Part number Units per assembly Description Screws 00E4998 Fan holder Air baffle (right side) Air baffle (left side) Additional system parts Figure 7. Additional system parts Finding parts and locations...
  • Page 60 Table 17. Additional system parts Index number Part number Units per assembly Description 02CL585 8 GB, 2400 MHz 1RX8 DDR4 RDIMM (Micron Technology, Inc.) 02CL407 System backplane kit (includes system backplane and vacuum pen) Screws 01KL666 System processor kit (includes 6 core 2.095 GHz system processor module, system processor tray, and vacuum pen) 00E4891 Heat sink kit (includes heat sink and thermal interface...
  • Page 61: Notices

    Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead.
  • Page 62: Accessibility Features For Ibm Power Systems Servers

    All IBM prices shown are IBM's suggested retail prices, are current and are subject to change without notice. Dealer prices may vary. This information is for planning purposes only. The information herein is subject to change before the products described become available.
  • Page 63: Privacy Policy Considerations

    This product uses standard navigation keys. Interface information The IBM Power Systems servers user interfaces do not have content that flashes 2 - 55 times per second. The IBM Power Systems servers web user interface relies on cascading style sheets to render content properly and to provide a usable experience.
  • Page 64: Trademarks

    IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the web at Copyright and trademark information at www.ibm.com/legal/copytrade.shtml.
  • Page 65 Warning: This is a Class A product. In a domestic environment, this product may cause radio interference, in which case the user may be required to take adequate measures. VCCI Statement - Japan The following is a summary of the VCCI Japanese statement in the box above: This is a Class A product based on the standard of the VCCI Council.
  • Page 66 Warning: This is a Class A product. In a domestic environment this product may cause radio interference in which case the user will be required to take adequate measures. IBM Taiwan Contact Information: Problem analysis, system parts, and locations for the 7063-CR1...
  • Page 67 Um dieses sicherzustellen, sind die Geräte wie in den Handbüchern beschrieben zu installieren und zu betreiben. Des Weiteren dürfen auch nur von der IBM empfohlene Kabel angeschlossen werden. IBM übernimmt keine Verantwortung für die Einhaltung der Schutzanforderungen, wenn das Produkt ohne Zustimmung von IBM verändert bzw.
  • Page 68: Class B Notices

    Properly shielded and grounded cables and connectors must be used in order to meet FCC emission limits. Proper cables and connectors are available from IBM-authorized dealers. IBM is not responsible for any radio or television interference caused by unauthorized changes or modifications to this equipment.
  • Page 69 European Community contact: IBM Deutschland GmbH Technical Regulations, Abteilung M456 IBM-Allee 1, 71139 Ehningen, Germany Tel: +49 800 225 5426 email: halloibm@de.ibm.com VCCI Statement - Japan Japan Electronics and Information Technology Industries Association Statement This statement explains the Japan JIS C 61000-3-2 product wattage compliance.
  • Page 70 Um dieses sicherzustellen, sind die Geräte wie in den Handbüchern beschrieben zu installieren und zu betreiben. Des Weiteren dürfen auch nur von der IBM empfohlene Kabel angeschlossen werden. IBM übernimmt keine Verantwortung für die Einhaltung der Schutzanforderungen, wenn das Produkt ohne Zustimmung von IBM verändert bzw.
  • Page 71: Terms And Conditions

    Permissions for the use of these publications are granted subject to the following terms and conditions. Applicability: These terms and conditions are in addition to any terms of use for the IBM website. Personal Use: You may reproduce these publications for your personal, noncommercial use provided that all proprietary notices are preserved.
  • Page 72 Problem analysis, system parts, and locations for the 7063-CR1...
  • Page 74 IBM®...

Table of Contents