Sun Microsystems Sun Enterprise 10000 Dynamic Reconfiguration User Manual
Sun Microsystems Sun Enterprise 10000 Dynamic Reconfiguration User Manual

Sun Microsystems Sun Enterprise 10000 Dynamic Reconfiguration User Manual

Hide thumbs Also See for Sun Enterprise 10000 Dynamic:
Table of Contents

Advertisement

Sun Enterprise™ 10000 Dynamic
Reconfiguration User Guide
Sun Microsystems, Inc.
901 San Antonio Road
Palo Alto,CA 94303-4900
U.S.A. 650-960-1300
806-2249-10
Part No.
February 2000
, Revision 01
Send comments about this document to: docfeedback@sun.com

Advertisement

Table of Contents
loading
Need help?

Need help?

Do you have a question about the Sun Enterprise 10000 Dynamic and is the answer not in the manual?

Questions and answers

Summary of Contents for Sun Microsystems Sun Enterprise 10000 Dynamic

  • Page 1 Sun Enterprise™ 10000 Dynamic Reconfiguration User Guide Sun Microsystems, Inc. 901 San Antonio Road Palo Alto,CA 94303-4900 U.S.A. 650-960-1300 806-2249-10 Part No. February 2000 , Revision 01 Send comments about this document to: docfeedback@sun.com...
  • Page 2 Sun, Sun Microsystems, le logo Sun, AnswerBook2, docs.sun.com, Solstice, DiskSuite, SunFastEthernet, Ultra Enterprise, Sun Enterprise, OpenBoot, et Solaris sont des marques de fabrique ou des marques déposées, ou marques de service, de Sun Microsystems, Inc. aux Etats-Unis et dans d’autres pays. Toutes les marques SPARC sont utilisées sous licence et sont des marques de fabrique ou des marques déposées de SPARC International, Inc.
  • Page 3 Sun Enterprise 10000 SSP Attributions: This software is copyrighted by the Regents of the University of California, Sun Microsystems, Inc., and other parties. The following terms apply to all files associated with the software unless explicitly disclaimed in individual files.
  • Page 5: Table Of Contents

    Contents Introduction to DR 1 DR Configuration Issues 3 dr-max-mem Variable 3 To Enable the Kernel Cage 3 Configuration for DR Detach 4 I/O Devices 4 Memory 5 Pageable and Nonpageable Memory 5 Target Memory Constraints 6 Correctable Memory Errors 6 To Re-Enable Dump Detection 7 Swap Space 7 Reconfiguration After a DR Operation 7...
  • Page 6 To Detach a Board With Hostview 32 To Detach a Board By Using dr(1M) 34 Viewing Domain Information 37 To View Domain Information with Hostview 38 To Specify How Windows Are Updated 38 Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 7 To View DR CPU Configuration Information 39 To View DR Memory Configuration Information 41 To View DR Device Configuration Information 43 To View DR Device Detailed Information 44 To View DR OBP Configuration Information 45 To View the DR-Unsafe Devices 46...
  • Page 8 Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 9 Figures Attach Board and Domain Selection Window 20 FIGURE 3-1 Dynamic Reconfiguration Window With init attach Button 21 FIGURE 3-2 Dynamic Reconfiguration Window With the complete Button 22 FIGURE 3-3 Detach—Board and Domain Selection Window 32 FIGURE 3-4 Dynamic Reconfiguration Window With the drain Button 33 FIGURE 3-5 System Information Buttons 38 FIGURE 3-6...
  • Page 10 Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 11: Before You Read This Book

    Preface This book describes the Dynamic Reconfiguration (DR) feature, which enables you to logically attach and detach system boards from the Sun Enterprise™ 10000 server while other domains continue running. Before You Read This Book This book is intended for the Sun Enterprise 10000 server system administrator who has a working knowledge of UNIX®...
  • Page 12 See one or more of the following sources for this information: AnswerBook2 online documentation for the Solaris operating environment, particularly those dealing with Solaris system administration Other software documentation that you received with your system Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 13: Typographic Conventions

    Typographic Conventions Typeface or Symbol Meaning Examples The names of commands, files, Edit your .login file. AaBbCc123 and directories; on-screen Use ls -a to list all files. computer output % You have mail. AaBbCc123 What you type, when contrasted with on-screen Password: computer output AaBbCc123...
  • Page 14: Related Documentation

    The docs.sun.com web site enables you to access Sun technical documentation on the Web. You can browse the docs.sun.com archive or search for a specific book title or subject at: http://docs.sun.com Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 15: Sun Welcomes Your Comments

    Sun Welcomes Your Comments We are interested in improving our documentation and welcome your comments and suggestions. You can email your comments to us at: docfeedback@sun.com Please include the part number (806-2249-10) of your document in the subject line of your email.
  • Page 16 Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 17: Introduction To Dr

    While DR operations are being performed within a domain, the dr_daemon(1M) (see the Sun Enterprise 10000 Dynamic Reconfiguration Reference Manual) and the operating environment write messages regarding the status or exceptions of DR requests to the domain syslog message buffer (/var/adm/messages) and the SSP...
  • Page 18 DR operation from being started in a different domain. A partially completed DR operation must be finished before a subsequent DR operation is permitted in the same domain. Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 19: Dr Configuration Issues

    C H A P T E R DR Configuration Issues This chapter describes how to configure a domain for all DR operations and capabilities. Caution – Be careful when choosing the slot into which a board is inserted to prevent disk controller renumbering. For more information, see “Reconfiguration After a DR Operation”...
  • Page 20: Configuration For Dr Detach

    Note – When memory (swapfs) or swap space on a disk is detached, there must be enough memory or swap space remaining in the domain to accommodate currently running programs. Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 21: Memory

    A board that hosts non-vital system resources can be detached whether or not there are alternate paths to the resources. All of the devices on the board must be closed before the board can be detached; all of its file systems must be unmounted; and, its swap partitions must be deleted.
  • Page 22: Target Memory Constraints

    When the SSP detects correctable memory errors, it initiates a record-stop dump to save the diagnostic data, which can interfere with a DR detach operation. Therefore, Sun Microsystems suggests that when a record-stop occurs from a correctable memory error, you allow the record- stop dump to complete its process before you initiate a DR Detach operation.
  • Page 23: To Re-Enable Dump Detection

    To Re-Enable Dump Detection 1. Log in to the SSP as the user ssp. 2. Disable record-stop dump detection: SSP% edd_cmd -x stop This command suspends all event detection on all of the domains. 3. Monitor the in-progress record-stop dump: SSP% ps -ef | grep hpost In the grep(1) output, the -D option of hpost indicates that a record-stop dump is in progress.
  • Page 24: When To Reconfigure

    1 are named /dev/dsk/cXtYdZsW where: X is the disk controller number, Y, in most cases, corresponds to the disk target number, Z corresponds to the logical unit number, and Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 25: Dr And Ap Interaction

    W corresponds to the partition number. When the reconfiguration sequence is executed after a board is detached, the /dev links for all of the disk partitions on that board are deleted. The remaining boards retain their current numbering. Disk controllers on a newly inserted board are assigned the next available lowest number by disks(1M).
  • Page 26: Rpc Time-Out Or Loss Of Connection

    The dr_daemon(1M), which runs in each domain, communicates with Hostview and the dr(1M) shell application (both of which run on the SSP) by way of Remote Procedure Calls (RPCs). If an RPC time-out or connection failure is reported during Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 27: System Quiescence Operation

    a DR operation, check the domain. The daemon must be configured in the /etc/inetd.conf file of the domain. The following line (which appears on a single line) must be present in the file: 300326/4 tli rpc/tcp wait root \ /platform/SUNW,Ultra-Enterprise-10000/lib/dr_daemon/ dr_daemon If the DR daemon is configured in /etc/inetd.conf, kill the dr_daemon(1M) if it is currently running.
  • Page 28: Suspend-Safe/Suspend-Unsafe Devices

    All other I/O devices are suspend-unsafe when open. Note – At the time of this printing, the drivers released by Sun Microsystems™ that are known to be suspend-safe are st, sd, isp, esp, fas, sbus, pci, pei-pci, qfe, hme (SunFastEthernet™), nf (NPI-FDDI), qe (Quad Ethernet), le (Lance Ethernet),...
  • Page 29: Special Handling For Tape Devices

    Special Handling for Tape Devices For the Solaris 8 operating environment, tape devices that are natively supported by Sun Microsystems™ are suspend-safe and detach-safe (see st(7D) for a list of natively-supported drives). If a system board that you are detaching contains a natively-supported tape device, you can safely detach the board without suspending the device.
  • Page 30: Special Handling Of Sun Storedge A3000

    Using standard Solaris interfaces to manually close and to unload all such drivers on the board. See modload(1M) in the SunOS Reference Manual. Detaching the system board in the normal fashion. Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 31: Dr And Ddi_Suspend/Ddi_Resume

    Note – It should be possible to unplumb all network drivers. However, this action is rarely tested in normal environments and may result in driver error conditions. If you use DR, Sun Microsystems suggests that you test these driver functions during the qualification and installation phases of any suspend-unsafe device.
  • Page 32 Caution – Using the force option to quiesce the operating environment, without first successfully quiescing the controller, can result in a domain failure and subsequent reboot. Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 33: Using Dynamic Reconfiguration

    C H A P T E R Using Dynamic Reconfiguration Attaching a System Board This section gives a broad overview of the actions that occur when you execute DR Attach. For step-by-step instructions, see “To Attach a Board With Hostview”. You can attach system boards that are present in the machine, powered on, and not part of an active domain (that is, not being used by an operating environment).
  • Page 34: Complete Attach

    After a board is successfully attached, you have the option of reconfiguring the I/O devices. See “Reconfiguration After a DR Operation” on page 7 for more information. This operation can take several minutes to complete. Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 35: Attach Buttons

    Attach Buttons When you perform an attach operation using the Hostview GUI (which transparently calls a separate executable: drview(1M)), the following buttons appear at various times during the attach process: init attach – Begins the attach operation (see “Init Attach” on page 17). After the operation has completed successfully, the label on this button changes to complete.
  • Page 36: Figure 3-1 Attach Board And Domain Selection Window

    If any errors occur, the error messages appear in the main Hostview window. Otherwise, the Dynamic Reconfiguration window is displayed with the init attach button visible ( on page 21). FIGURE 3-2 Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 37: Figure 3-2 Dynamic Reconfiguration Window With Init Attach Button

    Dynamic Reconfiguration DR - Attach Board properties Attaching Board: Target Domain: Information: Checking environment . . . Establishing Control Board Server connection . . . Initializing SSP SNMP MIB . . . Establishing communication with DR daemon . . . xf2: System Status - Summary BOARD # : 5 6 7 physically present.
  • Page 38: Figure 3-3 Dynamic Reconfiguration Window With The Complete Button

    Caution – Before you choose the reconfig button, be sure to read “Reconfiguration After a DR Operation” on page 7. 10. Click the dismiss button. The DR Attach operation is complete. Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 39: To Attach A Board By Using Dr(1M)

    To Attach a Board By Using dr(1M) Note – The following procedure explains how to attach a board by using dr(1M) with SSP version 3.1, or higher. If you are using SSP version 3.0, refer to a previous version of the Dynamic Reconfiguration User’s Guide. Before you perform the following steps, read “Attaching a System Board”...
  • Page 40 OBP display to see an inventory of the board resources. dr> drshow board_number OBP If you wish to abort the attach operation, use the abort_attach(1M) command. dr> abort_attach board_number Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 41 If you wish to complete the board attach operation, use the complete_attach(1M) command. dr> complete_attach 6 Completing attach for board 6..Checking IDN state of domain_name_a : UP Issuing IDN UNLINK (domain_name_a) Verifying IDN UNLINK... IDN (XM) UNLINK succeeded (domain_name) ...Checking IDN state of domain_name_a : UP ...Checking IDN state of domain_name_b : UP Initiating IDN LINK...
  • Page 42: Detaching A System Board

    CPU and I/O resources on the board. However, less memory is available to the domain. Note – After memory is drained, enough memory and swap space must remain in the domain to accommodate the current workloads. Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 43: Complete Detach

    During the drain period, Hostview and dr(1M) are available to monitor the detach progress. You can view the current status of the drain operation, including the number of memory pages remaining to be drained, and the usage of devices on the board.
  • Page 44: Non-Network Devices

    You must perform certain tasks for non-network devices. Although the following list of tasks implies a sequence of order, strict adherance to the order is not necessary. Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 45: Processes

    1. If the redundancy features of Alternate Pathing or Solstice DiskSuite mirroring are used to access a device connected to the board, reconfigure these subsystems so that the device or network is accessible using controllers on other system boards. Note that for Alternate Pathing 2.1, the system automatically switches the disk devices to an alternate interface if one is available.
  • Page 46: Processors

    You can now attach the board to another domain, power it off, and remove it by way of hot-swapping, leaving it in the system unattached, or reattaching it at a later time. Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 47: Hostview Detach Buttons

    Hostview Detach Buttons The Hostview detach window displays the following buttons at various times during a detach operation: Hostview Buttons TABLE 3-1 Button Description drain Drains the memory (see “Drain” on page 26). After the drain operation is finished, the drain button becomes the complete button.
  • Page 48: To Detach A Board With Hostview

    If the target domain is not currently booted, the detach operation simply manipulates the domain configuration file on the SSP. However, if the domain is running, the following window is displayed ( on page 33). FIGURE 3-5 Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 49: Figure 3-5 Dynamic Reconfiguration Window With The Drain Button

    Dynamic Reconfiguration DR - Detach Board properties Detaching Board: Source Domain: Information: Checking environment . . . Establishing Control Board Server connection . . . Initializing SSP SNMP MIB . . . Establishing communication with DR daemon . . . xf2: System Status - Summary BOARD # : 6 7 physically present.
  • Page 50: To Detach A Board By Using Dr(1M)

    26. The process of detaching a board is very similar with either Hostview or dr(1M). The basic concepts are not repeated in this section. The dr(1M) program was introduced in Chapter 1. 1. Set SUNW_HOSTNAME to the appropriate domain using the domain_switch(1M) command. Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 51 2. Use the dr(1M) command in an SSP Window to bring up the dr(1M) prompt. In the following example, the target domain is called xf3. % dr Checking environment... Establishing Control Board Server connection... Initializing SSP SNMP MIB... Establishing communication with DR daemon... xf3: Domain Status - Summary BOARD #: 0 1 2 5 6 8 9 10 11 13 physically present.
  • Page 52 The drain(1M) command initiates the drain operation and returns to the shell prompt immediately. You can monitor the progress of the drain operation with the following command: dr> drshow board_number drain Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 53: Viewing Domain Information

    Note – In addition, you can initiate the drain with the wait option of the drain(1M) command, which does not return to the shell prompt until after the drain has completed. Refer to drain(1M) for more information regarding the wait option.
  • Page 54: To View Domain Information With Hostview

    If you click the All button, all of the currently enabled windows are displayed. To Specify How Windows Are Updated 1. Click the Properties button in the Dynamic Reconfiguration window ( FIGURE 3-7 Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 55: Figure 3-7 Dr Properties Window

    DR Unsafe Devices Auto Update System Information Displays: Update Interval (secs) save reset dismiss help DR Properties Window FIGURE 3-7 2. To cause displays to be updated, set Auto Update Domain Information Displays to On (the default). 3. Set the Update Interval to a value (in seconds) to determine how often updates occur.
  • Page 56: Figure 3-8 Dr Cpu Configuration Window

    Threads may be bound to a processor by use of the pbind(1M) command. PROCS Displays the process IDs of the user processes that are bound to a CPU. Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 57: Figure 3-9 Dr Memory Configuration Window

    To View DR Memory Configuration Information Click the memory button. The DR Memory Configuration window is displayed ( FIGURE 3-9 DR Memory Configuration System Memory Sizes (MB) Current System: 2048 Attached Capacity: 18432 63488 dr-max-mem: 20480 65536 Memory Detach: enabled Memory Configuration for Board 0 Memory Size(MB): 1024...
  • Page 58 The drain operation is finished. Memory Drain Information Reduction Amount of memory to be removed from domain usage when the board is detached Remaining in Domain memory size after the board is detached Domain Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 59: Figure 3-10 Dr Device Configuration Window

    DR Memory Configuration Information (Continued) TABLE 3-3 Percent Complete How far the drain operation has progressed. Note that the time required to drain each memory page is not constant. Some memory pages take longer to drain than others. Drain Start Time The time the drain operation was started.
  • Page 60: Figure 3-11 Dr Detail Device Window

    If a controller or network interface is part of the AP database, the window indicates that it is active or that it is an AP alternate. For active AP alternates, the usage of the AP metadevice is displayed. Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 61: To View Dr Obp Configuration Information

    To View DR OBP Configuration Information Note – The information in the DR OBP Configuration window is derived from the OBP device tree, and is less detailed than the information that is available from the other windows described in this section. For example, in the init attach state, only the I/O adapters are known—not the devices attached to those controllers nor the memory interleave configuration.
  • Page 62: Figure 3-12 Dr Obp Configuration Window

    Memory Size (MB): 1024 dismiss DR OBP Configuration Window FIGURE 3-12 To View the DR-Unsafe Devices Click the unsafe button. The DR Unsafe Devices window is displayed ( FIGURE 3-13 Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 63: Figure 3-13 Dr Unsafe Devices Window

    DR Unsafe Devices Unsafe devices which are currently open: No Unsafe Devices are Open dismiss DR Unsafe Devices Window FIGURE 3-13 The DR Unsafe Devices window shows the suspend-unsafe devices that are open across the entire domain, not just those that are resident on the selected system board.
  • Page 64 Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 65 A P P E N D I X DR Error Messages This appendix contains a list of some of the error messages that you might see while you are performing DR operations. The list does not include Protocol Independent Module (PIM) layer errors, which are more generic than the error messages in the following tables.
  • Page 66 DR daemon without properly configuring the network services on the domain. Normally, network services spawn the DR daemon in response to an incoming RPC from the SSP. Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 67 DR Daemon Start-Up Error Messages TABLE A-1 Error Message Probable Cause Suggested Action Cannot fork: descriptive The DR daemon could not fork a The descriptive error message message process from which to run its RPC corresponds to an errno_value and server.
  • Page 68 An EAGAIN error means that the problem may have been temporary. You can retry the operation, which may succeed eventually, or you may have to stop and restart the daemon. Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 69 Memory Allocation Error Messages (Continued) TABLE A-2 Error Message Probable Cause Suggested Action While it queried the system First, check the size of the daemon by DR Error: malloc information, the DR daemon could not using the ps(1) command. Normally, failed (AP ctlr_t array) errno_description allocate enough memory for a...
  • Page 70 An EAGAIN error means that the problem may have been temporary. You can retry the operation, which may succeed eventually, or you may have to stop and restart the daemon. Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 71 Memory Allocation Error Messages (Continued) TABLE A-2 Error Message Probable Cause Suggested Action While it queried the system First, check the size of the daemon by DR Error: malloc information, the DR daemon could not using the ps(1) command. Normally, failed (dr_io) errno_description allocate enough memory for a...
  • Page 72 An EAGAIN error means that the problem may have been temporary. You can retry the operation, which may succeed eventually, or you may have to stop and restart the daemon. Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 73 Memory Allocation Error Messages (Continued) TABLE A-2 Error Message Probable Cause Suggested Action While it queried the system First, check the size of the daemon by DR Error: malloc information, the DR daemon could not using the ps(1) command. Normally, failed allocate enough memory for a the daemon uses about 300- to 400-...
  • Page 74 An EAGAIN error means that the problem may have been temporary. You can retry the operation, which may succeed eventually, or you may have to stop and restart the daemon. Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 75 DR Driver Failures The following table contains the DR driver failures that are sent to the system logs and to the SSP applications. In general, refer to the descriptions of the daemon and PSM errors for details about what goes to the system logs and what goes to the SSP. Note –...
  • Page 76 You can find explanations of DR: Error: the error numbers in the /usr/ abort_detach: include/sys/sfdr.h header file. CONFIGURE ioctl failed DR: Error: get_dr_state: ioctl failed DR: Error: get_dr_status: ioctl failed Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 77 PSM Error Messages The following table contains a list of PSM error messages that are sent to the system logs and to the SSP applications. PSM Error Messages TABLE A-4 Error Message Probable Cause Suggested Action An internal driver failed. None 1 SFDR_ERR_INTERNAL Failed to suspend devices.
  • Page 78 24 SFDR_ERR_CPUSTOP Failed to move the clock-signal CPU. None 25 SFDR_ERR_JUGGLE_ BOOTPROC Could not cancel a RELEASE Retry the Abort Detach 26 SFDR_ERR_CANCEL operation. operation after the Drain operation is complete. Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 79 General Failures The following table contains a list of the general failure error messages that are sent to the system logs and/or to the SSP applications. General Failure Error Messages TABLE A-5 Error Message Probable Cause Suggested Action The DR daemon could not fork off a The errno_description offers DR Error: Cannot fork() process .
  • Page 80 (for example, memory or modification time [non- CPUs added), it is probed or deprobed fatal]. by OBP so that OBP can inform other programs of the change. Then, the modification time is updated. Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 81 Protocol and Communication Error Messages The following table contains the protocol and communication error messages that are sent to the system logs and/or the SSP applications. Protocol and Communication Failure Error Messages TABLE A-6 Error Message Probable Cause Suggested Action The RPC is attempting to perform a Check the SSP network DR Error:...
  • Page 82 Therefore, this error indicates a breakdown on the SSP or in the network connection to the SSP. Or, it indicates an incompatibility between the SSP applications and the DR daemon. Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 83 Protocol and Communication Failure Error Messages (Continued) TABLE A-6 Error Message Probable Cause Suggested Action The RPC is attempting to perform a Check the SSP network DR Error: detach_board: DR operation on a board number that connection and/or the SSP invalid board number is not in the range of valid numbers.
  • Page 84 Therefore, this error indicates a breakdown on the SSP or in the network connection to the SSP. Or, it indicates an incompatibility between the SSP applications and the DR daemon. Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 85 Protocol and Communication Failure Error Messages (Continued) TABLE A-6 Error Message Probable Cause Suggested Action The RPC is attempting to perform a Check the SSP network DR Error: get_cpu_info: DR operation on a board number that connection and/or the SSP invalid board number is not in the range of valid numbers.
  • Page 86 DR daemon. You may need Attach operation from the SSP. to reboot the domain to recover from this error. Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 87 Attach-Related Failure Error Messages (Continued) TABLE A-7 Error Message Probable Cause Suggested Action The board entered the FATAL state Reboot the domain. DR Error: Cannot abort after the abort command was issued, attach. Board ineligible causing the abort operation to fail and for further DR operations.
  • Page 88 The manual interfaces on the board could be busy, execution of the command so manual intervention may be may yield more detailed needed. information about the failure. Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 89 Detach-Related Failure Error Messages (Continued) TABLE A-8 Error Message Probable Cause Suggested Action The ifconfig(1M) command failed to Log in to the domain, and, if ifconfig unplumb failed. unplumb the network interfaces. The possible, unplumb the network ifconfig(1M) command unplumbs interfaces manually by using and brings down the network the ifconfig(1M) command...
  • Page 90 However, to the DR driver Detach operations to and daemon, the board is not part of determine if the error is the domain. recoverable. Stop and start the DR daemon and driver. Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 91 Detach-Related Failure Error Messages (Continued) TABLE A-8 Error Message Probable Cause Suggested Action The proper sequence of board states Examine the state of the board DR Error: detach_board: has not been followed, meaning that by using the invalid board state the board went into the error state or dr_cmd_board_states(?) that an earlier failure in the drain-...
  • Page 92 Retry the board is detached. DR operation after you have solved the error. If no fix is apparent, stop and restart the DR daemon, then retry the DR operation. Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 93 Auto-Configuration Error Messages The following table contains the list of auto-configuration error messages that are sent to the system logs and/or to the SSP applications. Auto-Configuration Error Messages TABLE A-9 Error Message Probable Cause Suggested Action The autoconfig(1M) command failed Use the DR Error: Complete pending because a DR operation was still...
  • Page 94 . . . error descriptions reconfigure the operating description and/or error environment. number that is sent with the error message to determine why the command failed. Manually run the command on the domain. Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 95 System Exploration Error Messages The following table contains the system exploration error messages that are sent to the system logs and/or to the SSP applications. System Exploration Error Messages TABLE A-10 Error Message Probable Cause Suggested Action The DR daemon made an incorrect Analyze what caused this error Cannot open /etc/ decision about the detachability and...
  • Page 96 Kbytes. If the size is not within memory cannot be calculated, then this range, stop and start the DR the effects of removing a board from daemon and driver. the domain cannot be calculated as well. Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 97 System Exploration Error Messages (Continued) TABLE A-10 Error Message Probable Cause Suggested Action The DR daemon encountered a Determine what caused this get_net_config_info: failure while it tried to obtain error by using the errno_value, interface_name no address information about a network then correct the error.
  • Page 98 Sun service representative, providing as much information from the system logs as possible. Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 99 System Exploration Error Messages (Continued) TABLE A-10 Error Message Probable Cause Suggested Action The DR daemon found a symbolic Remove the symbolic link so Recursive symlink found link as it walked the /dev and that the test can be retried. ‘symbolic_link_name’.
  • Page 100 Sun service representative, providing as much information from the system logs as possible. Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 101 System Exploration Error Messages (Continued) TABLE A-10 Error Message Probable Cause Suggested Action Unable to open hostname_file The information that is needed to Analyze what caused this error test each active network device could by using the open(2) man page (errno=errno_value) not be acquired.
  • Page 102 Sun service representative, providing as much information from the system logs as possible. Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 103 System Exploration Error Messages (Continued) TABLE A-10 Error Message Probable Cause Suggested Action As it walked the /dev and / Check the /dev and walk_dir: dirlist buffer devices directories, the DR daemon /devices directories for overflow. encountered too many directories, recursive symbolic links.
  • Page 104 This error may also explain why queries for memory information or detachability tests are failing due to incorrect reporting of memory sizes. Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 105 System Exploration Error Messages (Continued) TABLE A-10 Error Message Probable Cause Suggested Action The libdevinfo API failed to build Make sure that the correct DR Error: device tree not the device tree for the system board. version of the libdevinfo is built.
  • Page 106 CPUs (either online or info is incomplete [non- offline). Therefore, the information fatal]. about each CPU in the CPU Configuration window will not be accurate. Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 107 System Exploration Error Messages (Continued) TABLE A-10 Error Message Probable Cause Suggested Action The device tree was built incorrectly. Check the size of the DR DR Error: build_rpc_info: Several functions create the device daemon by using the ps(1) bad slot number tree for a system board by searching command.
  • Page 108 Report this be returned from an RPC. error to your Sun service representative, providing as much information from the system logs as possible. Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 109 OpenBoot PROM Error Messages The following table contains the list of OpenBoot™ PROM (OBP) error messages that are sent to the system logs and/or to the SSP applications. OBP Error Messages TABLE A-11 Error Message Probable Cause Suggested Action This message indicates that corrupted This is a non-fatal error.
  • Page 110 Sun service representative, providing as much information from the system logs as possible. Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 111 OBP Error Messages (Continued) TABLE A-11 Error Message Probable Cause Suggested Action The DR daemon failed to close the Determine what caused this DR Error: close error on entry point for the OBP driver. error by using the error /dev/openprom messages that preceeded this error message.
  • Page 112 DR daemon constructs a not constitute a correctable string for the device, marking it as error. The daemon can use “(unknown, major_number)”. the major number to identify the drive. Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 113 Unsafe-Device Query Error Messages (Continued) TABLE A-12 Error Message Probable Cause Suggested Action WARNING: board board_number While the DR daemon was examining You may have to stop and the system boards for unsafe devices, restart the DR daemon to not checked for unsafe the daemon encountered a failure that recover the domain from this devices.
  • Page 114 I/O controller, but the response was specific details about this incorrect. failure, or an error number may be available. Also, check the ap_daemon(1M) man page for more details about this error. Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 115 AP-Related Error Messages (Continued) TABLE A-13 Error Message Probable Cause Suggested Action The physical device name that Make sure that AP works Cannot find physical device for AP_alias corresponds with the AP alias could properly. Check to see if all of not be found.
  • Page 116 Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 117 Index board, attach, 20 buttons abort button, 19, 31 abort, 19, 31 active DR operations, only one, 2 complete, 19, 22, 31, 34 Alternate Pathing (AP) and DR, 4 CPU, 39 alternate pathing and vital partitions during device, 43 detach, 4 dismiss, 19, 31 amount of memory attachable, 42 drain, 31, 33...
  • Page 118 Hostview, 32 detach-safe, 14 detach-safe tape devices, 13 detach-unsafe, 14 file systems unmounted before detach, 5 detach-unsafe devices present, cannot force files detach, 15 .postrc, and memory interleaving, 5 Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 119 st.conf (ST_UNLOADABLE flag and tape memory, configuring for detach, 5 devices), 13 memory, determining if nonpageable memory is force button, 31 present, 6 force quiesce, how to, 12 memory, pageable and nonpageable, 5 forcible conditions and quiesce failures, 12 memory, total size (all boards), 42 hard lock on file systems (lockfs) before detach, 5 network between SSP and UE1000, and detach, 4 help button, 19, 31...
  • Page 120 11 suspending OS and real-time processes, 11 suspending OS and suspend-unsafe devices, 11 suspending OS during detach, and nonpageable viewing system information, 37 memory, 5, 11 suspend-safe device, 12 Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...
  • Page 121 windows CPU configuration, 39 detach parameter selection, 32 device configuration, 43 device detail, 44 DR parameter selection, 20 dynamic reconfiguration, 21 memory configuration, 41 unsafe devices, 47...
  • Page 122 Sun Enterprise 10000 Dynamic Reconfiguration User Guide • February 2000...

Table of Contents