IBM Power Systems 775 Manual page 289

For aix and linux hpc solution
Table of Contents

Advertisement

EMS and HMCs
The next level of dependency is the EMS and HMCs. These items are started at the same
time after the network switches are started.
Frames
After the EMS and HMCs are started, we begin to start the 775 hardware by powering on
all of the frames. The frames are dependent on the switches and the EMS to come up
properly.
CECs
After the frame is powered on, the CECs are powered on. The CECs depend on the
switches, EMS, and frames. Applying power to the CECs brings up the HFI network
hardware, which is critical to distributing the operating system to diskless nodes, as well
as for application communication.
Service Node
The SN is started after the CECs are powered on. The SN is dependent on the switches,
EMS, frame, and CEC.
I/O node
The I/O node is started after the SN is operational. The I/O node is dependent on the
switches, EMS, frame, CEC, and SN.
Compute nodes
Last in the list is the starting of the compute nodes. The compute nodes are done after the
SN and I/O nodes are up and operational. The login and compute node require the SN to
be operational for the OS images loading. Compute nodes depend on Ethernet switches,
EMS, frame, CEC, SN, and I/O nodes.
After the compute nodes start, the hardware startup process ends and the administrator
begins to evaluate the HPC cluster state by checking the various components of the cluster.
After the HPC stack is verified, the cluster startup is complete.
Other node types that each customer define to meet their own specific needs might be
needed. Some examples are nodes responsible for areas such as login and data backup.
These nodes must be brought up last to allow the rest of the cluster to be up and running.
Because these nodes are outside the HPC cluster support and software and the nature of
their startup is an unknown factor, the nodes are outside the scope of this publication and are
not part of any timing of the cluster startup process.
Startup procedure
This section describes the startup procedure. The following sections describe the
prerequisites, the process for this step, and the verification for completion. Some assumptions
are made about the current site state, which must be met before starting this process. These
assumptions include cooling and power, and initial configuration and verification of the cluster
that is performed during installation.
Before we begin with the startup procedure, we describe the benefit of the use of xCAT group
names. xCAT supports the use of group names that allow the grouping of devices or nodes in
a logical fashion to support a type of nodes. We recommend that the following node groups
be in place before performing this procedure: frame, cec, bpa, fsp, service, storage, and
compute. Other node groups might be used to serve site-specific purposes.
Creating node groups significantly enhances the ability to start a group of nodes at the same
time. Without these definitions, an administrator must issue many separate commands
instead of a single command.
Chapter 5. Maintenance and serviceability
275

Advertisement

Table of Contents
loading

Table of Contents