IBM Power Systems 775 Manual page 297

For aix and linux hpc solution
Table of Contents

Advertisement

Timing the shutdown process: When timing the shutdown process for any shutdown
benchmarking, the process to drain running jobs must not be considered part of the IBM
HPC cluster shutdown process. This shutdown is excluded because it is dependent on how
long a user job runs to completion and is not considered part of the actual shutdown. After
all of the jobs are stopped, the official timing of the IBM HPC cluster shutdown process
begins. For a complete site shutdown process, the time it takes to drain the jobs might be
included, but the time varies depending on where each job is in its execution.
Cluster shutdown process
The following sections describe the steps to shut down the cluster. Each step outlines the
necessary process and verifications that must be completed before moving to the next step.
The cluster shutdown process is faster than the startup process. During the startup process, it
is necessary to manually start some daemons and the hardware verification during startup is
a longer process than shutting down the system.
User access
Care must be taken to ensure that any users of the cluster are logged off and that their
access is stopped during this process.
Site utility functions
Any site-specific utility nodes that are used to load, backup, or restore user data or other data
that is related to the cluster are disabled or stopped.
Preparing to stop LoadLeveler
To shut down the cluster, all user jobs must be drained or canceled. It is assumed that the
administrator has a thorough understanding of job management and scheduling to drain or
cancel all jobs. There are two methods to accomplish these tasks: draining the jobs and
cancelling the jobs.
Environmental site conditions affect whether to drain or cancel the jobs. If the shutdown is a
scheduled shutdown with sufficient time for the jobs to complete, then draining the jobs is the
best practice. A shutdown that does not allow for all of the jobs to complete requires that the
jobs must be canceled.
Shutdown scheduling and preparation for this task in advance is needed, especially when
jobs are drained to allow sufficient time for the jobs to complete.
Method 1: Draining LoadLeveler jobs
Draining the jobs is the preferred method but is only attainable when there is time to allow the
jobs to complete before the cluster must be shutdown.
Example 5-18 shows how to drain the jobs on compute and service nodes.
Example 5-18 Draining the jobs on compute nodes and schedule jobs on service nodes
for n in `nodels compute`
do
llctl -h $n drain startd
done
xdsh service -v llctl drain
Chapter 5. Maintenance and serviceability
283

Advertisement

Table of Contents
loading

Table of Contents