Loadleveler - IBM Power Systems 775 Manual

For aix and linux hpc solution
Table of Contents

Advertisement

Compiler
Figure 1-68 HPCS Toolkit high-level design flow

1.9.8 LoadLeveler

LoadLeveler is a parallel job scheduling system that allows users to run more jobs in less time
by matching the processing needs and priority of each job with the available resources, which
maximizes resource utilization. LoadLeveler also provides a single point of control for
effective workload management and supports high-availability configurations. LoadLeveler
also offers detailed accounting of system utilization for tracking or charge back.
When jobs are submitted to LoadLeveler, the jobs are not executed in the order of
submission. Instead, LoadLeveler dispatches jobs that are based on their priority, resource
requirements, and special instructions. For example, administrators specify that long-running
jobs run only on off-hours that short-running jobs are scheduled around long-running jobs or
that jobs that belong to certain users or groups are prioritized. In addition, resources are
tightly controlled. The use of individual machines is limited to specific times, or users, job
classes, or LoadLeveler use machines only when the keyboard and mouse are inactive.
LoadLeveler tracks the total resources that are used by each serial or parallel job and offers
several reporting options to track jobs and utilization by user, group, account, or type over a
specified time. To support charge back for resource use, LoadLeveler incorporates machine
speed to adjust charge back rates and is configured to require an account for each job.
LoadLeveler supports high-availability configurations to ensure reliable operation and
automatically monitors the available compute resources to ensure that no jobs are scheduled
to failed machines. For more information about LoadLeveler, see this website:
http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/topic/com.ibm.cluster.lo
adl.doc/llbooks.html
As shown in Figure 1-69 on page 94, LoadLeveler includes the following characteristics:
Split into Resource Manager and Job Scheduler
Better third-party scheduler support through LL RM APIs
Performance and Scaling improvements for large core systems
Multi-Cluster support
Faster and scalable job launch as shown in Figure 1-69 on page 94
Workflow support with enhanced reservation function
Database option with xCAT integration
Original
Program
Data
Execution
File
Collection
(pSigma)
Program
Information
Modified Program (e.g. Block
cyclic distribution of A)
Chapter 1. Understanding the IBM Power Systems 775 Cluster
Bottleneck Discovery Engine
Performance
(Data Centric Analysis)
Data (Memory,
MPI, I/O, ...)
FPU
stalls
Solution Determination Engine
(Alternate Scenario Prediction)
H
P
M
L2
misses
MPI
Performance Bottleneck
(e.g. Communication
imbalance: Array A)
93

Advertisement

Table of Contents
loading

Table of Contents