Managing Jobs With Large Numbers Of Tasks (Up To 1024 K) - IBM Power Systems 775 Manual

For aix and linux hpc solution
Table of Contents

Advertisement

Figure 2-13, Node 0 and Node 1 include a CAU and a processor. Node 2 does not have a
CAU and node 3 does not have a processor (uncommon).
P0
HFI0
Figure 2-13 CAU tree physical connectivity example
Specifying the number of CAU group
The performance of small broadcast and reduction operations is significantly enhanced with
the use of the CAU resources of the HFI. You specify the number of CAU groups for an
application by using the MP_COLLECTIVE_GROUPS environment variable or the
-collective_groups command line flag. Allowable values include any number that is greater
than zero and less than the number of available CAU groups. The default value is zero.
Important: The CAU is not used with operations that are larger than 64 bytes.
2.4.5 Managing jobs with large numbers of tasks
In some situations, you might want to run large jobs that involve large numbers of tasks. PE
can support a maximum of 1024 K tasks in a job, with the following exceptions:
Because of constraints that are imposed by node architecture and interconnect
technology, 1024 K task support is limited to 64-bit applications.
When mixed User Space and shared memory jobs are run, architectural limits for other
components (for example, HFI windows) might reduce the number of tasks supported.
To prevent performance degradation, jobs that include large numbers of tasks need special
considerations. For example, you must use a different method for specifying the hosts on the
host list file. Also, the debuggers and other tools you use must query and receive task
information differently, and ensure that they attach to jobs before they start.
Managing task information when large jobs are run
When large jobs of up to 1024 K tasks are run, the amount of information that is generated
about the tasks and the operations between them is a large. Writing such a large amount of
information to a file degrades performance. To avoid this affect on performance, it is important
for a tool (such as a debugger) to minimize the task information that is generated by
requesting only the task information it needs. Also, the tool requests that it provide task
information to, and receive notifications from, POE by using a socket connection rather than
writing the task information to a file.
132
IBM Power Systems 775 for AIX and Linux HPC Solution
P1
Mem
Mem
(FIF0s)
(FIF0s)
HFI1
Cluster Network
P2
Mem
(FIF0s)
Key:
C3
HFI2
HFI3
Cn = CAU on node "n"
Pn = processor on node "n"
HFIn = HFI on node "n"

Advertisement

Table of Contents
loading

Table of Contents